The new genomic frontier: Next-generation data management takes flight

R&D
next generation genomic data management

Genomics is changing the world. It will revolutionise healthcare for millions, with the power to transform research, drug development, and diagnosis, particularly in rare disease and oncology. Potential is seen in stratospheric growth estimates, with CAGR of 19% to 2032 and a surge in market value from US$32 billion in 2022 to US$178 billion by 2032.

Yes, genomics is changing the world — but more slowly than we’d like. To date, sequencing has been at the forefront of innovation, and this has seen us gather the data – but not its insights.

We need a shift in focus towards simplifying and standardising genomic data management and interpretation – because it’s here that we will see the new high-growth and high-impact genomics frontier. It’s here that we will see genomics truly realise its potential.

Vast genomic libraries — but few readers

Despite being a relatively new technology, advances in sequencing have unlocked huge economies of scale. While it cost US$2.7 billion to map a single human genome in 2003, today a sequence costs around US$100 and we can probably count the number of genomes in the tens of millions — and beyond.

It means that we now have vast and constantly growing libraries of complex genomic data. The issue is that there are few people able to analyse and interpret the information they contain. Broadly speaking, sequencing has been automated and industrialised, but our ability to gain insight from the data is still heavily reliant on highly specialised, highly skilled — and extremely rare — human expertise.

To use another metaphor: there’s genomic gold in the mountains of sequencing data we’re amassing, but we’re still mining it with picks and shovels.

Genomic data — the challenges of dynamic interrogation and scale

One of the barriers to effective data management is the fact that genomic data is different. Where other forms of medical data are historical snapshots that can be stored in flat file systems, genomic data needs to be dynamically accessible and analysable — allowing for constant reinterpretation.

To complicate the challenge, research suggests there could be anywhere between 2 and 40 exabytes of genomic data created every year by 2025 (40 exabytes could store 40 trillion 720p feature films). Research published in PLOS Biology finds that genomics is, perhaps, the most demanding data domain in terms of acquisition, storage, distribution, and analysis.

As the National Human Genome Institute states, “Our ability to sequence DNA has far outpaced our ability to decipher the information it contains.”

Data science — the new genomic frontier

I have focused on accessing the value of genomic data for over a decade. Working with Genomics England on its 100,000 Genomes Project back in 2015, we quickly realised that the scale of genomic data demanded a fundamental re-think of storage, access, and sharing.

And now work conceived at Genomics England, developed by the University of Cambridge, and carried forward by Zetta Genomics has led to a genomic-native tertiary analysis platform.

Today, indeed, data technologies have caught up with sequencing, with the ability to mine data mountains and extract ‘genomic gold’ at speed and scale.

CRO and pharma applications

This new frontier also offers compelling advantages to pharma and clinical research organisations (CROs).

Drug development costs, for example, are spiralling: Deloitte reports average R&D costs of $2.3billion. Clinical trials are a significant part of this, with average costs per participant of over US$44,000, and trial sizes ranging from just four participants to over 8,000.

The accepted wisdom is that larger sample sizes lower the standard error for greater statistical precision. The recruitment and retention of large participant populations, however, is both complex and costly: not least because many of those recruited may not be suitable candidates.

Next generation genomic data management solutions can give researchers the opportunity to target recruitment on genomically pre-selected, trial-relevant individuals. Trials can be smaller without impacting accuracy. Now, technologies can enhance trial efficiency, efficacy, and safety, while cutting costs.

Democratise genomic data

Genomic data management is a truly horizontal technology. It is applicable and can bring value to research, pharma, and clinical sectors — and different disciplines within them. Consequently, technologies need to democratise access and insight.

Solutions should be customisable, simply because their capabilities can be overwhelming. Solutions must simplify, with intuitive dashboards that bring meaningful insight to data specialists and non-specialists alike.

Access must also bring genomic data to whoever needs it, wherever they need it. This demands multi-cloud platforms that liberate organisations to use the providers that best meet their needs, and sophisticated single sign-on protocols to allow instant access to authorised users.

Genomics is also a numbers game, where size of database really does matter. So, solutions must offer the opportunity to bring these resources together and support the federation of data. It means platforms will allow incorporation and annotation of both public and private databases — to multiply effectiveness.

Customer learning

Genomic data management is a new concept, and customers — particularly non-specialists — will be feeling their way through the technologies as they discover capabilities. The ability to listen, learn, and react to customer feedback will be critical.

Sales teams will need deep knowledge of customers’ genomic challenges to provide meaningful solutions. Engineering teams must work hand in glove with UI and UX experts to ensure comprehensive accessibility. Regular customer feedback will be the north star of product development.

This customer-centric understanding will help solutions providers to not only meet customers’ immediate objectives, but also explore untapped opportunities. By delivering short, medium, and long-term value, we can continually improve clinical impact.

Converging technologies

History provides us with a precedent to show where we are in terms of genomic development.

Twenty years after the Wright brothers first flew, aircraft were still largely made of canvas. Innovations were emerging by the late 1920s, but air travel was expensive and enjoyed by few. Yet, just a generation later, converging technologies ushered in the jet age that democratised air travel.

A century later and 20 years on from sequencing the first human genome, innovations are emerging, but genomic medicine is still relatively expensive and enjoyed by few. Yet, in just a generation, I believe converging technologies will usher in the genomic age and democratise precision medicine.

Genomic data solutions are a key convergent technology and the new growth frontier. As the National Human Genome Institute predicts, “genomic data science will be a vibrant field of research for many years to come.”

Image
Ignacio Medina
profile mask
Ignacio Medina