What AI really means for biology and the pharmaceutical industry
A blanket statement like ’We need to use AI’ today is just as futile as saying ’We need to use electricity’ during the second industrial revolution.
It’s not that I don’t believe that AI will change the face of biology. I am confident that it will. But this rallying cry falls short for two key reasons. First, it neglects to mention the changes needed to help realise such a transformation. Back in the late 19th century, electricity alone wasn’t enough to boost productivity. It was the new technology, plus people changing the way they worked - the shift from arranging factories around huge drive-shafts powered by steam engines to arranging them into production lines - that made the difference.
The second reason? AI covers a huge amount of ground. You can apply AI in countless different ways, across every aspect of the value chain. Unless you go into detail, ’We need to use AI’ as a statement doesn’t tell you how or when to use AI, or why. Saying “harnessing active learning can help us optimise assay development for early discovery” or “applying large language models will improve user interfaces for our complex methodologies and equipment” instantly becomes more meaningful.
I predict that changes to our working methods will be the most significant way AI will alter the biology and the pharmaceutical industry. Ultimately, this stems from my belief that biological research and AI haven’t really found their synergies yet. And this won’t happen until we change how we approach science from an operational standpoint. Not only do teams and organisations need to embrace a new way of thinking, but they also need a new and improved toolkit and new scientific processes to support it.
Which means that, when we’re talking about AI, we’re really talking about data. The question now becomes this: what properties would the data you’re using to feed an AI system need to unravel biology’s complexity?
Data volume is vital, but so are data context and quality
Biology is an emergent phenomenon. Its complexity is in the distinct behaviours and patterns that come from interactions between simpler components. It’s impossible to predict emergent features from individual features with confidence. Which means that really getting to grips with a biological system involves understanding its dynamic as any number of factors change.
Except, lots of the big data produced in biology studies today are multi-omic: they’re incredibly detailed molecular snapshots of a system. Apart from genomic data, all of these readouts change in response to myriad stimuli over time.
To gain a better understanding of biology, we need an interconnected, comprehensive dataset. But we can’t only measure lots of things; we also have to measure them in the context of this ever-changing multifactorial landscape. We have to give AI a good view by methodically running experiments that explore and uncover that space.
Sequencing alone won’t help us achieve this. To fully understand biology’s complexity, we must shift away from one-dimensional data and towards a more holistic look at how it works with and responds to different stimuli.
Context is just as important. It’s too easy to lose vital information on how experimental data was produced. And sufficiently recording our experimental data - our methods, why we chose them, conditions in the lab, liquid classes used in automated pipettors - with today’s tools and processes is no easy feat.
This need to improve data recording methods is felt by many: a worrying 43% of R&D leadership lacks confidence in quality in experimental data, according to research we recently conducted. Crucially, it also underpins a need for yielding higher quality data from the beginning. To make the necessary improvements, we’ll also need to understand how this data was created. Which, by extension, means we’ll have to change our working methods, as experimental metadata will need to be at the centre of all future AI strategies.
For AI to truly transform our industry, it will have to cover the end-to-end process: help biologists pick the best experiment possible, run it, then analyse the full span of experimental data and metadata, then use that data to decide on the next experiment.
To realize AI’s potential, we must lead the charge
Beyond the sheer amount of data, it's the depth of context that really counts. Approaches to biological data from companies like Recursion and Insitro are already ahead of the curve —especially when it comes to AI. They have comprehensive, automated platforms curated for a fully digitalized, methodical exploration of biological systems. With consistent generation of high-quality, multidimensional data, plus comprehensive metadata, they offer a snapshot into the future.
The approaches of these pioneers hold promise for the pharmaceutical industry and for biology as a whole. It will be this data that forms the basis for AI, and how we use it to transform how we interact with and understand biology.