AI and the Big Data paradigm – big ambitions in novel drug discovery

Over the past few decades, data generation has veritably exploded. However, the ‘Big Data paradigm’ is not so much concerned with the volume of that data, but how businesses and, indeed, industries can derive meaningful insights from what has become a glut of information.

With the currently popular approach to artificial intelligence (AI) focussing on the Big Data paradigm, also, pharmaphorum spoke with Adityo Prakash, CEO of Verseon, about the whys and wherefores, delving deeper into the processes for dealing with the current mountain of data and how it can be generated, as well as the purposes for which it can be dealt with constructively, and efficiently.

Data generation and chess

“The fundamental underlying assumption is that an enormous amount of data is available to teach an AI programme how to handle the problem at hand,” Prakash began. However, he explained, “the number of known examples to train AI is at least many thousands of times larger than the number of variables or features to be tracked.”

“These training examples can be either real-world data or data synthetically generated by computer software,” he continued. “AI chatbots use real-world examples – the text of hundreds of millions of web pages […] Of course, for every real-world problem that has large, available data sets, there are dozens of others that have only limited training data.”

This mention of ‘real-world’ is, of course, an on-point concern at present within the pharma industry, ongoing conversations seeking to ensure that the data being collected is of use because it is such, i.e., real and not ignorant of the variables of day-to-day patient life.

Comparing AlphaZero, the chess program developed by Google’s DeepMind, to the training examples mentioned, Prakash said that, by contrast, AlphaZero “relies on self-generated synthetic data. The system plays millions of games against another instance of itself, generating example data from each match.”

A slow journey to drug discovery

Asked about the impetus behind Verseon’s work in the sector, Prakash explained that he began the company “to change the way the world finds new drugs”. Not an unambitious goal.

“Today’s pharmaceutical drug discovery process essentially relies on trial and error,” he expanded. “To find potential drug candidates, disease-associated protein targets are tested against a small pool of less than ten million distinct chemotypes synthesised at great effort and expense over the past 100 years. This slow and erratic process fundamentally limits our ability to explore vast numbers of novel drug-like molecules in search of the new treatments we need,” he said.

It might seem like the perfect entry point for AI – but there are caveats to the essential purpose of hopping on the bandwagon.

“The industry now hopes that feeding the same limited experiential data to AI may help tweak existing drug-like compounds slightly faster than before. Unfortunately, using AI in this manner does nothing to help find new drugs for all the diseases we can’t treat today.”

Beating the bottleneck, molecularly

How, then, can such a limiting problem be circumvented? For Prakash, the key is physics-based molecular modelling.

“At Verseon, we’ve developed significant advances in multiple distinct areas of science over the past two decades to overcome [….] bottlenecks in current drug discovery and development,” he said. “On the computer, we design billions of novel drug-like molecules that have never been made before. Using physics-based molecular modelling, we determine whether new molecules will bind to a target protein without having to first make them in the lab.”

The process doesn’t end there, however.

“We then select the most promising candidates, synthesise them in our chemistry lab, advance them through extensive biochemical testing, and optimise them using AI. Through this process, we systematically generate multiple clinical candidates with unique therapeutic profiles,” Prakash explained. “As multiple candidates advance through the clinic, our adaptive clinical-trial AI helps segment patient populations and personalise therapies.”

From Big Data to small

As far back as 2020 – this is a fast-moving sector – computer scientist and technology entrepreneur, and co-founder and head of Google Brain, Andrew Ng was talking to Forbes about Big Data and small data problems, saying that, “The number of teams that can build good AI systems from big data […] is in the hundreds, but a much smaller number can build good AI with small data.”

So it was that, Verseon having acquired Edammo for the purposes of solving the ‘small data problem’, pharmaphorum asked Prakash to provide further detail on what his company hoped to achieve by such a move.

“Most AI systems require massive amounts of dense data to make accurate predictions,” he said. “But in fields like the life sciences – especially drug development and clinical trials – the amount of data is small and sparsely distributed, compared to the number of variables or features an AI model must track.”

This is where problems arise.

“In these settings, traditional Big-Data AI systems struggle to produce accurate results – if they can even create a predictive model from the thinly available data at all,” Prakash continued. “Verseon has continued to develop its own specialised AI tools internally to handle these situations [but] also kept an eye on external developments and found that Edammo’s Extreme AutoML technology performs particularly well in a variety of life sciences tasks.”

In short, what Edammo brings to Verseon’s targets, is efficiency:

“As we collect data on our novel compounds through laboratory tests and clinical trials, Edammo’s technology will help generate insights from the data more efficiently. These insights will help us bring to market several treatment alternatives for each disease we address and offer options to patients who currently have none,” Prakash explained. “We expect that the analyses performed by Edammo’s AI will help us to personalise disease treatments to a degree not [yet] possible today.”

Bucking the compound library trend, and the future

Asked whether such technology had been used across the company’s product pipeline, Prakash’s response was highly affirmative.

“To date, all of the candidates in Verseon’s pipeline have been developed using our […] platform. Unlike other current companies that tout so-called ‘AI-first’ or ‘AI-only’ approaches,” Prakash claimed, “we do not depend on university labs or industry partners to find candidates for us. The drug candidates we find are vastly different from the compound libraries used by the rest of our industry.”

And as for future AI drug discovery, it seems the medicinal horizon will become ever more intertwined with this technology – at least for Verseon.

“[Using the platform,] we expect the growth of our pipeline to accelerate over the next several years as we target an ever-increasing number of health conditions that impact the quality of our lives. We’re not only changing the efficiency with which drugs can be discovered, but also changing what people can expect from 21st-century medicine.”

About the interviewee

Adityo Prakash is CEO of Verseon. He enjoys building fundamental science-based solutions to major business problems that impact society. Prakash has led the development of Verseon’s drug discovery platform, novel drug pipeline, and overall business strategy. Previously, he was the CEO of Pulsent Corporation. Prakash received his BSc in Physics and Mathematics from Caltech.