Talking techbio with NVIDIA: Accelerated computing, NLP, and GenAI in drug discovery


As Genentech and NVIDIA enter into a multi-year strategic AI research collaboration to accelerate drug discovery and development, web editor Nicole Raleigh revisits a recent discussion with NVIDIA’s vice president and general manager of healthcare, Kimberly Powell, while she was in London – a conversation in which they explored accelerated computing, natural language processing (NLP) and generative AI, and protein structures and genomics.

With the newly announced collaboration, the companies will optimise Genentech’s proprietary machine learning (ML) algorithms and models on NVIDIA DGX Cloud, providing a training-as-service platform built on NVIDIA AI supercomputing and software, including NVIDIA BioNeMo for generative AI (GenAI) applications in drug discovery. Additionally, the collaboration will also help accelerate Genentech’s “lab in a loop”, where extensive experimental data feeds computational models that uncover patterns and make new, experimentally testable predictions.

A blended journey of technology and healthcare

For Powell, her own journey went from studying electrical and computer engineering, to an internship, to being hired into a diagnostic imaging medical device company.

“I got exposed to both technology and healthcare,” Powell explained. “In that work, I actually contacted NVIDIA to help build part of the solution. As a tiny little 60-person company, NVIDIA was just very receptive. We formed a great relationship and then NVIDIA recruited me over in about 2008, so 15 years ago, to start the healthcare practice.”

Needless to say, the company has undergone some transformation since then.

“It was at a very interesting time for NVIDIA when we were transitioning from what we're known for, which is a computer graphics company and the inventor of the GPU, into an accelerated computing company,” said Powell. “We were realising that compute needs were going to continue to increase at exponential rates, and your typical architectures were not going to be able to keep up with that, so we needed to invent a new computing model, and that was accelerated computing.”

Indeed, some of the earliest use cases of accelerated computing came out of the medical field. In particular, imaging.

“If you think about all of the diagnostic imaging devices out there - CT, MR, ultrasound - these are really sensors,” Powell explained. “They were improving the sensor technology so much that they were generating a lot of data, and they needed to do a lot of computational mathematics on the back end of that. To do it on your typical computers at the time, you'd have to build big data centres for each CT scanner [and] accelerated computing was really the ability that allowed for new algorithms like iterative reconstruction to make its way to CT.”

Imaging was one of the areas that was a key application for accelerated computing at NVIDIA, and it still remains so today because imaging is, in Powell’s words, “the lifeblood of the patient journey.”

“It's everything from screening, all the way through to diagnosis and treatment and post-treatment,” she said. “We continue to innovate in that area. As things evolved, more applications in healthcare were discovering that accelerated computing could help them do new and amazing things.”

Indeed, it wasn’t just a case of CT scans and imaging.

“Another area came out of the pharmaceutical industry [and] trying to simulate molecular behaviour and molecular modelling,” said Powell. “Molecular dynamics simulation was another really hard application that required a lot of computing. They could do it at longer timescales to […] get insights into the chemical process.”

The AI revolution and genomics

Genomics was another extremely important application area and there is an excitement around NVIDIA’s work in genomics in partnership with Oxford Nanopore.

“In this era of accelerated computing, a revolution was happening in that artificial intelligence had discovered computer vision and AI and deep learning had discovered that accelerated computing was the perfect architecture to build new neural networks and improve their capabilities.
At that time, you could, again, see all of the amazing critical applications that AI could have in healthcare,” she said.

So it was that NVIDIA decided to start building specific computing platforms that leverage both accelerated computing and AI for the healthcare industry.

“Our platform is called NVIDIA Clara. It's after Clara Barton. She's the inventor of the American Red Cross,” Powell explained. “It's meant to be a platform to really lift the whole industry and provide [it with] these capabilities. Since then, we've been in earnest building amazing solutions for medical imaging AI. We have a platform for that, for doing accelerated and AI genomics analysis. We have all of the high-performance computing for scientific computing around quantum chemistry, molecular dynamics, and even quantum computing.”

DeepMind, AlphaFold, and structural biology

And then came AlphaFold. The state-of-the-art AI system developed by DeepMind in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI) for computational prediction of protein structures. Over 200 million protein structure predictions by AlphaFold have since been released that are freely and openly available to the global scientific community.

“[It’s] leveraging all this generative AI capability that's been being built up over the years,” commented Powell. “[Take] ChatGPT, which I'm sure you know, GPT stands for generative pre-trained transformer. This is an AI model architecture […] What we've done is we've essentially vetted a bunch of natural language and, these AI applications, they can essentially encode all of that knowledge. For somebody who's not super technically familiar with it, you're injecting a bunch of information into a brain, and that brain continues to get larger and larger and larger. It has now the ability, as you've experienced with ChatGPT, to be able to reason over that data and to generate information from that data, to synthesise that data.”

“AlphaFold was a moment in time in structural biology where they used the same technique,” she continued. “They used an amino acid sequence of proteins. A very critical stage in drug discovery is, ‘What is the structure of my protein?’, because what you're really trying to understand is the structure, so you can put something in it and stop its behaviour. That is, it's a lock and a key kind of a thing. The protein is the biology, and when it misfolds, folds in a certain way, you're trying to stop that misfolding from happening by introducing another molecule in the middle of it.”

Structural biology, of course, has been a core part of all drug discovery for many decades.

“What AlphaFold was able to do is train a model on all these amino acid sequences and infer the structure,” said Powell. “The historical way to do that is actually through a pretty arduous process using an imaging instrument called cryoelectron microscopy. You have to cryogenically freeze the protein. You have to take all these incredible images. You then have to reconstruct all the images. Sometimes, you couldn't even get the imaging to give you the structure, and it could take up to a year.”

“To be able to now go from sequence to structure just through an AI query was a tremendous breakthrough,” she enthused. “It really illuminated to the world that not only are we able to use this in natural language, but this is going to be incredibly useful for all of biology and chemistry in fact. Our DNA is a sequence of characters of As, Ts, Cs, and Gs. It's just three billion letters long. Proteins are amino acid sequences, and those are 20 different characters that are in a sequence of 20,000 to hundreds of thousands long. Even chemistry is represented as a language. They call it SMILES, and they're just these different characters that can represent the structure of a chemical. This was very fortuitous that the world has made the invention of being able to use natural language and generate these models.”

Evolving drug discovery with a techbio ecosystem

To put it mildly, it’s pretty mind-blowing stuff.

“The number of potential chemicals in the world is 10 to the 16th: it's more particles than in the universe,” exclaimed Powell. “That's the problem in drug discovery, right? You have this protein, and you have to find that one in that gigantic chemical space that is going to have the right effect on that protein. For proteins, they're also being used as therapeutics, mRNA, biologics... There are 10 to 160 potential proteins that could be used as a therapeutic. It's infinitely large, this problem space. What is magical about these models is they're going to be able to, and they are already, inventing proteins that nature has never seen before. That 10 to the 160 doesn't mean that we've evolved and we've seen every single one of those. That is not the case, because evolution is quite efficient. If something works, great, move on.”

However, in the case of drug discovery, those other potentials are very much sought.

“We’re going to want to invent something that the world's never seen before [and] these artificial intelligence models can generate those ideas,” stated Powell. “This is what's just tremendously powerful and part of the NVIDIA Clara platform is we're giving the world the capability to not only train these models, but also run them at scale. We work with the academic community, all of the latest and greatest research, and we productise it. We invent our own AI models, and they're being implemented inside of the drug discovery ecosystem.”

“There's a brand new ecosystem that has evolved in the last couple of years that is called the techbio ecosystem, not the biotech,” she continued. “It's the opposite on purpose in that the realisation that, on the relatively cheap now, we can generate massive amounts of biological data. We have a data solution. We have the AI methods, GPT, and we have NVIDIA's large-scale computing platform. We have this perfect storm, if you will, of the three ingredients to really push the capabilities of AI in a given field.”

“The first field was in computer vision because we had all the data from our phones and taking pictures and cats and all that jazz. We were building to this point, and then we had all the language. We had the internet. We had every book that's been written. We had all the doctor's notes we've been putting into digital format for the last couple of decades. Natural language was the next one, and here we are at the forefront of biology because we have all the ability to generate the data now. We have the methods, and we have the compute platform,” Powell explained.

The future of the field

AlphaFold, nevertheless, was but a piece of the puzzle.

“Now, there are new AI models that are coming out of the literature as we speak: it's an absolute exponential curve in the publications of using generative AI in biology,” said Powell. “The tech bios, they are building their own AI methods, and they are validating what these AIs are discovering in the lab in some very formidable ways. We've had some news a while back with a company called Evozyme. They're a protein engineering company. They took AlphaFold maybe even a step further, which is, ‘I know the sequence, what is its function?’ Protein is really sequence, structure, then determine its function. They went from sequence to function, and they have a unique capability to make the protein and measure that function in the lab. They discovered a protein never seen before. It had the properties of the protein family they were trying to create, the functional properties, and they measured it in the lab. This is truly protein engineering.”

“The other method is called directed evolution, where you're taking a sequence and you're changing out one of the amino acids, and then you have to go make it and measure it, and then you do that again and again, amino acid by amino acid,” she continued. “It won the Nobel Prize, but it's also a very arduous process [but] you can now enhance the notion of directed evolution through AI. We're really seeing this have early indicators of future success. Some of these techbio companies… Exscientia is one here in the UK to be so proud of. They have six drugs in clinical phase now that have been discovered on their platform. We have a lot of proof. Insilico Medicine is another company that very well documented the process of using generative methods completely to not only discover a novel target, they also discovered a novel small molecule for lung fibrosis that is now in phase two clinical trials. They were able to do that discovery in 30 months for $2 million versus five years and $200 million. It's almost unbelievable. This is really encouraging. This is a huge area of investment.”

Certainly, NVIDIA has been deeply situated in the field.

“We did early work with AstraZeneca in 2020, when AlphaFold also just came out and GPT-3 just came out,” Powell explained. “We worked on building a generative model for chemistry. We really pioneered it back then and now we're enhancing it with being able to work with DNA and RNA and all the protein sequences. If pharma companies have decades worth of their own proprietary data, we're giving them the capability to train up their own models that are unique and very high value to them, as well as all the academic community.”

“We're turning all of these models into cloud APIs, so they're very simple to call,” she continued. “Just literally through a website, you can enter an amino acid sequence, press generate, and back comes a structure. To me, what's going to be so transformational is, essentially, a lot of biologists [becoming] a computer scientist and even a lot of computer scientists [becoming] a biologist because we made it so accessible and so easy to build these models and to deploy these models.”

Towards the holy grail of precision medicine

The lab, however, will always be in the loop, including in the new collaboration with Genentech.

“This is becoming the trend now. It makes a lot of sense to have a dry lab and wet lab right next to each other,” said Powell. “The dry lab can think outside the box. It can generate new ideas. It can search what is a nearly infinite space, but do it with intelligence and do it much more effectively. Then, put it in the lab, observe it, take all that data, and put it back into your model.”

These are the things that NVIDIA is working with industry on, enabling across academia, start-ups, and large pharma. And the excitement for future is palpable.

“If you can reduce the time it takes to discover a drug to days or hours, and it also costs maybe dollars, that's when we're truly going to be able to present therapies that are tailored to you or to me individually: that's the holy grail,” said Powell. “You can see it coming into the sights now a bit because we're able to improve so much on the processes. I think that's a dream, but I think you're starting to see it come into view a bit with these methodologies and, being able to use, almost creating digital twin of me, just like the internet.”