OpenBind unveils its first AI model for drug discovery

A research consortium that aims to make the UK a leader in AI-driven drug discovery, OpenBind, has hit its first major milestone with the publication of an experimental dataset and predictive AI model.

First announced last year, OpenBind has been set up to generate what aims to become the world's largest collection of data on how drugs interact with proteins, 20 times larger than any other initiative in recent decades. The database will be used to support the training of AI models that can identify promising new drugs.

The first fruits of the programme are detailed X-ray images of 699 compounds binding to a protein in EV-A71 enterovirus, an organism associated with mild cases of a common childhood infection known as hand, foot, and mouth disease (HFMD).

The OpenBind team have also generated binding strength measurements for 601 of the compounds, saying that it is already one of the largest public datasets for a single protein target, as well as an accompanying EV-A71 2A protease target-specific AI model that is being made available to researchers as a basis for developing and testing new computational approaches.

"This first release is an important step because it shows we can now generate high-quality, standardised data at scale, specifically designed for AI in drug discovery," commented Prof Charlotte Deane, professor of structural bioinformatics at the University of Oxford and a senior OpenBind investigator.

"As the dataset grows, it will give researchers the kind of consistent, reliable information needed to improve how these models perform," she added.

A new general predictive model, OpenBind v1, is scheduled for release around the end of the month.

Co-founded by the University of Oxford and Diamond Light Source – the UK's national synchrotron facility at the Harwell Science Campus in Oxfordshire – the consortium also includes scientists from Columbia University, Memorial Sloan Kettering Cancer Center, the Open Molecular Software Foundation, and the University of Washington, alongside industry partners such as London startup Isomorphic Labs.

Even the most advanced AI systems used in structural biology and drug discovery, such as Google DeepMind's AlphaFold and Recursion's Boltz, are limited by the data they are trained on, according to OpenBind. While they can model biological structures similar to those in their training data, predicting new targets that look significantly different remains a challenge.

OpenBind was set up with £8 million of investment from the UK Department for Science, Innovation and Technology's recently established Sovereign AI fund.

"High-quality experimental data is essential for developing new and improved AI models," commented Dr Fergus Imrie, associate professor in the statistics department at Oxford University and an OpenBind computational researcher.

"As AI performance improves, this in turn helps guide future experiments, helping to accelerate discovery," he added. "The lessons from these early cycles are already helping us improve the speed, consistency and reproducibility of the pipeline, which will be critical as OpenBind grows."