Basecamp Research unveils improved protein structure model

Visual comparison of the difference in structural prediction performance of AlphaFold2 (orange) against BaseFold (cyan)

Basecamp Research says it has augmented the AlphaFold2 artificial intelligence tool for predicting protein structures to create a platform that is more accurate and better at modelling small-molecule interactions.

The new model, called BaseFold, enables more reliable 3D structure predictions for larger and more complex proteins and, according to the company, is “poised to greatly accelerate AI-based drug discovery efforts.”

Google DeepMind and EMBL’s freely available AlphaFold2 has had a big impact on research into the role of proteins in disease since its launch in 2021 and is already being used to develop new medicines that can bind to problem proteins more effectively.

Basecamp says that BaseFold builds on that model, which predicts the 3D structure of a protein based on its amino acid sequence, by layering in a foundational dataset (BaseGraph) that considers how proteins behave in the ‘real world’, i.e., in the context of more than six billion chemical, biological, and metagenomic relationships.

The result is a deep learning model that improves the accuracy of AlphaFold2’s predicted structures by up to six times, and delivers a three-fold improvement in modelling accuracy for small molecule interactions with protein targets, according to Basecamp, which has just published a paper on BaseFold on the preprint server.

“We have redesigned and rebuilt the entire data acquisition process, making us the first team ever to collect and annotate biodiversity data with the same quality as human clinical genetic data – all purpose-built for the AI era,” said Dr Phil Lorenz, Basecamp’s chief technology officer.

There are other augmented versions of AlphaFold2 available, including CollabFold, ESMFold, OpenFold, and RoseTTAFold. Basecamp says, however, that their performance relies on “public protein databases that are widely seen as unfit for biotech’s AI era,” as they rely on proteins from lab organisms - representing less than 0.000001% of life on Earth.

For example, AlphaFold2 draws on the public MGnify database, which Basecamp claims has issues with incomplete sequences that make it hard to predict the structure of larger proteins.

“AlphaFold is one of the most useful AI tools in drug discovery, and for good reason. It enables researchers to better predict how medicines may interact with proteins in the body, shaving off years of work,” said Basecamp co-founder Dr Glen Gowers.

“However, AlphaFold still has significant room for improvement – particularly when being used to predict large, complex, and underrepresented proteins, which are often the most critical for the development of new therapeutics,” he added. “Even just a few percentage points of error can have major implications in accurately predicting protein-molecule interactions.”

Alongside the publication and launch, Basecamp also announced a collaboration with NVIDIA to include BaseFold within NVIDIA BioNeMo, the tech company’s AI platform for drug discovery.