Google's DeepMind puts human proteome online for free

News
AlphaFold

The most complete database of protein structures ever assembled, developed with the help of Google's artificial intelligence unit DeepMind, has been made freely available to researchers around the world.

DeepMind partnered with the European Molecular Biology Laboratory (EMBL) to come up with the AlphaFold database, which predicts the three-dimensional structures of the human proteome – nearly all (98.5%) of the 20,000 or so proteins expressed by the human genome.

AlphaFold has doubled the number of protein structures known to research, and that could help accelerate research into how diseases affect the body, and develop new medicines that can latch on to problem proteins effectively.

It is one of the most significant contributions that AI has made to the advancement of science, according to its developers, although they acknowledge that some of the predictions will still need to be validated in experimental testing. Around a third of the structures are considered to be detailed and precise enough to allow drug design.

The team has also used the AI to model hundreds of thousands of proteins from other species, including 20 model organisms used in research like the mouse, nematode worm and fruit fly, and human pathogens like the malaria parasite.

[caption id="attachment_82423" align="alignleft" width="180"] Demis Hassabis[/caption]

DeepMind's founder and chief executive Demis Hassabis said that AlphaFold solves one of the biggest problems in biology, namely what shape a protein take in the body, and if that can be predicted simply from the amino acid sequence of the gene coding for it.

He said the database has applications in understanding the fundamental mechanisms of life, drug design, and other applications like creating designer proteins with different functions. The source code behind the latest version of AlphaFold was released a few days before the paper describing the proteome database was published.

"Our goal at DeepMind has always been to build AI and then use it as a tool to help accelerate the pace of scientific discovery itself, thereby advancing our understanding of the world around us," he said.

"In the coming months we plan to vastly expand the coverage to almost every sequenced protein known to science – over 100 million structures," he added.

The decision to make the database freely available is to make sure that AlphaFold has the greatest scientific and societal impact possible, according to Hassabis.

AlphaFold is already being used by early partners researching neglected diseases, studying antibiotic resistance, recycling single-use plastics, and understanding the biology of the COVID-19 virus SARS-CoV-2.

EMBL director general Edith Heard said that AlphaFold "was trained using data from public resources built by the scientific community so it makes sense for its predictions to be public."

She added: "I believe that AlphaFold is truly a revolution for the life sciences, just as genomics was several decades ago."