Basecamp lifts veil on Trillion Gene Atlas genomics push
Basecamp Research has launched a new scientific initiative, called the Trillion Gene Atlas, to bring together genomic data from millions of species and use it to guide and scale AI-powered design of new therapeutics.
The project has started in collaboration with AI company Anthropic, '$100 genome' company Ultima Genomics, and gene sequencing specialist PacBio and, according to the partners, will "expand known evolutionary genetic diversity 100-fold by collecting genomic data from more than 100 million species across thousands of sites worldwide."
In a statement, the companies said the project is "on the scale of the Human Genome Project" and its ultimate aim is to "provide the vast, diverse training data required for AI systems to learn from evolution to design new medicines on demand."
NVIDIA is providing the AI infrastructure to power the initiative, which has set a two-year timeframe for the biological data gathering and analysis that will create the atlas. Current gene sequence-based foundation models rely on variants of the public repositories, with 80% of them trained on a public database containing fewer than 250 million sequences.
"Today's biological AI models are trained on a narrow slice of life on Earth," said Glen Gowers, co-founder and chief executive of London, UK, and Massachusetts-based Basecamp Research, who presented details of the project at an event in Austin.
"The Trillion Gene Atlas expands the known genetic universe by orders of magnitude beyond what is in public databases," he added. "Training models at this scale establishes a new paradigm for programmable therapeutic design."
The atlas will draw on Basecamp Research's EDEN foundation models, which launched earlier this year with training on more than 10 billion novel genes from over 1 million new species and underpin the company's aiPGITM platform, designed for large-scale, programmable, and precise gene insertion into the human genome.
The company has suggested that the EDEN models offer a new way to carry out tasks like replacing faulty genes and reprogramming cells for therapeutic applications in areas like cancer and genetic disorders, and reckons the Trillion Gene Atlas will build on this approach by expanding the breadth and contextual depth of genomic data suitable for AI training.
As part of the Atlas launch, Basecamp Research is announcing new partnerships in Chile, Argentina, and an expanded collaboration in Antarctica, to extend its network of scientific collaborators across 31 countries.
