UK Biobank releases largest-ever genome sequencing dataset
After five years of painstaking lab work, more than 350,000 hours of genome sequencing and £200 million ($253 million) in investment, UK Biobank has completed and released the full sequencing data from its 500,000 volunteers, which should become a rich resource for the discovery of new diagnostics and therapeutics.
The genomic data drop is said to be the largest that has ever been made available for research purposes and is being offered to scientists worldwide – providing they plan to use it for appropriate studies.
Crucially, the genome data is accompanied by 15 years of information from the patient cohort on lifestyle, whole-body imaging scans, biological samples and other health information that can be harnessed for research studies and, according to some, makes it one of the most important scientific assets held by the UK. The entire dataset has been anonymised to protect patient privacy, according to UK Biobank.
Around 30,000 researchers are already registered to access the data, which started to be released in 2012. It has already been used to find genes associated with obesity and type 2 diabetes, to identify individuals at high risk of heart disease and some forms of cancer, and uncover a link between physical activity and Parkinson’s disease that could allow the illness to be predicted years in advance using smartwatch data.
Last month, a public-private partnership revealed how they used the resource to generate a map of the interactions between genes and proteins, around 80% of which had never been described before.
“This is a veritable treasure trove for approved scientists undertaking health research, and I expect it to have transformative results for diagnoses, treatments, and cures around the globe,” said Professor Sir Rory Collins, principal investigator at UK Biobank.
“The sheer amount of genetic data is exceptional – it is twice as much as anywhere else – but UK Biobank’s data is so illuminating because we’ve been able to follow the health of our brilliant volunteers for around 15 years,” he added.
The organisation says the dataset can allow for more targeted drug discovery and development – pointing out that therapeutics developed based on evidence from human genetics are twice as likely to be approved for clinical use – as well as helping to understand the biological mechanisms behind diseases; for example, through the discovery of variants in non-coding DNA that may play a role.
Commenting on the announcement, Cheryl Moore, chief research programmes officer at the Wellcome Trust, which is one of the founders of UK Biobank, said it will be invaluable for “early-career researchers and those in low-and middle-income countries, in turn offering huge potential to unlock new discoveries and enhance our understanding of health to improve lives around the world.”
Pharma companies Amgen, AstraZeneca, GSK, and Johnson & Johnson provided funding for the project, along with Wellcome and UK Research and Innovation (UKRI). In return for the investment, UK Biobank granted nine months’ exclusive data access to industry members of the consortium, but from today other researchers can now access the data using a cloud-based analysis platform.
The four drugmakers have pledged to publicly share the summary statistical analyses from the collaboration, including genome-wide association results, with the research community, saving them the costly and time-consuming burden of analysing the raw data.
One acknowledged limitation of the UK Biobank is a predominance of people with European heritage, who tend to be healthier and more affluent than the overall UK population, and there are relatively few people of African and Asian descent.
A US take on the biobank, meanwhile, is the All of Us project launched by the National Institutes of Health (NIH) in 2018. That is more ethnically diverse and will eventually have data on a million people. So far, sequencing has only been completed for around a quarter of that total. It also lacks the close integration of other health data that is core to the UK Biobank.