Open science: disrupting scientific community collaboration
According to the Association of British Pharmaceutical Industry (ABPI), the number of clinical trials started in the UK each year fell by 41% between 2017 and 2021, suggesting a significant problem when it comes to the discovery of new therapeutic areas and pathways. With the NHS struggling to reduce waiting lists and signs that big pharmaceutical companies may be reducing their activity in the UK, the quiet rise of collaboration among pharma companies has never been more important as a driving force behind future discovery and innovation.
One of the biggest challenges faced by data scientists and bioinformaticians is their ability to process large data sets across multiple infrastructures, often downloading reams of research and pipeline data that is frequently siloed. In a highly regulated industry, processing and analysing this data requires sophisticated data management and analysis tools, as well as powerful computing resources, which are not always time or cost effective. This is also frequently compounded by the fact that research results may not be immediately reproducible and usable for future research. In fact, more than 70% of computational research is non-reproducible, according to analysis by the National Academy of Sciences.
In attempting to solve these data analysis challenges, there is a rapidly expanding need for access to scalable and flexible tools that can help streamline such processes, ensuring that each workflow is viable and reproducible, thus enabling faster and more efficient data research. In a paradigm shift for this type of analysis, we are increasingly seeing how such tools and platforms are quickly revolutionising how Big Pharma companies process big datasets and large-scale pipelines in the search for new vaccines and diagnostics.
Industry data-sharing trends
There has been a steady rise in the move towards an ‘open science’ approach to bioinformatics and genomics in particular. Whilst collaboration in this sector is nothing new, as the demand for personalisation of medicine grows, being able to build scalable and repeatable tools which point to potential cures based on an individual’s genomic data is driving the current market. However, this type of research and treatment is hugely expensive; hence, why saving both money and time by being able to access data shared by other scientists working in the same areas allows for more innovation to be made – and faster.
We are seeing this collaboration firsthand through our platform and how it has helped to speed up productivity and efficiency for pharmaceutical and biotech researchers, especially those specialising in genomics, global health, and beyond. The UK’s 100,000 Genomes Project, which offers whole genomics sequencing as standard to those with rare diseases or cancer, is one such example of how recent innovations in this space are revolutionising the way healthcare is administered. This method of diagnosis can save hundreds of thousands of pounds, adding to an already growing data bank where scientists can collate how and why these diseases occur.
In these cases, the impetus is no longer driven by one or two companies, but rather by many players, requiring collaboration at an unprecedented scale. Smaller, more agile biotechs have far higher technological capabilities, but lack the access to vast datasets Big Pharma has acquired over the years. Therefore, a greater ability to combine highly advanced research and analysis capability with large data sets may hold the key to unlocking potential at this stage of a pipeline process, thereby bringing new therapies and diagnostics to the market much earlier. This idea, centred on democratising access to scientific data, is certainly gaining traction as the new standard in the industry.
The future of cloud and collaboration
Overall, efficient data orchestration that facilitates global collaboration has the potential to help solve some of the biggest issues in healthcare. The success of COVID-19 track-and-trace systems, which resulted in variant-specific vaccines, is a best-practice instance of how organisations around the world were able to monitor the spread of the virus and bring viable pharmaceutical products to the market in record time. This allowed for a robust response at a time where it was greatly needed and likely saved many lives. It helps to emphasise the importance of collaboration, through using new technologies in speeding up how we react to some of the planet’s most pressing issues.
As a discipline, bioinformatics has always been quite an open practice, with scientists designing software tools to understand biological data – such as whether a biopsy of a tumour is cancerous – and sharing it to benefit a wider array of global researchers. Traditionally, much of this knowledge-sharing would occur on forums such as GitHub, and recent innovations in this industry mean scientists are coming to understand the benefit of using scalable pipelines to access this crucial data.
As advances in cloud infrastructure, AI, and machine learning continue to shape this landscape, it is likely we will see this movement towards collaboration and knowledge-sharing being proliferated across multiple industries. Use of the cloud has allowed companies to freely access, modify, and distribute data effectively and hence has grown popular in the bioinformatics space, as demand for personalised medicines and therapies grows.
In the long term, being able to free up resources previously spent building such infrastructures can help speed up the process and reduce the cost of innovation at a time when healthcare services need it most. Now, more than ever, these advancements will greatly impact patient outcomes, as well as healthcare systems’ ability to rebuild and reinforce the strength of their services, and to tackle some of the problems they face post-pandemic.
About the author
Evan Floden is CEO and co-founder of Seqera Labs and the open-source project Nextflow. He holds a Doctorate in Biomedicine from Universitat Pompeu Fabra (ES) for the large-scale deployment of analyses and is the author of 14 peer-reviewed articles.