Hybrid and multi-cloud: The next step for biotech to secure flexible, scalable scientific workflows
An unmissable trend for those working in biotech research over the past decade has been the transition to cloud computing. The appeal of highly resilient, super-fast, scalable, and secure technical infrastructure is evident for companies working with such extensive data sets, and the growth in informatics-led drug discovery is, in many ways, testament to the innovations cloud computing has unlocked.
Such have been the requirements posed by NGS data and the potency of cloud infrastructure, that the life sciences sector has transitioned its batch analysis processes faster than many other industries that rely on scientific computing. Major pharmaceutical companies have already paved the way, turning to cloud environments to speed up drug discovery. A trend is more pronounced when considering those companies that work with genomics data.
Seqera’s own State of the Workflow Community survey showed public cloud use by bioinformatic researchers rose 20% in last year alone, with 70% indicating that they plan to migrate workloads soon. Research scientists are on board with cloud computing and have been for some time. What we have seen less of is R&D teams making the most out of the technology across the board and truly reaping the promised benefits when it comes to flexible research processes and scalable infrastructure.
The reasons for this are plentiful. Biotech and biopharma organisations often lack cost transparency across their existing processes, and unpredictable and spikey workloads make it difficult to know whether a cloud platform is the most cost-effective option. The flexibility required to deal with this is not hard to achieve, but it will involve a mindset shift for many working in research organisations – from one which sees cloud computing as the great consolidator of research workflows, to one which embraces a hybrid model or multi-cloud future and the flexible software solutions that come with it.
Embracing complexity
Cloud computing has already made significant strides when it comes to enhancing agility and dismantling data silos prevalent in biopharma research. But the reason why consolidating on a single cloud provider is often not the preferred route for a biotech speaks to their dynamic nature.
Firstly, research-led organisations are complex and changeable entities. Decision-making often occurs across a range of research units, and acquisitions change the range of work being completed within a portfolio.
Research organisations conduct collaborative, alliance-led work. Using partnerships to plug gaps in expertise or utilise the specialist knowledge that some companies possess has been a key driver for growth for many of the largest pharmaceutical and biotech companies.
Whilst this dynamism increases the complexity of work a biopharma company can take on, it also leads to difficulties to manage data pipelines through a single cloud provider. This internal complexity can also limit the flexibility of individual research arms running workloads at scale in the cloud. This is because the increasing size of a cloud data store in one end of a research organisation leads to data gravity – a phenomenon whereby researchers move their data stores to the existing cloud data source to take advantage of the cloud’s ability to scale quickly and deal with performance issues. However, this data gravity only promotes the problem of large data store consolidated across numerous vendors, at different ends of an organisation. It can be highly expensive to move data between storage environments, compounding cost issues and de-incentivising consolidation.
If you add in the commercial incentives that cloud providers offer to ensure their customers remain tethered to their platforms, we start to get a sense of why so many research organisations are running data stores across multiple cloud providers.
But why is this an issue? Does it even have to be? Sure, there is the cost issue to contend with, which occurs when there are underused or over-provisioned resources across cloud systems, at a time when funding is limited and as companies are actively looking to streamline their cloud workloads. But often trying to consolidate onto a single cloud vendor can exacerbate that issue – vendor lock-in limits flexibility long term, as well as hindering an organisation’s capacity to efficiently allocate resources or foster collaboration across different teams. Countering these challenges requires a mindset shift. From one that sees too many cloud providers as a complication, to one that is at least open to embracing the multi-cloud future many are already a part of. Doing that requires having software solutions that empower that sort of flexibility.
Research tools that empower flexible pipelines
Firstly, a key reason many researchers lack flexible data pipelines is due to friction between software and the cloud system. To reduce friction between existing research software and a cloud-based service, organisations should always use pipeline infrastructure that is platforms agnostic.
Second, organisations must ensure all data workflows used are portable across compute environments. This can promote maximum flexibility for researchers. Some research tools available on the market are now built with an abstraction layer between pipeline and the underlying compute environment. This means that when it comes to runtime, the compute environment is selectable, and the pipeline can be run anywhere without the need for modification. This empowers researchers across different arms to work and share knowledge, and not be inhibited by siloed data.
Finally, ensuring research infrastructure is built with collaboration in mind can help an organisation get the most out of the cloud’s flexibility and scalability. Ensuring research tools are positioned in sharable compute environments, that provision storage flexibly as per researcher need, can go a long way to improving flexibility across a research organisation. This, coupled with research tools that come with a common user experience, can all help researchers navigate pipelines regardless of the underlying cloud platform.
What all of this speaks to is the need for flexible research tools to go hand in hand with a more dynamic approach to the cloud. Embracing the mutli-cloud environment many biotechs find themselves in, or actively pursuing hybrid models, does not have to come with added complexity and cost. If anything, the opposite is true. When the workflow tools and pipeline software deployed are sufficiently adaptable, navigating variable workloads or collaborating with new knowledge centres becomes easy. And this is where innovation is unlocked.