Leveraging the FAIR principles of data in pharma

The new FAIR guidelines aim to help pharma realise the full potential of data.

As the amount of data available to pharmaceutical companies continues to accelerate at a rapid pace, it is becoming increasingly difficult to manage. Data is often gathered from numerous sources and exists in many different silos, making it extremely challenging to integrate and use effectively. It is a generally accepted notion that data is the underpinning factor to enable digital transformation, but only if it is managed efficiently so that useful insights can be harnessed quickly and accurately.

Data scientists within the pharma industry all too often have to navigate through multiple data repositories containing information that exists in multiple formats, with no clarity on whether that data is the most up-to-date and relevant. This frequently leads to out-of-date data being used, which causes inaccurate reporting and can have critical implications when it comes to areas such as developing new drugs.

To help combat the data management issues facing not only pharma organisations but across all industries where data analysis is key, the FAIR Data Principles were established to provide guidance on how data should be managed most effectively and used to draw insights faster and more accurately. Standing for Findable, Accessible, Interoperable and Re-usable, it is by embracing these standards that the potential of data can be truly maximised.

If applied correctly, the FAIR Data Principles can advance the digital transformation of the pharma industry, enabling pharma businesses to leverage a plethora of operational efficiencies while also reducing time to market and cutting costs of R&D.

“In a time where regulations are restricting how data can and can’t be used, it is imperative that the correct stakeholders have access to the right data in a timely manner”

Metadata is key to making data ‘Findable’

In order to execute accurate queries on data sets, it’s a prerequisite that the data must be of high quality when it is ingested into a database. This means that from when data is created right up until its most recent update, all metadata tagged to the record of the data is up-to-date and accurate. This includes information on the author, those that have access to the data, who has modified it and a whole series of characteristics such as category, security classification and file type.

Only with accurate metadata in place can intelligent databases filter through vast data sets to retrieve records relevant to queries. Semantic capabilities in such databases can also enhance sets of data, enabling relationships between disparate pieces of data to be tracked and logged for future use.

In addition, having an accurate representation of the data’s provenance, and wrapping data with as much information as possible, ensures that search functionality can be as powerful as possible. Rather than maintaining separate sources for metadata and data, the integration of all information into a single unified view is essential for pharma organisations to truly derive value from their data.

Making data ‘Accessible’ by embracing secure sharing and open standards

In a time where regulations are restricting how data can and can’t be used, it is imperative that the correct stakeholders have access to the right data in a timely manner, while at the same time ensuring those that do not have permission to the data cannot access it under any circumstance.

Reiterating the importance of metadata in this equation is important, however it is also crucial to explore the different types of permissions that can be granted or denied, and how to govern these effectively in real time. In most cases, the majority of users that need to access a data record only require read-only permission, while a separate list of users will need to be authorised to modify it. Furthermore, those users should be the only ones who can create new records and delete obsolete data in certain categories. It’s important that access is partially redacted for certain users to ensure accountability as and when these records are edited.

This is particularly prevalent in the pharma industry, whereby researchers, doctors and nurses will require access to patient records and have varying requirements pertaining to the data itself. In some cases, it’s only necessary to analyse the symptoms, condition, prognosis and/or treatment of a patient, meaning that the personally identifiable details of the patients themselves are blocked from view.

Being able to control access at a granular level is the key to driving collaboration and insights across the enterprise. It is important that unnecessary private information doesn’t fall into the wrong hands, but what is equally important in the pharma industry is that the necessary information is easily accessible to the correct stakeholders. This simultaneously eliminates privacy concerns and enhances and speeds up research and development efforts.

At an interface level, it is important to be able to input and output permission information pertaining to data in a single system, or at least multiple systems that can seamlessly integrate. This means that the data platform used to track this information must embrace open standards and securely integrate data from multiple, disparate sources regardless of structure and language.

Achieving a single source of truth through ‘Interoperable’ data

Data is interoperable when it can be integrated with other data sources into a single, unified view so that other applications can easily utilise the data available. In the pharma industry, this is crucial as numerous different organisations such as clinical practices, research institutes and governmental bodies often need to access the same data, but may all be using different programs, applications and filing systems to view the data itself.

When ontologies (i.e. structural frameworks) are used effectively to categorise data and make it usable by different parties, this ultimately enriches the data and makes it more reliable and accurate. Properties of data can be defined with authority across datasets, which improves the confidence in substantiating facts from the data.

Correctly sorting data by the associated ontology provides users with a centralised definition of data, which leads to a validated data set. Thus, data sets can be easily integrated from disparate sources and separate organisations, for the whole industry’s benefit. Other powerful outcomes of using ontologies include gaining greater insight into the connections within data and even the possibility of inferring non-obvious facts.

By integrating data and making it interoperable, pharma organisations can boost the speed and accuracy of critical insights delivered from their data, but perhaps more importantly ensure that all data used is up-to-date and relevant. No longer will crucial data be lost from smaller facets of the organisation, or the wrong data used in vital circumstances.

Faster R&D through ‘Reusing’ critical data

In the pharma industry the re-use of data is paramount, as research and experimentation rely so heavily on participant information, the manipulation of variables and the findings from previous trials. Such trials are carried out by multiple independent organisations and it is key that data is re-usable to significantly cut down the cost and time of R&D, ultimately shortening the time to market for new and improved drugs.

In order to make this process more streamlined, data must have the capability to be linked. This leads to building up a network of relationships between data points and is particularly useful when it comes to being able to repurpose information and finding new ways to use existing data, perhaps in ways not originally envisioned when that data was created.

Metadata must also be meticulously recorded, and the process used to do this must involve tracking the details of multiple versions of the same record, as and when they are amended. This means that whenever an update is made to a document, the original document is kept and a new version is created with an incremented ID and new timestamp. This makes it possible to view every single change made to a document, who made it and when it happened, so that no critical data is lost during the amendment process.

Data plays a critical role in the pharma industry but put simply, not enough is being done to unlock its true potential. By leveraging data platform technology that can drive the FAIR data principles of Findability, Accessibility, Interoperability and Re-usability into the operational fabric of the enterprise, pharma organisations can achieve a faster time to results and a greater agility to respond to changing business demands.

About the author

Bill Fox is global CSO healthcare and life sciences at MarkLogic.