Solving the Big Data problem in pharma innovation

The effectiveness of AI applications can be undermined by the volumes of unstructured data prevalent in the pharma industry. What can be done to overcome this issue?

We live in an exciting time for the pharmaceutical industry. Cutting-edge technologies like artificial intelligence (AI) and Blockchain are making headlines or revolutionising everything from drug discovery to clinical trials. Many of these innovations are built upon the same foundation: Big Data. But a longstanding challenge within Big Data must be overcome in order for technologies like AI to achieve their full potential. That challenge is unstructured data.

Unstructured data and pharmaceutical AI

The need to overcome this challenge can be illustrated by examining the consequences of unstructured data for the effectiveness of AI applications within the pharmaceutical and life science industries.

As I’ve written about in the past, the history of AI can be seen through the lens of three distinct waves. The first wave brought 'knowledge engineering' software that enabled efficient solutions to practical challenges. The second wave brought machine learning programs that enabled automated pattern recognition and advanced statistical analysis. We’ve now entered the third wave of AI, which has the power to generate novel hypotheses by analysing massive sets of data.

Third-wave AI has the potential to significantly accelerate the research and development process for new drugs, as companies like Merck & Co and Sanofi have begun to discover. Applications of third-wave AI programs have powered medical discoveries such as the connection between fish oil and Raynaud’s disease.

But third-wave AI applications have also suffered a series of failures in healthcare and pharmaceutical contexts. MD Anderson’s problems with IBM Watson serve as a notable example. In that instance, the problems all started when MD Anderson changed its electronic medical record (EMR) provider, preventing Watson from accessing the data that it needed. This example illustrates the challenge posed by unstructured data and the corresponding need for greater data integrity within life science industries.

Data integrity in life sciences

Many of today’s AI programs depend on good, clean data in order to operate effectively. If access to such data is compromised, the AI program’s ability to conduct analysis and generate hypotheses is undermined.

Data sets within the pharmaceutical and life science industries pose a particular challenge for AI programs because of the unusual density, depth, and diversity of biological data. Because the complexity of biological data renders it incomprehensible to many AI programs, the majority of pharmaceutical research today is carried out manually. Human researchers curate data, generate hypotheses, and perform experiments in much the same way that they have for decades. Lacking automation, the drug discovery, development, and testing process is inefficient, expensive, and often inaccurate.

The inefficiency of this process causes prolonged delays between the completion of an experiment and the publication of its results in scientific journals or databases. This delay has resulted in a significant problem with publication bias and inaccuracy in the industry. Even the open-science movement, which is attempting to increase access to not-yet-published clinical research results, depends on manually-curated datasets that are usually created by companies with proprietary interests.

Even heavily-curated data sets are often too inconsistent to be meaningfully analysed by AI. Take, for example, the challenge posed by abbreviations and acronyms within the pharmaceutical industry. The same abbreviation may carry different meanings depending on its context. 'Ca', for instance, could mean 'cancer' in one context and 'calcium' in another. Most AI depends on accurate and nuanced contextual information, and manually-curated data sets often fall short of this mark.

Overcoming the unstructured data challenge

Fortunately, some of the world’s leading firms have begun to explore two possible ways to overcome these challenges. One approach is to simply improve the state of available data sets. 2009’s HITECH Act modelled this approach by standardising EMR systems to create richer, more comprehensive, and more up-to-date, biological data sets. As a result, diverse data from biological patents, clinical trials, academic theses, and other sources can increasingly be analysed by advanced AI programs.

The second way to overcome the unstructured data challenge is simply to build better AI. Recent innovations have brought 'context normalisation' AI technology that can process and analyse unstructured, heterogeneous data points using a combination of natural language processing, machine learning, and cutting-edge text analytics. Finally, the most advanced AI programs are able to utilise disparate, incongruous data to generate novel hypotheses without the need for costly human curation.

Innovations like these are allowing researchers to analyse data, generate hypotheses, and conduct conclusive clinical trials at unprecedented levels of speed and accuracy. This is good news for pharmaceutical companies, medical professionals, and consumers alike.

About the author:

Gunjan Bhardwaj is the founder and CEO of Innoplexus, a leader in AI and analytics as a service for life science industries. With a background at Boston Consulting Group and Ernst & Young, he bridges the worlds of AI, consulting, and life science to drive innovation.

The third wave of AI in pharma R&D

Solving the Big Data problem in pharma innovation

AI

Closed & Collaborative H1 2025: Mergers, acquisitions, colla...

Balancing efficiency with risks: How biopharma companies can...

Bridging research and clinical guidelines – interactive stra...

Deep Dive: Research and Development 2025

Life Sciences Industry Report - Biologics and Generics

In pursuit of a cure for cancer: An ASCO 2025 discussion wit...

Life Sciences Industry Report - Oncology

16th World ADC San Diego 2025

Axtria Ignite 2025: Leading in an Agentic Era

AI

artificial intelligence

big data

unstructured data

In pursuit of a cure for cancer: An ASCO 2025 discussion wit...

The value of a partner, globally and locally

Others Old
-About pharmaphorum
-Articles
--eyeforpharma Barcelona 2015
-Audio
-barelonalive 2016
-Cancer
-Diabetes
--Diabetes
-Disease Spotlight
--DMD Main
--LHON Main
-ESMO2016
-ESMO2016-related
-Event coverage
--ASCO-slider
--ASCO-pp
--ASCO-contributor
--ASCO 2018
-Featured content right column
-Featured Event
-Jobs
--pharmaphorum-live
-Medical Affairs Spotlight
--Medical Affairs Spotlight - partner
--Medical Affairs Spotlight - main
-Podcast
-Premium media
-Quintiles
--Quintiles-oncol-PC
--Quintiles - ContentIO
--Quintiles - Articles
--Quintiles - Oncol articles
-Roundtables
--AZ-roundtable-biographies-2017
--AZ-roundtable-interviews-2017
--AZ-roundtable-debate-2017
-Sales and Marketing
--News
--Insights
--Appointments
--Debates
--Debates & Insight
--Views & Analysis
--Partner Content
-Slider in events
-Slider in home page
-Slider in video
-Slider in views and analysis
-Slider in webinars
-Slider in whitepapers
-Spotlight
--Understanding HCPs
--Future of Generics
--Frontiers Health 2022
--Frontiers Health 2021
--Behavioural Science
--ASCO 2019 Live coverage
--AI in Pharma Marketing
--ASCO 2019 and the future of oncology
-Startups
--Views & Analysis
--News
-System
-Top pharmaceutical companies
--Bristol-Myers Squibb
--Teva
--Roche
--Pfizer
--Novo Nordisk
--Novartis
--Merck Inc
--GlaxoSmithKline
--Gilead
--AstraZeneca
--AbbVie
-Top pharmaceutical companies
--Merck Inc
-Top stories
-Transversal Categories
--Brexit
--Clinical
--Digital and Social Media
--US
---Articles
--UK & Europe
--Sales and Marketing
--Rare diseases
--R&D
--Pharma Market Access
--M&A
--Healthcare
-Veeva
-Veeva channel content articles
-Veeva channel content webinars
-Video
-Views and analysis
-Webinars in home page
-Whitepapers
Others
-Transversal Categories
--Sustainability
--Clinical
--Brexit
--Digital and Social Media
--Healthcare
--US
--UK & Europe
--Sales & Marketing
--Rare diseases
--R&D
--Pharma Market Access
--M&A
-Top stories
-About pharmaphorum
-Top pharmaceutical companies
--Gilead
--Merck Inc
--Teva
--Roche
--Pfizer
--Novo Nordisk
--Novartis
--GlaxoSmithKline
--Bristol-Myers Squibb
--AstraZeneca
--AbbVie
-test webinar
-System
-Startups
--News
--Views & Analysis
-Veeva channel content webinars
-Webinars in home page
--Live webinar
--Webinars On demand
-Veeva channel content articles
-Veeva
-Spotlight
-Event coverage
--ASCO-slider
--ASCO-pp
--ASCO-contributor
--ASCO 2018
-ESMO2016-related
-ESMO2016
-Featured Event
-Disease Spotlight
--LHON Main
--DMD Main
-Diabetes
--Diabetes
-Cancer
-barelonalive 2016
-Audio
-Articles
--eyeforpharma Barcelona 2015
-Slider in whitepapers
-Featured content right column
-Jobs
--pharmaphorum-live
-Roundtables
--AZ-roundtable-interviews-2017
--AZ-roundtable-biographies-2017
--AZ-roundtable-debate-2017
-Slider in webinars
-Slider in views and analysis
-Slider in video
-Slider in home page
-Slider in events
-Quintiles
--Quintiles-oncol-PC
--Quintiles - Oncol articles
--Quintiles - ContentIO
--Quintiles - Articles
-Premium media
-Medical Affairs Spotlight
--Medical Affairs Spotlight - partner
--Medical Affairs Spotlight - main
Home
-News
-Views & Analysis
-Deep Dive
-Webinars
--Live webinar
-Podcasts
-Videos
-White Papers
-Events
--JP Morgan
--LSX Investech
--Anthropy
--Reuters Pharma 2022
--World Cancer Series
--Hlth 2022
--PM Society Awards
--Patient Summit PC
--Frontiers Health 2022
--FHlive17
--ASCO 2018
-Digital Debates
-Podcasts & Videos
-Partner Content
--Events
--News
--Appointments
--Press releases
Digital
-Appointments
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Market Access
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Oncology
-Appointments
-Debates
-Insights
-News
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Patients
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
R&D
-Appointments
-Debates
-Debates & Insights
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Partner Content
-Debates & Insight
Sales & Marketing
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Spotlight on
-AI in Pharma Marketing
-ASCO 2019 and the future of oncology
-ASCO 2019 Live coverage
-Behavioural Science
-Frontiers Health 2021
-Frontiers Health 2022
-Frontiers Health 2023
-Frontiers Health 2024
-Future of Generics
-Life Sciences Industry Report
-Understanding HCPs
-World Without Disease Summit
-The Future of Generics

Others Old
-About pharmaphorum
-Articles
--eyeforpharma Barcelona 2015
-Audio
-barelonalive 2016
-Cancer
-Diabetes
--Diabetes
-Disease Spotlight
--DMD Main
--LHON Main
-ESMO2016
-ESMO2016-related
-Event coverage
--ASCO-slider
--ASCO-pp
--ASCO-contributor
--ASCO 2018
-Featured content right column
-Featured Event
-Jobs
--pharmaphorum-live
-Medical Affairs Spotlight
--Medical Affairs Spotlight - partner
--Medical Affairs Spotlight - main
-Podcast
-Premium media
-Quintiles
--Quintiles-oncol-PC
--Quintiles - ContentIO
--Quintiles - Articles
--Quintiles - Oncol articles
-Roundtables
--AZ-roundtable-biographies-2017
--AZ-roundtable-interviews-2017
--AZ-roundtable-debate-2017
-Sales and Marketing
--News
--Insights
--Appointments
--Debates
--Debates & Insight
--Views & Analysis
--Partner Content
-Slider in events
-Slider in home page
-Slider in video
-Slider in views and analysis
-Slider in webinars
-Slider in whitepapers
-Spotlight
--Understanding HCPs
--Future of Generics
--Frontiers Health 2022
--Frontiers Health 2021
--Behavioural Science
--ASCO 2019 Live coverage
--AI in Pharma Marketing
--ASCO 2019 and the future of oncology
-Startups
--Views & Analysis
--News
-System
-Top pharmaceutical companies
--Bristol-Myers Squibb
--Teva
--Roche
--Pfizer
--Novo Nordisk
--Novartis
--Merck Inc
--GlaxoSmithKline
--Gilead
--AstraZeneca
--AbbVie
-Top pharmaceutical companies
--Merck Inc
-Top stories
-Transversal Categories
--Brexit
--Clinical
--Digital and Social Media
--US
---Articles
--UK & Europe
--Sales and Marketing
--Rare diseases
--R&D
--Pharma Market Access
--M&A
--Healthcare
-Veeva
-Veeva channel content articles
-Veeva channel content webinars
-Video
-Views and analysis
-Webinars in home page
-Whitepapers
Others
-Transversal Categories
--Sustainability
--Clinical
--Brexit
--Digital and Social Media
--Healthcare
--US
--UK & Europe
--Sales & Marketing
--Rare diseases
--R&D
--Pharma Market Access
--M&A
-Top stories
-About pharmaphorum
-Top pharmaceutical companies
--Gilead
--Merck Inc
--Teva
--Roche
--Pfizer
--Novo Nordisk
--Novartis
--GlaxoSmithKline
--Bristol-Myers Squibb
--AstraZeneca
--AbbVie
-test webinar
-System
-Startups
--News
--Views & Analysis
-Veeva channel content webinars
-Webinars in home page
--Live webinar
--Webinars On demand
-Veeva channel content articles
-Veeva
-Spotlight
-Event coverage
--ASCO-slider
--ASCO-pp
--ASCO-contributor
--ASCO 2018
-ESMO2016-related
-ESMO2016
-Featured Event
-Disease Spotlight
--LHON Main
--DMD Main
-Diabetes
--Diabetes
-Cancer
-barelonalive 2016
-Audio
-Articles
--eyeforpharma Barcelona 2015
-Slider in whitepapers
-Featured content right column
-Jobs
--pharmaphorum-live
-Roundtables
--AZ-roundtable-interviews-2017
--AZ-roundtable-biographies-2017
--AZ-roundtable-debate-2017
-Slider in webinars
-Slider in views and analysis
-Slider in video
-Slider in home page
-Slider in events
-Quintiles
--Quintiles-oncol-PC
--Quintiles - Oncol articles
--Quintiles - ContentIO
--Quintiles - Articles
-Premium media
-Medical Affairs Spotlight
--Medical Affairs Spotlight - partner
--Medical Affairs Spotlight - main
Home
-News
-Views & Analysis
-Deep Dive
-Webinars
--Live webinar
-Podcasts
-Videos
-White Papers
-Events
--JP Morgan
--LSX Investech
--Anthropy
--Reuters Pharma 2022
--World Cancer Series
--Hlth 2022
--PM Society Awards
--Patient Summit PC
--Frontiers Health 2022
--FHlive17
--ASCO 2018
-Digital Debates
-Podcasts & Videos
-Partner Content
--Events
--News
--Appointments
--Press releases
Digital
-Appointments
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Market Access
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Oncology
-Appointments
-Debates
-Insights
-News
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Patients
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
R&D
-Appointments
-Debates
-Debates & Insights
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Partner Content
-Debates & Insight
Sales & Marketing
-Debates
-Insights
-News
-Press Releases
-Views & Analysis
-Deep Dive
-Webinars
-Podcasts
-Video
-White Papers
-Event
-Debates & Insight
-Partner Content
Spotlight on
-AI in Pharma Marketing
-ASCO 2019 and the future of oncology
-ASCO 2019 Live coverage
-Behavioural Science
-Frontiers Health 2021
-Frontiers Health 2022
-Frontiers Health 2023
-Frontiers Health 2024
-Future of Generics
-Life Sciences Industry Report
-Understanding HCPs
-World Without Disease Summit
-The Future of Generics

Solving the Big Data problem in pharma innovation

Editor's Picks