The Internet, emerging data analytics and patient-reported outcomes: a new approach to pharmaceutical research and development

Carol Hills and Stephen Doogan

Prism Ideas and dMetrics

Throughout the process of drug development and commercialisation, pharmaceutical companies use a variety of information sources to guide decisions on trial designs, feasibility and marketing strategy. Traditionally, much of this information has relied upon de novo collection of data from opinion leader advisors, focus groups and market research studies. In today’s environment, the Internet provides a rich source of real-world data to complement traditional research methods. Pharmaceutical companies have long been leveraging the web to present information on their company, disease areas and product information to analysts, patients and prescribers through closely controlled corporate websites and largely ‘one-way’ social media brand pages. The pharmaceutical industry’s primary use of the Internet has been as a vehicle to push and disseminate information rather than as a means to listen and extract valuable market insights.

The explosion of Internet forums, chat rooms and other social media sites has allowed patients to openly share experiences of their diseases and treatments. Not only does this provide potential participants with valuable information and advice from other patients, it also represents a largely untapped source of information for healthcare organisations. There are numerous examples that demonstrate how online platforms have been used to extract clinically relevant information. Within individual patient communities, online questionnaires have been used to elicit information on the effect of imatinib dose changes on self-reported disease progression1 and patient preferences for reduced toxicities2 in patients with gastrointestinal tumours. In addition, patient data from other forums have been used to evaluate off-label drug use3 and develop patient-reported outcome instruments.4

Unlike traditional data resources such as primary market research studies and clinical trials, the vast majority of patient-reported information within Internet discussions is unstructured, which presents a considerable challenge to conventional analytical methods. In particular, the complex nature of human communication means that methods such as keyword identification are unable to link individual pieces of information to understand the motivation and sentiment behind the patient’s actions, and therefore to date, this has limited the value of Internet chatter as a meaningful data source for healthcare research.

“The explosion of Internet forums, chat rooms and other social media sites has allowed patients to openly share experiences of their diseases and treatments.”

This situation is ripe for change. Advances in health-focused natural-language processing technology enable Big Data to be harnessed to identify the decisions patients make, and more importantly, why they make them. By joining pieces of information and putting them in context, this new technology allows patients’ actions, such as changing therapy, to be linked to the underlying reasons (Figure 1). The voices of thousands of patients can be analysed to provide precise information on patient outcomes, reported in their own words. As patients only talk about things that really matter to them, the insights gained have the potential to inform drug development and marketing strategies to produce new medicines that significantly impact patients’ lives. Furthermore, with increasing pressure from regulators and payers to demonstrate true patient benefit, the analysis of Internet chatter with advanced language processing now has the potential to provide accurate and quantified real-world patient-reported outcomes data.

Figure 1: 22 year-old female patient with allergies

“The voices of thousands of patients can be analysed to provide precise information on patient outcomes, reported in their own words.”

The success of applying such technology in healthcare relies on an overlay of medical expertise so that Internet chatter can be interrogated to ask clinically relevant questions pertinent to a given disease. Moreover, the refinement of the algorithms that perform the language analysis requires expert feedback to ensure fast and accurate automated data analysis. Depending on the questions asked and algorithms used, patient discussions can help to define clinical development programmes and product positioning pre-launch by identifying unmet need and providing competitor profiles, respectively. Once launched, patient discussion will generate signals that can be used in pharmacovigilance and risk evaluation and mitigation programmes, which in turn may refine future commercial strategies.

While the Internet provides a vast source of information, it has been perceived that analysing chatter from online sources has the potential for bias by limiting reports to those patients active in discussion forums. In addition, it has not previously been possible to filter posts so that an individual patient is only represented once in a given data set. Today, the Internet is not the domain of a selected user group, but is widely used by all groups of society and thus the potential for bias is profoundly reduced. Furthermore, with the specific technology available today, it is possible to mirror observational clinical trials by applying appropriate inclusion and exclusion criteria, thus minimising bias and ensuring that the patients analysed are representative of the usual disease population. It is also possible to identify multiple posts from a single individual, detect linguistic patterns and de-duplicate the resultant data set to lessen and / or eliminate the impact of an individual with outlying views.

“While the Internet provides a vast source of information, it has been perceived that analysing chatter from online sources has the potential for bias”

Unlike conventional market research and clinical trials, which prospectively generate new data, Internet data often exist in advance of the study planning and design process. By harnessing this resource it is now possible to conduct in silico studies to answer the same questions usually resolved by market research programmes, observational clinical trials and adverse events signal detection activities, but with substantially shorter timelines and potentially lower cost.

In summary, to date the use of the internet as a research tool has been limited and the application of generic technical approaches to analysing self-reported chatter has been disappointing. Recent advances in language analysis and the ability to overlay healthcare expertise allows the pharmaceutical industry to incorporate the wealth of real-world patient data available into clinical and commercial strategy in a time-efficient manner so that ongoing research programmes can focus their priorities on the needs of patients.


1. Call J, Scherzer NJ, Josephy PD, Walentas C. Evaluation of self-reported progression and correlation of imatinib dose to survival in patients with metastatic gastrointestinal stromal tumors: an open cohort study. J Gastrointest Cancer 2010, 41(1): 60-70.

2. Hauber AB, Gonzalez JM, Coombs J, Sirulnik A, Palacios D, Scherzer N. Patient preferences for reducing toxicities of treatments for gastrointestinal stromal tumor (GIST). Patient Prefer Adherence 2011, 5: 307-14.

3. Frost J, Okun S, Vaughan T, Heywood J, Wicks P. Patient-reported outcomes as a source of evidence in off-label prescribing: analysis of data from PatientsLikeMe. J Med Internet Res 2011, 13(1): e6.

4. Wicks P, Massagli M, Kulkarni A, Dastani H. Use of an online community to develop patient-reported outcome instruments: the Multiple Sclerosis Treatment Adherence Questionnaire (MS-TAQ). J Med Internet Res 2011, 13(1): e12

About the authors:

Carol Hills is the Chief Operating Officer, Prism Ideas, Nantwich, Cheshire.

Carol’s career spans 20 years in the pharmaceutical and medical communications industries. After gaining a degree in Natural Sciences from Cambridge University, Carol spent 8 years in preclinical research at Glaxo, where she completed a PhD in receptor pharmacology. Carol then moved to a career in medical communications, leading the editorial development of a range of products. Carol expanded her experience in medical marketing at AstraZeneca, working across all aspects of global brand development and communication for several oncology brands. Before moving to Prism 3 years ago, Carol was Scientific Director at a communications agency providing strategic input to communication plans and leading the development of medical education programmes.

Email: Telephone: +44 1270 621724

Stephen Doogan is the VP of Healthcare Operations, dMetrics, Boston, USA.

After completing a degree in Linguistics at Leeds University, Stephen performed clinical research and computational pharmacovigilance roles at Kitasato University, Tokyo and Hitachi, respectively. He has also run 15 healthcare social networks. He is currently responsible for Healthcare Operations at dMetrics in Boston.


Is it time that pharma unleashed the wealth of real-world patient data?