Beyond the RCT: mitigating bias in observational data

Though randomised control trials (RCTs) have long been considered best for establishing causal relationships between exposures and outcomes, researchers are increasingly using observational (non-randomised) 'real world' data.

This has been acknowledged in a series of guidelines by the International Society For Pharmacoeconomics and Outcomes Research (ISPOR) Good Research Practices Task Force Report¹.

These data allow the effectiveness of treatment as administered in clinical practice to be measured. Convenience is an immediate benefit of using such data; RCTs are associated with significant costs and the scope of an RCT is limited, both in terms of the cohort included and the period of follow-up. There are also situations where RCTs are not able to answer research questions, since ethical reasons restrict their use or because interest lies in clinical effectiveness rather than efficacy.

Scepticism

The use of observational data may be convenient but can be viewed with scepticism by many decision makers due to its inherent biases. Inferring causality from observational data is challenging due to the possible presence of confounders, such as patient age and disease severity², RCTs are, of course, designed to minimise this³.

"The researcher needs to ensure that the treatment effect of interest is estimated from comparable and representative populations"

When using observational data, the researcher needs to ensure that the treatment effect of interest is estimated from comparable and representative populations. The practice of achieving comparable populations is known as matching. The first step in this process is determining which covariates to include, i.e. match. Literature reviews and stratification analyses are recommended to inform the matching process. Covariates may include age, treatment start date (to account for changes in practice/outcomes over time), comorbidities and disease-specific measures of severity. For any kind of matching exercise it is assumed that, in addressing imbalance in observed confounders such as age, balance is also achieved in unobserved confounders, such as virus strain or genotype or patient medical history⁴.

Propensity score matching

Once the included covariates have been established, a method such as propensity score matching is employed. Propensity scores are calculated for each patient which, as described by Rosenbaum, reflect 'the conditional probability of assignment to a particular treatment given a vector of observed covariates'⁵.

This score represents the likelihood that a patient would be in a particular treatment group given his or her characteristics. For example, if patients on treatment are generally older than control patients, a younger patient in the cohort would have a lower propensity score than a comparable patient (in terms of other characteristics) of older age. Patients can then be matched on the basis of this consolidated score rather than on all individual covariates. This methodology can be time consuming for the researcher, however, since consideration should be given to the inclusion of covariates and to the propensity score relationship or model defined.

With advances in computational power, newer matching methods are taking a more iterative, algorithmic approach. These include 'Genetic Matching', a method introduced in 2012 with a statistical package available for its implementation⁴.

"Algorithmic approaches have the potential to significantly reduce the burden for researchers whilst allowing them to check the performance of the matching exercise"

This method will iteratively test and compare a range of matching models until it determines that the covariate balance can be improved no further. Algorithmic approaches have the potential to significantly reduce the burden for researchers whilst allowing them to check the performance of the matching exercise using statistical measures and visual inspection of the data. Once matched cohorts are achieved, the researcher may still wish to consider the effect of any remaining imbalance by including the identified patient covariates as explanatory variables in any outcome analysis considered (such as survival analysis); this practice is known as double-robust estimation.

In summary, there is a place for the use of observational data in supporting health economics and outcomes research, but due care and consideration should be given to addressing the limitations of these data. With the advent of convenient and robust matching methods, researchers are adopting approaches to overcome the limitations associated with the use of observational data.

References

1. ISPOR Task Force. ISPOR Task Force Index [Internet]. ISPOR Task Force. Available from: http://www.ispor.org/taskforces/tfindex.asp
2. Cochran W G, Rubin D B. Controlling Bias in Observational Studies: A Review. Sankhyā Indian J Stat Ser 1961-2002. 1973 Dec 1;35(4):417–46.
3. Kendall J. Designing a research project: randomised controlled trials and their principles. Emerg Med J (EMJ). 2003 Mar;20(2):164–8.
4. Diamond A, Sekhon J S. Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies. Rev Econ Stat. 2012 Oct 10;95(3):932–45.
5. Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983 Apr 1;70(1):41–55.

About the author:

Catrin Treharne is a Health Economist at Abacus International. Catrin holds a BSc in Mathematics and an MSc in Health Economics. She has several years' experience in designing and developing a variety of health economics models and sales tools and has been involved in a number of HTA submissions. She has experience in disease areas including cardiovascular disease, diabetes, end-stage renal disease, oncology, HIV and hepatitis C.

For over 17 years, the Abacus International health economics team has developed cost-effectiveness and budget impact models for many of the world's leading pharmaceutical and medical device companies. In addition to delivering projects, its health economists are creating a series of educational health economics information pieces.

Have your say: Is there is a place for the use of observational data in supporting health economics and outcomes research?