Sharing raw data from clinical trials: what are the challenges?

Adam Jacobs

Dianthus Medical Limited

Adam Jacobs of Dianthus Medical Limited discusses access to raw data and the associated challenges and benefits.

Access to raw data from clinical research is a hot topic. Currently, it is rare for anyone outside the research team and regulators to have access to raw data: most of us must content ourselves with summary statistics from a published manuscript.

But there is now talk of going further than that. Access to raw data would mean that in addition to publishing the usual information about studies in the form of summary statistics, we would also post the entire study database on a website so that anyone could access the data and run their own analyses.


“…why are data from clinical trials not shared as a matter of routine?”


There are undoubted benefits to being able to access raw data, such as facilitating individual patient level meta analyses, and the ability to confirm others’ results by independently replicating their analyses.

So why are data from clinical trials not shared as a matter of routine?

The answer is complex. A number of barriers hinder wider sharing of data, some of which are easily surmountable, others, less so.

One problem is that researchers are often in competition with each other. Pharmaceutical companies are literally in competition in the market place, but even academic researchers frequently regard other researchers in their field as competitors. Academic promotions can depend on being able to publish exciting new results before the other guy does.

This culture of competition gives rise to a classic prisoner’s dilemma problem. If I share my data and you don’t, then you have an advantage and I lose. However, if we both share our data, then neither of us is at a disadvantage and we both gain from the increased access to data. But since neither of us has anything to gain from unilaterally sharing our data, the data remain hidden.

This could be solved if some third party in a position of authority were able to mandate that everyone share their data. Two obvious third parties that could fulfil that role are journals and regulators. Happily, we are seeing moves from both.

The British Medical Journal has recently announced a policy of requiring that authors of papers submitted to them about drug or device studies must make their raw data available. It’s disappointing that the policy is limited to drug and device studies, particularly given that the BMJ publishes such a large proportion of studies on other interventions: the arguments about the benefits of access to raw data apply irrespective of the nature of the intervention being tested. It also remains to be seen whether this new policy is enforced, or whether any other journals will follow suit. However, this is undoubtedly a good start and moves things in the right direction.


“GlaxoSmithKline has announced that they intend to make their raw data available…”


The European Medicines Agency held a meeting in November 2012 to discuss how regulators could make raw data available. The outcome of this meeting is that new policies will be developed over the coming months, with the stated intention that they will come into force on 1 January 2014.

And despite the “prisoner’s dilemma” aspects to releasing raw data unilaterally, GlaxoSmithKline has announced that they intend to make their raw data available anyway. This is a further welcome development.

None of those initiatives propose simply posting raw data on a publicly accessible website. Rather, data are to be made available on request to bona fide researchers (although how bona fide researchers will be distinguished from the other kind is not clear). Although in some respects making the data completely open would be desirable, there is another important barrier to doing this: patient confidentiality.

Obviously no-one is proposing to include patients’ names and addresses in a dataset. But nonetheless, a typical dataset from a clinical trial would include details such as date of birth, sex, medical history, and the name of the institution at which the patient is being treated. That would frequently be sufficient to identify an individual patient. This gives us a tricky problem.

At the EMA meeting on data transparency, I was surprised to see (via Twitter, I wasn’t able to attend in person) the psychiatrist David Healy apparently suggesting in this context that it was better to ask not whether patients consent to having their data shared, but whether they “consent to it being hidden”. I find that attitude shocking. Patients do not have to consent to their data being kept confidential. Confidentiality is, and must continue to be, the default position in a doctor-patient relationship.


“Confidentiality is, and must continue to be, the default position in a doctor-patient relationship.”


We must not underestimate the importance of confidentiality in medicine. Patients expect their data to be kept confidential, although no doubt many would be happy to have their de-identified data posted on websites if the benefits to research were explained to them. Some might make the argument that the benefits of sharing data more widely trump the rights of the few patients who would prefer that their data were not shared. I think we go down a very dangerous route indeed if we attempt to use those kind of utilitarian arguments to over-ride patients’ individual rights.

How, then, do we solve that problem?

Current plans to make data on request to bona fide researchers seem a good compromise. In the future, it would be nice to see raw data made more easily available, but this would, in my opinion, need two things to happen first. First, researchers need to take great care to ensure that patients are not identifiable, perhaps by removing data fields such as date of birth that allow patients to be identified. Defining what makes patients identifiable is complex, but work has already been done in this area. Second, the surest way of avoiding ethical problems with posting the data would be if patients were to give specific consent when they consent to join the trial. We should all be writing something about this on our informed consent forms for clinical trials already, but I suspect few of us are.

Another problem is simply one of resources. Raw data don’t post themselves. There is a cost to making data available. Ideally, it would be nice if it could be confirmed that the benefits of making raw data available outweigh the costs, but I’m not holding my breath waiting for anyone to do that analysis. That’s not the way regulators work. Regulators, on the whole, have no problem with things that increase costs as long as those costs result from gainful employment for regulators.


“There is a cost to making data available.”


There is also the problem of data standards and meta-data. A dataset is no use to another researcher unless it is clear what all the various data fields mean. For example, in a field labelled “sex”, where the data are coded as 1 and 2, does that mean 1 = men and 2 = women, or vice versa? And if we come across a variable in the dataset labelled “bp_hms_x3”, then what on earth could the data in that column actually be?

Happily, those problems are the most easily solved of all the problems I’ve mentioned so far. The Clinical Data Interchange Standards Consortium (CDISC) has been doing heroic work in data standardisation for many years. If we were all to make our data available using CDISC’s Study Data Tabulation Model, then anyone else using the dataset would know exactly what the data mean.

This is an area where practices are developing rapidly, and I’m sure we will see many changes over the next few years. I have volunteered to join the EMA’s working party on developing standards for releasing raw data, so I hope I’ll be able to keep up with any developments. We would all be wise to watch closely to see what happens.

About the author:

Adam is an experienced medical writer and statistician. Before setting up Dianthus Medical in 1999, he worked as a medical writer for both a small contract research organisation and a large medical communication agency. Adam has a PhD in organic chemistry from the University of Cambridge and an MSc in medical statistics from the London School of Hygiene and Tropical Medicine.

He takes an active role in the European Medical Writers Association (EMWA), and was president of the association in 2004-2005. In 2003, he set up EMWA’s ghostwriting task force, as a result of which he was co-author of EMWA’s guidelines on the role of medical writers in peer-reviewed publications. He is a regular workshop leader for EMWA’s training workshops and a columnist in their journal, The Write Stuff, and was among the first few people to be awarded EMWA’s advanced professional development certificate. He is also a fellow of the Institute of Clinical Research and a Chartered Scientist.

In his spare time, he enjoys cooking, gardening, karate, long-distance running, travel, and hill walking (but not usually all at the same time).

He can be contacted via Twitter at @dianthusmed

When can we expect all raw data to be made available?