Simulated data, real-world patient insights

Views & Analysis
AI in pharma and healthcare

The UK has some of the best patient data available anywhere in the world, and a preeminent position in the collection of cancer data at both an individual tumour and patient level.  

In trying to realise the potential of this resource, the health sector is faced with the conundrum of protecting patients’ identities while at the same time allowing life science researchers access to their data and enabling industry to benefit from the vital disease insights it can bring. 

A newly-launched simulated data initiative is set to both cement the UK as a destination for oncology research and solve many of the challenges associated with improving healthcare through the wider use of patient data. 

Known as the Simulacrum, the synthetic database was developed by Health Data Insight in partnership with IQVIA and AstraZeneca. It’s comprised of only artificial data that is modelled on real patient data collected by the National Cancer Registration and Analysis Service (NCRAS) in England and is the first major use of synthetic data on population cancer data. 

Coinciding with the Simulacrum going live at the end of November 2018, and being free to access and use, a pharmaphorum webinar brought together an expert panel to discuss the new database, the thinking behind it and the value it could offer to pharmaceutical researchers. 

Simulated data in cancer care pathway 

One of the first points to address in the digital debate was to clarify what simulated data is and how it differs from real patient data. As Jem Rashbass, medical director at Health Data Insight, explained: “It’s data that looks and feels like patient data, but that hasn’t come from real patients. The maths behind the Simulacrum is sufficiently complex that we built it from anonymous data so there is never a real patient in there.” 

Rashbass, who is also responsible for Public Health England’s NCRAS, is keenly aware of the challenges in cancer care. “One in two of the population is likely to get cancer in their lifetime and what we need to understand is what the risk factors are for individuals, what ways can we intervene in those risk factors, how can we prevent cancer, how can we detect cancer earlier and, ultimately, how we can ensure the best possible life after a cancer diagnosis.” 

In order to do this data is needed, he explained: “We need data across the whole care pathway that removes variation in treatment access and practice. We need timely data, accurate data and very detailed data. As we move towards an era of personalised medicine, to understand how to deliver the best care to the individual, we need information from very large numbers of people.” 

NCRAS collects data on every NHS patient diagnosed with cancer and is the largest data collection of its kind in the world. This high-quality data is abstracted into the Cancer Analysis Services (CAS), but at this level it still contains potentially identifiable data and thus is highly protected with significant controls on its access and use. 

“That is why we set about to create a dataset that had the look and feel and the same structure as the data held by Public Health England but was in fact entirely synthetic – and that’s what the Simulacrum is,” Rashbass said. 

In fact, the database is a complete replica of the data model that’s found in the Cancer Analysis Service (CAS). “The elegance of this approach is that people can write and test their queries on the Simulacrum, knowing that should they wish to do so in real data they wouldn’t need to modify their query,” Rashbass  explained. 

Patient data and the patient view 

Although the formal basis for collecting and using NHS data in this way is clear gaining an understanding of how patient groups view the concept of simulated data is vital. 

Bringing this perspective to the discussion was Chris Carrigan, who’s an expert data advisor and advocate at use MY data, a movement of patients, carers and relatives looking at the benefits of health data sharing. Reflecting on some of the conversations he has had with a few of those in the movement, he said: “Fundamentally they see data sharing as important. There’s a recognition that we are in a very strong, unique position with the NHS and that should be protected and not exploited.  

He added: “Openness and transparency are the key points, and their lack in the past has caused some problems. Patients out there know that this is difficult, particularly those in the cancer world who have been exposed to the work of the registration service and the complexity of the cancer data that exists out there. But there’s a real keenness to look for ways to fix that with new models of using the data.” 

Generating real-world insights from simulated data  

Now that the Simulacrum is live and has the backing of the NHS, Public Health England and patient groups, how can researchers use it to improve the development of better cancer medicines and improve patient outcomes? 

One key area will be generating real-world evidence in cancer, as Adam Reich, expert in real-world and analytics solutions at IQVIA explained. 

“A lot of time and effort has been put into exploring the use of Simulacrum to create a model to access the real data being collected by the National Cancer Registry and Analysis Service and show that it’s a model that researchers and industry can use going forward,” he said. 

Simulated datasets can also allow us to potentially answer some questions without requiring actual patient data at all. Meanwhile, in many more circumstances, the Simulacrum will tell us whether there are enough patients to run a study using anonymous, aggregated data, where individual patients cannot be identified,” Reich said. 

External stakeholders such as payers and providers, especially in the UK, have already recognised, and in some cases mandated, that real-world evidence be used to support decisions related to reimbursement and access to medicines. Here too, Reich said, there were benefits. “The application of simulated datasets, and Simulacrum particularly, can help address challenges with generating real-world evidence and establish a gateway to improved real-world evidence in oncology.”  

Simulacrum: unlocking the potential of real world evidence  

The launch of the Simulacrum is just the beginning for bringing simulated data into the mainstream and providing researchers with real-world applications for its use in drug development, market access and approval.  

The simulated data within the database was originally generated using a machine learning algorithm and, as Rashbass noted“in the future this rich data source could have a lot of potential for use with artificial intelligence”. 

Expansion of the Simulacrum is also on the agenda. “Obviously the data will go out of date over time and our aim is to update the data, and we anticipate at least annual updates of the dataset to coincide with the annual updates of cancer registration data,” said Rashbass. “However, the other thing that we need to add in is additional datasets and there is work going on at the moment to add in radiotherapy and molecular data to increase the richness of the datasets that are there. 

Ultimately, as Reich said, the intention is “to build on it and add to it so that we can use the data to help more people and improve outcomes”.