Data hackathon analyses truth behind TB vaccine and COVID-19

Is a century-old vaccine a ‘game-changer’ for COVID-19? Anita de Waard from Elsevier and Radoslav Kirkov from Estafet tells us how a hackathon is harnessing data science to look beyond the hype and seek definitive clinical evidence.

Today, the notion of ‘data science’ has permeated almost every area of society. Words like machine learning, artificial intelligence and deep learning have entered the everyday business lexicon. From government agencies to online retailers, a ‘big data strategy’ is a must-have. This year, as the COVID-19 pandemic has spread, there has been increased talk of statistics, modelling, predictive analytics, and using data to solve the serious issues we face.

But often, what purports to be data science is actually just a random correlation between different data sets. The phrase ‘data science’ is often used to represent any form of data analysis, however rudimentary, and regardless of whether it is based on scientific understanding. Given the amount of faith we increasingly put in algorithms to make decisions on our behalf, whether in our hospitals, our courts, or our education system, we need a much deeper understanding of how these correlations are drawn, and what they are based on, in order to apply data science for good.

This is especially true in the search for effective therapies to fight COVID-19, and a vaccine. Speed is truly of the essence, but at the same time, the integrity of the science underpinning any clinical recommendations must be maintained. With so many research projects, collaborations and clinical trials taking place in an attempt to limit and prevent the virus, we have to be clear on how decisions are being made and what the data behind an apparent breakthrough is really telling us.

“In a worst-case scenario, misplaced hype could lead to a sudden rush to buy doses of the BCG vaccine. In nations where TB is widespread, this could put many lives at stake”

Understanding the link between COVID-19 and the BCG vaccine

A good example of this phenomenon is the sudden hype around the Bacillus Calmette–Guérin (BCG) vaccine, which is primarily used against tuberculosis (TB). This century-old vaccine came to prominence recently, when a number of early ecological studies (those which study population factors in epidemiology) seemed to show a strong correlation between receiving the vaccine and having immunity against COVID-19.

Some studies suggested the link was a “game-changer” and a “silver bullet”. The studies claimed to show a strong correlation between the BCG vaccination and protection against COVID-19, but closer examination revealed a tenuous correlation, from which clear conclusions can’t be drawn. Indeed, the World Health Organization said that, “Such ecological studies are prone to significant bias from many confounders, including differences in national demographics and disease burden, testing rates for COVID-19 virus infections, and the stage of the pandemic in each country.”

The world-leading TB researcher Prof. Madhukar Pai was also quick to warn of the serious limitations with this approach and the need to be cognizant of confounding variables. In a worst-case scenario, misplaced hype could lead to a sudden rush to buy doses of the BCG vaccine. For developed nations with low TB rates, this would have little impact. But in nations where TB is more widespread, such as India, the potential implications of a sudden shortage of BCG vaccine could put many lives at stake.

The aim now must be on providing stronger clinical trial evidence of the link between the BCG vaccination and incidence of COVID-19, to enable data-led decisions to be made. There are clear shortcomings with current ecological studies, which take aggregated data and look to make inferences at an individual level. If the data are not representative or confounders are not taken into account, the results will be inaccurate.

Establishing an evidence-backed link

The only way to truly understand the correlation between COVID-19 and the BCG vaccine is to conduct randomised trials combined with deep analysis of existing data. To that end, Estafet and Elsevier have initiated a two-stage hackathon. The groups are working together with the BCG World Atlas team, which is led by an infectious disease specialist at the University of Ottawa, Dr Alice Zwerling. The BCG Atlas is an open-source database of global BCG vaccination policies and practices, founded in 2011.

Many of the aforementioned ecological studies were based on data from the BCG Atlas, so the first stage of the hackathon aimed to augment and improve the Atlas; with additional data and health records available on BCG vaccinations. These have been found through natural language processing (NLP) methods. With thirty volunteers globally, including judges, organisers, and data gatherers, prizes were awarded to those deemed to have extended the data most. The winner was Dimitrina Zlatkova of Sofia University, who contributed 57 additional data points, followed by developer Marouane Benmeida of Morocco who added 33 additional data points.

The hackathon now moves to stage two, where the volunteers will seek to answer a series of questions, such as whether the BCG vaccination is causally related to reduced COVID‐19 mortality, or if other factors like lockdowns and average age of the population are responsible for the different mortality rates. If the BCG vaccination does reduce COVID-19 mortality, what are the key factors. For example. how long does the immunity from BCG last after that vaccination? Does the strain of BCG vaccination impact immunity? The team is now looking for more volunteers to get involved as the hackathon progresses. Once complete this most valuable insights from the task will be shared with the ongoing BCG COVID-19 clinical trials.

Data science for good

When it comes to COVID-19, data science will certainly be critical – but it is the blend of scientific understanding and technical acumen through data science that is vital.

It is a job for all of us engaged in data science projects – whether in academia or commercial or government research – to stem the hype. It is important to assess the veracity of a claim before accepting any conclusions, and empower the public to do the same. This habit of mind is important not only in the development of treatments and vaccinations, but paramount to establishing a broad public trust in data-led decision making.

About the authors

Anita de Waard is VP research collaborations at Elsevier and Radoslav Kirkov is technology director at Estafet.