Study finds big differences between top symptom checker apps

Apps that patients can use to report symptoms and seek advice on treatment are highly variable in their accuracy, but some come close to matching GPs, says a new study.

The peer-reviewed study – published in the journal BMJ Open – compared eight of the most popular symptom assessment apps to a control group of seven GPs against a series of 200 primary care scenarios or “vignettes” designed to mimic real-world patient experiences and gleaned from the NHS 111 telephone triage service.

The apps – Ada, Babylon, Buoy, K Health, Mediktor, Symptomate, WebMD, and Your.MD – were put through their paces against three criteria, namely the breadth of content covered, and the accuracy and safety of advice given compared to a GP consultation.

The researchers from Brown University in the US and German digital health company Ada Health – which developed the Ada app – suggests there are wide differences between the apps on all these measures which raise questions about whether some are fit for purpose in clinical settings.

The paper found that coverage of the conditions in the vignettes ranged from 51.5% with Babylon to 99% with Ada, with an average overall of 69.5%, while GPs provided 100% coverage.

Those at the bottom of the coverage list were not able to suggest conditions for significant numbers of cases, including scenarios involving children, patients with a mental health condition, or pregnancy, according to the German company.

Ada was also rated as the most accurate for accuracy, suggesting the right condition in its top three suggestions 71% of the time while the average across all the other apps was just 38%, indicating that they didn’t identify the correct condition in the majority of cases. Once again, GPs were top with 82% accuracy.

On the final measure, most apps gave safe advice most of the time, but only three approached the 97% rating for doctors. Among these, Ada came top again at 97%, followed by Babylon at 95% and Buoy at 80%.

It’s worth pointing out that some of the apps that fared less well – including Buoy, K Health and WebMD – were designed for the US market, so may have been penalised by the use of NHS-derived vignettes.

Noting that symptom assessment apps are now used by tens of millions of patients annually in the US and UK alone, Dr Hamish Fraser of Brown’s Centre for Biomedical Informatic said the study is an important indicator of how valuable they are.

“Compared to a similar study from five years ago, this larger and more rigorous study shows improved performance with results closer to those of physicians,” according to Fraser. “It also demonstrates the importance of knowing when apps cannot handle certain conditions.”

The results could also be used to determine which of the apps are ready for clinical testing in observational studies and then randomised controlled trials, he added.