Forget the doctor, why not just ask ChatGPT?

In a study of random patient questions drawn from the Reddit social media platform, people responded more positively to answers provided by ChatGPT, rather than a physician.

That may seem like quite a disturbing finding coming just as the so-called 'godfather of AI' at tech giant Google – Geoffrey Hinton – has quit the company so he can be free to warn of the dangers of the emerging technology, which he says is developing at an alarming pace and starting to result in a very different form of intelligence to humans.

The researchers behind the work say, however, that it demonstrates the potential of AI-powered chatbots to generate patient answers – for example, email responses to questions – that could be reviewed by the doctor before being sent, reducing their workload and risk of burnout.

The study found that chatbot responses were preferred over physician responses to 195 questions, rating significantly higher for both quality and empathy by a team of licensed healthcare professionals who rated the responses without knowing their source.

All told, chatbot answers were more than three times more likely to be rated as good or very good quality and around tenfold more likely to be empathetic or very empathetic.

The multidisciplinary team – led by behavioural scientist John Ayers of the University of California San Diego, La Jolla – note this is a preliminary, subjective assessment, and that randomised trials will be needed to assess whether the use of AI assistants could “improve responses, lower clinician burnout, and improve patient outcomes.”

Commentators have been quick to point out the limitations of the study, and particularly that the responses were provided without the benefit of a patient’s medical history or other relevant context.

They also voiced concerns that the tools could eventually end up being used without the intervention of the physician – which could be dangerous in a medical setting.

“From the examples of answers shown in the paper, the doctors gave succinct advice, whereas ChatGPT’s answers were similar to what you would find from a search engine selection of websites, but without the quality control that you would get by selecting (say) an NHS website,” commented Prof Martyn Thomas, professor of IT at Gresham College in London, UK.

“ChatGPT has no medical quality control or accountability, and [large language models] are known to invent convincing answers that are untrue,” he said. “If patients just want sympathy and general information, ChatGPT would seem to offer that.”

One key difference noted was that the ChatGPT responses tended to be much longer than those of physicians, raising questions about whether those assessing the answers were truly blinded to the source. That may also have skewed the perception of empathy, according to IT specialist Prof James Davenport at the University of Bath.

AI expert Prof Nello Cristianini – also at Bath – said the study was “impressive” – albeit with many limitations – but shows that ChatGPT can perform as well as doctors on the simple function of providing information.

“However, this ‘text only’ interaction is the natural mode in which GPT is trained, and not the natural setting for human doctors,” he continued. “Framing the comparison in terms of textual prompt and textual answer means missing a series of important points about human doctors. I would prefer to see these tools used by a doctor, when addressing a patient, in a human-to-human relation, which is part of the therapy.”

Others see an opportunity for the technology to help doctors improve their performance. “Chatbots could also serve as educational tools to help professionals identify opportunities for responding more empathetically to their patients,” said Dr Heba Sailem, head of the biomedical AI and data science group at King’s College London.