Autonomous AI can spot cognitive decline in medical notes
Hossein Estiri, director of the Clinical Augmented Intelligence (CLAI) research group and associate professor of medicine at Massachusetts General Hospital.
Researchers in the US have developed an agentic AI that can screen for people in the early stages of cognitive impairment by sifting through routine clinical documents.
The team from Mass General Brigham say the AI is fully autonomous, requiring no human intervention after it is deployed, and has shown 98% specificity in real-world validation testing. They have published two large language model (LLM) workflows for the AI approach in Nature's npj Digital Medicine journal.
In the paper, they point to the potential for LLMs to "revolutionise clinical workflows by systematically processing and interpreting the complex narrative threads woven throughout medical documentation."
The researchers have also released an open-source tool, called Pythia, that they say can enable any healthcare system or research institution to develop and deploy autonomous AI screening applications for their own purposes.
"We didn't build a single AI model – we built a digital clinical team," said corresponding author Hossein Estiri, director of the Clinical Augmented Intelligence (CLAI) research group and Associate Professor of Medicine at Massachusetts General Hospital. "This AI system includes five specialised agents that critique each other and refine their reasoning, just like clinicians would in a case conference."
The hope is that AI agents can complement and improve on current tools for detecting cognitive decline, like the Mini-Mental State Examination and Montreal Cognitive Assessment, which can be cumbersome and time-consuming to administer and can yield variable results.
At the same time, with drugs now reaching the market that can help slow down cognitive decline in diseases like Alzheimer's, there is an urgency behind early detection that can allow them to be used when they will be most effective.
"By the time many patients receive a formal diagnosis, the optimal treatment window may have closed,” said co-lead study author Lidia Moura of the Center for Healthcare Intelligence at Mass General Brigham's neurology unit.
"Clinical notes contain whispers of cognitive decline that busy clinicians can’t systematically surface [and] this system listens at scale."
The study analysed more than 3,300 clinical notes produced during regular healthcare visits, from 200 anonymised patients, for signs of cognitive decline. The AI agents' conclusions were reviewed by humans, and where there was disagreement, an independent expert stepped in with a re-evaluation.
The system achieved 91% sensitivity – the ability to correctly find cases – under balanced testing, but that fell to 62% under real-world conditions. On the other hand, specificity – ruling out negative cases – was near-perfect.
Where there was disagreement between the AI and human reviewers, the expert validated the AI's reasoning 58% of the time, suggesting it was making sound clinical judgments that the initial human team had missed.
"We're publishing exactly the areas in which AI struggles," said Estiri. "The field needs to stop hiding these calibration challenges if we want clinical AI to be trusted."
