Generative AI correctly diagnoses 39% of complex, challenging medical cases

Jun 20, 2023

The generative artificial intelligence (AI) model GPT-4 correctly diagnosed 39% of challenging medical cases in a study published in JAMA.

GPT-4 is the successor to the technology behind ChatGPT, the chatbot that took discussions of generative AI into the mainstream late last year. Such generative models are trained on large datasets and provide accurate and detailed text-based responses to written prompts. Because the models can interpret and respond to language, they could theoretically mimic the deductions of doctors and diagnose disease.

To test that theory, researchers at Beth Israel Deaconess Medical Center (BIDMC) applied GPT-4 to a set of 70 clinicopathological case conferences (CPCs). CPCs are complex and challenging cases published in the New England Journal of Medicine for educational purposes. The reports include clinical and laboratory data, imaging studies, and histopathological findings.

“We wanted to know if such a generative model could ‘think’ like a doctor, so we asked one to solve standardized complex diagnostic cases used for educational purposes. It did really, really well,” Dr. Adam Rodman, co-director of the Innovations in Media and Education Delivery Initiative at BIDMC, said in a statement. Rodman is an instructor in medicine at Harvard Medical School.

GPT-4 matched the final CPC diagnosis 39% of the time. For 64% of cases, the AI included the final CPC diagnosis in its list of possible conditions that could account for a patient’s symptoms, medical history, clinical findings, and laboratory or imaging results.

The JAMA paper lacks a control arm showing how GPT-4 compares to the success rate of doctors, but the results of another study suggest the AI performed well. Last year, two people described as “medical doctors, but non-experts” reached the correct diagnosis for 28% of the 50 CPCs they reviewed.

While GPT-4 performed well in the study, the fact that the AI can solve CPCs does not mean it can execute the process physicians use to diagnose patients. Experts create CPCs for physicians to learn from, providing all the data they need to reach a diagnosis. In the real world, physicians start with limited information and need to form hypotheses and generate data to test them, working step by step toward a diagnosis.

Even so, the study suggests generative AI could play a role in supporting doctors, as first author Dr. Zahir Kanjee, a physician at BIDMC and assistant professor of medicine at Harvard Medical School, said in a statement.

“It has the potential to help physicians make sense of complex medical data and broaden or refine our diagnostic thinking. We need more research on the optimal uses, benefits, and limits of this technology, and a lot of privacy issues need sorting out, but these are exciting findings for the future of diagnosis and patient care,” Kanjee said.