AI Is Failing at Primary Diagnosis More Than 80% of the Time, Study Finds
A new study highlighted by Euronews suggests AI systems miss the mark on primary diagnosis in the large majority of cases. The result is a sharp reminder that broad medical intelligence remains far harder than answering isolated questions well.
Diagnosis is the most consequential benchmark for medical AI, and also the one most likely to expose its weaknesses. A system that can summarize an article or draft a note may still struggle when asked to identify the primary cause of a patient’s symptoms from incomplete, noisy, and evolving information.
The reported failure rate is a sobering counterweight to the hype around “doctor-level” AI. It suggests that models remain vulnerable to pattern-matching without robust clinical reasoning — especially when the task requires prioritization, uncertainty management, and recognition of rare but critical alternatives.
This does not mean AI has no role in diagnosis. It can still help with differential generation, summarization, and decision support. But the study strengthens the argument that diagnostic AI should be framed as a second set of eyes, not a replacement for clinical judgment.
For health systems and regulators, the lesson is simple: do not evaluate AI by its best demos. Evaluate it on the hard cases, the edge cases, and the situations where consequences are highest.