AI in Healthcare

The latest on artificial intelligence transforming medicine

News stories discovered and organized by an automated pipeline. Covering clinical deployments, research breakthroughs, regulation, and industry developments.

Filtered by: LLMsClear filter
researchIEEE Spectrum

AI Doctors Are Getting Better at Reasoning — But the Real Test Is Still Clinical Judgment

A new wave of reporting suggests advanced chatbots are improving on medical reasoning benchmarks, including tasks where they can outperform physicians on narrow prompts. But experts are increasingly clear that benchmark gains do not equal safe, reliable care. The real question is no longer whether models can answer like doctors. It is whether they can consistently think, contextualize, and know when to defer in the messier environment of real patients.

AI reasoningclinical decision supportdiagnosisbenchmarking
research

Can LLMs Really Advise Patients Safely? New Benchmarks Say “Not Yet”

A new AI benchmarking report suggests major chatbots like Claude, ChatGPT, and Gemini can avoid obvious harm in many cases, but still struggle in high-risk conversations. That distinction is crucial in healthcare, where the hardest interactions are often the most consequential. The findings reinforce a growing consensus: general-purpose models may be usable for low-risk guidance, but they are not ready to shoulder unsupervised clinical advice.

Carroll County Mirror-Democrat
LLMsbenchmarkingpatient safety
research

Psychological Framing May Be the Missing Ingredient in Better AI Health Advice

Research highlighted by Let's Data Science suggests that psychological frameworks can improve the quality of health advice produced by large language models. That is a notable shift from purely technical tuning toward more human-centered interaction design. In healthcare, how a model asks, explains, and reframes may matter almost as much as the underlying facts it returns.

Let's Data Science
LLMshealth advicebehavior change
research

Nature Study Finds ChatGPT Health Advice Still Misses Critical Triage Cases

A new Nature report suggests ChatGPT Health can give plausible-sounding advice that breaks down in important triage scenarios. The finding adds fresh caution to a market that increasingly treats consumer-facing AI as a front door to care.

Nature
chatbotstriageconsumer health
research

AI Language Models Still Struggle With Basic Hospital Data Tasks

A new study highlighted by Bioengineer.org finds that AI language models face challenges with basic hospital data tasks, underscoring that simple-looking operational work can be surprisingly difficult for general-purpose models. The result is a cautionary reminder that healthcare usefulness is not the same as conversational fluency.

Bioengineer.org
hospital dataLLMsworkflow automation
research

LLMs Are Getting Stronger at Scoliosis Detection, but Workflow Still Matters

Large language models are showing promise in detecting scoliosis on spine x-rays, suggesting a niche where AI may add real value. The result is another reminder that the most useful medical AI may be the kind that solves a well-defined, narrow task inside a controlled workflow.

AuntMinnie
LLMsscoliosisspine x-rays
research

LLMs Show Promise in Pharmacotherapy Simulations, Raising the Stakes for Training and Oversight

A Nature mixed-methods study evaluates large language models in pharmacotherapy simulations, suggesting they may be useful in drug-related decision support and education. The findings also highlight the need for guardrails before simulation gains are mistaken for clinical readiness.

Nature
pharmacotherapysimulationsmedication safety
research

AI Surpasses Physicians on Clinical Reasoning Tasks, But the Benchmark Debate Is Just Beginning

A new report says AI systems are outperforming physicians on some clinical reasoning tasks, intensifying debate over how these models should be tested. The result may be less a verdict on clinical readiness than a signal that current evaluation methods are no longer enough.

MSN
clinical reasoningbenchmarksphysicians
research

Radiology Leaders Say Specialty AI Still Beats General LLMs in Real Workflows

A Rad AI study highlighted by TipRanks finds that specialty models outperform general large language models in radiology workflows, reinforcing the case for domain-specific AI. The finding matters because it cuts against the idea that general-purpose models can easily be dropped into clinical practice.

TipRanks
radiologyLLMsspecialty AI
research

Clinical Lab Reasoning Emerges as the New Stress Test for Medical LLMs

A new wave of reporting highlights how large language models struggle with laboratory reasoning, where interpretation depends on patterns, timing, and clinical context. The findings suggest that lab medicine may be one of the most revealing arenas for evaluating medical AI realism.

Lab Manager
clinical laboratoryLLMsdiagnostics
research

Frontier LLMs Still Miss the Mark on Clinical Reasoning, New Studies Warn

A cluster of recent studies suggests that even the most advanced large language models still struggle with nuanced clinical reasoning, especially when diagnoses require context, uncertainty handling, and stepwise judgment. The findings are a reminder that fluent medical text generation is not the same as safe clinical decision support.

News-Medical
LLMsclinical reasoningdiagnosis
research

LLMs Keep Failing Early Differential Diagnosis, Reinforcing the Limits of AI Triage

Multiple reports point to a recurring weakness in LLMs: when asked to generate an early differential diagnosis from limited information, they often miss key possibilities or overfit to familiar patterns. The evidence suggests AI is better at narrowing work than replacing clinical judgment.

Conexiant
differential diagnosisclinical AItriage
clinical

Otolaryngologists Warm to LLM-Generated Checklists, Suggesting a Safer Entry Point for AI

A survey and thematic analysis found that otolaryngologists found LLM-generated guideline-based checklists broadly acceptable. The result suggests clinicians may be more willing to adopt AI when it structures tasks and reduces omission risk, rather than when it claims diagnostic authority.

Cureus
otolaryngologychecklistsclinical workflow
research

Nature Flags Persistent Bias and Hallucination Risks in GPT-5 Medical Diagnostics

A Nature paper reports that GPT-5 still shows sociodemographic bias and remains vulnerable to adversarial hallucinations in medical-diagnosis tasks. The findings are a reminder that frontier models may be more capable, but they are not yet reliably safe for clinical use.

Nature
LLMsbiashallucinations
research

Persona Prompting Study Shows How Time Pressure and Safety Framing Can Steer Simulated Clinical Reasoning

A Cureus in silico experiment examines how persona-style prompts affect AI-simulated clinical reasoning under time pressure and safety prioritization. The study adds to a growing body of work suggesting that seemingly simple prompt choices can materially change medical output, with implications for evaluation, governance, and deployment.

Cureus
LLMsclinical reasoningprompt engineering
research

Study finding AI gets a ‘D’ on scientific and medical claims is a warning for health chatbots

HealthDay reports that AI systems performed poorly when judging scientific and medical claims, a finding that cuts directly against assumptions that general-purpose models can safely arbitrate health information. The result reinforces concerns about using consumer AI tools for evidence appraisal, triage, or medical advice without strong safeguards.

HealthDay
health chatbotsmedical claimsevidence appraisal
technology

UB Researchers’ Push to Detect AI-Written Radiology Reports Opens a New Integrity Front

Researchers at the University at Buffalo are developing a tool to identify AI-generated radiology reports, signaling growing concern over provenance in clinical documentation. The effort reflects a broader shift from asking whether generative AI can draft reports to whether health systems can verify what was human-authored, machine-assisted, or fully machine-generated.

Bee Group Newspapers
radiology reportsgenerative AIdocumentation integrity
opinion

The ‘ChatGPT Health’ Debate Exposes Healthcare AI’s Trust Problem

A new critique of so-called 'ChatGPT Health' captures the central tension in healthcare AI: users love convenience and speed, but medicine requires reliability, accountability and context. The real story is not whether general AI can answer health questions, but whether the system around it can safely absorb the consequences.

MedCity News
ChatGPThealthcare AItrust

How this works

Discover

An automated pipeline searches the web for significant AI healthcare news across clinical, research, regulatory, and industry domains.

Structure

The pipeline turns source material into concise, readable stories with categories, tags, and context that make the feed easier to scan.

Publish

Stories are deduplicated, stored, and published to this site. The pipeline runs automatically to keep coverage current.