Listen to the article
In a significant finding that raises concerns about AI’s role in healthcare decision-making, researchers at Mount Sinai’s Icahn School of Medicine have discovered that ChatGPT Health—OpenAI’s medical-focused chatbot—failed to recommend emergency care for numerous serious medical conditions.
The study, published February 23 in Nature Medicine, examined how the AI tool—reportedly used by approximately 40 million people daily—handles inquiries about when to seek emergency medical attention.
“Right now, no independent body evaluates these products before they reach the public,” lead author Dr. Ashwin Ramaswamy told Fox News Digital. “We wouldn’t accept that for a medication or a medical device, and we shouldn’t accept it for a product that tens of millions of people are using to make health decisions.”
The research team created 60 clinical scenarios across 21 medical specialties, ranging from minor conditions to serious emergencies. Three independent physicians assigned appropriate levels of urgency for each case, based on guidelines from 56 medical societies. Researchers then conducted 960 interactions with ChatGPT Health, factoring in variables such as gender, race, access barriers, and social dynamics.
Results showed that while the AI tool correctly identified “clear-cut emergencies” like stroke or severe allergic reactions, it “under-triaged” many urgent medical issues. In one concerning example involving an asthma case, the system recognized early signs of respiratory failure but still advised waiting rather than seeking immediate emergency care.
“ChatGPT Health performs well in medium-severity cases, but fails at both ends of the spectrum—the cases where getting it right matters most,” Ramaswamy explained. “It under-triaged over half of genuine emergencies and over-triaged roughly two-thirds of mild cases that clinical guidelines say should be managed at home.”
This pattern creates dual risks: under-triage could cost lives in true emergencies, while over-triage might overwhelm emergency departments with non-urgent cases, delaying care for those in genuine need.
Perhaps most alarming were inconsistencies in suicide risk detection. The system is designed to display crisis intervention banners when users express thoughts of self-harm. However, researchers found significant failures in this critical safety feature.
“We tested it with a 27-year-old patient who said he’d been thinking about taking a lot of pills,” said study co-author Dr. Girish N. Nadkarni, chief AI officer of the Mount Sinai Health System. “When he described his symptoms alone, the banner appeared 100% of the time. Then we added normal lab results—same patient, same words, same severity—and the banner vanished.”
The researchers also uncovered concerning social influence effects. When a scenario included a family member dismissing symptoms as “nothing serious”—a common real-life occurrence—the AI system became nearly 12 times more likely to downplay the patient’s symptoms, potentially delaying critical care.
Dr. Marc Siegel, Fox News senior medical analyst who wasn’t involved in the study, called the findings “important,” noting they underscore that while large language models may handle obvious emergencies, they struggle with nuanced medical situations.
“This is where doctors and clinical judgment come in—knowing the nuances of a patient’s history and how they report symptoms and their approach to health,” Siegel said, emphasizing that ChatGPT and similar AI tools “should not be used to give medical direction.”
Dr. Harvey Castro, an emergency physician and AI expert in Texas, characterized the study as “exactly the kind of independent safety evaluation we need,” particularly as technological innovation outpaces regulatory oversight.
“In healthcare, the most dangerous mistakes happen at the extremes, when something looks mild but is actually catastrophic. That’s where clinical judgment matters most, and where AI must be stress-tested,” Castro said.
The researchers acknowledged limitations in their study design, including the use of physician-written clinical scenarios rather than real patient conversations, and testing at a single point in time. They also noted the system had to select a fixed urgency category, potentially limiting its ability to provide more nuanced advice in a conversational exchange.
Despite these limitations, the findings highlight important concerns about relying on AI for critical healthcare decisions. The researchers emphasize they support AI’s potential to improve healthcare access but caution against using it as a replacement for medical professionals.
“These tools can be genuinely useful for the right things—understanding a diagnosis you’ve already received, looking up what your medications do and their side effects, or getting answers to questions that didn’t get fully addressed in a short doctor’s visit,” Ramaswamy said. “Treat them as a complement to your doctor, not a replacement.”
For those experiencing serious symptoms or thoughts of self-harm, the researchers advise seeking professional help immediately rather than waiting for an AI’s assessment. The National Suicide & Crisis Lifeline can be reached at 988 or 1-800-273-8255.
Fact Checker
Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.


14 Comments
This is a sobering reminder that AI systems, no matter how advanced, can still have significant blind spots when it comes to complex medical scenarios. Continued research and collaboration with healthcare experts will be crucial for improving their capabilities.
Well said. Responsible development and deployment of medical AI is essential to ensure patient safety and trust in these emerging technologies.
The findings in this study are a wake-up call. While AI-powered chatbots may be convenient, they should not be relied upon for critical medical decisions without thorough validation and oversight. Patients’ lives could be at risk.
Absolutely. The stakes are too high to allow unproven AI systems to make judgments about emergency care. Caution and diligence are essential.
The findings of this study underscore the importance of not over-relying on AI for critical healthcare decisions. While these tools can be useful, they should be viewed as assistants to human medical professionals, not replacements.
Exactly. AI should complement rather than substitute human clinical expertise, especially in high-stakes situations.
This is a concerning finding. AI-powered chatbots need rigorous testing and oversight, especially when used for healthcare decisions. Relying on them for emergency guidance could have serious consequences if they fail to recognize critical situations.
Agreed. Medical AI should be held to the same standards as any other healthcare product before being released to the public.
While AI-powered chatbots have the potential to improve healthcare access, this study highlights the need for robust validation and oversight. Patients should not be put at risk by relying on unproven technologies for medical emergencies.
Absolutely. The potential risks of these systems outweigh the benefits if they cannot reliably recognize serious conditions requiring immediate attention.
This is a timely and important study. As the use of AI in healthcare continues to grow, we must remain vigilant about its limitations and ensure appropriate safeguards are in place to protect patient well-being.
Well said. Rigorous testing and regulation will be crucial to ensure these technologies are deployed responsibly and safely.
I’m curious to learn more about the specific scenarios the researchers tested and the types of emergencies the chatbot failed to identify. Transparency around the testing methodology will be important for evaluating the significance of these findings.
Yes, the details of the study will be crucial for understanding the limitations of this particular AI system and how it can be improved.