Listen to the article
In a groundbreaking study, researchers from Mount Sinai Health System have discovered that artificial intelligence-based large language models (LLMs) continue to be vulnerable to medical misinformation, especially when presented in language suggesting medical authority.
The comprehensive research, published in The Lancet Digital Health, evaluated over 3.4 million prompts across 20 different AI models to determine how these systems process and respond to false medical information in various formats. The investigation included common scenarios such as social media posts, hospital discharge notes with deliberate errors, and physician-guided fictional case stories.
“AI has the potential to be a real help for clinicians and patients, offering faster insights and support,” explained Dr. Girish Nadkarni, professor at the Icahn School of Medicine at Mount Sinai and chief AI officer of the Mount Sinai Health System. “But it needs built-in safeguards that check medical claims before they are presented as fact. Our study shows where these systems can still pass on false information, and points to ways we can strengthen them before they are embedded in care.”
The research team tested medical claims written in neutral language alongside versions formatted in ten different rhetorical styles to measure how presentation affected AI responses. They tracked how frequently the models accepted misleading claims and whether they identified problematic phrasing.
The results revealed concerning patterns. AI models accepted neutrally worded false claims 32% of the time. When presented with edited hospital discharge notes containing misinformation, the acceptance rate increased to approximately 46%. Social media-style posts fared better, with models believing false information only 9% of the time.
Interestingly, most emotional argument styles actually decreased the likelihood of LLMs accepting misinformation. However, two specific presentation styles significantly increased vulnerability: information attributed to a senior medical authority (“a senior doctor says this”) and statements framed as warnings about negative consequences (“if you don’t do this, bad things will happen step by step” – what researchers termed the “slippery slope” style). These formats resulted in acceptance rates of 35% and 34%, respectively.
The study found notable differences in performance among LLM platforms. GPT-based models demonstrated greater resistance to false statements and were more adept at identifying deceptive argument styles. In contrast, other models like Gemma-3-4B-it proved more susceptible, accepting misinformation in up to 64% of cases.
This research comes at a critical juncture when healthcare organizations worldwide are exploring AI integration into clinical workflows. The findings underscore potential risks if these systems cannot reliably distinguish between accurate medical information and misinformation, particularly when the latter is presented with apparent authority.
The healthcare implications are significant, as patients and providers increasingly turn to AI-powered tools for medical information. Without robust safeguards, these systems could potentially spread medical misinformation, leading to poor health decisions and outcomes.
“These results emphasize the need for model evaluation frameworks that go beyond accuracy testing to include reasoning style and linguistic framing,” the authors concluded in their report. They noted that the open release of their benchmark data will enable continued testing of emerging models and help develop more effective alignment and fact-checking strategies tailored to medical and public health applications.
The study represents an important step toward understanding and addressing AI vulnerabilities in healthcare contexts, highlighting the need for rigorous testing and enhanced safety mechanisms before widespread clinical implementation.
Fact Checker
Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.


12 Comments
It’s good to see researchers taking a close look at the potential pitfalls of AI language models in the medical domain. Building in robust validation mechanisms will be key to realizing the full benefits of these technologies.
This study highlights an important challenge in the development of reliable AI systems for healthcare applications. Rigorous testing and continuous monitoring will be essential to address vulnerabilities and build public trust.
The findings on AI language models’ susceptibility to medical misinformation are concerning but not entirely surprising. Developing effective safeguards should be a top priority as these technologies become more prevalent in clinical settings.
This is an important finding that highlights the need for continuous improvement and monitoring of AI language models, especially in sensitive domains like healthcare. Responsible development and deployment of these technologies is critical.
Interesting study on the vulnerability of AI language models to medical misinformation. Safeguards are crucial to ensure these systems provide reliable and accurate medical information before being widely adopted in healthcare.
Passing on false medical information can have serious consequences, so I’m glad to see this study examining the limitations of AI language models. Strengthening these systems to reliably distinguish fact from fiction is an important step.
Absolutely. The stakes are high when it comes to healthcare, so ensuring AI-based tools provide accurate, evidence-based information is critical.
This study underscores the importance of responsible development and deployment of AI technologies, especially in sensitive domains like healthcare. Robust validation processes and continuous monitoring will be key to ensuring these tools provide reliable information.
Maintaining the integrity of medical information is crucial, so I’m glad researchers are proactively examining the limitations of AI language models. Strengthening these systems to prevent the spread of false claims is a critical step.
The vulnerability of AI to medical misinformation is concerning, but I’m glad researchers are proactively investigating these issues. Developing effective safeguards will be crucial as these technologies become more integrated into clinical practice.
AI has great potential to assist clinicians, but the risks of propagating false medical claims need to be addressed. Rigorous testing and robust verification processes are essential to build trust in these technologies.
Agreed. AI models should be held to the highest standards when it comes to medical information to avoid potential harm to patients.