Listen to the article
AI Chatbots Struggle with Health Misinformation, Study Reveals
Popular artificial intelligence chatbots are performing poorly when answering questions about health topics prone to misinformation, according to new research published in the medical journal BMJ Open. The study raises concerns about the reliability of AI systems as sources of health information at a time when public use of these tools is rapidly expanding.
Researchers from Harbor-UCLA Medical Center, led by Dr. Nicholas B. Tiller, conducted a comprehensive audit of five major chatbots: Google’s Gemini, DeepSeek’s High-Flyer, Meta AI, OpenAI’s ChatGPT, and xAI’s Grok. The evaluation, carried out in February 2025, tested how these AI systems responded to health questions across five sensitive categories: cancer, vaccines, stem cells, nutrition, and athletic performance.
The results paint a concerning picture of AI reliability in healthcare communication. Nearly half (49.6%) of all responses generated by the chatbots were deemed problematic by expert reviewers, with 30% classified as “somewhat problematic” and 19.6% as “highly problematic.”
While performance was relatively consistent across most platforms, Elon Musk’s Grok chatbot stood out for concerning reasons. It generated significantly more highly problematic responses than would be expected under random distribution, according to the study’s statistical analysis.
The research revealed clear patterns in topic performance. Chatbots demonstrated their strongest capabilities when discussing vaccines and cancer, subjects with well-established scientific consensus and extensive medical literature. However, they struggled considerably with questions about stem cells, athletic performance, and especially nutrition – areas often plagued by conflicting information, evolving research, and commercial interests.
Reference quality was another major weakness identified in the study. The median completeness score for citations was just 40%, with researchers noting that no chatbot produced a fully accurate reference list. Many responses contained hallucinated or entirely fabricated citations – a particularly troubling finding given that proper sourcing is essential for verifying health information.
Readability presented additional barriers, with the chatbots’ responses generally graded as “difficult” and equivalent to college sophomore-senior level, making them potentially inaccessible to many users seeking health guidance.
“By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences,” the study authors explained. “They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments. This behavioral limitation means that chatbots can reproduce authoritative-sounding but potentially flawed responses.”
The findings come at a critical juncture as AI companies increasingly position their chatbots as helpful tools for information seeking, including in health domains. The healthcare industry has witnessed rapid AI adoption, with applications ranging from administrative support to clinical decision assistance. Market analysts project the global healthcare AI market to exceed $200 billion by 2030.
Public health experts have previously expressed concerns about the potential for AI systems to either combat or amplify health misinformation. While properly designed AI tools could theoretically help address the infodemic of misleading health claims online, this study suggests current mainstream chatbots may instead contribute to the problem.
The research underscores the need for continued improvement in AI systems before they can be considered reliable sources for health information. It also highlights the importance of media literacy and critical thinking when evaluating AI-generated content on sensitive topics like healthcare.
As AI technology continues to evolve and integrate into everyday information seeking, these findings serve as a reminder that human expertise remains essential in evaluating complex health information and that AI outputs should be approached with appropriate skepticism, especially regarding topics vulnerable to misinformation.
Fact Checker
Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.


5 Comments
I appreciate the researchers taking a close look at this issue. AI-powered chatbots have a lot of potential, but they need to be thoroughly vetted before being deployed as sources of health information. Rigorous testing and oversight are essential.
I’m curious to know more about the specifics of the study. What were the most problematic areas, and what are the key limitations of current chatbot technology in the healthcare domain? These findings highlight the importance of human oversight and expertise.
It’s good to see this research being conducted. As the use of AI chatbots grows, it’s critical that we understand their limitations, especially when it comes to sensitive topics like health and medical advice. Transparency and accountability should be priorities.
This is an important reminder that we can’t blindly trust AI systems, especially when it comes to complex, high-stakes domains like healthcare. While the technology is advancing, there’s still a lot of work to be done to ensure reliable and safe chatbot performance.
This is concerning, but not entirely surprising. AI chatbots still struggle with complex, nuanced health topics prone to misinformation. More rigorous testing and validation is clearly needed before these tools can be relied upon for sensitive medical information.