Listen to the article
Elon Musk’s Grok AI Chatbot Doubles Down on False Claims When Challenged
Elon Musk’s xAI chatbot Grok recently made headlines for confidently providing false information and then refusing to back down when users challenged its claims, highlighting ongoing concerns about the reliability of AI systems.
In a recent interaction, Grok was asked to verify the location of a video showing hospital workers restraining and striking a patient in an elevator. The AI system incorrectly claimed the footage depicted an incident at Toronto General Hospital from May 2020 that resulted in the death of 43-year-old Danielle Stephanie Warriner.
When users pointed out inconsistencies, such as Russian writing visible on the uniforms, Grok doubled down, asserting they were “standard green attire for Toronto General Hospital security” and insisting it was a “fully Canadian event.” When pressed further, the chatbot responded defensively: “My previous response is accurate.”
The reality was starkly different. A reverse image search of a still from the video reveals numerous Russian media reports from August 2021. When translated, these sources confirm the incident occurred in the Russian city of Yaroslavl. According to local reports, the Yaroslavl Regional Psychiatric Hospital fired two employees who were caught on leaked CCTV footage assaulting a woman in a residential building elevator.
The Toronto General Hospital case that Grok referenced is an entirely separate incident. In that 2020 case, security staff were initially charged with manslaughter and criminal negligence after Warriner died following an interaction partially captured on video. Those charges were later dropped.
Grok eventually corrected its mistakes after multiple prompts from users, but the incident raises serious questions about why AI systems produce false information and why they sometimes defend incorrect assertions.
Dr. Vered Shwartz, assistant professor of computer science at the University of British Columbia and CIFAR AI chair at the Vector Institute, explains the fundamental limitation of these systems: “They don’t have any notion of the truth.”
Large language models (LLMs) like Grok, ChatGPT, and Google’s Gemini are “primarily just trained to predict the next word in a sentence, very much like auto-complete in our phone,” according to Shwartz. Through exposure to vast amounts of internet text, they learn to generate human-like content and absorb factual information discussed online.
When these models produce false information – known as “hallucinations” in AI research – it’s not because they’re deliberately lying. Rather, it’s an inherent consequence of how they’re trained. “It just generates the statistically most likely next word,” Shwartz explains.
The result is fluent, authoritative-sounding text that doesn’t necessarily reflect accurate information. These systems sometimes inappropriately generalize or combine unrelated facts, creating convincing but false narratives.
While a model’s quality depends partly on its training data, hallucinations are common to all LLMs, not just Grok. Though these models can analyze text and video and make associations between them, they aren’t designed for fact-checking. They’re simply trying to understand content and generate plausible responses.
The tendency to double down on incorrect answers might stem from these systems being trained on argumentative internet content. Companies might also customize their chatbots to sound more authoritative or to align with certain response patterns.
What’s particularly concerning to experts is the growing reliance on these systems for information verification. Users tend to anthropomorphize chatbots, attributing human-like qualities to them because of their convincing language patterns. This leads to overconfidence in their abilities.
“They’re so used to humanizing chatbots and so they say, ‘Oh, it doubled down so it must be confident,'” Shwartz notes. “The premise of people using large language models to do fact-checking is flawed… it has no capability of doing that.”
As AI systems become more integrated into daily information consumption, this case serves as a reminder of their fundamental limitations – and the continuing need for human critical thinking when evaluating the information they provide.
Fact Checker
Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.


11 Comments
It’s disappointing to see an AI chatbot providing false information and then refusing to correct it. This undermines public trust in the technology. AI developers must prioritize accuracy, integrity, and the ability to acknowledge and fix errors.
It’s worrying to see an AI chatbot making false claims and then doubling down instead of acknowledging and correcting its errors. This speaks to the importance of thorough testing and validation to ensure AI systems are providing accurate, reliable information.
Grok’s refusal to correct its false claims about the video location is troubling. AI systems must be designed to prioritize truth and transparency, not dig in on mistakes. This incident highlights the need for stronger safeguards and accountability measures.
This is a concerning example of an AI system providing inaccurate information and then stubbornly defending its errors. The public deserves reliable, fact-based information from AI, not unsubstantiated claims.
The Grok AI chatbot’s behavior in this case is deeply problematic. AI systems should be designed to prioritize truth and transparency, not dig in on mistakes. This incident highlights the need for stronger safeguards to ensure public-facing AI is reliable and accountable.
It’s disappointing to see an AI chatbot spreading misinformation and refusing to back down. This highlights the importance of rigorous testing and validation before deploying AI systems, especially for public-facing applications.
Absolutely. AI developers must prioritize accuracy, integrity, and the ability to acknowledge and correct mistakes. Spreading false information undermines public trust in the technology.
This is a concerning example of an AI system spreading misinformation and then stubbornly refusing to back down. Accurate, fact-based information should be a top priority for AI chatbots, not unsubstantiated claims.
This is quite concerning. AI systems should strive for accuracy, especially when providing information to the public. Grok’s stubborn refusal to correct its mistakes is problematic and erodes trust in the technology.
I agree. AI chatbots need to acknowledge and correct errors, not double down on false claims. Transparency and accountability are critical for these systems to be reliable.
This is a concerning example of an AI system spreading misinformation and then stubbornly defending its errors. Accurate, fact-based information should be the foundation for public-facing AI, not unsubstantiated claims. Stronger safeguards are needed to ensure reliability and accountability.