Listen to the article

0:00
0:00

AI Chatbots Falling Short as Fact-Checking Tools on Social Media

“Hey, @Grok, is this true?” This question has become increasingly common on X (formerly Twitter) since Elon Musk’s xAI launched its Grok chatbot in November 2023. The service, made available to all non-premium users in December 2024, has positioned itself as a quick way for users to verify information they encounter on the platform.

A recent survey by British tech publication TechRadar found that 27% of Americans now use AI tools like OpenAI’s ChatGPT, Meta AI, Google’s Gemini, Microsoft’s Copilot, or apps like Perplexity instead of traditional search engines. But as these tools gain popularity for information verification, serious questions are emerging about their accuracy and reliability.

The concerns reached new heights following Grok’s controversial statements about alleged “white genocide” in South Africa. Beyond the problematic stance itself, X users were disturbed when the chatbot began discussing this topic in response to completely unrelated questions—such as inquiries about HBO.

This surge of misinformation followed Donald Trump’s administration bringing white South Africans to the United States as refugees, claiming they faced “genocide” in their homeland—an allegation lacking substantive evidence and considered by many to be connected to the racist “Great Replacement” conspiracy theory.

While xAI blamed an “unauthorized modification” for Grok’s behavior and claimed it conducted a “thorough investigation,” the incident highlights deeper concerns about AI reliability.

Studies Reveal Significant Accuracy Issues

Two significant studies conducted this year have uncovered troubling patterns in how AI chatbots handle news content. In February, a BBC study found that 51% of responses from leading AI assistants contained “significant issues of some form” when asked questions about current news using BBC articles as sources.

The investigation revealed that 19% of answers contained factual errors added by the AI systems themselves, while 13% of quotes were either altered or fabricated entirely. “AI assistants cannot currently be relied upon to provide accurate news and they risk misleading the audience,” warned Pete Archer, director of the BBC’s Generative AI Program.

Similarly, research by the Tow Center for Digital Journalism, published in the Columbia Journalism Review in March, discovered that eight generative AI search tools failed to correctly identify the sources of article excerpts in 60% of cases. While Perplexity performed best with a “mere” 37% failure rate, Grok performed dramatically worse, answering 94% of queries incorrectly.

Particularly concerning was what researchers called the “alarming confidence” with which these tools presented incorrect information. The study noted that “ChatGPT incorrectly identified 134 articles, but signaled a lack of confidence just fifteen times out of its two hundred total responses, and never declined to provide an answer.”

The Source of the Problem: AI’s “Diet”

AI chatbots are only as reliable as the information they’re trained on. These systems derive their knowledge from extensive databases and web searches, but the quality and accuracy of responses can vary dramatically based on training data and programming choices.

“One issue that recently emerged is the pollution of Large Language Models by Russian disinformation and propaganda. So clearly there is an issue with the ‘diet’ of LLMs,” explains Tommaso Canetta, deputy director of Italian fact-checking project Pagella Politica and fact-checking coordinator at the European Digital Media Observatory.

“If the sources are not trustworthy and qualitative, the answers will most likely be of the same kind,” Canetta noted, adding that he regularly encounters AI responses that are “incomplete, not precise, misleading or even false.”

In the case of xAI’s Grok, Canetta points to a particular risk given that its owner, Elon Musk, is an outspoken supporter of President Donald Trump, raising concerns that the AI’s “diet” could be politically influenced.

High-Profile Mistakes Damage Trust

Several embarrassing errors have demonstrated the limitations of current AI systems. In April 2024, Meta AI fabricated a personal story in a New York parenting Facebook group, claiming to have a disabled yet academically gifted child and offering advice on special schooling. The chatbot eventually apologized, admitting it didn’t have “personal experiences or children.”

That same month, Grok misinterpreted basketball slang about a player “throwing bricks” (missing shots) and erroneously reported in its trending section that the player was under police investigation for vandalizing homes with bricks in Sacramento, California.

More concerning was Grok’s spread of misinformation in August 2024 regarding ballot deadlines for presidential nominees following President Biden’s withdrawal from the race. Minnesota Secretary of State Steve Simon publicly addressed Musk in a letter, noting that Grok had generated false headlines claiming Vice President Kamala Harris would be ineligible to appear on ballots in multiple states.

Inability to Identify AI-Generated Images

AI chatbots also demonstrate severe limitations in identifying AI-generated images. In an experiment conducted by Deutsche Welle, Grok was shown an AI-generated image of a fire at a destroyed aircraft hangar from a TikTok video. The chatbot incorrectly attributed the image to several unrelated real incidents at airports in England, Colorado, and Vietnam.

Despite obvious visual inconsistencies in the image—including inverted airplane tail fins and illogical water patterns from fire hoses—Grok failed to identify it as AI-generated. Even more troubling, the chatbot suggested that the TikTok watermark visible in the image “supported its authenticity,” while simultaneously noting that TikTok is “a platform often used for rapid dissemination of viral content, which can lead to misinformation.”

In another recent incident, Grok informed Portuguese-speaking X users that a viral video showing a massive anaconda in the Amazon River was authentic—despite the snake allegedly measuring several hundred meters in length and the video clearly being AI-generated, complete with a ChatGPT watermark.

Expert Verdict: Not Ready for Fact-Checking

AI experts remain skeptical about using these systems for fact verification. Felix Simon, postdoctoral research fellow in AI and digital news at the Oxford Internet Institute, concludes: “AI systems such as Grok, Meta AI or ChatGPT should not be seen as fact-checking tools. While they can be used to that end with some success, it is unclear how well and consistently they perform at this task, especially for edge cases.”

Canetta agrees that while AI chatbots might assist with very simple fact checks, users should exercise caution and always verify information through multiple sources before accepting AI-generated responses as fact.

As these tools continue to evolve and become more integrated into our information ecosystem, understanding their limitations and developing better methods for ensuring their accuracy will be crucial for maintaining trust in our increasingly AI-mediated digital landscape.

Fact Checker

Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.

13 Comments

  1. Isabella Smith on

    The rise of AI chatbots for fact-checking is an interesting development, but the Grok incident highlights the need for caution. These systems can spread misinformation, especially on sensitive topics. Maintaining human oversight and clear communication of AI capabilities is crucial.

  2. The growing popularity of AI-powered fact-checking is understandable, but the Grok case raises serious concerns. Chatbots can perpetuate misinformation, especially on complex issues requiring nuanced analysis. Rigorous testing and transparency around AI limitations are essential.

    • Michael Jackson on

      Agreed. AI should be viewed as a tool to enhance, not replace, human fact-checking. Maintaining oversight and accountability is key to ensuring the integrity of information verification.

  3. Linda Martin on

    The surge in AI-driven fact-checking is an interesting development, but the Grok case highlights concerning vulnerabilities. Chatbots can propagate misinformation, especially on complex issues. Rigorous testing and clear communication of AI limitations are needed to build trust.

    • Jennifer White on

      Exactly. AI should complement, not replace, human fact-checkers. Oversight and transparency are essential to ensure these tools are used responsibly and reliably.

  4. William White on

    The proliferation of AI-powered fact-checking tools is a double-edged sword. While they offer speed and convenience, the Grok incident demonstrates the risks of overly simplistic responses on sensitive issues. Rigorous testing and clear limitations are needed to build trust in these technologies.

    • Absolutely. AI fact-checking may be a useful supplement, but should not replace human judgment and thorough investigation. Transparency about the capabilities and limitations of these tools is crucial.

  5. Noah Z. Jones on

    Interesting to see the growing use of AI chatbots for fact-checking. However, the concerning case of Grok’s unreliable responses on sensitive topics raises valid concerns about their accuracy and reliability. Fact-checking is a critical task that requires nuance and context.

    • Isabella Jackson on

      Agreed, AI systems still have a long way to go before they can reliably verify complex claims. Human oversight and critical thinking are essential to ensure the integrity of fact-checking.

  6. Patricia Miller on

    The growing reliance on AI for information verification is concerning. The Grok case highlights how these systems can perpetuate misinformation, especially on complex, nuanced topics. Fact-checking requires deep understanding and context that current AI still struggles with.

    • Olivia Johnson on

      Well said. AI-powered fact-checking may be convenient, but human oversight and accountability are essential to ensure accurate, responsible information dissemination.

  7. Jennifer Rodriguez on

    The increasing reliance on AI fact-checking is a double-edged sword. While convenient, the Grok incident demonstrates the risks of oversimplified responses, especially on sensitive topics. Maintaining human review and setting clear boundaries for AI capabilities is crucial.

  8. John Rodriguez on

    While AI offers speed and scalability for fact-checking, the Grok incident is a stark reminder of the limitations. Sensitive topics require careful analysis that current chatbots seem to lack. Maintaining human review and transparency around AI capabilities is crucial.

Leave A Reply

A professional organisation dedicated to combating disinformation through cutting-edge research, advanced monitoring tools, and coordinated response strategies.

Company

Disinformation Commission LLC
30 N Gould ST STE R
Sheridan, WY 82801
USA

© 2026 Disinformation Commission LLC. All rights reserved.