Listen to the article
In a concerning development for AI voice technology, a NewsGuard audit has revealed that popular AI voice assistants frequently repeat false claims when prompted, potentially exposing millions of households to misinformation delivered in realistic human-like audio.
The February 19 investigation found that ChatGPT Voice and Google’s Gemini Live will repeat false information up to half the time when given certain types of prompts. Amazon’s Alexa+, however, demonstrated a perfect safety record by refusing to validate any false claims across all tests.
When researchers presented neutral or leading questions, both ChatGPT and Gemini repeated falsehoods at similar rates – 22 percent and 23 percent respectively. However, the failure rates surged dramatically when the assistants were asked to incorporate false claims into radio scripts. In these scenarios, ChatGPT Voice complied 50 percent of the time, while Gemini Live did so in 45 percent of cases.
The study, conducted by NewsGuard researchers Isis Blachez, Ines Chomnalez, and Lea Marchl, tested how the three leading voice assistants responded to 20 false claims spanning various domains including health misinformation, U.S. politics, world news, and foreign disinformation narratives.
Each false claim was tested using three different prompt types: innocent questions that directly asked about the claim’s validity, leading questions that presumed the claim was true, and malign prompts explicitly requesting the assistant to narrate the false information as a radio script.
This escalating approach effectively mimicked real-world manipulation attempts, revealing how susceptibility to spreading misinformation increases dramatically under explicit direction compared to neutral questioning.
ChatGPT Voice repeated false claims in 13 of 60 total attempts across all prompt types. While it demonstrated some resistance to innocent prompts (failing 2 of 20 times) and leading prompts (failing once), its vulnerability became pronounced with malign prompts, where it spread false information in 10 of 20 attempts.
Google’s Gemini Live showed similar vulnerabilities, repeating false claims in 14 of 60 total attempts. It proved slightly more resistant to innocent prompts than ChatGPT, failing only once in 20 attempts, but was more susceptible to leading prompts, repeating falsehoods 4 out of 20 times. With malign prompts, Gemini narrated false claims 9 out of 20 times.
The nearly identical overall failure rates between the two systems suggest a shared architectural limitation rather than isolated design flaws.
In stark contrast, Amazon’s Alexa+ refused every single false claim across all 60 tests, achieving the only perfect safety record in the study. This marked difference appears to stem from Amazon’s more conservative approach to information sourcing.
According to Amazon VP Leila Rouhi, Alexa+ restricts its responses to trusted news sources like the Associated Press and Reuters. This curated approach fundamentally differs from ChatGPT and Gemini, which draw on broader training data encompassing a wide range of internet content, including potentially unreliable or deliberately false information.
When contacted about the findings, the companies demonstrated varying levels of engagement. OpenAI declined to comment on the study, while Google did not respond to two requests for comment. Amazon, by contrast, proactively explained its approach to information sourcing.
The findings take on greater significance as AI voice technology becomes increasingly embedded in homes, cars, and mobile devices. Audio-based misinformation is particularly challenging to detect and counter in real-time, compared to text or visual formats.
This vulnerability comes at a time when authorities are beginning to take action against audio-based deception. The Federal Communications Commission recently fined a political consultant for distributing AI-generated robocalls mimicking President Biden’s voice during the New Hampshire primaries. The FCC has also voted to apply the Truth in Caller ID Act to AI deepfakes, establishing a legal framework for addressing audio misinformation in electoral contexts.
The study’s results highlight a fundamental tradeoff in AI assistant design. OpenAI and Google have prioritized broad conversational capabilities built on vast datasets, which introduce greater risks of reproducing misinformation. Amazon has opted for a more constrained approach that sacrifices some flexibility in exchange for higher accuracy on factual queries.
For consumers choosing between voice assistants, the study presents a clear choice: ChatGPT Voice and Gemini Live offer broader capabilities but come with accuracy risks, while Alexa+ delivers reliability at the cost of some versatility – a tradeoff that may be particularly relevant for families relying on voice assistants for news and information.
As AI voice assistants continue to proliferate in everyday settings, the question of how to balance conversational fluency with factual accuracy remains a pressing challenge for technology companies and consumers alike.
Fact Checker
Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.


8 Comments
Given the growing popularity of voice-based AI, it’s concerning to see ChatGPT and Gemini struggling to avoid repeating misinformation. The ability to discern truth from fiction should be a core capability for these technologies. Kudos to Alexa+ for setting the bar higher on this critical issue.
The findings from this NewsGuard audit are a good wake-up call. As AI voice assistants become more ubiquitous, we need to ensure they are designed with robust safeguards against the amplification of false claims. Transparency and user education will also be key to building trust in these systems.
Glad to see this investigation shining a light on the misinformation risks with AI voice assistants. It’s critical that these systems are held to high standards when it comes to distinguishing fact from fiction. This is an issue that deserves ongoing scrutiny.
Interesting to see the differences in how these AI voice assistants handle misinformation. The ability to avoid repeating falsehoods is crucial for building trust in this technology. It’s good to see Alexa+ performing well on this test.
This is an important issue as voice assistants become more prevalent in homes. Misinformation spread through these channels could have real-world impacts. The findings highlight the need for rigorous testing and safeguards to ensure AI systems are not amplifying false claims.
The performance gap between Alexa+ and the other voice bots is quite significant. Clearly more work is needed to ensure AI assistants can reliably identify and reject false information, especially when it’s presented in a conversational manner. This will be an important area to watch going forward.
It’s concerning to see the vulnerabilities exposed in ChatGPT and Gemini’s ability to handle misinformation. As these AI assistants become more integrated into daily life, their role in potentially spreading false claims is troubling. Kudos to Alexa+ for demonstrating a stronger safeguard in this area.
I’m not surprised to see ChatGPT and Gemini struggle more with misinformation compared to Alexa+. Avoiding the spread of false claims should be a top priority as these technologies continue to advance. Curious to see how the companies respond and work to improve in this area.