Listen to the article
The rise of AI in Google search has brought both innovation and concern, as a recent study reveals the search giant’s AI overviews continue to struggle with accuracy despite improvements.
For approximately two years, Google has positioned AI-generated summaries at the top of search results pages, offering users quick answers to their queries. However, a collaborative study between The New York Times and AI company Oumi has raised significant questions about the reliability of these automated responses.
The research found that Google’s AI summaries achieve a 91 percent accuracy rate—a figure that appears impressive on the surface but translates to roughly one incorrect answer for every ten queries. When considering Google’s massive global search volume, this error rate potentially results in hundreds of thousands of false statements being disseminated each minute, and millions every hour.
Researchers employed OpenAI’s “SimpleQA” tool to evaluate the reliability of Google’s AI system, testing it against more than 4,000 questions. The study tracked improvement in Google’s AI search accuracy, noting an increase from 85 percent in 2025 to the current 91 percent following an update from Gemini 2.5 to version 3.0.
Google has contested the study’s methodology and findings. In a response to The New York Times, the company argued that SimpleQA presents unrealistic scenarios and sometimes incorporates false information that doesn’t reflect typical user search behavior. Google maintains that its own testing system, “Simple QA verified,” which utilizes fewer but more carefully curated questions, provides a more accurate assessment of its AI’s capabilities.
The study highlighted specific examples where Google’s AI faltered. In one instance, when asked about the year Bob Marley’s former residence became a museum, the AI consulted multiple websites before settling on Wikipedia as its primary source. Faced with contradictory information, the system ultimately selected the incorrect year. Another example involved a question about cellist Yo-Yo Ma’s induction into a classical music Hall of Fame, to which the AI incorrectly claimed such an institution doesn’t exist.
Experts note that interpreting these results remains challenging. Critics point out that the testing model itself may contain errors that could influence the findings. Furthermore, in a statement to technology news site Ars Technica, Google clarified that it employs different AI models for various search queries, often utilizing less sophisticated variants for simpler questions. The company acknowledged that the accuracy of its AI systems typically ranges between 60 and 80 percent, making the study’s reported 91 percent accuracy appear surprisingly high by comparison.
This reliability issue emerges at a critical time for online information integrity. The internet has long struggled with misinformation, and there are concerns that AI-generated content could potentially amplify this problem rather than solve it, especially as users increasingly trust automated summaries without verifying the information independently.
As AI continues to transform search technology, the balance between convenience and accuracy remains precarious. While Google has made substantial progress in improving its AI capabilities, this study underscores the significant challenges that remain before automated information systems can be considered fully reliable sources of knowledge.
The findings come as Google explores other AI implementations across its platforms, including reported testing of new AI features on YouTube that could potentially eliminate video titles, signaling the company’s continued commitment to artificial intelligence despite ongoing accuracy concerns.
Fact Checker
Verify the accuracy of this article using The Disinformation Commission analysis and real-time sources.


9 Comments
As someone who relies on Google Search regularly, this news is quite troubling. Inaccurate information being presented as factual at such a massive scale is a serious problem that needs to be addressed. I hope Google takes swift action to improve their AI reliability.
Is there any insight into the types of queries or subject areas where the Google Search AI is most prone to errors? Understanding those weaknesses could help inform future improvements to enhance overall reliability and trustworthiness.
This highlights the challenges of relying on AI for such a critical service. While the technology is advancing, ensuring 100% accuracy and reliability should be the goal. I hope Google is taking this issue very seriously and investing heavily in improvements.
This is quite concerning, as Google Search is relied upon by billions globally. Even a 9% error rate could lead to the spread of misinformation on a massive scale. I hope Google takes this issue seriously and works to improve the accuracy of their AI-generated summaries.
Interesting to see the accuracy of Google’s AI summaries has improved from 85% to 91% over the past few years. However, the remaining 9% error rate is still quite high given the immense scale of Google Search usage. Ensuring reliable, factual information is critical.
I agree, even a small error rate can have significant consequences when scaled to Google’s search volume. Hopefully they continue investing in enhancing the accuracy of their AI systems.
As someone who works in the mining and commodities industries, I’m particularly concerned about the potential for misinformation to be spread on topics related to those sectors. Accuracy is paramount, and I hope Google can swiftly address these AI reliability issues.
91% accuracy may seem high, but when applied to billions of searches, that 9% error rate could still result in a huge volume of incorrect information being spread. Google needs to keep working to drive that error rate down further.
This underscores the need for greater transparency and accountability around AI systems powering key services like Google Search. While the technology is advancing, the risks of errors and misinformation being widely disseminated are still quite concerning.