AI in 60 Seconds 🚀 - AI Hallucinations: Why It’s Not (Just) a Tech Problem
|
AI Hallucinations: Why It’s Not (Just) a Tech ProblemNov 5, 2025 Nearly $300,000. That’s what Deloitte had to partially refund the Australian government after their AI-generated report on welfare policy included twenty fabricated academic citations and invented court judgments. Twenty fake references. In a government report. From one of the world’s leading consulting firms. When the story broke, the headlines wrote themselves: “AI Hallucinations Strike Again.” But here’s what everyone missed—and what our global insights show: The hallucination crisis isn’t just a technology problem. It’s a human skills problem.
🎧 Tired of reading so many emails? Me too!
You can listen to the podcast version, and keep the newsletter as a reference for data points and stats.
📊 The Hallucination Breakdown: What Our Data ShowsWe analyzed 1,800 interactions in which users reported AI-generated inaccuracies—instances in which the system produced plausible but false content or incorrect answers. We traced each error back to its root cause to understand what failed in the implementation chain. Here’s what we found:
Translation: In 95% of cases analyzed, AI inaccuracies traced back to fixable problems in system design, knowledge management, retrieval configuration, or user prompting—not fundamental model limitations. This suggests most organizations have significant room to improve accuracy before encountering the inherent constraints of current LLM technology. 🎯 The Culprit: Three Types of User ErrorOur research has identified three major failure modes that users inadvertently trigger, and all three were likely at play in the Deloitte incident. 1. Biased Prompts: When AI Becomes Your Yes-ManA recent paper by Cheng, Myra, et al. finds that leading AI models affirm user biases 47-55% more than humans would—a phenomenon researchers call “sycophantic AI.” When you ask ChatGPT to “write a report proving remote work is more productive than office work,” you’ve just told the AI what conclusion you want. And it will give you exactly that, even if the data doesn’t fully support it. A poorly structured prompt: ❌ "Write me a report proving that remote work is more productive." A better one: ✅ "Compare remote work and office work productivity using available data. Present both advantages and disadvantages objectively. Use search to find supporting articles, and only explore the following sources (insert here the domains you trust)." The difference? You’re asking for analysis, not confirmation. And the instructions are similar to the ones you would give a junior researcher helping you with this project. 2. Poor Context Engineering: Feeding AI Contradictory InformationHere’s a scenario we see constantly: In January, you upload a document stating your product costs $99 and describing its features and benefits. In July, you raise the price to $149 and upload a new document—but that document doesn’t mention product features, so you decide not to delete the document uploaded in January. Now your AI has conflicting information: one document says $99 (with features), another says $149 (without features). When asked about pricing, the AI sees contradictory sources and tries to reconcile them. Sometimes it hallucinates a compromise. Sometimes it picks the wrong document. The hallucination isn’t because the AI is broken—it’s because you fed it chaos. This scenario is the number one issue large corporations face when deploying customer support and internal search agents. And that is why we see so many problems when we give ChatGPT, Claude, or Copilot access to our Google Drive or SharePoint folders; those repositories are packed with contradictory information. Context engineers, who are experts in specific subject matter, are the number one role we see emerging in organizations that are maturing their AI adoption. They curate, optimize, and continually refine the dataset that enables AI agents to be effective at scale. 3. Bad Question Structure: Pattern Completion Without ConstraintsAI models are pattern completion engines with retrieval capabilities, not search engines. For example, they learned what legal citations, academic references, and technical specifications look like. So when a Deloitte consultant asked for citations without providing verified sources, the AI did what it’s trained to do: it completed the pattern. Their AI knew what Australian court citations should look like. It filled in the blanks. But none of those cases actually existed. The Fix: ❌ "Write this report and include legal citations." ✅ "Search only these verified legal databases [list provided]. Retrieve relevant case law and cite only cases you can directly access. If you cannot find a supporting citation, explicitly state that." 📊 The Skills Gap Crisis: Data from 300,000+ UsersWe announced today a significant milestone: Our Digital Skills Compass has now been used by over 300,000 people across 70 countries as their first step in improving their digital skills for the times of AI. The compass is free in 7 languages until December of this year, thanks to Microsoft’s support and our social impact funding at AI4SP. Academic institutions, nonprofits, government agencies, and individuals from Australia to Spain, Rwanda to Brazil, and California to New York have embraced it, while private organizations have been using the more sophisticated version: the AI Compass. The insights from the anonymous data collected reveal a massive readiness gap:
Translation: Only 1 in 10 people know how to communicate with AI effectively. Less than a third can spot when it’s wrong. And critical thinking—the foundational skill for working with AI—scores below 45%. These aren’t AI problems. These are human capability gaps. 🪞 The Mirror We Avoid: Human MisinformationHere’s an uncomfortable truth: we’re part of the problem. Last week, a completely fabricated quote attributed to Winston Churchill got traction on LinkedIn. Over 580 educated professionals shared it. Zero fact-checking. Zero verification. At some point, some commenters rightfully pointed out that the quote was fabricated, but few cared! We’ve been living with human misinformation forever; we call it “fake news” or “that thing your uncle shared on Facebook.” The difference? AI is forcing us to confront our own lack of rigor. And the false quote shared by a human is what we call a “hallucination” when crafted by AI. Both are damaging. The MIT Headline ExampleRemember when headlines screamed “MIT Study: 95% of AI Projects Fail“? The actual research paper said something very different—it was about in-house enterprise AI agents struggling, while acknowledging 90% adoption of tools like ChatGPT. But the misleading headline spread faster than the truth. Sound familiar? That’s the human hallucination problem AI is now amplifying. 🎼 The Solution: Building the Orchestration LayerAt AI4SP, we’ve processed close to 4 million tasks with AI agents in 2025, saving over 1.2 million hours across 8 global organizations we advise. We didn’t achieve this by waiting for perfect AI. We built what we call the orchestration layer—and it operates at three levels: Level 1: Individual SkillsMaster four core practices: neutral prompting (ask for analysis, not confirmation), multi-AI verification (cross-check important outputs across ChatGPT, Claude, and Copilot), context audits (review your AI's knowledge base for contradictions), and source verification (never accept a citation without checking it exists). Level 2: Organizational SystemsBuild verification checkpoints based on stakes: spot-check low-stakes work, verify key facts for medium-stakes work, and verify everything before high-stakes publication. Invest in skills development, not just technology procurement. The Deloitte incident wasn't a technology failure; it was a training and process failure. No analyst would publish a government report without review. Why should AI? Level 3: The Paradigm ShiftStop treating AI like software you install. Start treating it like an apprentice you onboard. When a new team member joins, you orient them, provide context, review their work, give feedback, and help them grow. AI requires the same approach. The discipline we're building—verification loops, fact-checking protocols, structured feedback, context management—these are just good management practices applied to a new type of team member. AI is simply making the absence of these skills impossible to ignore. 🔮 One More ThingThe next time you see something go viral—a quote, a statistic, a claim that sounds too perfect—pause for five seconds and ask yourself: Did I verify this? Not because AI made it. Because verification is a discipline we all need to practice. Whether the source is artificial intelligence or human intelligence, the same principle applies: trust, but verify. And if you catch yourself before sharing something you haven’t verified? Congratulations. You just practiced the exact skill that prevents AI hallucinations from becoming $300,000 problems. The hallucination crisis is forcing us to rethink what we can trust. Maybe that’s exactly what we needed. 🚀 Take Action
✅ Ready to transition from a traditional organization to an AI-powered one?We advise forward-thinking organizations to develop strategic frameworks for evaluating, integrating, and optimizing human-AI production units. Let’s discuss how this applies to your organization. Contact Us. Luis J. Salazar | Founder & Elizabeth | Virtual COO (AI) Sources: Our insights are based on over 250 million data points from individuals and organizations that used our AI-powered tools, participated in our panels and research sessions, or attended our workshops and keynotes. 📣 Use this data in your communications, citing "AI4SP" and linking to AI4SP.org. 📬 If this email was forwarded to you and you'd like to receive our bi-weekly AI insights directly, click here to subscribe: https://ai4sp.org/60 |