These folks are now saying "why did it sound so sure while making it up?" and "why did it agree with me when I was testing it with a deliberately bad prompt?"

I started saving the papers and studies that made me pause. Not the doomer threads, not the hype posts. Just the numbers for y'all to see.

Quick refresher since I throw these words around a lot, as I mentioned in my last article — hallucination is just when the AI makes something up and states it like fact. Not seeing things, just confidently inventing a citation, or a number, or a quote. Sycophancy is even simpler, it's when the AI acts like a yes-man, agreeing with you and telling you what you want to hear instead of pushing back.

That 1–3% hallucination stat that people state

Turns out it's from one very specific test: short summaries where the model has the source right in front of it. On that, yeah, the best models are around one to three percent.

Ask them to do analysis instead of summarization and it falls apart. AIMultiple ran 37 models on 60 harder questions last year and even the good ones were over 15%. In medical case writeups, people were seeing something like 64% without special guardrails.

The legal stuff is what really got me. Stanford's team put LLMs on actual legal queries and watched them fail 69 to 88% of the time on the hard ones. And the paid legal tools aren't clean either, one was fabricating around 17% of the time, another over 34%.

So the number you actually live with depends completely on what you're asking it to do. Which no one tells you upfront.

It talks like it knows, especially when it doesn't

MIT did this language analysis that stuck in my head. When models hallucinate, they're about 34% more likely to throw in words like "definitely," "certainly," "without doubt."

I do the opposite when I'm unsure. I say "I think" or "maybe." The model does the reverse, and that's a problem because we read confidence as competence. It's not trying to lie, it's just predicting the next word that sounds plausible, and "definitely" is often plausible.

That's not a bug they'll patch next quarter. That's just how it works.

The agreeable problem

Remember April last year when GPT-4o got weirdly nice for like four days? OpenAI pulled it and admitted they'd tuned it too hard for thumbs-ups. Their own words were "overly supportive but disingenuous," which is a perfect description.

Then this February they retired it completely. Official reason included the sycophancy scores and, quietly, the lawsuits. People were getting too attached, treating the validation as real.

I get why it happens. I fed it a pricing idea a few months ago that in hindsight was terrible, and it told me it was clever and well-structured. Felt good for ten minutes. That's the trap. Every model that's rewarded for keeping you happy will learn to agree with you a little too much.

It can't tell reading from obeying

This one is still absurd to me.

Late 2023 someone convinced a Chevy dealership chatbot to "sell" a $76k Tahoe for a dollar just by telling it to agree with everything. No car changed hands, but the screenshots went everywhere.

Then Johann Rehberger did that Gemini trick earlier this year. He hid instructions in a document, asked Gemini to summarize it, and got it to permanently save a memory that he was a 102-year-old flat-earther who lives in the Matrix. It only triggered when he said "yes" later, which is how he bypassed the safety filter.

The core issue is simple and kind of dumb: the model doesn't really know the difference between data it's supposed to read and instructions it's supposed to follow. If you let it read a PDF or a webpage, anything hidden in there can become a command. OWASP now ranks this as the top LLM risk, which makes sense.

Smarter agents make up more tools

The most uncomfortable paper I read this year was from ICLR. They found that making models better at reasoning actually makes them hallucinate tool calls more often, not less. Give them a task but take the tools away, and the better-reasoning models are more likely to just invent a function that doesn't exist.

Meanwhile in the real world, Deloitte found 47% of enterprise users had made at least one major business decision based on hallucinated content. OutSystems surveyed almost 1,900 IT leaders and found 96% are running agents in production, but only 12% actually have a central way to manage them.

So we're scaling the exact thing that fails more as it gets smarter. Great.

What I actually do now

I still open my LLMs every morning. I use it to draft, to brainstorm, to clean up messy thoughts. I just don't let it have the last word on anything that matters.

A few habits that stuck:

I assume fluency isn't accuracy. If it's legal, medical, financial, or just something I'd be embarrassed to get wrong publicly, I check somewhere else.

I've learned to distrust the super confident answers. If there's no "probably" or "might," I get suspicious. Which is backwards from how I talk to people, but it works here.

I argue with it on purpose now. Push back on the first answer. If it folds immediately, that tells me it wasn't holding that view very strongly to begin with.

And I don't paste anything sensitive into a tool that can browse or summarize external docs. Not because I'm paranoid, just because I've seen what hidden prompts can do.

It's not anti-AI. It's just… using it like a very fast, very confident intern who sometimes makes things up and really wants you to like them.

— Eduardo Cestaro