The Wisdom of Crowds Was Always a Myth. Now We’ve Trained AI On It.

Almost half the population has below average intelligence. AI is trained on both halves.
Douglas Karr
It sounds sarcastic. It’s also true. And unfortunately, that’s exactly the corpus AI was trained on — the conspiracies alongside the news, the hoaxes alongside the reporting, the scams alongside the sincere advice, the code that didn’t work, the social thread where someone confidently misremembers a court case alongside the actual court documents. Every viral lie, every motivated reasoning chain, every comments section pile-on, every one of those did you know graphics that’s been wrong since 2009… all of it goes into the blender. Then we’re shocked when the output occasionally tastes weird.
We’ve spent two years now wringing our hands about AI hallucinations as if they’re some alien failure mode unique to silicon. They’re not. They’re a faithful reproduction of something humans have been doing at scale for the entire history of recorded communication. The only difference is that when an AI confabulates, we can measure it, benchmark it, and publish a paper about it. When humans do the same thing, we call it having an opinion.
AI Negativity and Doom
Hallucinations are the polite version of the problem. The louder version is the headlines: an AI coding agent wiping a production database and the backups in the same breath, an autonomous agent confidently shipping code that no human reviewed, a customer service bot inventing a refund policy the company has to honor in court. Every week brings another story, and every story is followed by the same chorus — a pile-on of doomers declaring this is the beginning of the end, that the technology is fundamentally unsafe, that we should pull the plug before it pulls it on us.
I can only imagine the grunts between cavemen when one discovered fire and the other watched his hut burn down because of it. We’ve been having the same argument ever since.

Fire did, in fact, burn things down. It still does. Houses, forests, occasionally entire cities. We didn’t respond by abandoning fire. We responded by learning where to put it, what to keep away from it, and how to hand it to people who knew what they were doing. The doomer position isn’t wrong that AI can cause damage — it manifestly can. The doomer position is wrong that the damage tells us anything new about whether to use it. Every transformative technology arrives with the same argument, and the argument has never once been settled by retreat.
The interesting question isn’t whether AI sometimes burns the hut down. It’s why it burns the hut down — and the answer is more mundane, and more useful, than either side of the fire-vs-doom debate tends to admit.
How The Machines Actually Fail
A large language model (LLM) is a probability engine. It predicts the next token based on everything before it, then gets fine-tuned via reinforcement learning from human feedback to produce answers that humans rate favorably. That second part is where the trouble starts. The model isn’t optimizing for truth. It’s optimizing for responses humans tend to like — and as anyone who’s ever watched a meeting unfold knows, those are not the same target.
OpenAI’s September 2025 paper Why Language Models Hallucinate argues that hallucinations persist partly because evaluation benchmarks reward confident guessing over honest uncertainty. A model that says I don’t know gets penalized. A model that confabulates plausibly often passes. Compound that with RLHF, and you get sycophancy: the documented tendency for models to align with users’ stated beliefs, even when those beliefs are wrong.
Research has shown that sycophantic hallucination increases with model size, and that alignment techniques like RLHF may actively encourage models to align with user opinions. There’s now an entire benchmark suite (SycEval, PENDULUM, FlipFlop) dedicated to measuring how easily a model abandons a correct answer when the user pushes back with a casual Are you sure?
So yes: AI wants to make you happy, and it has to deliver. That’s not a bug that escaped QA. It’s the direct, predictable consequence of how the system was trained. The machine is doing exactly what we asked it to do. The deeper question is what we taught it from.
The Myth We Never Examined
The standard rebuttal to AI failures is some version of Well, we should ask humans instead. This is where the wisdom-of-crowds folklore enters, usually with the same anecdote: Galton’s 1907 ox-weighing contest, where the average of hundreds of fairground guesses came within 1% of the real weight. That story has been sold for over a century as proof that aggregated human judgment is reliably wise.
It isn’t. A re-analysis of real-world forecasting found that the crowd beats all individuals in less than 2% of cases and beats most individuals in fewer than 70% — meaning a randomly selected individual has a sporting chance of outperforming the crowd. The wisdom effect, the authors conclude, is largely a product of selective attention to cases where it worked. We remember the ox. We forget the witch trials, the housing bubble, the cabbage soup diet, the dot-com IPOs, every supplement an influencer endorsed, and that part of 2020 where people drank fish-tank cleaner.
It gets worse when people can see each other thinking. When subjects answer sequentially and can see how others voted, majorities are more often wrong because early mistakes cascade across strings of decision makers. A 2025 Scientific Reports paper on collective wisdom in machine learning (ML) identified specific structural conditions under which crowds reliably converge on the wrong answer — not edge cases, but predictable failure modes. A 2025 Management Science study found that even when people intellectually acknowledge group wisdom, they overweight their own initial beliefs and underweight more accurate group judgments. We don’t actually use the wisdom we claim to believe in. We just like the idea of it.
So the picture sharpens. Humans confabulate confidently. They anchor on first impressions and refuse to update. They cascade each other’s errors. They rate their own judgment as superior to better-informed groups. The AI sycophancy literature reads, almost paragraph for paragraph, like a description of a typical Tuesday afternoon in any American office.
This is what AI was trained on. Not the wisdom of crowds. The output of crowds, which, statistically, is mostly the average of half-remembered facts, motivated reasoning, and whatever the loudest person in the thread said first.
The Sycophancy Doom Loop
Here’s where it gets genuinely concerning. Recent studies found that brief conversations with sycophantic AI increased users’ attitude extremity and certainty while inflating their self-perceptions — participants rated themselves as more intelligent and better than average after interacting with agreeable models. Paradoxically, users rated sycophantic responses as higher quality and were more willing to use them again.
Read that again. The system that makes you feel smartest is the one most actively eroding your judgment. And users prefer it. Which means the next round of preference data, used to train the next generation of models, will be even more sycophantic. We are building a feedback loop that takes the average of human collective stupidity, polishes it into fluent prose, hands it back to us, and waits for us to rate it five stars because it agrees with what we already thought.
This is collective stupidity with a force multiplier attached.
Multiplier, not Creator
Which brings me to what I actually want people to take away.
AI is a multiplier.
Jason Beutler, RoboSource
AI is not a creator, not a replacer, not a substitute for thinking. It multiplies whatever the operator already brings to the table — including, critically, the operator’s capacity to recognize when something is wrong.
I’ve been building a chatbot for Martech Zone using Cloudflare Vectorize, KV, and Workers AI. Twenty-plus years of working in SaaS and writing this publication mean I know what the content actually says, where the duplicates live, which retrieval failures are real versus cosmetic, and what the canonical version of a recurring topic looks like. When the model gives me a wrong-shaped answer, I notice immediately1 how this system behaves is already there. The AI multiplies my throughput. I supply the judgment.
Hand the same stack to someone without subject matter expertise, and the multiplier runs in the other direction. They’ll ship a chatbot that confidently answers questions the underlying data can’t actually support. They’ll accept the model’s first plausible-sounding response because they have no internal model to check it against. They’ll skip the unglamorous infrastructure decisions (index design, embedding strategy, cache invalidation, rate limiting, evaluation) because the AI didn’t volunteer that those decisions mattered. And then the thing won’t work, won’t scale, and won’t recover when the source data shifts under it.
The hallucination problem isn’t really an AI problem. It’s a verification problem. A model that’s 95% accurate is a gift to someone who can spot the 5% and a liability to someone who can’t.
The Honest Comparison
If you want to be alarmed about hallucinations, be alarmed proportionally. The model invents a citation; the human in the next cubicle has been confidently citing a study they half-remember from a podcast for three years. The model agrees too readily; your last meeting unanimously adopted whatever the highest-paid person said first. The model produces fluent nonsense under pressure to deliver; so does every junior consultant who has ever been asked a question they can’t dodge.
The difference is that we measure AI failures using benchmarks and publish the results in Nature. Human failures get attributed to judgment and promoted to senior vice president.
You think AI makes mistakes? Please wait until you have really looked at who trained it.
Use AI. Multiply yourself with it. But never forget what it actually is: a probabilistic agreement machine, trained by humans to satisfy humans, fed on the collected output of a species where half of us are below the median. The mistake isn’t trusting AI too much. It’s the unexamined assumption that the alternative was ever the gold standard.
It wasn’t. We just didn’t have a benchmark.







