In 1906, the British polymath Francis Galton stopped at a country fair in Plymouth and watched a contest in which 800 people each paid sixpence to guess the weight of a slaughtered ox. None of them knew the answer. A few were butchers. Most were ordinary fairgoers. Galton, an aristocrat and an unrepentant elitist, expected the average of their guesses to be wildly off — confirmation that the masses, as he assumed, were no substitute for experts.
When he ran the numbers later, he found that the median of the 800 guesses was 1,207 pounds. The actual weight of the ox was 1,198. The crowd, against his expectations, had landed within 1% of the truth. He published the result in Nature in 1907 and quietly revised some of his views. The episode became one of the founding anecdotes of what we now call the wisdom of crowds.
But the literature on this idea is not a vindication of crowds in general. It is a careful description of the precise conditions under which crowd judgments outperform expert ones — and the equally precise conditions under which they fail badly. Both halves matter.
When crowds get it right
The journalist James Surowiecki, in his 2004 book of the same title, summarized the conditions for an effective wise crowd as four-fold. The list has held up well in subsequent research.
Diversity of opinion. Each person needs to bring some private information or some idiosyncratic angle. If everyone is reading from the same source and reasoning the same way, the crowd is not aggregating; it is multiplying.
Independence. People's judgments must not be primarily shaped by knowing what others are guessing. The Galton ox-weighting worked because no one was looking over anyone else's shoulder. The moment social pressure or visible consensus enters, the average starts to converge on the loudest opinion rather than on the truth.
Decentralization. Local knowledge — what is happening on this floor of the factory, in this region, with this customer — needs to feed into the aggregate. A central planner cannot replicate this.
A way to aggregate. There has to be a mechanism — a market, a vote, a survey, an average — that combines individual judgments into a collective one.
Where these conditions are met, the math becomes elegant. If each guess has some signal plus uncorrelated noise, averaging cancels the noise. The Condorcet jury theorem, formulated by the eighteenth-century French mathematician the Marquis de Condorcet, captured a version of this in 1785: if each voter is more likely than not to be right, the probability of the majority being right rises rapidly with group size.
This is why prediction markets, when properly designed, often outperform individual forecasters; why Wikipedia is, on average, surprisingly accurate on matters of fact; why a roomful of analysts asked to estimate a number independently and then averaged tends to outperform any one of them.
When crowds get it wrong
A crowd is a powerful estimator only when its members can think for themselves.
Take away independence and the wise crowd becomes a stampeding mob. The replication crisis in social psychology damaged some specific findings about herd behavior, but the basic phenomena — informational cascades, conformity pressure, social proof — are robust. Solomon Asch's classic conformity experiments in the 1950s showed that subjects would deny the obvious evidence of their senses to align with a confident-looking majority. Robert Cialdini's work on social influence has documented dozens of mechanisms by which the appearance of consensus produces actual consensus.
This means crowds tend to fail in predictable ways:
Information cascades. Each person, seeing what others have done, rationally weighs the apparent group judgment more than their own private signal — and the group converges on a position no one independently believes. Financial bubbles, fashion trends, and sudden social-media pile-ons all show this pattern.
Polarization in groups that talk. When like-minded people deliberate, they often emerge holding more extreme versions of their original views, not more moderate ones. Cass Sunstein has documented this effect across legal, political, and corporate settings.
Loss of diversity. As organizations grow, they tend toward shared norms, shared sources, and shared blind spots. The crowd may still be large, but its members increasingly think the same way — and a homogeneous crowd is no longer a wise one.
Aggregation problems. Some questions don't have averageable answers. "Should we go to war?" cannot be averaged like "How much does this ox weigh?" Crowds work best on tractable, single-answer estimation problems — and worse the further you move from that case.
What this means in practice
The lesson from the literature is not "trust crowds" or "trust experts." It is more nuanced and more useful.
For estimation problems with diffuse local knowledge, crowds — properly designed, with independent judgments and a clean aggregator — tend to outperform any individual, including expert individuals. This is the case for prediction markets, distributed forecasting, and well-designed surveys.
For specialized technical problems, experts beat crowds, and beating them by larger margins. Crowds are bad at brain surgery, bad at structural engineering, bad at quantum chromodynamics.
For value-laden questions, crowds are not really doing the same kind of work at all. A crowd can vote on what most people prefer. It cannot vote on what is right. The aggregation procedures that work for facts do not work for values, and pretending otherwise produces pathology.
For institutions trying to harness crowd wisdom, the design matters more than the size. A small group of independent thinkers, each contributing their honest private signal, regularly outperforms a much larger group reading the same brief and watching each other's faces.
The honest takeaway
Galton's 1907 result is real and worth remembering. So is the long history of crowds going badly wrong — markets in the grip of mania, public opinion sweeping toward atrocity, juries delivering verdicts that pre-existed the deliberations.
The wisdom of crowds is not a feature of crowds as such. It is a feature of certain aggregation procedures, applied to certain kinds of questions, in certain group conditions. Strip those away and what remains is not wisdom but the volume of the mob, amplified.
The literature counsels an interesting humility. Your individual hunch, on a tractable estimation question, is probably worse than the average of many independent ones. And the crowd's hunch is probably worse than yours on a question requiring expertise or moral judgment, even when the crowd is loud. Both lessons are uncomfortable. They are both, on the evidence, true.



