Why Some Sciences Mastered Replication Long Before Others

Picture yourself sitting in a university library with a stack of psychology journals and a spreadsheet. You're not looking for anything dramatic. You're just running the numbers on statistical power, study by study, and the numbers are coming back wrong. Most studies underpowered. Effect sizes implausibly large. And a striking proportion of the findings, when someone actually bothered to run the experiment again, simply gone. The replication crisis had a name. What it didn't have, for a long time, was an answer to the more interesting question: why were particle physicists and chemists largely baffled by the fuss?

They'd been replicating things for a century.

The disciplines that got there first built replication into the price of entry

Physics didn't develop a replication culture because physicists are more virtuous. It developed one because the phenomena physicists study punish sloppiness with immediate, visible failure. Build a bridge on a miscalculated load-bearing figure and you'll know soon enough. Run a particle accelerator experiment with a flawed detector calibration and the data will be incoherent in ways that are hard to hide. The feedback loop is short and brutal. Beyond that, the objects of study in physics are, in most cases, identical: one electron behaves exactly as every other electron. No between-subjects variance. No cultural context. You can run the Millikan oil-drop experiment in Osaka and in Oslo and expect the same answer, and when you don't get it, everyone agrees that something went wrong, not that the electrons in Osaka have a different psychology.

Chemistry built the same culture through a different door. Synthesis is the test. If you publish a procedure for producing a compound and another lab can't reproduce it, the compound is suspect, full stop. The whole enterprise of organic chemistry rests on the assumption that a published synthesis is a recipe, not a story. Journals like the Journal of the American Chemical Society have required detailed procedural disclosure for generations, because the readers are practitioners who will attempt the work themselves. Replication isn't a methodological nicety in that world. It's how you use the literature.

Contrast that with social psychology in the mid-twentieth century. The phenomena being studied were real, but they were also context-dependent, participant-dependent, and often measured through self-report or behavioral proxies that only loosely tracked the underlying construct. An effect found with American undergraduates in one decade might not survive in a different population, a different generation, or with a different experimenter. That isn't a flaw unique to psychology: it is a feature of studying humans. But the field chose, for a long stretch, to treat this variance as noise to be averaged away, not as signal worth investigating. A single striking result, published in a prestigious journal, could anchor a subfield for twenty years before anyone ran it back.

The statistical culture made it worse. The near-universal reliance on p < 0.05 as a publication threshold created an ecosystem where a result that crossed the line got published and a result that didn't got filed in a drawer. Think of it as a sieve that only keeps the gold-coloured pebbles, regardless of whether they're actually gold. Brian Nosek and his collaborators, through the Reproducibility Project, eventually put hard numbers on this: across 100 psychology studies, roughly 36 to 39 percent produced a significant effect on replication, with average effect sizes considerably smaller than the originals. Those numbers have been debated, refined, and contextualized since. The direction of the finding has not seriously been challenged.

What people get wrong about why this took so long

The easy story is that softer sciences are less rigorous, populated by researchers who are less careful or less honest. That story is mostly wrong, and it is worth saying so plainly. The harder story is structural. Physics and chemistry had external validators built into their incentive systems: industry, engineering, and manufacturing would quickly surface a result that didn't replicate, because someone was trying to use it. Pharmaceutical development has a brutal replication mechanism called Phase III trials, which is why medicine, despite its own serious replication problems in observational research, developed a strong randomized-controlled-trial culture for treatment claims.

Psychology and much of social science lacked that external forcing function.

The audience for most findings was other academics, who cited results rather than replicating them. A theory about ego depletion or stereotype threat didn't have a customer who would call back and say it didn't work. So ask yourself: what system, exactly, was supposed to catch the errors?

There wasn't one. Consider two researchers, call them Carla and David, who both published strong priming effects in the same year. Carla's finding got built into a popular book and cited 800 times. David's ended up in a meta-analysis that quietly noted the effect was probably half the original size. Neither was dishonest. Both were working inside a system that rewarded the first result and had no efficient mechanism for surfacing the second. The problem was never the scientists. The problem was the architecture.

The disciplines that got replication right early didn't do so because they planned it. They did so because their subject matter, their industrial connections, or their publication norms created real consequences for getting things wrong. The lesson for the fields still catching up isn't to feel ashamed. It's to build those consequences in deliberately, because if they don't, the subject matter will eventually do it for them, and that process is considerably less orderly.

The disciplines that got there first built replication into the price of entry

What people get wrong about why this took so long

Why Some Scientific Findings Don't Replicate Across Cultures

Grant Applications Select for Personality, Not Science

What Keeps Open-Source Science Infrastructure Alive

Clinical Trial Geography and Its Limits on Medical Evidence