Why Some Scientific Findings Don't Replicate Across Cultures

The Study That Travelled Badly

You read the headline, nod, and move on. Humans are fundamentally competitive. That facial expression means the same thing in Tokyo as in Toledo. The study looked rigorous, the sample size respectable, and the write-up carried the quiet authority of peer review. Then, a few years later, a shorter and much less celebrated follow-up paper noted that the finding had dissolved the moment researchers tested it somewhere else. Not weakened. Dissolved.

This is not a fringe problem. It sits at the centre of what researchers now call the replication crisis, and the cultural dimension of that crisis is the part most popular accounts skip entirely. Some scientific findings are genuinely universal: the boiling point of water, the mechanism by which a virus hijacks a cell, the relationship between sleep deprivation and reaction time. Others turn out to be local products, as specific to their place of origin as a regional dialect. Understanding which is which, and why, is one of the more consequential methodological arguments in contemporary science.

WEIRD, and Why That Label Sticks

The acronym that reframed the debate comes from a paper by psychologists Joseph Henrich, Steven Heine, and Ara Norenzayan. They coined WEIRD: Western, Educated, Industrialized, Rich, Democratic. Their argument was blunt and, at the time, uncomfortable. The overwhelming majority of subjects in behavioural and social science research were drawn from a slice of humanity representing perhaps twelve percent of the global population, and that slice was being treated as a proxy for all of it.

The numbers are hard to argue with. Studies published in top psychology journals have historically drawn somewhere between sixty and ninety percent of their participants from the United States alone. When Henrich and colleagues examined the actual distribution of human variation on traits like spatial reasoning, fairness intuitions, and visual perception, they found that WEIRD populations were frequently outliers, not averages. The undergraduates filling out questionnaires in exchange for course credit were, in many measurable respects, the least representative sample a researcher could have chosen.

The label names the problem. It doesn't explain the mechanism.

What Actually Breaks When You Cross a Border

Think of scientific findings as sitting on a spectrum between two poles. At one end: findings anchored in biology so basic that culture cannot reach them. At the other: findings that are essentially descriptions of a particular social environment, dressed up in universal language.

The Müller-Lyer illusion is a useful case. Two lines of equal length, one with inward-pointing arrows at each end and one with outward-pointing arrows, reliably look different in length to most observers. For decades this was treated as a hardwired perceptual quirk, the kind of thing textbooks reproduce without caveat. Then cross-cultural testing showed that the magnitude of the illusion varies significantly between populations, with people raised in environments full of right-angled architecture (carpentered worlds, as researchers call them) showing stronger susceptibility. The biology is shared. The calibration is cultural.

Now scale that principle up to something murkier, like social conformity. The Asch conformity experiments, conducted in mid-twentieth-century America, showed that a meaningful proportion of participants would agree with an obviously wrong answer if the group around them endorsed it. The finding replicated in many places. But the rate of conformity shifted substantially between cultures, tracking something like the collectivism-individualism axis that researchers like Geert Hofstede had been mapping since the 1970s. The phenomenon exists broadly. The magnitude is local.

The catch: most published papers report a finding as though the magnitude were the finding. "People conform" is different from "American undergraduates conform at this rate." Conflating the two is how a result becomes a headline and then becomes a failed replication.

The Mechanism, With a Worked Example

Imagine two researchers, call them Dr. Okonkwo in Lagos and Dr. Lindqvist in Stockholm, both replicating a classic American study on ultimatum bargaining. The original study found that people routinely reject financial offers they perceive as unfair, even when accepting would leave them better off than refusing. The conclusion: humans have an evolved sense of fairness that overrides pure self-interest.

Dr. Okonkwo runs the experiment in a market community where the norms around negotiation are explicit and well-rehearsed. Participants treat the lab game as a negotiation, not a moral test. Rejection rates fall. Dr. Lindqvist runs it with participants embedded in a strong welfare-state culture where financial inequality carries particular social stigma. Rejection rates rise. Neither result is wrong. Both are accurate descriptions of what happened in that room, with those people, carrying those expectations into a scenario the original researchers assumed was culturally neutral.

The original paper had no mechanism for knowing which part of its result was the universal claim and which was the local calibration. It couldn't know, because it only looked in one place. A finding from a single population is a hypothesis, not a conclusion. Treating it as a conclusion is where the trouble starts, and that distinction is one the field has been painfully slow to enforce.

What People Get Wrong About This

The common misreading is that this is a story about bad scientists or sloppy methods. It isn't, mostly. The replication problem in cross-cultural research is largely structural, and blaming individual researchers for it is like blaming individual cab drivers for traffic. Academic incentives reward novelty and clean results. Running a study in four countries costs four times as much and produces a messier paper that is harder to publish. Funding bodies prefer a sharp finding to a nuanced one. The result is a literature systematically skewed toward single-site, single-culture data reported as universal truth.

There is also a subtler error worth naming: the assumption that findings which do replicate are therefore culturally neutral. They may simply reflect shared exposure to similar conditions. Findings about screen time and attention, for instance, might replicate across wealthy urban populations worldwide not because the effect is universal, but because those populations share the relevant environmental variable. Replication across ten countries is more reassuring than replication across two. It is still not the same as replication across the full range of human conditions.

The findings that hold most robustly tend to be the ones with the most direct biological substrate and the fewest intervening social variables: basic conditioning, certain aspects of memory encoding, physiological stress responses. The further a finding gets from the body and into the realm of judgment, preference, or social behaviour, the more it needs to earn its universality rather than assume it. That earning takes time, money, and a willingness to publish complicated answers. The incentive structure of academic science currently rewards none of those things.

The Consequence Worth Sitting With

Ask yourself this: how many public health campaigns, school curricula, or workplace interventions have you seen justified by a study that was, at its foundation, a snapshot of one particular kind of person in one particular kind of place?

Policy built on behavioural research inherits its assumptions. A public health intervention designed around findings on risk perception, or a school curriculum shaped by a theory of motivation, carries the cultural fingerprint of wherever the underlying research was done. Like a key copied from a copy rather than the original, each iteration drifts a little further from what it was meant to fit. When that research was done in one particular kind of university, in one particular kind of country, the intervention may work brilliantly in similar environments and achieve almost nothing elsewhere.

The researchers who take this most seriously are now calling for something they describe as a credibility revolution: pre-registration of studies, multi-site replication as a condition of publication, and explicit statements about the population from which conclusions can be drawn. It is slow. It is expensive. It produces less exciting headlines.

Still, the alternative is a science that mistakes its own reflection for a mirror held up to humanity, and the cost of that error does not stay inside the laboratory. It travels, badly, into the world.

The Study That Travelled Badly

WEIRD, and Why That Label Sticks

What Actually Breaks When You Cross a Border

The Mechanism, With a Worked Example

What People Get Wrong About This

The Consequence Worth Sitting With

Why Some Sciences Mastered Replication Long Before Others

What Decides Whether a Scientific Anomaly Gets Investigated

Grant Applications Select for Personality, Not Science

When the Measuring Stick Is Wrong: Science in Crisis