Statistical Significance

A result is statistically significant when it would be unlikely to arise from chance alone, quantified by the p-value: the probability of observing a result at least as extreme as the data assuming the null hypothesis is true.

Why be cautious about "significant"?

Significance gives a sharable convention for “this isn’t just noise”, but the convention is routinely overread as proof of importance or truth.

Conventional thresholds:

  • : “significant” (5% chance of false positive under the null)
  • : “highly significant”
  • : “very highly significant”

What significance does NOT mean

  • Not the probability the hypothesis is true
  • Not the probability the result is real
  • Does not measure effect size; a tiny effect can be highly significant given a huge sample
  • Does not mean the result will replicate

A medication lowering blood pressure by 0.2 mmHg in a 100,000-person trial can be “highly significant” yet clinically worthless. Always ask how big the effect is, not just whether it is nonzero.

p-hacking

Run enough tests and some will hit by chance. This is the Texas Sharpshooter Fallacy applied to statistics: drawing the target around the bullets after the fact.