Statistical Significance
A result is statistically significant when it would be unlikely to arise from chance alone, quantified by the p-value: the probability of observing a result at least as extreme as the data assuming the null hypothesis is true.
Why be cautious about "significant"?
Significance gives a sharable convention for “this isn’t just noise”, but the convention is routinely overread as proof of importance or truth.
Conventional thresholds:
- : “significant” (5% chance of false positive under the null)
- : “highly significant”
- : “very highly significant”
What significance does NOT mean
- Not the probability the hypothesis is true
- Not the probability the result is real
- Does not measure effect size; a tiny effect can be highly significant given a huge sample
- Does not mean the result will replicate
A medication lowering blood pressure by 0.2 mmHg in a 100,000-person trial can be “highly significant” yet clinically worthless. Always ask how big the effect is, not just whether it is nonzero.
p-hacking
Run enough tests and some will hit by chance. This is the Texas Sharpshooter Fallacy applied to statistics: drawing the target around the bullets after the fact.