2025-11-12
Association is not causation is perhaps the most important lesson one can learn in a statistics class.
Correlation is not causation is another way to say this.
Throughout the statistics part of the book, we have described tools useful for quantifying associations between variables.
However, we must be careful not to over-interpret these associations.
Empirical association (i.e. Correlation): A relationship must exist between the cause and effect
Temporal priority: The cause must happen before the effect. This is considered the most widely accepted requirement for a causal relationship.
There is a dose-response relationship: increasing the “dose” of the cause increases the effect.
Plausibility: There is a plausible mechanism that explains how the cause leads to the effect.
Exploration and Elimination of alternative causes: Find confounders and understand their role.
There are many reasons that a variable \(X\) can be correlated with a variable \(Y\), without having any direct effect on \(Y\).
Today we describe four: spurious correlation, data dredging, reverse causation, and confounders.
With reverse causation the correlation is real, it is just not causal
# A tibble: 1,000,000 × 2
group r
<int> <dbl>
1 487822 0.809
2 312927 0.799
3 458790 0.785
4 824422 0.781
5 160316 0.772
6 658446 0.767
7 646127 0.761
8 444083 0.758
9 242805 0.757
10 898776 0.754
# ℹ 999,990 more rows
This particular form of data dredging is referred to as p-hacking.
P-hacking is a topic of much discussion because it poses a problem in scientific publications.
Since publishers tend to reward statistically significant results over negative results, there is an incentive to report significant results.
In epidemiology and the social sciences, for example, researchers may look for associations between an adverse outcome and several exposures, and report only the one exposure that resulted in a small p-value.
Similarly with high throughput biology it is routine to test 100’s to 1000’s of genes, genetic variants, epigenetic sites
unless care is taken to perform some form of p-value adjustment these results are unlikely to replicate
This does not necessarily happen due to unethical behavior, but rather as a result of statistical ignorance or wishful thinking.
In advanced statistics courses, you can learn methods to adjust for these multiple comparisons.
This high correlation is driven by the one outlier.
Another way association is confused with causation is when the cause and effect are reversed.
An example of this is claiming that tutoring makes students perform worse because they test lower than peers that are not tutored.
In this case, the tutoring is not causing the low test scores, but the other way around.
Quote from NY Times:
When we examined whether regular help with homework had a positive impact on children’s academic performance, we were quite startled by what we found. Regardless of a family’s social class, racial or ethnic background, or a child’s grade level, consistent homework help almost never improved test scores or grades…
\[ X_i = \beta_0 + \beta_1 y_i + \varepsilon_i, i=1, \dots, N \]
The model fits the data very well.
The model is technically correct.
The estimates and p-values were obtained correctly as well.
But we cannot interpret this as saying that the heights of the son’s has any causal effect on the height of the fathers
Confounders are a common reason for associations being misinterpreted.
When studying the relationship between and exposure and an outcome a confounder is third variable that:
Is related to the exposure
Affects the outcome, and is not caused by the exposure
hence the confounder can affect the perceived relationship between the exposure and the outcome.
Earlier, when studying baseball data, we saw how Home Runs were a confounder that resulted in a higher correlation than expected when studying the relationship between Bases on Balls and Runs.
In some cases, we can use linear models to account for confounders.
However, this is not always the case.
Incorrect interpretation due to confounders is ubiquitous in the lay press and they are often hard to detect.
Here, we present a widely used example related to college admissions.
two_by_two <- admissions |> group_by(gender) |>
summarize(total_admitted = round(sum(admitted / 100 * applicants)),
not_admitted = sum(applicants) - sum(total_admitted))
two_by_two |>
mutate(percent = total_admitted/(total_admitted + not_admitted)*100)# A tibble: 2 × 4
gender total_admitted not_admitted percent
<chr> <dbl> <dbl> <dbl>
1 men 1198 1493 44.5
2 women 557 1278 30.4
What’s going on?
This actually can happen if an uncounted confounder is driving most of the variability.
This particular version is called Simpson’s Paradox
The case we have just covered is an example of Simpson’s paradox.
It is called a paradox because we see the sign of the correlation flip when comparing the entire population to specific strata.
You can see that \(X\) and \(Y\) are negatively correlated.
However, once we stratify by \(Z\) (shown in different colors below), another pattern emerges.
It is really \(Z\) that is negatively correlated with \(X\).
If we stratify by \(Z\), the \(X\) and \(Y\) are actually positively correlated, as seen in the plot above.
when studying medical records an interesting phenomenon called collider bias can occur
Example from Wikipedia: https://en.wikipedia.org/wiki/Berkson%27s_paradox
An example presented by mathematician Jordan Ellenberg is that of a dating pool, measured on axes of niceness and handsomeness. A person might conclude from their own dating experience that “the handsome ones tend not to be nice, and the nice ones tend not to be handsome”.
Alice decides to only date men that that are above some line of handsomeness and niceness
But as a result the really handsome ones do not have to be that nice and the really nice ones do not need to be that handsome
so now Alice notices a correlation in her dating pool even when there is no such correlation in the population
studies of hospital records can come to the wrong conclusions
being in hospital requires that the person have something serious and medically adverse
unless care is taken spurious negative associations can be found
when looking at everyone in the hospital for appendicitis the researchers find that few if any of them have kidney disease
and when comparing their appendicitis patients to controls, taken from patients in the hospital they find this association to be significant
it would be wrong to believe that this suggests that kidney disease is protective against appendicitis
they have to have something wrong with them, and since both kidney disease and appendicitis are rare it is very unlikely that many patients will have both