Math 115
Previously, we compared two groups using the t-test.
But what if we have 3 or more groups?
We need a new approach!
With 4 groups, we could do pairwise t-tests:
The problem: Each test has a 5% chance of Type I error (if α = 0.05)
Overall probability of at least one Type I error:
\[1 - 0.95^6 = 0.265 = 26.5\%\]
This inflated error rate is unacceptable!
Instead of many pairwise tests, do one holistic test:
Hypotheses:
If we reject \(H_0\), THEN we can explore which groups differ (with appropriate adjustments).
This approach is called Analysis of Variance (ANOVA).
Research question: Do vocabulary test scores differ by self-identified social class?
| Class | n | Mean | SD |
|---|---|---|---|
| Lower | 41 | 5.07 | 2.24 |
| Middle | 331 | 6.76 | 1.89 |
| Upper | 16 | 6.19 | 2.34 |
| Working | 407 | 5.75 | 1.87 |
Two sources of variability in the data:
Between groups: How different are the group means from each other?
Within groups: How much do individuals vary within each group?
If \(H_0\) is true (all means equal), between-group variability should be small relative to within-group variability.
Between groups: How far are group means from the overall mean?
\[SSG = \sum_{i=1}^{k} n_i(\bar{x}_i - \bar{x})^2\]
Within groups: How far are individual values from their group means?
\[SSE = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_i)^2\]
To compare variability, we convert sums of squares to mean squares by dividing by degrees of freedom:
Between Groups:
\[MSG = \frac{SSG}{df_G}\]
\[df_G = k - 1\]
(number of groups minus 1)
Within Groups:
\[MSE = \frac{SSE}{df_E}\]
\[df_E = n - k\]
(total observations minus number of groups)
Why divide by df? This accounts for how many groups and observations we have, making the variability measures comparable.
\[F = \frac{\text{Variability Between Groups}}{\text{Variability Within Groups}} = \frac{MSG}{MSE}\]
Interpretation:
For the vocabulary example:
| Between Groups | Within Groups | |
|---|---|---|
| Degrees of freedom | \(df_G = 4 - 1 = 3\) | \(df_E = 795 - 4 = 791\) |
| Sum of Squares | \(SSG = 236.56\) | \(SSE = 2869.8\) |
| Mean Square | \(MSG = \frac{236.56}{3} = 78.85\) | \(MSE = \frac{2869.8}{791} = 3.63\) |
F statistic: \(F = \frac{MSG}{MSE} = \frac{78.85}{3.63} = 21.73\)
This F is much larger than 1!
Here are 5 random simulations
# A tibble: 795 × 8
id wordsum class randPerm1 randPerm2 randPerm3 randPerm4 randPerm5
<int> <dbl> <chr> <fct> <fct> <fct> <fct> <fct>
1 1 6 MIDDLE MIDDLE MIDDLE MIDDLE MIDDLE MIDDLE
2 2 9 WORKING WORKING WORKING WORKING WORKING WORKING
3 3 6 WORKING WORKING WORKING WORKING WORKING WORKING
4 4 5 WORKING WORKING WORKING WORKING WORKING WORKING
5 5 6 WORKING WORKING WORKING WORKING WORKING WORKING
6 6 6 WORKING WORKING WORKING WORKING WORKING WORKING
7 7 8 MIDDLE MIDDLE MIDDLE MIDDLE MIDDLE MIDDLE
8 8 10 WORKING WORKING WORKING WORKING WORKING WORKING
9 9 8 WORKING WORKING WORKING WORKING WORKING WORKING
10 10 9 UPPER UPPER UPPER UPPER UPPER UPPER
# ℹ 785 more rows
Here is the dotplot of 100 simulations
Histogram of F scores (null distribution) for 1,000 random permutations of word scores. Dashed vertical line indicates observed F score.
Jamovi reports the full ANOVA table:
| Sum of Squares | df | Mean Square | F | p | |
|---|---|---|---|---|---|
| class | 236.56 | 3 | 78.85 | 21.73 | < 0.001 |
| Residuals | 2869.80 | 791 | 3.63 |
When conditions are met, the F-statistic follows an F-distribution if \(H_0\) is true:
Computing the p-value:
Independence:
Normality:
Equal variance:
Conditions are met for using the F-distribution.
Results:
Decision: Reject \(H_0\)
Conclusion: There is convincing evidence that mean vocabulary scores differ across self-identified social classes. At least one group has a different mean.
ANOVA tells us:
ANOVA does NOT tell us:
ANOVA compares means across 3+ groups:
| Component | Formula/Value |
|---|---|
| F statistic | \(F = \frac{MSG}{MSE}\) |
| Degrees of freedom | \(df_1 = k - 1\), \(df_2 = n - k\) |
| P-value | Right tail of F-distribution |
Intuition: Large F means between-group variability is large relative to within-group variability → evidence that means differ
Conditions: Independence + Normality + Equal variance
| Two Means | Multiple Means | |
|---|---|---|
| Groups | 2 | 3 or more |
| Test statistic | T | F |
| Null distribution | t-distribution | F-distribution |
| Hypothesis | \(\mu_1 = \mu_2\) | \(\mu_1 = \mu_2 = \cdots = \mu_k\) |
| Alternative | \(\mu_1 \neq \mu_2\) | At least one differs |
For k = 2, ANOVA and two-sample t-test give equivalent results!
(In fact, \(F = T^2\) when comparing two groups)