The linear model that R fit to the data is \[\begin{array}{rcl}\widehat{Sodium} &=& -113 + 3.28\times Calories\\ && +11\times TypeMeat + 183\times TypePoultry\end{array}\]
We can recast the standard ANCOVA model in a similar form \[y_{ij} = (\mu-\beta\bar{\bar{X}}+\alpha_1) + (\alpha_i - \alpha_1) + \beta X_{ij} + \varepsilon_{ij}\]
By identifying terms in this model with the regression output, we can estimate the coefficients in the standard model
Prediction model in standard form \[\widehat{Sodium} = 428+3.28\times (Calories-145) + \left\{\begin{array}{ll}-65 & \text{if } Beef\\
-53 & \text{if } Meat \\ 118 & \text{if } Poultry\end{array}\right.\]
Sodium vs Calories, faceted by Type with linear model
Hypotheses
We will test the hypotheses
\(H_0: \alpha_1=\alpha_2=\alpha_3=0\)
\(H_A:\) at least one alpha is different
However, this time our analysis (ANCOVA) will take into account the relationship between Sodium and Calories
ANOVA table
hdsc_lm |>anova() |>tidy()
# A tibble: 3 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Calories 1 106270. 106270. 34.7 3.28e- 7
2 Type 2 227386. 113693. 37.1 1.34e-10
3 Residuals 50 153331. 3067. NA NA
A different conclusion
When we take into the covariate (Calories) into account, we come to a different conclusion
We reject the null hypothesis. There is an association between sodium and hotdog type
The ANCOVA compared the intercepts of the three lines
We found that the vertical distance between the lines is significantly different from 0
Adjusting for Calories
We can adjust Sodium for Calories by subtracting \(b(X_{ij}-\bar{\bar{X}})\) from each \(y_{ij}\), where \(b\) is the estimate of the slope
Sodium Adjusted for Calories vs Calories, faceted by Type with adjusted linear model
Sequential sums of squares
R computes sums of square sequentially by default
First, the sums of squares for calories is calculated (as a regression sum of squares) \[SS_{Calories}=\sum_{i=1}^n(\hat{y}_i-\bar{y})^2\]
\(\hat{y}\) is based on a model that does not account for hot dog type
lm(Sodium ~ Calories + Type, data = hotdog) |>anova() |>tidy()
# A tibble: 3 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Calories 1 106270. 106270. 34.7 3.28e- 7
2 Type 2 227386. 113693. 37.1 1.34e-10
3 Residuals 50 153331. 3067. NA NA
Compare to \(SS_{Calories}\) without Type
lm(Sodium ~ Calories, data = hotdog) |>anova() |>tidy()
# A tibble: 2 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Calories 1 106270. 106270. 14.5 0.000369
2 Residuals 52 380718. 7321. NA NA
Next, the sums of squares for hot dog type is calculated, accounting for calories \[SS_{Type}=\left(\sum_{i=1}^a\sum_{j=1}^{n_i}(\hat{y}_{ij}-\bar{\bar{y}})^2\right)-SS_{Calories}\]
Here, the prediction \(\hat{y}_{ij}\) uses the full model: a different intercept for each type of hot dog (but same slope)
This is the sum of squares that is accounted for by the full model that is not accounted for by calories alone
Compare \(SS_{Type}\) accounting for Calories
lm(Sodium ~ Calories + Type, data = hotdog) |>anova() |>tidy()
# A tibble: 3 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Calories 1 106270. 106270. 34.7 3.28e- 7
2 Type 2 227386. 113693. 37.1 1.34e-10
3 Residuals 50 153331. 3067. NA NA
To \(SS_{Type}\) without accounting for Calories
lm(Sodium ~ Type, data = hotdog) |>anova() |>tidy()
# A tibble: 2 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Type 2 31739. 15869. 1.78 0.179
2 Residuals 51 455249. 8926. NA NA
Sequential Sums of Squares: Order
lm(Sodium ~ Calories + Type, data = hotdog) |>anova() |>tidy()
# A tibble: 3 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Calories 1 106270. 106270. 34.7 3.28e- 7
2 Type 2 227386. 113693. 37.1 1.34e-10
3 Residuals 50 153331. 3067. NA NA
Compare to Type first
lm(Sodium ~ Type + Calories, data = hotdog) |>anova() |>tidy()
# A tibble: 3 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Type 2 31739. 15869. 5.17 9.07e- 3
2 Calories 1 301917. 301917. 98.5 2.09e-13
3 Residuals 50 153331. 3067. NA NA
If there is one factor of interest (Type), but we want to account for another variable (Calories), the factor of interest should enter the model last