Interactions in Linear and Logistic Models

Additional Topic
Math 219

Iris

The iris dataset has 150 observations of 5 variables
We want to explore the relationship between petal length (Petal.Length) and petal width (Petal.Width)
We can run simple a linear regression on the data

lm1 <- lm(Petal.Length ~ Petal.Width, data = iris)
lm1 |>  tidy()

# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)     1.08    0.0730      14.8 4.04e-31
2 Petal.Width     2.23    0.0514      43.4 4.68e-86

summary(lm1)$adj.r.squared

[1] 0.9266173

Below is the resulting regression line

While the results are statistically significant, we can improve the quality of the model by adding, for example, predictor Species

Recall, previously we fit a parallel-slopes (or additive) model to the relationship between petal width and petal length for 3 different species, Iris setosa, Iris versicolor, and Iris virginica

lm2 <- lm(Petal.Length ~ Petal.Width + Species, data = iris)
lm2 |>  tidy()

# A tibble: 4 × 5
  term              estimate std.error statistic  p.value
  <chr>                <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)           1.21    0.0652     18.6  2.88e-40
2 Petal.Width           1.02    0.152       6.69 4.41e-10
3 Speciesversicolor     1.70    0.181       9.38 1.17e-16
4 Speciesvirginica      2.28    0.281       8.09 2.08e-13

summary(lm2)$adj.r.squared

[1] 0.9542099

The equation of the multivariate regression line is \[\widehat{Petal.Length}=1.21+1.02\times Petal.Width+1.70\times Specesversicolor+2.28\times Speciesvirginica\]

Scatter plot of petal length vs. petal width colored by species, along with parallel slopes model.

\[\widehat{Petal.Length}=\left\{\begin{array}{cl}1.21+1.02\times Petal.Width, & \textrm{if } Species = ``setosa''\\(1.21+1.70)+1.02\times Petal.Width, & \textrm{if } Species = ``versicolor''\\(1.21+2.28)+1.02\times Petal.Width, & \textrm{if } Species = ``virginica''\end{array}\right.\]

Interaction

We can include an interaction between species and petal width in the model
There is an interaction if different species have different relationships between the response petal length and predictor petal width
In other words, there is an interaction between Petal.width and Species
Unlike the additive model, a model with an interaction has a different slope for each species

There is a significant interaction between species and petal width

lm(Petal.Length ~ Petal.Width * Species, data = iris) |>
  tidy()

# A tibble: 6 × 5
  term                          estimate std.error statistic  p.value
  <chr>                            <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                      1.33      0.131    10.1   1.45e-18
2 Petal.Width                      0.546     0.490     1.12  2.67e- 1
3 Speciesversicolor                0.454     0.374     1.21  2.27e- 1
4 Speciesvirginica                 2.91      0.406     7.17  3.53e-11
5 Petal.Width:Speciesversicolor    1.32      0.555     2.38  1.85e- 2
6 Petal.Width:Speciesvirginica     0.101     0.525     0.192 8.48e- 1

Note that the model also includes the slopes for each individual predictor as well
In the models with interaction we shouldn’t remove the main effect of predictors from the model (hierarchy principle)

The equation of the multivariate regression with interactions is: \[\widehat{Petal.Length}=1.33+0.546\times Petal.Width+0.454\times Specesversicolor+2.91\times Speciesvirginica\\+1.32\times Petal.Width\times Speciesversicilor+0.101\times Petal.width\times Speciesvirginica\]

Scatter plot of petal length vs. petal width along with model with interaction.

\[\widehat{Petal.Length}=\left\{\begin{array}{cl}1.33+0.546\times Petal.Width, & \textrm{if } Species = ``setosa''\\(1.33+0.454)+(0.546+1.32)\times Petal.Width, & \textrm{if } Species = ``versicolor''\\(1.33+2.91)+(0.546+0.101)\times Petal.Width, & \textrm{if } Species = ``virginica''\end{array}\right.\]

Quality of models

Look at the p-value associated with the coefficient of the interaction term:
- In our case, the coefficient of the interaction term is statistically significant for at least on of the species.
- This means that there is strong evidence of an interaction between species and petal width.
Compare adjusted \(R^2\) of the model without interaction to that of the model with interaction:

summary(lm(Petal.Length ~ Petal.Width + Species, data = iris))$adj.r.squared

[1] 0.9542099

summary(lm(Petal.Length ~ Petal.Width * Species, data = iris))$adj.r.squared

[1] 0.9580704

Palmer Penguins

penguins dataset ¹
Measurements for three species of penguins from Palmer Archipelago
Previously, we considered an additive model predicting body mass using bill depth, flipper length, and sex

Three-way interaction

With three predictors we can evaluate whether or not there is evidence of a three-way interaction
A three-way interaction is more difficult to interpret than a two-way interaction

lm1 <- lm(body_mass_g ~ bill_depth_mm * sex * flipper_length_mm, data = penguins) 
lm1 |>  tidy()

# A tibble: 8 × 5
  term                                     estimate std.error statistic  p.value
  <chr>                                       <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                             -39789.     6398.       -6.22 1.54e- 9
2 bill_depth_mm                             2198.      394.        5.58 4.95e- 8
3 sexmale                                  16903.     9041.        1.87 6.24e- 2
4 flipper_length_mm                          222.       31.6       7.01 1.42e-11
5 bill_depth_mm:sexmale                    -1054.      534.       -1.97 4.94e- 2
6 bill_depth_mm:flipper_length_mm            -11.2       1.97     -5.71 2.56e- 8
7 sexmale:flipper_length_mm                  -79.2      44.0      -1.80 7.27e- 2
8 bill_depth_mm:sexmale:flipper_length_mm      5.14      2.63      1.95 5.16e- 2

summary(lm1)$adj.r.squared

[1] 0.843684

The adjusted R^2 is

summary(lm1)$adj.r.squared

[1] 0.843684

The three-way interaction is not significant, so we drop it from the model and consider the possible two-way interactions
Of these, the only two-way interaction that is significant is between flipper_length_mm and bill_depth_mm
Drop the others from the model

# A tibble: 7 × 5
  term                             estimate std.error statistic  p.value
  <chr>                               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                     -30542.     4324.      -7.06  9.91e-12
2 bill_depth_mm                     1624.      263.       6.18  1.92e- 9
3 sexmale                           -561.     1365.      -0.411 6.81e- 1
4 flipper_length_mm                  176.       21.2      8.28  3.28e-15
5 bill_depth_mm:sexmale              -11.5      31.3     -0.369 7.12e- 1
6 sexmale:flipper_length_mm            6.30      4.52     1.39  1.64e- 1
7 bill_depth_mm:flipper_length_mm     -8.35      1.31    -6.38  6.22e-10

The adjusted R^2 is

summary(lm2)$adj.r.squared

[1] 0.8423331

The negative coefficient for the interaction between flipper length and bill depth indicates that the rate at which body mass increases with bill depth decreases as flipper length increases

# A tibble: 5 × 5
  term                             estimate std.error statistic  p.value
  <chr>                               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                     -28121.     4201.       -6.69 9.44e-11
2 sexmale                            498.       49.0      10.2  2.75e-21
3 bill_depth_mm                     1434.      245.        5.85 1.16e- 8
4 flipper_length_mm                  164.       20.3       8.07 1.36e-14
5 bill_depth_mm:flipper_length_mm     -7.43      1.20     -6.22 1.51e- 9

[1] 0.839635

The adjusted R^2 is

summary(lm3)$adj.r.squared

[1] 0.839635

The model

\[\begin{array}{rcl}\widehat{body\_mass\_g} &=& -28121 + 498\times sexmale \\ & & + 1434 \times bill\_depth\_mm \\ & & + 164\times flipper\_length\_mm \\ & & - 7.34 \times bill\_depth\_mm\times flipper\_length\_mm \end{array}\]

can also be written as

\[\begin{array}{rcl}\widehat{body\_mass\_g} &=& -28121 + 498\times sexmale \\ & & + 164\times flipper\_length\_mm \\ & & +(1434 - 7.34 \times flipper\_length\_mm)\times bill\_depth\_mm\end{array}\]

Comments

Interaction terms represent joint effects
For example, for a two predictor model using predictors \(x_1\) and \(x_2\) and response variable \(y\), the equation of the model is \[\hat{y}=\beta_0+\beta_1 \cdot x_1+\beta_2 \cdot x_2+\beta_{12} \cdot x_1x_2+\epsilon\]
Typically, regression models that include interactions between quantitative predictors adhere to the hierarchy principle, which says that if your model includes an interaction term, \(x_1x_2\), and the coefficient at the term \(x_1x_2\) is shown to be a statistically significant predictor of \(y\), then your model should also include the “main effects” \(x_1\) and \(x_2\), whether or not the coefficients for these main effects are significant.

We have looked primarily at “first order” interactions, and only at interactions between two variables at a time. However, second order interactions, or interactions between three or more variables are also possible.
A general practical problem with all interactions is that they can be hard to detect in small or moderately sized data sets,

Interaction in Logistic Models

We will deal with the well-known data on Titanic (titanic_train)

glimpse(titanic_train)

Rows: 891
Columns: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Survived    <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
$ Pclass      <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
$ Name        <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
$ Sex         <chr> "male", "female", "female", "female", "male", "male", "mal…
$ Age         <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
$ SibSp       <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
$ Parch       <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
$ Ticket      <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
$ Fare        <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
$ Cabin       <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", "G6", "C…
$ Embarked    <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", "S"…

dat <- titanic_train

We will look at Sex and Fare and their interaction as predictors of the survival probability

Logistic Model

glm1 <- glm(Survived ~ Fare*Sex, data = dat,
            family = binomial)
tidy(glm1)

# A tibble: 4 × 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    0.408    0.190        2.15 3.16e- 2
2 Fare           0.0199   0.00537      3.70 2.15e- 4
3 Sexmale       -2.10     0.230       -9.12 7.79e-20
4 Fare:Sexmale  -0.0116   0.00593     -1.96 5.03e- 2

The equation of the logistic regression is:

\[\begin{array}{rcl}\log\left(\frac{\hat{p}}{1-\hat{p}}\right) &=& 0.408 + 0.0199\times Fare -2.10\times Sexmale\\ && -0.0116\times Fare\times Sexmale\end{array} \]

For example, a male who paid \(Fare=300\) has predicted probability of survival \[\hat{p} = \frac{e^{0.408+0.0199\cdot 300-2.10-0.0116\cdot 300}}{1+e^{0.408+0.0199\cdot 300-2.10-0.0116\cdot 300}}=0.69\]
A female who paid \(Fare=300\) has predicted probability of survival \[\hat{p} = \frac{e^{0.408+0.0199\cdot 300}}{1+e^{0.408+0.0199\cdot 300}}=0.99\]

Plotting Interaction

Distribution of the logits

dat %>% 
  mutate(pred_logit = predict(glm1)) -> dat
dat |> ggplot(aes(x = pred_logit, fill = Sex)) +
  geom_density(alpha = .5)

Let’s plot the interaction on logit scale
Diverging lines indicates the presence of an interaction effect between Sex and Fare

dat %>% 
  ggplot() +
  aes(x = Fare, y = pred_logit, color = Sex) +
  geom_point() +
  geom_line()

Plotting the predicted results

dat %>% 
  ggplot() +
  aes(x = Fare, y = plogis(pred_logit), color = Sex) +
  geom_point() +
  geom_line()

Discrimination in Hiring

Data set resume(from openintro package)
Does perceived race or sex of an applicant affect job application callback rates?
Randomly assigned a name to each resume
Name implied applicant’s race (Black or White) and sex (male or female)

Previously we fit a logistic model to predict the probability of receiving a call back using job city, years experience, honors, and race
All of these predictors are statistically significant, including race

lm1 <- glm(received_callback ~ job_city + years_experience + honors + race,
    family = binomial, data = resume)
lm1 |> tidy()

# A tibble: 5 × 5
  term             estimate std.error statistic  p.value
  <chr>               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)       -2.77     0.134      -20.6  1.45e-94
2 job_cityChicago   -0.350    0.109       -3.22 1.29e- 3
3 years_experience   0.0264   0.00958      2.76 5.85e- 3
4 honors             0.793    0.183        4.34 1.43e- 5
5 racewhite          0.440    0.108        4.08 4.55e- 5

Is there evidence of an interaction between race and any of the other predictors?
For example, does the relationship between the probability of receiving a call back and years of experience depend on the race of the applicant?
Let’s test all of the possible two-way interactions involving race

We do not find convincing evidence of an interaction between race an any of the other predictors
We would proceed using the earlier model without interactions

glm(received_callback ~ job_city*race + years_experience*race +
      honors*race, family = binomial, data = resume) |>
  tidy()

# A tibble: 8 × 5
  term                       estimate std.error statistic  p.value
  <chr>                         <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                -2.83       0.185    -15.3   1.27e-52
2 job_cityChicago            -0.319      0.170     -1.87  6.13e- 2
3 racewhite                   0.534      0.241      2.22  2.66e- 2
4 years_experience            0.0320     0.0148     2.17  3.00e- 2
5 honors                      0.703      0.290      2.42  1.54e- 2
6 job_cityChicago:racewhite  -0.0535     0.221     -0.242 8.09e- 1
7 racewhite:years_experience -0.00947    0.0194    -0.488 6.25e- 1
8 racewhite:honors            0.151      0.374      0.403 6.87e- 1

Interaction Between Quantitative Predictors

Data set swallows
We will at some data resulting from a study in which the researchers (Colby, et al, 1987) wanted to determine if nestling bank swallows alter the way they breathe in order to survive the poor air quality conditions of their underground burrows.
In reality, the researchers studied not only the breathing behavior of nestling bank swallows, but that of adult bank swallows as well.

The researchers conducted the following randomized experiment on 120 nestling bank swallows.
- In an underground burrow, they varied the percentage of oxygen at four different levels (13%, 15%, 17%, and 19%)
- Percentage of carbon dioxide at five different levels (0%, 3%, 4.5%, 6%, and 9%).
- Under each of the resulting \(5 \times 4 = 20\) experimental conditions, the researchers observed the total volume of air breathed per minute for each of 6 nestling bank swallows.
- They replicated the same randomized experiment on 120 adult bank swallows.

glimpse(swallows)

Rows: 240
Columns: 5
$ Bird <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…
$ Type <chr> "nestling", "nestling", "nestling", "nestling", "nestling", "nest…
$ Vent <int> -49, 0, -98, 148, 49, 49, -24, 25, -123, 222, 123, 74, 11, 60, -8…
$ O2   <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1…
$ CO2  <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.5, …

Response (\(y\)): percentage increase in “minute ventilation”, (Vent), i.e., total volume of air breathed per minute.
Quantitative Predictor (\(x1\)): percentage of oxygen (O2) in the air the swallows breathe
Quantitative Predictor (\(x2\)): percentage of carbon dioxide (CO2) in the air the swallows breathe
Qualitative Predictor (\(x3\)): (Type) bird is an adult or a nestling

EDA

library(plotly)
plot_ly(x=swallows$O2, y=swallows$CO2, z=swallows$Vent, type="scatter3d", mode="markers",split=swallows$Type)

Multiple regression model

We can start by formulating the following multiple regression model with two quantitative predictors and one qualitative predictor: \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\beta_3 \cdot x_{i3}+\epsilon_i\] where:
- \(y_i\) is the percentage of minute ventilation for the \(i\)th swallow
- \(x_{i1}\) is the percentage of oxygen for the \(i\)th swallow
- \(x{i2}\) is the percentage of carbon dioxide for the \(i\)th swallow
- \(x{i3}\) is the type of bird (1, if nestling and 0, if adult) for for the \(i\)th swallow

and the independent error terms \(\epsilon_i\) follow a normal distribution with mean 0 and equal variance \(\sigma^2\).

lm1 <- lm(Vent~O2+CO2+Type,data=swallows)
lm1 |> tidy()

# A tibble: 4 × 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    147.       79.3      1.85  6.57e- 2
2 O2              -8.83      4.76    -1.85  6.50e- 2
3 CO2             32.3       3.55     9.08  4.35e-17
4 Typenestling    -9.92     21.3     -0.466 6.42e- 1

Model with interactions

There is a risk in omitting an important interaction term. Therefore, let’s instead formulate the following multiple regression model with three interaction terms: \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\beta_3 \cdot x_{i3}+\beta_{12} \cdot x_{i1}x_{i2}+\beta_{13} \cdot x_{i1}x_{i3}+\beta_{23}\cdot x_{i2}x_{i3}+\epsilon_i\]

lm2 <- lm(Vent~O2+CO2+Type+O2*CO2+O2*Type+CO2*Type,data=swallows)
lm2 |> tidy()

# A tibble: 7 × 5
  term             estimate std.error statistic p.value
  <chr>               <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)         93.3     160.       0.583  0.561 
2 O2                  -5.82      9.85    -0.591  0.555 
3 CO2                 56.6      26.0      2.18   0.0304
4 Typenestling      -112.      158.      -0.708  0.480 
5 O2:CO2              -1.45      1.59    -0.909  0.364 
6 O2:Typenestling      7.01      9.56     0.733  0.464 
7 CO2:Typenestling    -2.31      7.13    -0.324  0.746

It looks like the model’s quality hasn’t improved after the introduction of the interaction terms
All interaction terms are not statistically significant

Regression surface: Swallows

Regression surface: Adult Swallows

Regression surface: Nestlings Swallows

Both plots appear to be fairly planar which suggests there is little evidence for two-way interactions between oxygen level, and carbon dioxide level
We also see that there are no significant interactions between type of bird and other terms. If you tried to “draw” the “best fitting” function through each scatter plot, the two functions would probably look like two parallel planes.

Regression surface: Comparison

Strength

Data set manuf related to a manufacturing process
The independent variables (temperature, and pressure) affect the response variable (product strength)

glimpse(manuf)

Rows: 130
Columns: 3
$ Strength    <dbl> 96.08112, 93.71216, 91.62754, 85.04418, 91.93630, 90.20789…
$ Temperature <dbl> 100.59, 109.50, 103.98, 99.39, 107.25, 103.38, 103.80, 106…
$ Pressure    <dbl> 81.10, 72.38, 74.62, 65.04, 67.42, 68.28, 69.70, 71.78, 67…

EDA

plot_ly(x=manuf$Pressure, y=manuf$Temperature, z=manuf$Strength, type="scatter3d", mode="markers")

As before, we can run a multivariate linear regression
The equation of the model is \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\epsilon_i\] where
- \(y_i\) is the \(i\)th Strength observation
- \(x_{i1}\) is the \(i\)th Pressure observation
- \(x{i2}\) is is the \(i\)th Temperature observation

lm1 <- lm(Strength~Temperature+Pressure, data=manuf)
lm1 |> tidy()

# A tibble: 3 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   23.4      3.79        6.16 9.03e- 9
2 Temperature    0.455    0.0264     17.2  4.72e-35
3 Pressure       0.292    0.0328      8.90 4.76e-15

summary(lm1)$adj.r.squared

[1] 0.729865

Regression Plane

The residual plot

We will include interaction term Pressure*Temperature
The equation of the new model is \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\beta_{12} \cdot x_{i1}x_{i2}+\epsilon_i\] where:
- \(y_i\) is the \(i\)th Strength observation
- \(x_{i1}\) is the \(i\)th Pressure observation
- \(x{i2}\) is is the \(i\)th Temperature observation

lm1 <- lm(Strength~Pressure*Temperature, data=manuf)
lm1 |> tidy()

# A tibble: 4 × 5
  term                  estimate std.error statistic  p.value
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)          -633.      27.2         -23.3 1.64e-47
2 Pressure                9.50     0.381        25.0 1.34e-50
3 Temperature             6.74     0.260        25.9 2.34e-52
4 Pressure:Temperature   -0.0882   0.00365     -24.2 3.27e-49

summary(lm1)$adj.r.squared

[1] 0.9518061

Regression surface

The residual plot