Interactions in Linear and Logistic Models

Additional Topic
Math 219

Iris

  • The iris dataset has 150 observations of 5 variables
  • We want to explore the relationship between petal length (Petal.Length) and petal width (Petal.Width)
  • We can run simple a linear regression on the data
lm1 <- lm(Petal.Length ~ Petal.Width, data = iris)
lm1 |>  tidy()
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)     1.08    0.0730      14.8 4.04e-31
2 Petal.Width     2.23    0.0514      43.4 4.68e-86
summary(lm1)$adj.r.squared
[1] 0.9266173
  • Below is the resulting regression line
  • While the results are statistically significant, we can improve the quality of the model by adding, for example, predictor Species
  • Recall, previously we fit a parallel-slopes (or additive) model to the relationship between petal width and petal length for 3 different species, Iris setosa, Iris versicolor, and Iris virginica
lm2 <- lm(Petal.Length ~ Petal.Width + Species, data = iris)
lm2 |>  tidy()
# A tibble: 4 × 5
  term              estimate std.error statistic  p.value
  <chr>                <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)           1.21    0.0652     18.6  2.88e-40
2 Petal.Width           1.02    0.152       6.69 4.41e-10
3 Speciesversicolor     1.70    0.181       9.38 1.17e-16
4 Speciesvirginica      2.28    0.281       8.09 2.08e-13
summary(lm2)$adj.r.squared
[1] 0.9542099

The equation of the multivariate regression line is \[\widehat{Petal.Length}=1.21+1.02\times Petal.Width+1.70\times Specesversicolor+2.28\times Speciesvirginica\]

Scatter plot of petal length vs. petal width colored by species, along with parallel slopes model.

\[\widehat{Petal.Length}=\left\{\begin{array}{cl}1.21+1.02\times Petal.Width, & \textrm{if } Species = ``setosa''\\(1.21+1.70)+1.02\times Petal.Width, & \textrm{if } Species = ``versicolor''\\(1.21+2.28)+1.02\times Petal.Width, & \textrm{if } Species = ``virginica''\end{array}\right.\]

Interaction

  • We can include an interaction between species and petal width in the model
  • There is an interaction if different species have different relationships between the response petal length and predictor petal width
  • In other words, there is an interaction between Petal.width and Species
  • Unlike the additive model, a model with an interaction has a different slope for each species
  • There is a significant interaction between species and petal width
lm(Petal.Length ~ Petal.Width * Species, data = iris) |>
  tidy()
# A tibble: 6 × 5
  term                          estimate std.error statistic  p.value
  <chr>                            <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                      1.33      0.131    10.1   1.45e-18
2 Petal.Width                      0.546     0.490     1.12  2.67e- 1
3 Speciesversicolor                0.454     0.374     1.21  2.27e- 1
4 Speciesvirginica                 2.91      0.406     7.17  3.53e-11
5 Petal.Width:Speciesversicolor    1.32      0.555     2.38  1.85e- 2
6 Petal.Width:Speciesvirginica     0.101     0.525     0.192 8.48e- 1
  • Note that the model also includes the slopes for each individual predictor as well
  • In the models with interaction we shouldn’t remove the main effect of predictors from the model (hierarchy principle)

The equation of the multivariate regression with interactions is: \[\widehat{Petal.Length}=1.33+0.546\times Petal.Width+0.454\times Specesversicolor+2.91\times Speciesvirginica\\+1.32\times Petal.Width\times Speciesversicilor+0.101\times Petal.width\times Speciesvirginica\]

Scatter plot of petal length vs. petal width along with model with interaction.

\[\widehat{Petal.Length}=\left\{\begin{array}{cl}1.33+0.546\times Petal.Width, & \textrm{if } Species = ``setosa''\\(1.33+0.454)+(0.546+1.32)\times Petal.Width, & \textrm{if } Species = ``versicolor''\\(1.33+2.91)+(0.546+0.101)\times Petal.Width, & \textrm{if } Species = ``virginica''\end{array}\right.\]

Quality of models

  • Look at the p-value associated with the coefficient of the interaction term:
    • In our case, the coefficient of the interaction term is statistically significant for at least on of the species.
    • This means that there is strong evidence of an interaction between species and petal width.
  • Compare adjusted \(R^2\) of the model without interaction to that of the model with interaction:
summary(lm(Petal.Length ~ Petal.Width + Species, data = iris))$adj.r.squared
[1] 0.9542099
summary(lm(Petal.Length ~ Petal.Width * Species, data = iris))$adj.r.squared
[1] 0.9580704

Palmer Penguins

  • penguins dataset 1
  • Measurements for three species of penguins from Palmer Archipelago
  • Previously, we considered an additive model predicting body mass using bill depth, flipper length, and sex

Three-way interaction

  • With three predictors we can evaluate whether or not there is evidence of a three-way interaction
  • A three-way interaction is more difficult to interpret than a two-way interaction
lm1 <- lm(body_mass_g ~ bill_depth_mm * sex * flipper_length_mm, data = penguins) 
lm1 |>  tidy()
# A tibble: 8 × 5
  term                                     estimate std.error statistic  p.value
  <chr>                                       <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                             -39789.     6398.       -6.22 1.54e- 9
2 bill_depth_mm                             2198.      394.        5.58 4.95e- 8
3 sexmale                                  16903.     9041.        1.87 6.24e- 2
4 flipper_length_mm                          222.       31.6       7.01 1.42e-11
5 bill_depth_mm:sexmale                    -1054.      534.       -1.97 4.94e- 2
6 bill_depth_mm:flipper_length_mm            -11.2       1.97     -5.71 2.56e- 8
7 sexmale:flipper_length_mm                  -79.2      44.0      -1.80 7.27e- 2
8 bill_depth_mm:sexmale:flipper_length_mm      5.14      2.63      1.95 5.16e- 2
summary(lm1)$adj.r.squared
[1] 0.843684
  • The adjusted R^2 is
summary(lm1)$adj.r.squared
[1] 0.843684
  • The three-way interaction is not significant, so we drop it from the model and consider the possible two-way interactions
  • Of these, the only two-way interaction that is significant is between flipper_length_mm and bill_depth_mm
  • Drop the others from the model
# A tibble: 7 × 5
  term                             estimate std.error statistic  p.value
  <chr>                               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                     -30542.     4324.      -7.06  9.91e-12
2 bill_depth_mm                     1624.      263.       6.18  1.92e- 9
3 sexmale                           -561.     1365.      -0.411 6.81e- 1
4 flipper_length_mm                  176.       21.2      8.28  3.28e-15
5 bill_depth_mm:sexmale              -11.5      31.3     -0.369 7.12e- 1
6 sexmale:flipper_length_mm            6.30      4.52     1.39  1.64e- 1
7 bill_depth_mm:flipper_length_mm     -8.35      1.31    -6.38  6.22e-10
  • The adjusted R^2 is
summary(lm2)$adj.r.squared
[1] 0.8423331
  • The negative coefficient for the interaction between flipper length and bill depth indicates that the rate at which body mass increases with bill depth decreases as flipper length increases
# A tibble: 5 × 5
  term                             estimate std.error statistic  p.value
  <chr>                               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                     -28121.     4201.       -6.69 9.44e-11
2 sexmale                            498.       49.0      10.2  2.75e-21
3 bill_depth_mm                     1434.      245.        5.85 1.16e- 8
4 flipper_length_mm                  164.       20.3       8.07 1.36e-14
5 bill_depth_mm:flipper_length_mm     -7.43      1.20     -6.22 1.51e- 9
[1] 0.839635
  • The adjusted R^2 is
summary(lm3)$adj.r.squared
[1] 0.839635

The model

\[\begin{array}{rcl}\widehat{body\_mass\_g} &=& -28121 + 498\times sexmale \\ & & + 1434 \times bill\_depth\_mm \\ & & + 164\times flipper\_length\_mm \\ & & - 7.34 \times bill\_depth\_mm\times flipper\_length\_mm \end{array}\]

can also be written as

\[\begin{array}{rcl}\widehat{body\_mass\_g} &=& -28121 + 498\times sexmale \\ & & + 164\times flipper\_length\_mm \\ & & +(1434 - 7.34 \times flipper\_length\_mm)\times bill\_depth\_mm\end{array}\]

Comments

  • Interaction terms represent joint effects
  • For example, for a two predictor model using predictors \(x_1\) and \(x_2\) and response variable \(y\), the equation of the model is \[\hat{y}=\beta_0+\beta_1 \cdot x_1+\beta_2 \cdot x_2+\beta_{12} \cdot x_1x_2+\epsilon\]
  • Typically, regression models that include interactions between quantitative predictors adhere to the hierarchy principle, which says that if your model includes an interaction term, \(x_1x_2\), and the coefficient at the term \(x_1x_2\) is shown to be a statistically significant predictor of \(y\), then your model should also include the “main effects” \(x_1\) and \(x_2\), whether or not the coefficients for these main effects are significant.
  • We have looked primarily at “first order” interactions, and only at interactions between two variables at a time. However, second order interactions, or interactions between three or more variables are also possible.

  • A general practical problem with all interactions is that they can be hard to detect in small or moderately sized data sets,

Interaction in Logistic Models

  • We will deal with the well-known data on Titanic (titanic_train)
glimpse(titanic_train)
Rows: 891
Columns: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Survived    <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
$ Pclass      <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
$ Name        <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
$ Sex         <chr> "male", "female", "female", "female", "male", "male", "mal…
$ Age         <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
$ SibSp       <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
$ Parch       <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
$ Ticket      <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
$ Fare        <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
$ Cabin       <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", "G6", "C…
$ Embarked    <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", "S"…
dat <- titanic_train
  • We will look at Sex and Fare and their interaction as predictors of the survival probability

Logistic Model

glm1 <- glm(Survived ~ Fare*Sex, data = dat,
            family = binomial)
tidy(glm1)
# A tibble: 4 × 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    0.408    0.190        2.15 3.16e- 2
2 Fare           0.0199   0.00537      3.70 2.15e- 4
3 Sexmale       -2.10     0.230       -9.12 7.79e-20
4 Fare:Sexmale  -0.0116   0.00593     -1.96 5.03e- 2
  • The equation of the logistic regression is:

\[\begin{array}{rcl}\log\left(\frac{\hat{p}}{1-\hat{p}}\right) &=& 0.408 + 0.0199\times Fare -2.10\times Sexmale\\ && -0.0116\times Fare\times Sexmale\end{array} \]

  • For example, a male who paid \(Fare=300\) has predicted probability of survival \[\hat{p} = \frac{e^{0.408+0.0199\cdot 300-2.10-0.0116\cdot 300}}{1+e^{0.408+0.0199\cdot 300-2.10-0.0116\cdot 300}}=0.69\]
  • A female who paid \(Fare=300\) has predicted probability of survival \[\hat{p} = \frac{e^{0.408+0.0199\cdot 300}}{1+e^{0.408+0.0199\cdot 300}}=0.99\]

Plotting Interaction

  • Distribution of the logits
dat %>% 
  mutate(pred_logit = predict(glm1)) -> dat
dat |> ggplot(aes(x = pred_logit, fill = Sex)) +
  geom_density(alpha = .5)
  • Let’s plot the interaction on logit scale
  • Diverging lines indicates the presence of an interaction effect between Sex and Fare
dat %>% 
  ggplot() +
  aes(x = Fare, y = pred_logit, color = Sex) +
  geom_point() +
  geom_line()

Plotting the predicted results

dat %>% 
  ggplot() +
  aes(x = Fare, y = plogis(pred_logit), color = Sex) +
  geom_point() +
  geom_line()

Discrimination in Hiring

  • Data set resume(from openintro package)
  • Does perceived race or sex of an applicant affect job application callback rates?
  • Randomly assigned a name to each resume
  • Name implied applicant’s race (Black or White) and sex (male or female)
  • Previously we fit a logistic model to predict the probability of receiving a call back using job city, years experience, honors, and race
  • All of these predictors are statistically significant, including race
lm1 <- glm(received_callback ~ job_city + years_experience + honors + race,
    family = binomial, data = resume)
lm1 |> tidy()
# A tibble: 5 × 5
  term             estimate std.error statistic  p.value
  <chr>               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)       -2.77     0.134      -20.6  1.45e-94
2 job_cityChicago   -0.350    0.109       -3.22 1.29e- 3
3 years_experience   0.0264   0.00958      2.76 5.85e- 3
4 honors             0.793    0.183        4.34 1.43e- 5
5 racewhite          0.440    0.108        4.08 4.55e- 5
  • Is there evidence of an interaction between race and any of the other predictors?
  • For example, does the relationship between the probability of receiving a call back and years of experience depend on the race of the applicant?
  • Let’s test all of the possible two-way interactions involving race
  • We do not find convincing evidence of an interaction between race an any of the other predictors
  • We would proceed using the earlier model without interactions
glm(received_callback ~ job_city*race + years_experience*race +
      honors*race, family = binomial, data = resume) |>
  tidy()
# A tibble: 8 × 5
  term                       estimate std.error statistic  p.value
  <chr>                         <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)                -2.83       0.185    -15.3   1.27e-52
2 job_cityChicago            -0.319      0.170     -1.87  6.13e- 2
3 racewhite                   0.534      0.241      2.22  2.66e- 2
4 years_experience            0.0320     0.0148     2.17  3.00e- 2
5 honors                      0.703      0.290      2.42  1.54e- 2
6 job_cityChicago:racewhite  -0.0535     0.221     -0.242 8.09e- 1
7 racewhite:years_experience -0.00947    0.0194    -0.488 6.25e- 1
8 racewhite:honors            0.151      0.374      0.403 6.87e- 1

Interaction Between Quantitative Predictors

  • Data set swallows
  • We will at some data resulting from a study in which the researchers (Colby, et al, 1987) wanted to determine if nestling bank swallows alter the way they breathe in order to survive the poor air quality conditions of their underground burrows.
  • In reality, the researchers studied not only the breathing behavior of nestling bank swallows, but that of adult bank swallows as well.
  • The researchers conducted the following randomized experiment on 120 nestling bank swallows.
    • In an underground burrow, they varied the percentage of oxygen at four different levels (13%, 15%, 17%, and 19%)
    • Percentage of carbon dioxide at five different levels (0%, 3%, 4.5%, 6%, and 9%).
    • Under each of the resulting \(5 \times 4 = 20\) experimental conditions, the researchers observed the total volume of air breathed per minute for each of 6 nestling bank swallows.
    • They replicated the same randomized experiment on 120 adult bank swallows.
glimpse(swallows)
Rows: 240
Columns: 5
$ Bird <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…
$ Type <chr> "nestling", "nestling", "nestling", "nestling", "nestling", "nest…
$ Vent <int> -49, 0, -98, 148, 49, 49, -24, 25, -123, 222, 123, 74, 11, 60, -8…
$ O2   <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1…
$ CO2  <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.5, …
  • Response (\(y\)): percentage increase in “minute ventilation”, (Vent), i.e., total volume of air breathed per minute.
  • Quantitative Predictor (\(x1\)): percentage of oxygen (O2) in the air the swallows breathe
  • Quantitative Predictor (\(x2\)): percentage of carbon dioxide (CO2) in the air the swallows breathe
  • Qualitative Predictor (\(x3\)): (Type) bird is an adult or a nestling

EDA

library(plotly)
plot_ly(x=swallows$O2, y=swallows$CO2, z=swallows$Vent, type="scatter3d", mode="markers",split=swallows$Type)

Multiple regression model

  • We can start by formulating the following multiple regression model with two quantitative predictors and one qualitative predictor: \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\beta_3 \cdot x_{i3}+\epsilon_i\] where:

    • \(y_i\) is the percentage of minute ventilation for the \(i\)th swallow
    • \(x_{i1}\) is the percentage of oxygen for the \(i\)th swallow
    • \(x{i2}\) is the percentage of carbon dioxide for the \(i\)th swallow
    • \(x{i3}\) is the type of bird (1, if nestling and 0, if adult) for for the \(i\)th swallow

and the independent error terms \(\epsilon_i\) follow a normal distribution with mean 0 and equal variance \(\sigma^2\).

lm1 <- lm(Vent~O2+CO2+Type,data=swallows)
lm1 |> tidy()
# A tibble: 4 × 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    147.       79.3      1.85  6.57e- 2
2 O2              -8.83      4.76    -1.85  6.50e- 2
3 CO2             32.3       3.55     9.08  4.35e-17
4 Typenestling    -9.92     21.3     -0.466 6.42e- 1

Model with interactions

  • There is a risk in omitting an important interaction term. Therefore, let’s instead formulate the following multiple regression model with three interaction terms: \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\beta_3 \cdot x_{i3}+\beta_{12} \cdot x_{i1}x_{i2}+\beta_{13} \cdot x_{i1}x_{i3}+\beta_{23}\cdot x_{i2}x_{i3}+\epsilon_i\]
lm2 <- lm(Vent~O2+CO2+Type+O2*CO2+O2*Type+CO2*Type,data=swallows)
lm2 |> tidy()
# A tibble: 7 × 5
  term             estimate std.error statistic p.value
  <chr>               <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)         93.3     160.       0.583  0.561 
2 O2                  -5.82      9.85    -0.591  0.555 
3 CO2                 56.6      26.0      2.18   0.0304
4 Typenestling      -112.      158.      -0.708  0.480 
5 O2:CO2              -1.45      1.59    -0.909  0.364 
6 O2:Typenestling      7.01      9.56     0.733  0.464 
7 CO2:Typenestling    -2.31      7.13    -0.324  0.746 
  • It looks like the model’s quality hasn’t improved after the introduction of the interaction terms
  • All interaction terms are not statistically significant
  • Regression surface: Swallows
  • Regression surface: Adult Swallows
  • Regression surface: Nestlings Swallows
  • Both plots appear to be fairly planar which suggests there is little evidence for two-way interactions between oxygen level, and carbon dioxide level

  • We also see that there are no significant interactions between type of bird and other terms. If you tried to “draw” the “best fitting” function through each scatter plot, the two functions would probably look like two parallel planes.

  • Regression surface: Comparison

Strength

  • Data set manuf related to a manufacturing process

  • The independent variables (temperature, and pressure) affect the response variable (product strength)

glimpse(manuf)
Rows: 130
Columns: 3
$ Strength    <dbl> 96.08112, 93.71216, 91.62754, 85.04418, 91.93630, 90.20789…
$ Temperature <dbl> 100.59, 109.50, 103.98, 99.39, 107.25, 103.38, 103.80, 106…
$ Pressure    <dbl> 81.10, 72.38, 74.62, 65.04, 67.42, 68.28, 69.70, 71.78, 67…

EDA

plot_ly(x=manuf$Pressure, y=manuf$Temperature, z=manuf$Strength, type="scatter3d", mode="markers")
  • As before, we can run a multivariate linear regression
  • The equation of the model is \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\epsilon_i\] where
    • \(y_i\) is the \(i\)th Strength observation
    • \(x_{i1}\) is the \(i\)th Pressure observation
    • \(x{i2}\) is is the \(i\)th Temperature observation
lm1 <- lm(Strength~Temperature+Pressure, data=manuf)
lm1 |> tidy()
# A tibble: 3 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   23.4      3.79        6.16 9.03e- 9
2 Temperature    0.455    0.0264     17.2  4.72e-35
3 Pressure       0.292    0.0328      8.90 4.76e-15
summary(lm1)$adj.r.squared
[1] 0.729865
  • Regression Plane
  • The residual plot
  • We will include interaction term Pressure*Temperature

  • The equation of the new model is \[\hat{y}_i=\beta_0+\beta_1 \cdot x_{i1}+\beta_2 \cdot x_{i2}+\beta_{12} \cdot x_{i1}x_{i2}+\epsilon_i\] where:

    • \(y_i\) is the \(i\)th Strength observation
    • \(x_{i1}\) is the \(i\)th Pressure observation
    • \(x{i2}\) is is the \(i\)th Temperature observation
lm1 <- lm(Strength~Pressure*Temperature, data=manuf)
lm1 |> tidy()
# A tibble: 4 × 5
  term                  estimate std.error statistic  p.value
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)          -633.      27.2         -23.3 1.64e-47
2 Pressure                9.50     0.381        25.0 1.34e-50
3 Temperature             6.74     0.260        25.9 2.34e-52
4 Pressure:Temperature   -0.0882   0.00365     -24.2 3.27e-49
summary(lm1)$adj.r.squared
[1] 0.9518061

Regression surface

  • The residual plot