
IMS1 Ch. 8
Math 215
Box art from Mario Kart.
mariokart 1 data from 143 Ebay sales, 12 variables, including| Variable | Description |
|---|---|
total_pr |
Total price (auction price + shipping) |
start_pr |
Starting price of auction |
duration |
Auction length (days) |
cond |
Condition (new or used) |
wheels |
Number of steering wheels included |
n_bids |
Number of bids |

# A tibble: 3 × 2
total_pr title
<dbl> <fct>
1 327. "Nintedo Wii Console Bundle Guitar Hero 5 Mario Kart "
2 118. "10 Nintendo Wii Games - MarioKart Wii, SpiderMan 3, etc"
3 75 "NEW MARIO KART WITH WII WHEEL+2 GT PRO WHITE WII WHEEL"

total_pr ~ wheels)
\[\widehat{total\_pr} = 37.5 + 8.64\times wheels\]
Coefficient of determination: \(R^2=0.642\)
condused is the recoded variable (condused = 0 if cond = new, condused = 1 if cond = used)\[\widehat{total\_pr} = 42.4+7.23\times wheels-5.58\times condused\]
The model \[\widehat{total\_pr} = 42.4+7.23\times wheels-5.58\times condused\] can be rewritten as
\[\widehat{total\_pr} = \left\{\begin{array}{cc}42.4+7.23\times wheels, & \textrm{if } cond = ``new''\\36.8+7.23\times wheels, & \textrm{if } cond = ``used''\end{array}\right.\] Since this model is composed of two lines with the same slope, this is sometimes called a parallel slopes model
Scatter plot of total price vs. number of steering wheels colored by condition, along with parallel slopes model.
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.642 0.639 5.48 249. 9.05e-33 1 -439. 884. 892.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.717 0.712 4.89 174. 1.68e-38 2 -422. 853. 864.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 39.4 1.80 21.9 3.16e-46
2 wheels 6.72 0.508 13.2 3.82e-26
3 condused -4.77 0.935 -5.10 1.10e- 6
4 start_pr 0.180 0.0335 5.35 3.62e- 7
5 duration -0.275 0.171 -1.61 1.11e- 1
6 n_bids 0.191 0.0866 2.20 2.93e- 2
The fitted model is
\[\begin{array}{rcl}\widehat{total\_pr} & = & 39.4\\ & + & 6.72\times wheels\\ & - & 4.77\times condused\\ & + & 0.180 \times start\_pr\\ & - & 0.28\times duration\\ & + & 0.191\times n\_bids\end{array}\]
In general, a multiple regression model with \(k\) predictors has the form \[\hat{y}=b_0+b_1x_1+b_2x_2+\cdots+b_kx_k\]
I performed backward selection starting with the 5-predictor Mario Kart total price model. The results are summarized below.
| Step | Predictors | Adjusted \(R^2\) |
|---|---|---|
| 0 | wheels, cond, start_pr, duration, n_bids |
0.761 |
| 1 | wheels, cond, start_pr, duration |
0.755 |
| 1 | wheels, cond, start_pr, n_bids |
0.766* |
| 1 | wheels, cond, duration, n_bids |
0.713 |
| 1 | wheels, start_pr, duration, n_bids |
0.718 |
| 1 | cond, start_pr, duration, n_bids |
0.456 |
| 2 | wheels, cond, start_pr |
0.752 |
| 2 | wheels, cond, n_bids |
0.714 |
| 2 | wheels, start_pr, n_bids |
0.691 |
| 2 | cond, start_pr, n_bids |
0.433 |
The selected model has 4 predictors (duration) was dropped from the model.
# A tibble: 5 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 38.7 1.76 22.0 1.10e-46
2 wheels 6.86 0.503 13.6 3.19e-27
3 condused -5.38 0.859 -6.26 4.68e- 9
4 start_pr 0.170 0.0332 5.12 1.04e- 6
5 n_bids 0.187 0.0871 2.14 3.38e- 2
iris dataset has 150 observations of 5 variablesSpecies is a categorical variable with 3 levelsPetal.Width and Species to predict Petal.Length.Species as a predictor introduces coefficients for Speciesversicolor and Speciesvirginica to the model.Species variable (first alphabetically)# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.21 0.0652 18.6 2.88e-40
2 Petal.Width 1.02 0.152 6.69 4.41e-10
3 Speciesversicolor 1.70 0.181 9.38 1.17e-16
4 Speciesvirginica 2.28 0.281 8.09 2.08e-13
The model can be written in two ways
\[\widehat{Petal.Length}=1.21 + 1.02\times Petal.Width + 1.70\times Speciesversicolor + 2.28\times Speciesvirginica\]
or
\[\widehat{Petal.Length}=\left\{\begin{array}{cl}1.21+1.02\times Petal.Width, & \textrm{if } Species = ``setosa''\\2.91+1.02\times Petal.Width, & \textrm{if } Species = ``versicolor''\\3.49+1.02\times Petal.Width, & \textrm{if } Species = ``virginica''\end{array}\right.\]
This is another example of a parallel slopes model.
Scatter plot of petal length vs. petal width colored by species, along with parallel slopes model.