Simple Linear Regression

Topic 17

Math 115

From Categories to Numbers

In previous topics, our explanatory variable was categorical:

  • Groups (treatment vs control)
  • Categories (Democrat, Independent, Republican)

Now: Both variables are numerical

  • How does one numerical variable relate to another?
  • Can we use one variable to predict another?

Scatterplots

A scatterplot visualizes the relationship between two numerical variables.

Data: Body measurements from 507 physically active adults.

Direction of Association

Positive association: As one variable increases, the other tends to increase.

Negative association: As one variable increases, the other tends to decrease.

Height and weight have a positive association — taller people tend to weigh more.

Fitting a Line

The data appear to fall roughly along a line. We can add a line of best fit:

The Linear Model Equation

\[\hat{y} = b_0 + b_1 x\]

  • \(b_0\) = intercept (where line crosses the y-axis)
  • \(b_1\) = slope (change in y for each unit increase in x)
  • \(\hat{y}\) = predicted value (the “hat” indicates a prediction)

Statistics vs Parameters:

  • \(b_0\), \(b_1\) are statistics (calculated from sample)
  • \(\beta_0\), \(\beta_1\) are parameters (true population values, unknown)

Variable Roles

Response variable (\(y\)): What we’re trying to predict

  • Also called: outcome, dependent variable
  • Here: weight (wgt)

Predictor variable (\(x\)): What we use to make predictions

  • Also called: explanatory, independent variable
  • Here: height (hgt)

\[\widehat{wgt} = -105 + 1.02 \times hgt\]

Making Predictions

Predict the weight of someone who is 170 cm tall:

\[\widehat{wgt} = -105 + 1.02 \times 170 = 68 \text{ kg}\]

Predict the weight of someone who is 180 cm tall:

\[\widehat{wgt} = -105 + 1.02 \times 180 = 78.2 \text{ kg}\]

The 10 cm difference in height corresponds to a 10.2 kg difference in predicted weight.

Interpreting the Slope

\[\widehat{wgt} = -105 + 1.02 \times hgt\]

Slope (\(b_1 = 1.02\)):

For each additional centimeter of height, we expect weight to increase by 1.02 kg, on average.

Template: “For each additional [unit of x], we expect [y] to [increase/decrease] by [slope] [units of y], on average.”

Interpreting the Intercept

Intercept (\(b_0 = -105\)):

The predicted weight for someone 0 cm tall is -105 kg.

This is often not meaningful! (No one is 0 cm tall.)

Better interpretation: The intercept positions the line vertically so it passes through the data cloud.

Extrapolation

Extrapolation: Predicting outside the range of observed data.

  • Our data: heights from 147 to 198 cm
  • Predicting for 205 cm? Risky — pattern may not hold.
  • Predicting for 5 cm? Nonsensical.

Rule: Only use the model within the range of your data.

How Is the Line Determined?

Many lines could go through the data. Which one is “best”?

We need a criterion for “best fit.”

Residuals

The residual is the difference between observed and predicted:

\[e_i = y_i - \hat{y}_i = \text{Observed} - \text{Predicted}\]

  • Positive residual: point above line (underpredicted)
  • Negative residual: point below line (overpredicted)

The Least Squares Line

The least squares regression line minimizes:

\[\sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]

Why squared?

  • Makes all contributions positive
  • Penalizes large errors more than small ones
  • Has nice mathematical properties

Using Software

Software finds \(b_0\) and \(b_1\) that minimize \(\sum e_i^2\):

Term Estimate
Intercept -105.01
hgt 1.02

You will learn to do this in Jamovi.

Correlation Coefficient (r)

The correlation coefficient measures the strength AND direction of a linear relationship.

\[-1 \leq r \leq 1\]

Value of r Interpretation
r close to +1 Strong positive linear relationship
r close to -1 Strong negative linear relationship
r close to 0 Weak or no linear relationship

Visualizing Correlation

Scatter plots with different correlations. From IMS2 Figure 7.10.

Correlation: Properties

Key properties:

  • Unitless (doesn’t depend on measurement scale)
  • Symmetric: same value if we exchange the roles of \(x\) and \(y\)
  • Only measures linear relationships

Correlation Example

\(r = 0.717\) indicates a moderately strong positive linear relationship.

Coefficient of Determination (R²)

measures how well the model fits the data.

\[R^2 = r^2 \quad \text{(for simple linear regression)}\]

Interpretation: The proportion of variability in \(y\) that is explained by \(x\).

  • R² close to 1: Model explains most of the variability
  • R² close to 0: Model explains little of the variability

R² Example

For our height-weight data:

\[R^2 = r^2 = (0.717)^2 = 0.515\]

Height explains about 51.5% of the variability in weight.

The remaining 48.5% is due to other factors (muscle mass, bone density, etc.).

Residual Plots

A residual plot shows residuals vs. predicted values.

What to look for: No obvious patterns → linear model is appropriate.

Interpreting Residual Patterns

Scatter plots (top) and residual plots (bottom). From IMS2.

  • Curved pattern: Relationship is non-linear
  • Fan shape: Variance changes with x

Outliers and Leverage

Outliers: Points far from the overall pattern.

High leverage points: Points with extreme x-values.

  • Far from the mean of x
  • Have potential to strongly influence the line
  • Example: A 210 cm tall person in our data

Influential Points

Influential points: Points that actually change the regression line substantially.

Test: Remove the point, refit the line. If the slope changes a lot, the point is influential.

Key insight: High leverage + doesn’t follow pattern = influential

Identifying Influential Points

Scatter plots with outliers. From IMS2 Figure 7.16.

Which points are high leverage? Which are influential?

Key Concepts

Component Description
Model \(\hat{y} = b_0 + b_1 x\)
Slope (\(b_1\)) Change in y per unit increase in x
Intercept (\(b_0\)) Predicted y when x = 0
Residual \(e = y - \hat{y}\) (observed - predicted)
Correlation (r) Strength and direction of linear relationship
Proportion of variability explained by model

What’s Next?

Next, we’ll learn:

  • How to make inferences about the slope
  • Is there really a relationship, or could it be due to chance?
  • Confidence intervals for the slope
  • Hypothesis tests for the slope

References