topic17

Simple Linear Regression

Topic 17

Math 115

From Categories to Numbers

In previous topics, our explanatory variable was categorical:

Groups (treatment vs control)
Categories (Democrat, Independent, Republican)

Now: Both variables are numerical

How does one numerical variable relate to another?
Can we use one variable to predict another?

Scatterplots

A scatterplot visualizes the relationship between two numerical variables.

Data: Body measurements from 507 physically active adults.

Direction of Association

Positive association: As one variable increases, the other tends to increase.

Negative association: As one variable increases, the other tends to decrease.

Height and weight have a positive association — taller people tend to weigh more.

Fitting a Line

The data appear to fall roughly along a line. We can add a line of best fit:

The Linear Model Equation

\[\hat{y} = b_0 + b_1 x\]

\(b_0\) = intercept (where line crosses the y-axis)
\(b_1\) = slope (change in y for each unit increase in x)
\(\hat{y}\) = predicted value (the “hat” indicates a prediction)

Statistics vs Parameters:

\(b_0\), \(b_1\) are statistics (calculated from sample)
\(\beta_0\), \(\beta_1\) are parameters (true population values, unknown)

Variable Roles

Response variable (\(y\)): What we’re trying to predict

Also called: outcome, dependent variable
Here: weight (wgt)

Predictor variable (\(x\)): What we use to make predictions

Also called: explanatory, independent variable
Here: height (hgt)

\[\widehat{wgt} = -105 + 1.02 \times hgt\]

Making Predictions

Predict the weight of someone who is 170 cm tall:

\[\widehat{wgt} = -105 + 1.02 \times 170 = 68 \text{ kg}\]

Predict the weight of someone who is 180 cm tall:

\[\widehat{wgt} = -105 + 1.02 \times 180 = 78.2 \text{ kg}\]

The 10 cm difference in height corresponds to a 10.2 kg difference in predicted weight.

Interpreting the Slope

\[\widehat{wgt} = -105 + 1.02 \times hgt\]

Slope (\(b_1 = 1.02\)):

For each additional centimeter of height, we expect weight to increase by 1.02 kg, on average.

Template: “For each additional [unit of x], we expect [y] to [increase/decrease] by [slope] [units of y], on average.”

Interpreting the Intercept

Intercept (\(b_0 = -105\)):

The predicted weight for someone 0 cm tall is -105 kg.

This is often not meaningful! (No one is 0 cm tall.)

Better interpretation: The intercept positions the line vertically so it passes through the data cloud.

Extrapolation

Extrapolation: Predicting outside the range of observed data.

Our data: heights from 147 to 198 cm
Predicting for 205 cm? Risky — pattern may not hold.
Predicting for 5 cm? Nonsensical.

Rule: Only use the model within the range of your data.

How Is the Line Determined?

Many lines could go through the data. Which one is “best”?

We need a criterion for “best fit.”

Residuals

The residual is the difference between observed and predicted:

\[e_i = y_i - \hat{y}_i = \text{Observed} - \text{Predicted}\]

Positive residual: point above line (underpredicted)
Negative residual: point below line (overpredicted)

The Least Squares Line

The least squares regression line minimizes:

\[\sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]

Why squared?

Makes all contributions positive
Penalizes large errors more than small ones
Has nice mathematical properties

Using Software

Software finds \(b_0\) and \(b_1\) that minimize \(\sum e_i^2\):

Term	Estimate
Intercept	-105.01
hgt	1.02

You will learn to do this in Jamovi.

Correlation Coefficient (r)

The correlation coefficient measures the strength AND direction of a linear relationship.

\[-1 \leq r \leq 1\]

Value of r	Interpretation
r close to +1	Strong positive linear relationship
r close to -1	Strong negative linear relationship
r close to 0	Weak or no linear relationship

Visualizing Correlation

Scatter plots with different correlations. From IMS2 Figure 7.10.

Correlation: Properties

Key properties:

Unitless (doesn’t depend on measurement scale)
Symmetric: same value if we exchange the roles of \(x\) and \(y\)
Only measures linear relationships

Correlation Example

\(r = 0.717\) indicates a moderately strong positive linear relationship.

Coefficient of Determination (R²)

R² measures how well the model fits the data.

\[R^2 = r^2 \quad \text{(for simple linear regression)}\]

Interpretation: The proportion of variability in \(y\) that is explained by \(x\).

R² close to 1: Model explains most of the variability
R² close to 0: Model explains little of the variability

R² Example

For our height-weight data:

\[R^2 = r^2 = (0.717)^2 = 0.515\]

Height explains about 51.5% of the variability in weight.

The remaining 48.5% is due to other factors (muscle mass, bone density, etc.).

Residual Plots

A residual plot shows residuals vs. predicted values.

What to look for: No obvious patterns → linear model is appropriate.

Interpreting Residual Patterns

Scatter plots (top) and residual plots (bottom). From IMS2.

Curved pattern: Relationship is non-linear
Fan shape: Variance changes with x

Outliers and Leverage

Outliers: Points far from the overall pattern.

High leverage points: Points with extreme x-values.

Far from the mean of x
Have potential to strongly influence the line
Example: A 210 cm tall person in our data

Influential Points

Influential points: Points that actually change the regression line substantially.

Test: Remove the point, refit the line. If the slope changes a lot, the point is influential.

Key insight: High leverage + doesn’t follow pattern = influential

Identifying Influential Points

Scatter plots with outliers. From IMS2 Figure 7.16.

Which points are high leverage? Which are influential?

Key Concepts

Component	Description
Model	\(\hat{y} = b_0 + b_1 x\)
Slope (\(b_1\))	Change in y per unit increase in x
Intercept (\(b_0\))	Predicted y when x = 0
Residual	\(e = y - \hat{y}\) (observed - predicted)
Correlation (r)	Strength and direction of linear relationship
R²	Proportion of variability explained by model

What’s Next?

Next, we’ll learn:

How to make inferences about the slope
Is there really a relationship, or could it be due to chance?
Confidence intervals for the slope
Hypothesis tests for the slope

References

Introduction to Modern Statistics (2e) textbook by Mine Çetinkaya-Rundel and Johanna Hardin
Section 1.2
Chapter 7