topic13

Inference for Paired Means

Topic 13

Math 115

A Different Kind of Comparison

Previously, we compared two independent groups (different subjects).

Sometimes data come in pairs — two measurements on the same unit.

Examples of paired data:

Blood pressure before and after taking medication (same patients)
Prices of the same textbook at Amazon vs. bookstore
Reaction time with left hand vs. right hand (same people)

Is It Paired? Quick Examples

Scenario	Paired or Independent?
Compare test scores of tutored vs. non-tutored students	Independent
Compare weight before and after a diet program	Paired
Compare salaries of men vs. women at a company	Independent
Compare prices of same product at two stores	Paired
Compare typing speed with one hand vs. two hands	Paired

Key question: Is there a natural pairing between observations?

Why Does Pairing Matter?

Pairing controls for individual differences.

Each subject serves as their own control
Removes subject-to-subject variability
Often results in more powerful tests

Example: Testing a new blood pressure medication

Paired design: Measure same patients before and after → differences remove variation between patients
Independent design: Compare different people → must account for natural differences between individuals

Studying with Music

Many students study while listening to music.
Does it hurt their ability to focus?
In “Checking It Out: Does music interfere with studying?” Stanford Prof Clifford Nass claims the human brain listens to song lyrics with the same part that does word processing
Instrumental music is, for the most part, processed on the other side of the brain and Nass claims that reading and listening to instrumental music has virtually no interference.

Experimental designs:

Experiment 1—Random assignment to 2 groups (Independent groups)
- 27 students were randomly assigned to 1 of 2 groups:
- One group listens to music with lyrics
- One group listens to music without lyrics
- Students play a memorization game while listening to the particular music that they were assigned.
Experiment 2—Paired design using repeated measures
- All students play the memorization game twice (randomly assigning the order):
- Once while listening to music with lyrics
- Once while listening to music without lyrics.
Experiment 3—Paired design using matching
- Test each student on memorization.
- Match students up with similar scores and randomly:
- Have one play the game while listening to music with lyrics and the other while listening to music without lyrics.

What if everyone could remember exactly 2 more words when they listened to a song without lyrics?
There could be a lot of overlap between the two sets of scores and it would be difficult to detect a difference as shown here.
We need to focus on differences within matching pairs

The Key Insight

With paired data:

Compute the difference for each pair
Analyze the differences as a single sample

This reduces the two-sample problem to a one-sample problem!

We already know how to do this (Inference for a Single Mean).

Statistics and Parameters for Paired Data

	Statistic (sample)	Parameter (population)
Mean difference	$\bar{x}_d$	$\mu_d$
SD of differences	$s_d$	$\sigma_d$

Statistic of interest: $\bar{x}_d$ (sample mean of differences)

Goal: Make inferences about $\mu_d$ (population mean difference)

Connection to Single Mean Scenario

	One Mean	Paired Data
Data	Single measurements	Differences
Statistic	$\bar{x}$	$\bar{x}_d$
Parameter	$\mu$	$\mu_d$
SE	$s/\sqrt{n}$	$s_d/\sqrt{n}$
df	$n - 1$	$n_{pairs} - 1$

Same formulas, just applied to differences!

Textbook Prices

Will you save money buying textbooks from Amazon instead of the UCLA bookstore?

Data: 68 textbooks with prices at both locations
This is PAIRED: Same book measured at two stores
Variable: price_diff = UCLA price − Amazon price

Research question: On average, do UCLA bookstore prices differ from Amazon prices?

ucla_textbooks_f18

# A tibble: 68 × 5
   subject                 course_num bookstore_new amazon_new price_diff
   <fct>                   <fct>              <dbl>      <dbl>      <dbl>
 1 American Indian Studies M10                48.0       47.4       0.520
 2 Anthropology            2                  14.3       13.6       0.710
 3 Arts and Architecture   10                 13.5       12.5       0.97 
 4 Asian                   M60W               49.3       55.0      -5.69 
 5 Astronomy               4                 120.       125.       -4.83 
 6 Communication           10                 17.0       11.8       5.18 
 7 Comparative Literature  2CW                12.0       10.9       1.09 
 8 Dance                   10                 26.8       38.9     -12.2  
 9 English                 19                  9.96       8.99      0.97 
10 English Composition     1A                 40.0       35         4.97 
# ℹ 58 more rows

Side-by-side Boxplots

Here are side-by-side dotplots

Paired data

One way to analyze the data would be to treat the books on Amazon and the books at the bookstore as two groups. Then we could compare the difference in the group means as we did in Chapter 20
Each observation would be a book on Amazon or a book at the bookstore
However, this ignores the paired structure of the data (observations are not independent)
Such analysis would not use all available information and will have lower power

EDA: Textbook Price Differences

n	$\bar{x}_d$	$s_d$
68	$3.58	$13.42

On average, UCLA bookstore prices are $3.58 higher than Amazon.

Hypotheses for Textbook Prices

Let $\mu_d$ = mean price difference (UCLA − Amazon) for all textbooks

Hypotheses:

$H_0: \mu_d = 0$ (no difference in prices on average)
$H_A: \mu_d \neq 0$ (prices differ on average)

This is a two-sided test.

Checking Conditions

Independence:

Random sample of textbooks ✓
Each textbook is independent of others ✓

Normality:

n = 68 ≥ 30 ✓
The data are skewed and there are outliers

We will proceed cautiously with using the t-distribution, but randomization would be preferred

Test Statistic

Degrees of freedom: $df = n - 1 = 68 - 1 = 67$

Standard error:

\[SE = \frac{s_d}{\sqrt{n}} = \frac{13.42}{\sqrt{68}} = 1.63\]

T-statistic:

\[T = \frac{\bar{x}_d - 0}{SE} = \frac{3.58 - 0}{1.63} = 2.2\]

Calculating the P-value

Use a t-distribution (df = 67).

P-value = 0.0312 (from Jamovi)

Conclusion: Hypothesis Test

Results:

T = 2.2
P-value = 0.0312
Using α = 0.05: P-value < 0.05

Decision: Reject $H_0$

Conclusion: The data provide convincing evidence that, on average, UCLA bookstore prices differ from Amazon prices for textbooks. However, we should be cautious given the skew and outliers present in the data and should verify our results using randomization.

CI for Mean Difference

Given: $\bar{x}_d = 3.58$, $SE = 1.63$, $df = 67$

Critical value: $t^*_{67} = 1.996$ (from Jamovi)

95% CI:

\[3.58 \pm 1.996 \times 1.63 = (0.33, 6.83)\]

Interpreting the CI

95% CI: ($0.33, $6.83)

Interpretation: We are 95% confident that the mean price difference (UCLA − Amazon) for all textbooks is between $0.33 and $6.83.

Does CI include 0? No

This is consistent with our hypothesis test—we rejected $H_0$, and 0 is not in the CI.

When to Use Paired Analysis

Use paired analysis when there is a natural pairing:

Same subjects measured twice: before/after, pre/post
Same item under two conditions: two stores, two methods
Matched pairs: twins, matched controls

Key insight: If you can match each observation in one group with exactly one observation in the other group, the data are paired.

Temperature

Data set temperature collected temperature readings from 32 NASA-GISS stations based on a random sample of latitude-longitude coordinates.
Variable Past represents average reading from 1901 through 1950 and variable Recent is the recorded average temprature from 1951 through 2000
This a matched-pairs design since we are comparing temperatures from the same locations at different time periods, and the differences can be analyzed to determine if we have statistically significant evidence of the rise of the temperature.

Summary

Paired data → Compute differences → Analyze as one sample

Component	Formula
Standard Error	$SE = \frac{s_d}{\sqrt{n}}$
Degrees of freedom	$df = n_{pairs} - 1$
T-statistic	$T = \frac{\bar{x}_d - 0}{SE}$
CI	$\bar{x}_d \pm t^*_{df} \times SE$

Conditions: Independence of pairs + Normality of differences

Same as Topic 11, just applied to differences!

References

Introduction to Modern Statistics (2e) textbook by Mine Çetinkaya-Rundel and Johanna Hardin
Section 21.3

	Statistic (sample)	Parameter (population)
Mean difference	\(\bar{x}_d\)	\(\mu_d\)
SD of differences	\(s_d\)	\(\sigma_d\)

Component	Formula
Standard Error	\(SE = \frac{s_d}{\sqrt{n}}\)
Degrees of freedom	\(df = n_{pairs} - 1\)
T-statistic	\(T = \frac{\bar{x}_d - 0}{SE}\)
CI	\(\bar{x}_d \pm t^*_{df} \times SE\)