Inference for Paired Means Topic 13
A Different Kind of Comparison
Previously, we compared two independent groups (different subjects).
Sometimes data come in pairs — two measurements on the same unit.
Examples of paired data:
- Blood pressure before and after taking medication (same patients)
- Prices of the same textbook at Amazon vs. bookstore
- Reaction time with left hand vs. right hand (same people)
Is It Paired? Quick Examples
|
Scenario
|
Paired or Independent?
|
|
Compare test scores of tutored vs. non-tutored students
|
Independent
|
|
Compare weight before and after a diet program
|
Paired
|
|
Compare salaries of men vs. women at a company
|
Independent
|
|
Compare prices of same product at two stores
|
Paired
|
|
Compare typing speed with one hand vs. two hands
|
Paired
|
Key question: Is there a natural pairing between observations?
Why Does Pairing Matter?
Pairing controls for individual differences.
- Each subject serves as their own control
- Removes subject-to-subject variability
- Often results in more powerful tests
Example: Testing a new blood pressure medication
- Paired design: Measure same patients before and after → differences remove variation between patients
- Independent design: Compare different people → must account for natural differences between individuals
Studying with Music
- Many students study while listening to music.
- Does it hurt their ability to focus?
- In “Checking It Out: Does music interfere with studying?” Stanford Prof Clifford Nass claims the human brain listens to song lyrics with the same part that does word processing
- Instrumental music is, for the most part, processed on the other side of the brain and Nass claims that reading and listening to instrumental music has virtually no interference.
Experimental designs:
- Experiment 1—Random assignment to 2 groups (Independent groups)
- 27 students were randomly assigned to 1 of 2 groups:
- One group listens to music with lyrics
- One group listens to music without lyrics
- Students play a memorization game while listening to the particular music that they were assigned.
- Experiment 2—Paired design using repeated measures
- All students play the memorization game twice (randomly assigning the order):
- Once while listening to music with lyrics
- Once while listening to music without lyrics.
- Experiment 3—Paired design using matching
- Test each student on memorization.
- Match students up with similar scores and randomly:
- Have one play the game while listening to music with lyrics and the other while listening to music without lyrics.
- What if everyone could remember exactly 2 more words when they listened to a song without lyrics?
- There could be a lot of overlap between the two sets of scores and it would be difficult to detect a difference as shown here.
- We need to focus on differences within matching pairs
The Key Insight
With paired data:
- Compute the difference for each pair
- Analyze the differences as a single sample
This reduces the two-sample problem to a one-sample problem!
We already know how to do this (Inference for a Single Mean).
Statistics and Parameters for Paired Data
| Mean difference |
\(\bar{x}_d\) |
\(\mu_d\) |
| SD of differences |
\(s_d\) |
\(\sigma_d\) |
Statistic of interest: \(\bar{x}_d\) (sample mean of differences)
Goal: Make inferences about \(\mu_d\) (population mean difference)
Connection to Single Mean Scenario
| Data |
Single measurements |
Differences |
| Statistic |
\(\bar{x}\) |
\(\bar{x}_d\) |
| Parameter |
\(\mu\) |
\(\mu_d\) |
| SE |
\(s/\sqrt{n}\) |
\(s_d/\sqrt{n}\) |
| df |
\(n - 1\) |
\(n_{pairs} - 1\) |
Same formulas, just applied to differences!
Textbook Prices
Will you save money buying textbooks from Amazon instead of the UCLA bookstore?
- Data: 68 textbooks with prices at both locations
- This is PAIRED: Same book measured at two stores
- Variable: price_diff = UCLA price − Amazon price
Research question: On average, do UCLA bookstore prices differ from Amazon prices?
# A tibble: 68 × 5
subject course_num bookstore_new amazon_new price_diff
<fct> <fct> <dbl> <dbl> <dbl>
1 American Indian Studies M10 48.0 47.4 0.520
2 Anthropology 2 14.3 13.6 0.710
3 Arts and Architecture 10 13.5 12.5 0.97
4 Asian M60W 49.3 55.0 -5.69
5 Astronomy 4 120. 125. -4.83
6 Communication 10 17.0 11.8 5.18
7 Comparative Literature 2CW 12.0 10.9 1.09
8 Dance 10 26.8 38.9 -12.2
9 English 19 9.96 8.99 0.97
10 English Composition 1A 40.0 35 4.97
# ℹ 58 more rows
Side-by-side Boxplots
Here are side-by-side dotplots
Paired data
- One way to analyze the data would be to treat the books on Amazon and the books at the bookstore as two groups. Then we could compare the difference in the group means as we did in Chapter 20
- Each observation would be a book on Amazon or a book at the bookstore
- However, this ignores the paired structure of the data (observations are not independent)
- Such analysis would not use all available information and will have lower power
EDA: Textbook Price Differences
On average, UCLA bookstore prices are $3.58 higher than Amazon.
Hypotheses for Textbook Prices
Let \(\mu_d\) = mean price difference (UCLA − Amazon) for all textbooks
Hypotheses:
- \(H_0: \mu_d = 0\) (no difference in prices on average)
- \(H_A: \mu_d \neq 0\) (prices differ on average)
This is a two-sided test.
Checking Conditions
Independence:
- Random sample of textbooks ✓
- Each textbook is independent of others ✓
Normality:
- n = 68 ≥ 30 ✓
- The data are skewed and there are outliers
We will proceed cautiously with using the t-distribution, but randomization would be preferred
Test Statistic
Degrees of freedom: \(df = n - 1 = 68 - 1 = 67\)
Standard error:
\[SE = \frac{s_d}{\sqrt{n}} = \frac{13.42}{\sqrt{68}} = 1.63\]
T-statistic:
\[T = \frac{\bar{x}_d - 0}{SE} = \frac{3.58 - 0}{1.63} = 2.2\]
Calculating the P-value
![]()
Use a t-distribution (df = 67).
P-value = 0.0312 (from Jamovi)
Conclusion: Hypothesis Test
Results:
- T = 2.2
- P-value = 0.0312
- Using α = 0.05: P-value < 0.05
Decision: Reject \(H_0\)
Conclusion: The data provide convincing evidence that, on average, UCLA bookstore prices differ from Amazon prices for textbooks. However, we should be cautious given the skew and outliers present in the data and should verify our results using randomization.
CI for Mean Difference
Given: \(\bar{x}_d = 3.58\), \(SE = 1.63\), \(df = 67\)
Critical value: \(t^*_{67} = 1.996\) (from Jamovi)
95% CI:
\[3.58 \pm 1.996 \times 1.63 = (0.33, 6.83)\]
Interpreting the CI
95% CI: ($0.33, $6.83)
Interpretation: We are 95% confident that the mean price difference (UCLA − Amazon) for all textbooks is between $0.33 and $6.83.
Does CI include 0? No
This is consistent with our hypothesis test—we rejected \(H_0\), and 0 is not in the CI.
When to Use Paired Analysis
Use paired analysis when there is a natural pairing:
- Same subjects measured twice: before/after, pre/post
- Same item under two conditions: two stores, two methods
- Matched pairs: twins, matched controls
Key insight: If you can match each observation in one group with exactly one observation in the other group, the data are paired.
Temperature
- Data set temperature collected temperature readings from 32 NASA-GISS stations based on a random sample of latitude-longitude coordinates.
- Variable
Past represents average reading from 1901 through 1950 and variable Recent is the recorded average temprature from 1951 through 2000
- This a matched-pairs design since we are comparing temperatures from the same locations at different time periods, and the differences can be analyzed to determine if we have statistically significant evidence of the rise of the temperature.
Summary
Paired data → Compute differences → Analyze as one sample
| Standard Error |
\(SE = \frac{s_d}{\sqrt{n}}\) |
| Degrees of freedom |
\(df = n_{pairs} - 1\) |
| T-statistic |
\(T = \frac{\bar{x}_d - 0}{SE}\) |
| CI |
\(\bar{x}_d \pm t^*_{df} \times SE\) |
Conditions: Independence of pairs + Normality of differences
Same as Topic 11, just applied to differences!