topic9

Inference for Two Proportions

Randomization

Math 115

What We’ve Done vs. What We’ll Do

Previously: Inference for a single proportion

One binary categorical variable
Hypothesis testing: \(H_0: p = p_0\) (single value)
Confidence interval: What is \(p\)?

Now: Inference for comparing two proportions

Two binary categorical variables
Hypothesis testing: \(H_0: p_1 - p_2 = 0\) (no difference between groups)
Confidence interval: What is \(p_1 - p_2\)?

Two Binary Categorical Variables

When we have two binary categorical variables:

One explanatory variable (defines the groups)
One response variable (defines success/failure)

We compare the proportion of successes between groups

Examples:

Stent vs No Stent \(\rightarrow\) Stroke rate
Drug vs Placebo \(\rightarrow\) Symptom improvement
With sign vs Without sign \(\rightarrow\) Recycling contamination

CPR Study Introduction

Researchers studied whether blood thinners affect survival during CPR

Data set cpr
90 patients who received CPR for cardiac arrest
Patients randomly assigned to treatment or control
Explanatory: Group (treatment = blood thinner, control = none)
Response: Outcome (survived or died within 24 hours)

Research question: Does blood thinner affect survival rate?

CPR Study Hypotheses

Let \(p_T\) = true survival rate for treatment, \(p_C\) = true survival rate for control

Hypotheses:

\(H_0: p_T - p_C = 0\) (no difference in survival rates)
\(H_A: p_T - p_C \neq 0\) (survival rates differ between groups)

Sometimes we state the hypotheses in words as

\(H_0:\) There is no association between blood thinner use and survival
\(H_A:\) There is an association between blood thinner use and survival

CPR Study Data

Group	Died	Survived	Total
Control	39	11	50
Treatment	26	14	40
Total	65	25	90

Survival proportions:

Treatment: \(\hat{p}_T = \frac{14}{40} = 0.35\) (35%)
Control: \(\hat{p}_C = \frac{11}{50} = 0.22\) (22%)
Observed difference: \(\hat{p}_T - \hat{p}_C = 0.13\)
This difference is the statistic of interest

The New Challenge

For single proportion: We used a spinner to simulate outcomes under the null

\(H_0: p = 0.10\) \(\rightarrow\) set spinner to 10% success
Generate many samples, build null distribution

For two proportions: We’re testing \(H_0: p_1 - p_2 = 0\)

We don’t know \(p_1\) or \(p_2\), just that they’re equal under null
Can’t build a spinner without knowing the probabilities
Need a different approach: permutation test

The Key Insight

Under the null hypothesis (\(p_T = p_C\)):

The groups have the same survival rate
Whether a patient is in treatment or control doesn’t affect their outcome
The outcome is independent of group assignment

Key idea: If group doesn’t matter, we can shuffle the outcomes!

Permutation: Shuffling Response Values

A permutation randomly reassigns the response values (outcomes):

Group assignments stay fixed (40 treatment, 50 control)
We randomly shuffle which outcome goes with which patient
This simulates “what if the outcome had nothing to do with group”

After shuffling, we calculate the difference in proportions

This gives us ONE value from the null distribution

Permutation Demo

View interactive permutation demo

The demo shows:

Original data with treatment/control groups
Shuffling outcomes between groups (simulating null)
Calculating difference in proportions after each shuffle
Building the null distribution one permutation at a time

Building the Null Distribution

Repeat the permutation many times (1000+)
Each permutation gives a difference in proportions
Distribution of all these differences = null distribution

The null distribution shows what differences we’d expect if there’s truly no effect (i.e., if \(H_0\) is true)

CPR Null Distribution

1000 permutations. Centered at 0 (as expected under null). Shaded region shows the smaller tail used for p-value.

Calculating the P-value

Method: Double the smaller tail

For a two-sided test (\(H_A: p_T - p_C \neq 0\)):

Count permutations to the LEFT of 0.13: 951
Count permutations to the RIGHT of 0.13: 117
Smaller tail: 117
P-value = \(2 \times \frac{117}{1000} = 0.234\)

CPR Conclusion

Results:

Observed difference: \(\hat{p}_T - \hat{p}_C = 0.13\)
P-value \(\approx 0.23\)
Using \(\alpha = 0.05\): Do not reject \(H_0\)

Conclusion: The data do not provide convincing evidence that blood thinners affect survival rate during CPR.

Note: This was a randomized experiment, so we could make causal conclusions if we had found a significant difference.

Why Permutation Works

The permutation test assumes exchangeability under the null:

If null is true, outcomes are independent of group
Shuffling outcomes simulates this independence
The permuted differences show what’s expected by chance alone
If observed difference is rare in this distribution \(\rightarrow\) evidence against null

From Testing to Estimation

The permutation test answers: “Is there a difference?”

But we also want to know: “How big is the difference?”

Solution: Bootstrap confidence interval for \(p_1 - p_2\)

Gives a range of plausible values for the true difference
More informative than just a yes/no answer

Bootstrap Approach

Similar to single proportion bootstrap:

Resample WITH replacement from the original sample
Calculate statistic of interest (now: difference in proportions)
Repeat many times to build the bootstrap distribution

CPR Bootstrap Demo

View interactive bootstrap demo

The demo shows:

Original 90 patients (circles = treatment, squares = control)
Resampling WITH replacement from entire sample
Group sizes vary in each bootstrap sample
Building the bootstrap distribution

CPR Bootstrap Distribution

1000 bootstraps. Centered near observed difference (0.13), NOT at 0

CPR Confidence Interval

95% Bootstrap Percentile CI:

Find the 2.5th and 97.5th percentiles of the bootstrap distribution:

\[(-0.065, 0.311)\]

Interpretation: We are 95% confident that the true difference in survival rates (treatment - control) is between -0.065 and 0.311.

CI and Test Agreement

The confidence interval and hypothesis test give consistent results:

For CPR study:

95% CI: (-0.065, 0.311) includes 0
P-value \(\approx 0.23 > 0.05\) \(\rightarrow\) Don’t reject \(H_0\)
Both results indicate that 0 is a plausible value for \(p_T-p_C\)

Permutation vs Bootstrap

	Permutation Test	Bootstrap CI
Purpose	Hypothesis testing	Estimation
Question	Is there a difference?	How big is the difference?
Assumes	Null is true	Nothing about null
Method	Shuffle response values	Resample with replacement
Centered at	0	Observed difference
Result	P-value	Confidence interval

References

Introduction to Modern Statistics (2e) textbook by Mine Çetinkaya-Rundel and Johanna Hardin
Chapter 11
Sections 17.1, 17.2