Comparing Two Proportions

IMS2 Ch. 17
Math 115

Yurk

CPR Study

Revisit cpr data set we explored in Ch. 14, available here
2 variables
- group: treatment (received blood thinner) or control (did not)
- outcome: died or survived (for at least 24 hours)
90 patients (40 treatment, 50 control, randomly assigned)

Hypotheses:

\(H_0\): Blood thinners do not affect survival rate. \(p_T-p_C = 0\)
\(H_A\): Blood thinners affect survival rate. \(p_T-p_C \neq 0\)

Data:

group	died	survived	total
control	39	11	50
treatment	26	14	40
total	65	25	90

Difference in proportions of “survived”: \[\hat{p}_T-\hat{p}_C=\frac{14}{40}-\frac{11}{50}=0.13\]

Hypothesis Test Using Random Permutation

1,000 random permutations simulating true null hypothesis
Values of response (outcome) shuffled each time
Calculate difference in proportions for each simulated sample

Null distribution for difference in proportions that survived (treatment - control). Observed difference in proportions indicated by dashed line.

For a two sided test, count the number of simulated differences that are
- greater than or equal to the observed difference
- less than or equal to the observed difference
Double the smaller count and divide by the number of simulations to get the p-value
p-value = \(2\times 55/1000 = 0.11\)

Mathematical Model for Difference in Proportions

Sampling distribution of \(\hat{p}_1-\hat{p}_2\)

The sampling distribution of \(\hat{p}_1-\hat{p}_2\) based on samples of size \(n_1\) and \(n_2\) and population proportions \(p_1\) and \(p_2\) will be approximately normal with mean \(p_1-p_2\) and standard error \[SE(\hat{p}_1-\hat{p}_2)=\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}\]
if the following technical conditions are met:

Data are independent within and between the two groups (e.g., observations from two independent random samples or from a randomized experiment)
(success-failure condition) At least 10 expected successes and at least 10 expected failures in each group.

Hypothesis Test Using Normal Approximation

Under the null hypothesis \(p_1=p_2\)
We use the pooled proportion of successes, \(\hat{p}_{pool}\) to approximate this common proportion \[\hat{p}_{pool}=\frac{number\, of\, successes}{number\, of\, cases}=\frac{\hat{p}_1n_1+\hat{p}_2n_2}{n_1+n_2}\]
In the CPR example 25 survived out of 90 total cases, so \(\hat{p}_{pool}=25/90=0.278\)

Checking Conditions for Hypothesis Test

The expected numbers of successes and failures in group 1 are \(n_1\hat{p}_{pool}\) and \(n_1(1-\hat{p}_{pool})\)
In group 2 they are \(n_2\hat{p}_{pool}\) and \(n_2(1-\hat{p}_{pool})\)

In the CPR example we expect

Treatment group
- \(0.278\cdot40=11.1\) successes
- \((1-0.278)\cdot40=28.9\) failures
Control group
- \(0.278\cdot50=13.9\) successes
- \((1-0.278)\cdot50=36.1\) failures

Since there are at least 10 expected successes and failures in each group a normal approximation of the null distribution is appropriate

SE for Hypothesis Test Using Normal Approximation

We also use the pooled proportion to approximate the SE \[\begin{array}{rcl}SE(\hat{p}_1-\hat{p}_2) & \approx & \sqrt{\frac{\hat{p}_{pool}(1-\hat{p}_{pool})}{n_1}+\frac{\hat{p}_{pool}(1-\hat{p}_{pool})}{n_2}}\\ & = & \sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}\end{array}\]
For the CPR study \[SE\approx \sqrt{0.278\cdot(1-0.278)\left(\frac{1}{40}+\frac{1}{50}\right)}=0.095\]

Z Score for Two Proportions

The hypothesis test using a normal approximation uses the \(Z\) score as the test statistic \[Z = \frac{(\hat{p}_1-\hat{p}_2) - 0}{\sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\]
Note that the denominator is the SE estimate we saw in the previous slide
When the conditions are met, \(Z\) will have a standard normal distribution (\(N(0,1)\))

For the CPR example the Z score is \[Z=\frac{(\hat{p}_T-\hat{p}_C)-0}{SE}=\frac{0.13}{0.095}=1.37\]

P-value

Standard normal curve with shaded area corresponding to p-value

The 2-sided p-value twice area under the the standard normal curve that is to the right of \(Z = 1.37\)
p-value = 0.171

Compare this p-value (0.171) to the one we calculated using random permutation (0.11)

Bootstrap Percentile Confidence Interval

We can calculate a bootstrap percentile 95% confidence interval in much the same way that we did for a single proportion in Ch 12
We think about the two samples (groups) as being our best approximation of the population and resample with replacement (bootstrap) from each group (\(n_1\) from group 1, \(n_2\) from group 2)
The bootstrap proportions \(\hat{p}_{1,boot}\) and \(\hat{p}_{2,boot}\) will tend to be centered on \(\hat{p}_1\) and \(\hat{p}_2\) but will vary between replicates

Calculate difference in bootstrap proportions \(\hat{p}_{1,boot}-\hat{p}_{2,boot}\) for each of a large number of replicates (at least 1,000)
95% CI is given by 2.5% to 97.5% percentiles

Let’s compute 1,000 differences in bootstrapped proportions using the CPR data.

The 95% bootstrap percentile confidence interval for the difference in survival rates (treatment - control) is between -0.0416 and 0.330.

Confidence Interval Using Normal Approximation

We can also use a normal approximation to calculate a confidence interval if the technical conditions are met
In this case, we use \(\hat{p}_1\) and \(\hat{p}_2\) as the best approximations of \(p_1\) and \(p_2\)

Checking Conditions for Using Normal Approximation for CI

In this case, the expected numbers of successes and failures in each group are the same as the counts of successes and failures in the samples
The CPR data satisfy the success-failure condition

SE for Using Normal Approximation for CI

The standard error approximation is \[SE\approx\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]
For the CPR study \[SE\approx\sqrt{\frac{0.35\cdot(1-0.35)}{40}+\frac{0.22\cdot(1-0.22)}{50}}=0.0955\]

95% Confidence Interval

Using the normal approximation, the 95% confidence interval for the difference in survival rates is \[0.13\pm 1.96\cdot 0.0955\]
Thus, the 95% confidence interval is between -0.057 and 0.317

Comparison of 95% Confidence Intervals

Type	Interval
Bootstrap Percentile	(-0.042, 0.330)
Normal Approximation	(-0.057, 0.317)