opportunity_cost data set is available hereAll students given the following statement:
“Imagine that you have been saving some extra money on the side to make some purchases, and on your most recent visit to the video store you come across a special sale on a new video. This video is one with your favorite actor or actress, and your favorite type of movie (such as a comedy, drama, thriller, etc.). This particular video that you are considering is one you have been thinking about buying for a long time. It is available for a special sale price of $14.99. What would you do in this situation? Please circle one of the options below.”
Control and treatment group given different options.
“(A) Buy this entertaining video. (B) Not buy this entertaining video.”
“(A) Buy this entertaining video. (B) Not buy this entertaining video. Keep the $14.99 for other purchases.”
In words:
In symbols:
Histogram of 1,000 differences in randomized proportions, test statistic (0.2) is the dashed vertical line.
Central Limit Theorem for proportions
The sample proportion (or difference in proportions) will follow a bell-shaped curve called the normal distribution if the following technical conditions are met:
Two examples of normal distributions with different means and standard deviations
How good is the approximation?
Null distribution from randomly permuted data and normal approximation
We can find the p-value by calculating the area under the curve in the same region (\(\ge 0.2\))
Shaded area is probability that value is less than 0.1
The probability that the value is “less than 0.1” is 0.895
The probability that the value is “at least 0.1” is \(1 - 0.895 = 0.105\)
We can standardize the observed difference in proportions in the opportunity costs problem \[Z = \frac{0.2 - 0}{0.0791} = 2.528\]
Standard normal distribution
Example: SAT scores follow a nearly normal distribution with a mean of 1500 points and a standard deviation of 300 points. ACT scores also follow a nearly normal distribution with mean of 21 points and a standard deviation of 5 points. Suppose Nel scored 1800 points on their SAT and Sian scored 24 points on their ACT. Who performed better?
Nel’s z-score is \[Z_{Nel} = \frac{1800-1500}{300}=1\] Sian’s z-score is \[Z_{Sian} = \frac{24-21}{5}=0.6\]
If sampling distribution is reasonably normal then we can construct a 95% confidence interval from a point estimate using the 68-95-99.7 rule
95% of the values will fall within 1.96 SD of the true value
95% confidence interval: \[\textrm{point estimate}\pm1.96\times SE\]
0.2 is a point estimate of the difference in proportions of students who would not buy a video (\(p_T-p_C\))
SE = 0.0791
A 95% confidence interval for the difference in proportions is \[0.2 \pm 1.96\times0.0791 = 0.2 \pm 0.155\]
The quantity 0.155 is called the margin of error
We can also write the 95% confidence interval as \[(0.045, 0.355)\]
We are 95% confident that in the population the probability of not buying video when the opportunity cost is highlighted is from 4.5% to 35.5% higher that not buying video when the opportunity cost is not highlighted.
Note that the value of the null hypothesis (i.e \(0\)) is not in the confidence interval which is consistent with the fact that we rejected the null hypothesis
| Confidence Level | Critical Value (\(z^{\ast}\)) |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 99% | 2.576 |
For a 99% CI, we want to find the cuttoff that includes 99.5% of the values, leaving 0.5% in the right tail and 0.5% in the left tail
Thus, a 99% CI for the difference in proportions is \[0.2\pm 2.576 \times 0.0791=0.2 \pm 0.203\]
Note that the margin of error is larger with a higher confidence level
Figure below shows twenty five 95% confidence intervals for a proportion that were constructed from 25 different datasets that all came from the same population where the true proportion was p=0.3.
However, 1 of these 25 confidence intervals (or about 5% of the intervals) happened not to include the true value
From IMS2 Figure 13.11.