
We will estimate the difference in mean birth weights \(\mu_n-\mu_s\) using a confidence interval
births14 1 datasethabit is smoking habit (“smoker” or nonsmoker”)weight is birth weight in pounds
| habit | n | mean | sd |
|---|---|---|---|
| nonsmoker | 867 | 7.27 | 1.23 |
| smoker | 114 | 6.68 | 1.60 |
The observed difference in means is \[\begin{array}{lcr}\bar{x}_n-\bar{x}_s &=& 7.27-6.68\\ &=& 0.59\end{array}\]
Histogram of differences in means (null distribution) calculated from 1,000 random permutations of birth weights. Observed difference is 0.59.
Note
When the null hypothesis is true and the following conditions are met, the \(T\) score has a \(t\)-distribution with \(df=n_1+n_2-2\) degrees of freedom.
t_test function in the infer package to calculate a p-valuevar.equal = TRUE, these calculations will use the equal variance assumptionvar.equal = TRUE relaxes the equal variance assumptiont_test function to calculate CI| Group | (n) | Sample mean (cm) | Sample SD (cm) |
|---|---|---|---|
| Setosa | 30 | 5.006 | 0.3525 |
| Versicolor | 70 | 5.936 | 0.5162 |
Research question:
Is the true mean sepal length for setosa different from the true mean sepal length for versicolor?
The pooled sample standard deviation \[s_p = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}\]
The \(T\) statistic is \[T=\frac{(\bar{x}_1-\bar{x}_2)-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\]
The degrees of freedom (d.f.) are \(df=n_1+n_2-2\)
Confidence interval for the difference in means as \[(\bar{x}_1-\bar{x}_2)\pm t^{\ast}_{df}\times SE\]