
births14 dataset available herehabit is smoking habit (“smoker” or nonsmoker”)weight is birth weight in pounds
| habit | n | mean | sd |
|---|---|---|---|
| nonsmoker | 867 | 7.27 | 1.23 |
| smoker | 114 | 6.68 | 1.60 |
The test statistic will be the observed difference in means is \[\boxed{\begin{array}{lcr}\bar{x}_n-\bar{x}_s &=& 7.27-6.68\\ &=& 0.59\end{array}}\]
Histogram of differences in means (null distribution) calculated from 1,000 random permutations of birth weights. Observed difference is 0.59.
First we compute the pooled sample standard deviation, \[s_p = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}\]
The pooled sample standard deviation in birth weights is \[\begin{array}{rcl} s_p &=& \sqrt{\frac{(867-1)\cdot 1.23^2+(114-1)\cdot 1.60^2}{867+114-2}}\\ &=& 1.28\end{array}\]
The \(T\) statistic is \[T=\frac{(\bar{x}_1-\bar{x}_2)-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\]
For the birth weight example, the value is \[T=\frac{0.59-0}{1.28\cdot\sqrt{\frac{1}{867}+\frac{1}{114}}} = 4.63\]
Note
When the null hypothesis is true and the following conditions are met, the \(T\) score has a \(t\)-distribution with \(df=n_1+n_2-2\) degrees of freedom.
The \(t\)-distribution with 979 degrees of freedom. The observed \(T\)-statistic is 4.63. The p-value is the total area to the left of -4.63 or to the right of 4.63.
If the technical conditions are met, including the equal variance assumption, then we can use the \(t\)-distribution to estimate the difference in means
We can calculate a confidence interval for the difference in means as \[(\bar{x}_1-\bar{x}_2)\pm t^{\ast}_{df}\times SE\]
Note that the value of standard errot (\(SE\)) is the same as in the formula for the T-score
| Group | (n) | Sample mean (cm) | Sample SD (cm) |
|---|---|---|---|
| Setosa | 30 | 5.006 | 0.3525 |
| Versicolor | 70 | 5.936 | 0.5162 |
Research question:
Is the true mean sepal length for setosa different from the true mean sepal length for versicolor?
The pooled sample standard deviation \[s_p = \sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}\]
The \(T\) statistic is \[T=\frac{(\bar{x}_1-\bar{x}_2)-0}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\]
The degrees of freedom (d.f.) are \(df=n_1+n_2-2\)
Confidence interval for the difference in means as \[(\bar{x}_1-\bar{x}_2)\pm t^{\ast}_{df}\times SE\]