
run17 dataset (from cherryblossom package)time is finishing time in minutes
| n | mean | sd | min | max |
|---|---|---|---|---|
| 100 | 99.02 | 17.93 | 53.27 | 139.07 |
Here is the original running times with 5 random bootstrapping permutations.
Histrogram showing 1,000 bootstrapped means.
Central Limit Theorem for Sample Mean
When the following conditions are met, the sampling distribution of \(\bar{x}\) from for samples of size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\) will be approximately normal with mean = \(\mu\) and standard error \[SE=\frac{\sigma}{\sqrt{n}}\]
We can use this rule of thumb for the normality check:
Mathematical Model for \(T\)
The \(T\) statistic (\(T\) score) will have will have a \(t\)-distribution with \(df=n-1\) degrees of freedom if the following conditions are met:
Comparison of normal distribution and \(t\)-distributions with different degrees of freedom (IMS1 Figure 19.8).
| Type | 95% CI |
|---|---|
| One sample \(t\)-interval | (95.46, 102.56) |
| Bootstrap SE | (95.51, 102.49) |
| Bootstrap percentile | (95.53, 102.55) |
The \(T\)-statistic is \[\begin{array}{lcr}T &=& \frac{\bar{x}-null\,value}{s/\sqrt{n}} &=& \frac{99.02-93.29}{17.93/\sqrt{100}} &=& \frac{99.02-93.29}{1.793} &=& 3.20 \end{array}\]
Since the alternative hypothesis is two-sided, the p-value is the total area under two symmetric tails of the density curve for \(t_{99}(\)the \(t\)-distribution with \(df=99\)) as extreme as the test statistic \(T\)
We find the area in the left tail using pt and double it (the t-distribution is symmetric)
if we reject the null hypothesis with \(\alpha=0.05\), the null value will not be included as a plausible value in the 95% CI
if we fail to reject the null hypothesis with \(\alpha=0.05\), the null value will be included as a plausible value in the 95% CI