Math 115
So far we’ve focused on inference for proportions (categorical variables).
Now we turn to inference for means (quantitative variables).
Same framework:
We begin with inference for a single mean (studies involving a single numerical variable).
| Statistic (sample) | Parameter (population) | |
|---|---|---|
| Mean | \(\bar{x}\) | \(\mu\) |
| Standard deviation | \(s\) | \(\sigma\) |
Statistic of interest: \(\bar{x}\) (sample mean)
Goal: Make inferences about \(\mu\) (population mean)
For the remaining topics, we focus on model-based inference.
When conditions are NOT met → consider randomization methods instead.
The Cherry Blossom Run is an annual 10-mile race in Washington, D.C.
| n | \(\bar{x}\) | s | min | max |
|---|---|---|---|---|
| 100 | 99.02 | 17.93 | 53.27 | 139.07 |
Two different types of variability:
Sample standard deviation (s):
Standard error (SE):
View interactive sampling demo
The demo shows:
The standard error for a sample mean is:
\[SE = \frac{\sigma}{\sqrt{n}}\]
Problem: We don’t know σ (population SD)
Solution: Estimate with sample SD:
\[SE \approx \frac{s}{\sqrt{n}} = \frac{17.93}{\sqrt{100}} = 1.793\]
With \(n=100\), spread of sample mean is \(\frac{1}{10}\)th spread of data
When conditions are met, the sampling distribution of \(\bar{x}\) is approximately normal with:
Conditions:
Normality:
These are rules of thumb, not hard cutoffs.
When we estimate σ with s, we add extra uncertainty.
To account for this, we use the t-distribution instead of normal:
The t-distribution is characterized by degrees of freedom (df).
For one mean: df = n − 1
As df increases, t-distribution approaches normal distribution.
When conditions are met:
\[\text{CI} = \bar{x} \pm t^*_{df} \times \frac{s}{\sqrt{n}}\]
Where:
Use Jamovi’s Model-Based Inference calculator to find \(t^*\).
Independence:
Normality:
Conditions are met for using the t-distribution.
Given: \(\bar{x} = 99.02\), \(s = 17.93\), \(n = 100\), \(df = 99\)
Critical value: \(t^*_{99} = 1.984\) (from Jamovi)
Standard error: \[SE = \frac{17.93}{\sqrt{100}} = 1.793\]
95% CI: \[99.02 \pm 1.984 \times 1.793 = (95.46, 102.57)\]
95% CI: (95.46, 102.57) minutes
Interpretation: We are 95% confident that the true mean finish time for all 2017 Cherry Blossom 10-mile runners is between 95.46 and 102.57 minutes.
Does CI include 93.29? No
This suggests the 2017 mean is different from 2006.
Let μ = mean finish time for all 2017 Cherry Blossom 10-mile runners
Hypotheses:
This is a two-sided test.
The T-statistic measures how far \(\bar{x}\) is from the null value in SE units:
\[T = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]
For Cherry Blossom:
\[T = \frac{99.02 - 93.29}{1.793} = \frac{5.73}{1.793} = 3.19\]
Use Jamovi’s Model-Based Inference calculator with t-distribution (df = 99).
P-value = 0.0019
Results:
Decision: Reject \(H_0\)
Conclusion: The data provide convincing evidence that the mean finish time in 2017 is different from 93.29 minutes (the 2006 mean).
Note: This is consistent with our CI—93.29 is outside the 95% CI.
All inference is built on one idea: statistics vary from sample to sample.
The standard error quantifies this variability—it tells us how much we expect the statistic to vary across different samples.
Whether we use randomization or mathematical models, the goal is the same: describe the sampling variability so we can make inferences.
CI = statistic ± multiplier × SE
Smaller SE → more precision → narrower CI
Test statistic = (observed − null) / SE
Under \(H_0\), we expect statistics to land near the null value—but not exactly, due to sampling variability.
As a rough sense of scale:
| Proportion | Mean | |
|---|---|---|
| Statistic | \(\hat{p}\) | \(\bar{x}\) |
| SE formula | \(\sqrt{\frac{p(1-p)}{n}}\) | \(\frac{s}{\sqrt{n}}\) |
| CI | statistic ± multiplier × SE | statistic ± multiplier × SE |
| Test statistic | \(Z=\frac{\text{observed} - \text{null}}{SE}\) | \(T=\frac{\text{observed} - \text{null}}{SE}\) |
The formulas differ, but the logic is identical:
Quantify sampling variability (SE) → Use it to measure uncertainty (CI) or surprise (p-value)