Chapters 13(Part 2), 14, 16, 17, 18, 19, 20, 21, Comparing many Means (Part I and Part II), 24, 25
Big ideas. This chapter introduces the logic behind inference for proportions. The main theme is that sample proportions vary from sample to sample, but when conditions are right that variation can be described with a model. You should be able to connect a real question to a null model, a standardized statistic, and a confidence interval for the population proportion.
\[ \hat{p}=\frac{x}{n} \]
Computes the sample proportion from the number of successes out of the sample size.
\[ SE_{H_0}(\hat{p})=\sqrt{\frac{p_0(1-p_0)}{n}} \]
Gives the null standard error for a one-proportion hypothesis test.
\[ z=\frac{\hat{p}-p_0}{SE_{H_0}(\hat{p})} \]
Standardizes the difference between the observed proportion and the null value.
\[ \hat{p}\pm z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
Builds a confidence interval for a population proportion.
Big ideas. This chapter explains that hypothesis testing is a decision process with possible mistakes. A test can wrongly reject a true null hypothesis or fail to detect a real effect, and those two risks are connected to significance level, power, and sample size. You should be able to describe what each error means in words and explain the tradeoff between being cautious and being sensitive.
\[ P(\text{Type I error})=\alpha \]
States that the probability of a Type I error equals the significance level.
\[ \text{Type II error}=1-\text{Power} \]
Power-is the probability of rejecting \(H_0\) when it is false. Usually tests have the value of power about 80%.
Big ideas. This chapter focuses on making inferences about a single population proportion. The key skill is deciding whether a claimed proportion is plausible and estimating the true proportion with uncertainty. You should know when a simulation approach is helpful, when the normal approximation is reasonable, and how to interpret the result in context.
\[ np_0\ge 10, \quad n(1-p_0)\ge 10 \]
Checks whether expected successes and failures are large enough for the normal approximation.
\[ z=\frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}} \]
Computes the z statistic for testing one population proportion.
\[ \hat{p}\pm z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
Builds a confidence interval for a population proportion.
\[ n\hat{p}\ge 10, \quad n(1-\hat{p})\ge 10 \]
Checks whether observed successes and failures are large enough for the normal approximation.
Big ideas. This chapter compares two groups on a categorical outcome. The central idea is that the difference in sample proportions estimates the difference in population proportions, and inference tells you whether the observed gap is too large to explain by chance alone. You should know why tests often use a pooled standard error while confidence intervals do not.
\[ \hat{p}_1-\hat{p}_2 \]
Represents the observed difference between two sample proportions.
\[ \hat{p}_{pool}=\frac{x_1+x_2}{n_1+n_2} \]
Computes the pooled proportion used in the null standard error for a two-proportion test.
Here \(x_1\) and \(x_2\) are the number of successes in group 1 and group 2 respectively.
\[ SE_{H_0}=\sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} \]
Gives the pooled standard error for testing equality of two proportions.
\[ z=\frac{(\hat{p}_1-\hat{p}_2)-0}{SE_{H_0}} \]
Standardizes the observed difference between two sample proportions under the null.
Checks whether expected successes and failures are large enough (\(\ge 10\)) in each group for the normal approximation.
Checks whether observed successes and failures are large enough (\(\ge 10\)) in each group for the normal approximation.
\[ (\hat{p}_1-\hat{p}_2) \pm z^*\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]
Builds a confidence interval for the difference between two population proportions.
Big ideas. This chapter studies categorical counts by comparing observed counts to expected counts under a null model. The main idea is that the chi-square statistic becomes large when the observed pattern is much different from what the null hypothesis predicts. You should know how to distinguish between goodness-of-fit and independence problems and how to compute degrees of freedom.
\[ E=\frac{(\text{row total})(\text{column total})}{\text{grand total}} \]
Computes an expected count in a contingency table under the null of independence.
\[ X^2=\sum \frac{(O-E)^2}{E} \]
Measures the total discrepancy between observed and expected counts.
\[ df=(r-1)(c-1) \]
Gives degrees of freedom for a chi-square test of independence.
\[ df=k-1 \]
Gives degrees of freedom for a chi-square goodness-of-fit test.
Big ideas. This chapter introduces inference for a population mean when the population standard deviation is unknown. The big idea is that the sample mean estimates the population mean, and the t distribution accounts for extra uncertainty from using the sample standard deviation. You should be comfortable with one-sample t tests, confidence intervals, and checking whether the method is reasonable for the data.
\[ \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i \]
This formula summarizes one of the main computations for this chapter.
\[ SE(\bar{x})=\frac{s}{\sqrt{n}} \]
Measures the standard error of the sample mean.
\[ t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}} \]
Computes the one-sample t statistic.
\[ \bar{x}\pm t^*\frac{s}{\sqrt{n}} \]
Builds a confidence interval for one population mean.
\[ df=n-1 \]
This formula summarizes one of the main computations for this chapter.
Big ideas. This chapter compares the means of two independent groups. The key question is whether the observed difference in sample means is large relative to the variability expected from random sampling. You should know how to set up the parameter, interpret the difference in means, and choose the correct t-based method.
\[ \bar{x}_1-\bar{x}_2 \]
This formula summarizes one of the main computations for this chapter.
\[ s_p=\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}} \]
This formula summarizes one of the main computations for this chapter.
\[ SE=s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}} \]
This formula summarizes one of the main computations for this chapter.
\[ t=\frac{(\bar{x}_1-\bar{x}_2)-0}{SE} \]
Computes the t statistic for comparing two independent population means.
\[ (\bar{x}_1-\bar{x}_2)\pm t^*SE \]
Builds a confidence interval for the difference between two population means.
Big ideas. This chapter handles matched observations such as before-and-after data or naturally linked pairs. The essential idea is that once you compute the difference within each pair, the problem becomes a one-sample inference problem on those differences. You should focus on defining the difference consistently and interpreting the average difference in context.
\[ d_i=x_{i,1}-x_{i,2} \]
Defines the paired difference for observation i.
\[ \bar{x}_d=\frac{1}{n}\sum_{i=1}^n d_i \]
This formula summarizes one of the main computations for this chapter.
\[ SE(\bar{x}_d)=\frac{s_d}{\sqrt{n}} \]
This formula summarizes one of the main computations for this chapter.
\[ t=\frac{\bar{x}_d-0}{s_d/\sqrt{n}} \]
Computes the t statistic for paired data.
\[ \bar{x}_d\pm t^*\frac{s_d}{\sqrt{n}} \]
This formula summarizes one of the main computations for this chapter.
Big ideas. This chapter extends inference for means to more than two groups. The core idea is that ANOVA compares variation between groups to variation within groups, and the F statistic summarizes that comparison. You should know what the null hypothesis means, what a significant F test tells you, and why it does not by itself identify which means differ.
\[ H_0: \mu_1=\mu_2=\cdots=\mu_k \]
This formula summarizes one of the main computations for this chapter.
\[ H_A:\text{ at least one mean differs} \]
This formula summarizes one of the main computations for this chapter.
\[ SST = SSG + SSE \]
This formula summarizes one of the main computations for this chapter.
\[ MSG=\frac{SSG}{k-1}, \qquad MSE=\frac{SSE}{n-k} \]
This formula summarizes one of the main computations for this chapter.
\[ F=\frac{MSG}{MSE} \]
Compares between-group variation to within-group variation in ANOVA.
Big ideas. This chapter explains what happens after an overall test, such as ANOVA, shows evidence that not all group means are equal. The main issue is that once you start making many pairwise comparisons, the chance of getting at least one false positive increases. Because of that, multiple-comparison procedures are used to control the familywise error rate.
The Bonferroni correction does this by making each individual comparison use a smaller significance level, while Holm’s method improves on Bonferroni by ordering p-values from smallest to largest and testing them step by step. You should understand that both methods are designed to reduce Type I errors across a whole family of comparisons, but Holm’s method is usually less conservative and therefore often more powerful.
\[ \alpha^* = \frac{\alpha}{m} \] This Bonferroni correction sets the significance level for each individual comparison by dividing the overall significance level by the number of comparisons.
\[ p_{\text{adj}} = m p \]
This Bonferroni adjusted p-value rescales a single comparison p-value to reflect that multiple tests are being performed.
Big ideas. This chapter studies whether a quantitative explanatory variable has a meaningful linear association with a response variable. The slope is the parameter of interest, and inference asks whether a slope of zero is plausible. You should know how the estimated slope, its standard error, and residual variation combine in the t test for slope.
\[ \hat{y}=b_0+b_1x \]
Gives the fitted simple linear regression line.
\[ t=\frac{b_1-\beta_{1,0}}{SE(b_1)} \]
This formula summarizes one of the main computations for this chapter.
\[ SE(b_1)=\frac{s}{\sqrt{\sum (x_i-\bar{x})^2}} \]
This formula summarizes one of the main computations for this chapter.
\[ s=\sqrt{\frac{SSE}{n-2}} \]
This formula summarizes one of the main computations for this chapter.
\[ b_1\pm t^*SE(b_1) \]
This formula summarizes one of the main computations for this chapter.
Big ideas. This chapter extends regression to several predictors at once. The most important idea is that each coefficient describes the expected change in the response for a one-unit change in that predictor while holding the other predictors fixed. You should also understand why standardization can help compare predictors and why multicollinearity can make interpretation harder.
\[ \hat{y}=b_0+b_1x_1+b_2x_2+\cdots+b_kx_k \]
Gives the fitted multiple regression equation.
\[ t=\frac{b_j-0}{SE(b_j)} \]
This formula summarizes one of the main computations for this chapter.
\[ u_{ij}=\frac{x_{ij}-\bar{x}_j}{s_j} \]
Standardizes a predictor so coefficients become more comparable in scale.
\[ VIF_j=\frac{1}{1-R_j^2} \]
This formula calculates the Variance Inflation Factor (VIF) for each predictor.