So far we’ve learned to describe data:
Now we learn to make inferences:
Examples:
In statistics, we use \(\hat{}\) (called a “hat”) to indicate an estimate from our sample:
\(p\) = Population parameter (what we want to know)
\(\hat{p}\) = Sample statistic (what we can calculate)
Example: If 12 students prefer online classes in a sample of 50:
Suppose, we flipped it 30 times and ended up with 21 heads and 9 tails (so that the sample statistic \(\hat{p}=\frac{21}{30}=0.7\))
Is that strong evidence that the coin is not fair?
Simulations:
Even when the population proportion \(p\) is fixed, different samples give different proportions (different values of \(\hat{p}\)).
Example: If the true proportion of left-handed people is \(p=0.13\) (13%)…
This natural variation is called sampling variability
When we observe something unusual in our sample:
Is it a real effect, or just due to chance?
Hypothesis testing helps us answer this question
Medical Consultants
consult dataset is available here
Some organ donors work with a medical consultant who helps them throughout the process
Assume that the average complication rate for liver donor surgeries in the United States is about 10%
One consultant claims she has low rate of complications compared to national average.
She has served as a consultant for 62 liver donor surgeries
3 (4.8%) resulted in complications
Sample statistic \(\hat{p} =\frac{3}{62}=0.048\)
Is her claim supported?
\(p\) = Population parameter (what we want)
\(\hat{p}\) = Sample statistic (what we can calculate)
Question: Is this evidence that \(p<0.1\)?
How do we explain the consultant’s low complication rate?
Explanation 1: The consultant is no different from the national average
Explanation 2: The consultant truly has a lower complication rate
How do we decide which is more plausible?
Strategy: Start by assuming there’s nothing special going on
This is like a courtroom:
Start with “innocent (\(p=0.1\)) until proven guilty (\(p<0.1\))” Look for evidence against innocence
Need strong evidence to convict
The null hypothesis represents the skeptical position:
\[H_0: p = 0.10\]
Key idea: The null hypothesis typically represents “no effect” or “no difference”
Courtroom parallel: \(H_0\) = presumption of innocence
The alternative hypothesis represents the claim we want to test:
\[H_A: p < 0.10\]
Medical Consultant Example:
| In words | In symbols | |
|---|---|---|
| \(H_0:\) | Consultant’s rate equals national average | \(p = 0.10\) |
| \(H_A:\) | Consultant’s rate is lower than national average | \(p < 0.10\) |
Important:
Assume \(H_0\) is true (consultant rate = 10%)
Simulate what we’d expect to see under this assumption
Compare our observed data to these simulations
Decide:
Question: If \(H_0\) is true, what values of \(\hat{p}\) should we expect?
Answer: We need the null distribution
How do we get it? Simulation!
Instead of a spinner we could use a random number generator
Random numbers between 1 and 10 (e.g., random.org)
To simulate one sample under \(H_0\):
Activity: Simulate your own sample under \(H_0\)
Activity: Let’s build the null distribution together!
Instructions: Draw a dot above your simulated \(\hat{p}\) value
Complete the questions in In Class Activity 4 about your \(\hat{p}\) and the null distribution
Here are the first 100 simulations
Our question: Is \(\hat{p} = 0.048\) unusually low, or just normal variation?
9 out of 100 simulations were at or below 0.048
Note that since the alternative hypothesis is in the form “less than”, p-value region is in the left tail
So the proportion of times when the simulated value was at 0.048 or below is 0.09
Is this unusual enough to reject \(H_0\)? Let’s do more simulations for a clearer picture…
P-value = Probability of observing results “as extreme as” the observed result (i.e. 0.048) if \(H_0\) is true
Interpretation: How likely is it to see a sample proportion of 0.048 or less if the true rate is 0.10?
Courtroom parallel: P-value = strength of prosecution’s evidence
Understanding “More Extreme”
“More extreme” means in the direction of \(H_A\):
From our 1000 simulations:
\[\text{P-value} = \frac{111}{1000} = 0.111\]
About 11.1% of simulations produced a result as extreme as (or more extreme than) what we observed
How would the p-value change with different observed results?
Suppose the consultant still had 62 surgeries, but…
| Observed complications | \(\hat{p}\) | Expected p-value |
|---|---|---|
| 1 complication | 0.016 | Very small (≈0.01) |
| 3 complications | 0.048 | 0.11 (our actual result) |
| 6 complications | 0.097 | Large (≈0.57) |
| 7 complications | 0.113 | Large (≈0.72) |
Pattern: Lower \(\hat{p}\) → smaller p-value → stronger evidence against \(H_0\) (why?)
P-value \(\approx\) 0.111 means:
“If the consultant truly has the same 10% complication rate as the national average, there is about an 11% chance of observing 3 or fewer complications in 62 surgeries.”
Is this surprising?
To make decisions, we set a significance level (\(\alpha\)) in advance
Common choice: \(\alpha = 0.05\) (5%)
Decision rule:
Courtroom parallel: \(\alpha\) = standard for “beyond reasonable doubt”
Our results:
Comparison:
Conclusion:
We fail to reject \(H_0\) at the \(\alpha = 0.05\) level.
The data do not provide convincing evidence that the consultant has a lower complication rate than the national average.
For each p-value below, decide whether to reject \(H_0\) (using \(\alpha = 0.05\)):
| P-value | Decision? |
|---|---|
| 0.001 | |
| 0.049 | |
| 0.15 | |
| 0.50 |
Answers: Reject, Reject, Fail to reject, Fail to reject
Notice: Smaller p-values = stronger evidence against \(H_0\)
Important distinctions:
| What we CAN say | What we CANNOT say |
|---|---|
| Data don’t provide strong evidence against \(H_0\) | \(H_0\) is true |
| 10% is a plausible value for the consultant’s rate | Consultant’s rate is exactly 10% |
| We can’t rule out that the consultant is average | Consultant is definitely not better |
“Fail to reject” ≠ “Accept”
The consultant might have a lower rate — we just don’t have enough evidence to conclude that
Courtroom parallel: Failed to find enough evidence for the “guilty” verdict
Watch Out: Three Different “p” Terms!
Don’t confuse these three:
They all use the letter “p” but mean completely different things!
Question: Does the majority of all payday borrowers in MI support additional regulation that would require payday lenders to do a credit check?
Data:
payday data set is available hereIn words:
\(H_0:\) the proportion of all payday borrowers in MI that support additional regulation is 50%.
\(H_A:\) The majority of all payday borrowers (more than 50%) in MI that support additional regulation.
In symbols:
\(H_0: p = 0.5\)
\(H_A: p > 0.5\)
Step 1: Identify the parameter
Step 2: State the hypotheses
Step 3: Calculate sample statistic
Step 4: Simulate 1000 samples under \(H_0\)
Step 5: Calculate the p-value
Open the data set payday.csv if it is not already open
Go to “Randomize” -> “Single Proportion:Hypothesis Testing”
Move variable legislation to “Variable” box
Make sure that your “Test Value” is the same as the null hypothesis value (0.5 in our case)
Choose the appropriate “Alternative hypothesis” (“Greater” in our case)
Make the “Number of simulated samples” to be 1000 and press “Enter”
Your conclusions:
Decision (using \(\alpha = 0.05\)): Decide whether to reject \(H_0\)
Interpretation
In the context of the problem:The data provide statistically significant evidence that MORE than half of all payday borrowers support new legislation
Summary of the process:
State hypotheses: \(H_0\) and \(H_A\)
Collect data: Calculate observed statistic (\(\hat{p}\))
Simulate null distribution: What would we see if \(H_0\) were true?
Calculate p-value: How extreme is our observation?
Make decision: Compare p-value to \(\alpha\)
State conclusion: In context of the original question