Intro to Hypothesis Testing:Single Proportion

Topic 3
Math 115

The Big Picture

So far we’ve learned to describe data:

Summarize with statistics (mean, proportion, etc.)
Visualize distributions

Now we learn to make inferences:

Draw conclusions about populations from samples

One Proportion - A Simple Starting Point

As we begin to explore statistical inference we will focus on studies that involve a single binary, categorical variable
Binary = two levels (one identified as a success)
The statistic of interest in these studies is a single proportion (of successes)

Examples:

Proportion of voters supporting a candidate
Proportion of patients experiencing side effects
Proportion of students passing an exam
Proportion of products that are defective

Statistical Notation: The “Hat”

In statistics, we use \(\hat{}\) (called a “hat”) to indicate an estimate from our sample:

\(p\) = Population parameter (what we want to know)

The TRUE proportion

\(\hat{p}\) = Sample statistic (what we can calculate)

Our ESTIMATE from data

Example: If 12 students prefer online classes in a sample of 50:

\(\hat{p} = \frac{12}{50} = 0.24\)
\(p\) - true proportion of all students who prefer online classes

Flipping a coin

What if you want to analyze whether a coin is fair? (i.e a coinn with the tru probability of “Head” is 0.5)
Maybe we can start flipping it a number of times and count the probability of “Head” outcomes
- Is it enough to make 10 flips? 20 flips?
- Even if the coin is fair, do we expect exactly half of the outcome to be “Head”?

Suppose, we flipped it 30 times and ended up with 21 heads and 9 tails (so that the sample statistic \(\hat{p}=\frac{21}{30}=0.7\))
Is that strong evidence that the coin is not fair?
Simulations:
- I can simulate the flips of a fair coin
- I can repeatedly flip coins (30 times per simulation) and see what proportion will be heads
- After many such sequences of 30 I will see what is the sampling distribution of the proportion of heads
🎯 Open Spinner Demo
- Or on Moodle go to “General Information” -> “Single Proportion Simulator”

Sampling Variability

Even when the population proportion \(p\) is fixed, different samples give different proportions (different values of \(\hat{p}\)).

Example: If the true proportion of left-handed people is \(p=0.13\) (13%)…

One sample of 100 people might have 11 left-handers: \(\hat{p}=0.11\) (11%)
Another sample might have 16: \(\hat{p}=0.16\) (16%)
Another sample might have 12: \(\hat{p}=0.12\) (12%)

This natural variation is called sampling variability

The Key Question

When we observe something unusual in our sample:

Is it a real effect, or just due to chance?

Could be due to sampling variability (“luck of the draw”)
Or could indicate a real difference in the population

Hypothesis testing helps us answer this question

Simulation Based Inference

Medical Consultants
- consult dataset is available here
- Some organ donors work with a medical consultant who helps them throughout the process
- Assume that the average complication rate for liver donor surgeries in the United States is about 10%
- One consultant claims she has low rate of complications compared to national average.
- She has served as a consultant for 62 liver donor surgeries
- 3 (4.8%) resulted in complications
- Sample statistic \(\hat{p} =\frac{3}{62}=0.048\)
Is her claim supported?
- Let \(p\) be the consultant’s long-run true complication rate Question: Is this evidence of a truly lower rate?

Notation Check

\(p\) = Population parameter (what we want)

The consultant’s true, long-run proportion of complications (value \(p\) is unknown)

\(\hat{p}\) = Sample statistic (what we can calculate)

proportion of complications observed in the sample of 62 surgeries (\(\hat{p}=0.042\))

Question: Is this evidence that \(p<0.1\)?

Two Competing Explanations

How do we explain the consultant’s low complication rate?

Explanation 1: The consultant is no different from the national average

True rate is 10% (\(p=0.1\))
She just happened to get a “lucky” sample

Explanation 2: The consultant truly has a lower complication rate

True rate is less than 10% (\(p<0.1\))
The sample reflects a real difference

How do we decide which is more plausible?

The Skeptical Approach

Strategy: Start by assuming there’s nothing special going on

Assume the consultant is no different (10% rate)
Ask: How surprising is our data under this assumption?
Decide: If very surprising → evidence against our assumption

Innocent Until Proven Guilty

This is like a courtroom:
Start with “innocent (\(p=0.1\)) until proven guilty (\(p<0.1\))” Look for evidence against innocence
Need strong evidence to convict

Null Hypothesis (\(H_0\))

The null hypothesis represents the skeptical position:

\[H_0: p = 0.10\]

“The consultant’s complication rate equals the national average”
This is what we assume to be true
We look for evidence against it

Key idea: The null hypothesis typically represents “no effect” or “no difference”

Courtroom parallel: \(H_0\) = presumption of innocence

Alternative Hypothesis (\(H_A\))

The alternative hypothesis represents the claim we want to test:

\[H_A: p < 0.10\]

“The consultant’s complication rate is lower than the national average”
This is what we conclude if we reject \(H_0\)

Stating Hypotheses

Medical Consultant Example:

	In words	In symbols
\(H_0:\)	Consultant’s rate equals national average	\(p = 0.10\)
\(H_A:\)	Consultant’s rate is lower than national average	\(p < 0.10\)

Important:

Hypotheses are about the population parameter (\(p\)), not the sample statistic (\(\hat{p}\))
We already know \(\hat{p} = 0.048\); we’re asking about \(p\)

The Testing Strategy

Assume \(H_0\) is true (consultant rate = 10%)
Simulate what we’d expect to see under this assumption
Compare our observed data to these simulations
Decide:
- If our data is very unusual → reject \(H_0\)
- If our data is plausible → cannot reject \(H_0\)

What We Need: A Null Distribution

Question: If \(H_0\) is true, what values of \(\hat{p}\) should we expect?

Answer: We need the null distribution

The distribution of \(\hat{p}\) values we’d see if \(H_0\) were true
Shows us what’s “typical” under the null hypothesis
Lets us judge whether our observed \(\hat{p}\) is unusual

How do we get it? Simulation!

Random Number Generator

Instead of a spinner we could use a random number generator

Random numbers between 1 and 10 (e.g., random.org)

1: “complication”
2-10: “no complication”

To simulate one sample under \(H_0\):

Draw 62 numbers (once per surgery)
Count how many are 1: “complication”
Calculate \(\hat{p} = \frac{\text{number of complications}}{62}\)

Let’s Try It!

Activity: Simulate your own sample under \(H_0\)

Go to www.random.org/integers/
Generate 62 random integers between 1 and 10
Count the number of times you get 1 (“complication”)
Calculate \(\hat{p}\) for your simulated sample

Class Null Distribution

Activity: Let’s build the null distribution together!

Instructions: Draw a dot above your simulated \(\hat{p}\) value

Complete the questions in In Class Activity 4 about your \(\hat{p}\) and the null distribution

Hypothesis Test Using Randomization

We can use simulation to approximate the null distribution
We simulate the outcome by spinning a spinner with 10% of the area representing “complication” and 90% representing “no complication”
For each simulation, spin the spinner 62 times (sample size) and record the proportion of complications in the sample
Repeat to obtain proportions for 1,000 simulated samples

Here are the first 100 simulations

Each dot = one simulated sample proportion under \(H_0\)
Red dashed line is our observed value (\(\hat{p} = 0.048\))
There are 9 simulated statistics \(\le 0.048\) out of 100

Key Question

Our question: Is \(\hat{p} = 0.048\) unusually low, or just normal variation?

9 out of 100 simulations were at or below 0.048
Note that since the alternative hypothesis is in the form “less than”, p-value region is in the left tail
So the proportion of times when the simulated value was at 0.048 or below is 0.09

Is this unusual enough to reject \(H_0\)? Let’s do more simulations for a clearer picture…

Now, there are 111 simulated statistics \(\le 0.048\) out of 1000

Measuring Surprise: The P-Value

P-value = Probability of observing results “as extreme as” the observed result (i.e. 0.048) if \(H_0\) is true

Interpretation: How likely is it to see a sample proportion of 0.048 or less if the true rate is 0.10?

Courtroom parallel: P-value = strength of prosecution’s evidence

Understanding “More Extreme”

“More extreme” means in the direction of \(H_A\):

Since \(H_A: p < 0.10\) (consultant’s complication rate is SMALLER than 0.1)
“More extreme” = LOWER proportions
We count simulations where \(\hat{p} \leq 0.048\)

Computing the P-Value

From our 1000 simulations:

Count simulations where \(\hat{p} \leq 0.048\)
Result: 111 out of 1000

\[\text{P-value} = \frac{111}{1000} = 0.111\]

About 11.1% of simulations produced a result as extreme as (or more extreme than) what we observed

Building Intuition: What If?

How would the p-value change with different observed results?

Suppose the consultant still had 62 surgeries, but…

Observed complications	\(\hat{p}\)	Expected p-value
1 complication	0.016	Very small (≈0.01)
3 complications	0.048	0.11 (our actual result)
6 complications	0.097	Large (≈0.57)
7 complications	0.113	Large (≈0.72)

Pattern: Lower \(\hat{p}\) → smaller p-value → stronger evidence against \(H_0\) (why?)

Interpreting the P-Value

P-value \(\approx\) 0.111 means:

“If the consultant truly has the same 10% complication rate as the national average, there is about an 11% chance of observing 3 or fewer complications in 62 surgeries.”

Is this surprising?

Not extremely rare, but not common either
We need a standard to make a decision

The Significance Level (\(\alpha\))

To make decisions, we set a significance level (\(\alpha\)) in advance

Common choice: \(\alpha = 0.05\) (5%)

Decision rule:

If p-value \(< \alpha\): Reject \(H_0\) (result is “statistically significant”)
If p-value \(\geq \alpha\): Fail to reject \(H_0\) (result is not statistically significant)

Courtroom parallel: \(\alpha\) = standard for “beyond reasonable doubt”

Making a Decision

Our results:

P-value \(\approx\) 0.111
Significance level \(\alpha = 0.05\)

Comparison:

Is 0.111 \(<\) 0.05? No

Conclusion:

We fail to reject \(H_0\) at the \(\alpha = 0.05\) level.

The data do not provide convincing evidence that the consultant has a lower complication rate than the national average.

Practice: Making Decisions

For each p-value below, decide whether to reject \(H_0\) (using \(\alpha = 0.05\)):

P-value	Decision?
0.001
0.049
0.15
0.50

Answers: Reject, Reject, Fail to reject, Fail to reject

Notice: Smaller p-values = stronger evidence against \(H_0\)

What Does “Fail to Reject” Mean?

Important distinctions:

What we CAN say	What we CANNOT say
Data don’t provide strong evidence against \(H_0\)	\(H_0\) is true
10% is a plausible value for the consultant’s rate	Consultant’s rate is exactly 10%
We can’t rule out that the consultant is average	Consultant is definitely not better

“Fail to reject” ≠ “Accept”

The consultant might have a lower rate — we just don’t have enough evidence to conclude that

Courtroom parallel: Failed to find enough evidence for the “guilty” verdict

Watch Out: Three Different “p” Terms!

Don’t confuse these three:

\(p\) = population parameter (the true proportion)
\(\hat{p}\) = sample statistic (our calculated estimate)
p-value = probability of observing our result (or more extreme) if \(H_0\) is true

They all use the letter “p” but mean completely different things!

Payday Loan Regulations

Borrowers use payday loans to get a cash advance before their next payday
Borrower writes a check for loan amount + service fee
Lender holds check until borrower’s payday
Very high APR equivalent (often over 300%)

Question: Does the majority of all payday borrowers in MI support additional regulation that would require payday lenders to do a credit check?

Data:

payday data set is available here
Researchers selected a random sample of 830 payday borrowers
- 443 out 830 (or proportion of 0.534) said they would support a regulation
- \(\hat{p}=0.534\)
- Let \(p\) be the true long-run proportion of all MI payday borrowers who support additional regulation

Hypotheses

In words:

\(H_0:\) the proportion of all payday borrowers in MI that support additional regulation is 50%.
\(H_A:\) The majority of all payday borrowers (more than 50%) in MI that support additional regulation.

In symbols:

\(H_0: p = 0.5\)
\(H_A: p > 0.5\)

Guided Practice: Set Up

Step 1: Identify the parameter

\(p\) = proportion of ALL payday borrowers who support regulations

Step 2: State the hypotheses

\(H_0: p = 0.50\) (exactly half support)
\(H_A: p > 0.50\) (more than half support)

Step 3: Calculate sample statistic

\(\hat{p} = \frac{443}{830} = 0.534\)

Guided Practice: Simulate

Step 4: Simulate 1000 samples under \(H_0\)

Use the Randomize module in Jamovi

Step 5: Calculate the p-value

The Randomize module will do this for us

Using Randomize Module

Open the data set payday.csv if it is not already open
Go to “Randomize” -> “Single Proportion:Hypothesis Testing”
Move variable legislation to “Variable” box
Make sure that your “Test Value” is the same as the null hypothesis value (0.5 in our case)
Choose the appropriate “Alternative hypothesis” (“Greater” in our case)
Make the “Number of simulated samples” to be 1000 and press “Enter”

Your scrren should look like the picture below
Note that your p-value could be slightly different than on the picture

Guided Practice: Interpret Results

Your conclusions:

Decision (using \(\alpha = 0.05\)): Decide whether to reject \(H_0\)
Interpretation

p-value of the test of significance is \(< 0.05\)
- We reject null hypothesis and rule in the favor of the alternative hypothesis
In the context of the problem:The data provide statistically significant evidence that MORE than half of all payday borrowers support new legislation

The Hypothesis Testing Framework

Summary of the process:

State hypotheses: \(H_0\) and \(H_A\)
Collect data: Calculate observed statistic (\(\hat{p}\))
Simulate null distribution: What would we see if \(H_0\) were true?
Calculate p-value: How extreme is our observation?
Make decision: Compare p-value to \(\alpha\)
State conclusion: In context of the original question

References

Introduction to Modern Statistics (2e) textbook by Mine Cetinkaya-Rundel and Johanna Hardin
Chapter 11, Section 16.1