Intro to Hypothesis Testing:Single Proportion

Topic 3
Math 115

The Big Picture

So far we’ve learned to describe data:

  • Summarize with statistics (mean, proportion, etc.)
  • Visualize distributions

Now we learn to make inferences:

  • Draw conclusions about populations from samples

One Proportion - A Simple Starting Point

  • As we begin to explore statistical inference we will focus on studies that involve a single binary, categorical variable
  • Binary = two levels (one identified as a success)
  • The statistic of interest in these studies is a single proportion (of successes)

Examples:

  • Proportion of voters supporting a candidate
  • Proportion of patients experiencing side effects
  • Proportion of students passing an exam
  • Proportion of products that are defective

Statistical Notation: The “Hat”

In statistics, we use \(\hat{}\) (called a “hat”) to indicate an estimate from our sample:

\(p\) = Population parameter (what we want to know)

  • The TRUE proportion

\(\hat{p}\) = Sample statistic (what we can calculate)

  • Our ESTIMATE from data

Example: If 12 students prefer online classes in a sample of 50:

  • \(\hat{p} = \frac{12}{50} = 0.24\)
  • \(p\) - true proportion of all students who prefer online classes

Flipping a coin

  • What if you want to analyze whether a coin is fair? (i.e a coinn with the tru probability of “Head” is 0.5)
  • Maybe we can start flipping it a number of times and count the probability of “Head” outcomes
    • Is it enough to make 10 flips? 20 flips?
    • Even if the coin is fair, do we expect exactly half of the outcome to be “Head”?
  • Suppose, we flipped it 30 times and ended up with 21 heads and 9 tails (so that the sample statistic \(\hat{p}=\frac{21}{30}=0.7\))

  • Is that strong evidence that the coin is not fair?

  • Simulations:

    • I can simulate the flips of a fair coin
    • I can repeatedly flip coins (30 times per simulation) and see what proportion will be heads
    • After many such sequences of 30 I will see what is the sampling distribution of the proportion of heads

    🎯 Open Spinner Demo

    • Or on Moodle go to “General Information” -> “Single Proportion Simulator”

Sampling Variability

Even when the population proportion \(p\) is fixed, different samples give different proportions (different values of \(\hat{p}\)).

Example: If the true proportion of left-handed people is \(p=0.13\) (13%)…

  • One sample of 100 people might have 11 left-handers: \(\hat{p}=0.11\) (11%)
  • Another sample might have 16: \(\hat{p}=0.16\) (16%)
  • Another sample might have 12: \(\hat{p}=0.12\) (12%)

This natural variation is called sampling variability

The Key Question

When we observe something unusual in our sample:

Is it a real effect, or just due to chance?

  • Could be due to sampling variability (“luck of the draw”)
  • Or could indicate a real difference in the population

Hypothesis testing helps us answer this question

Simulation Based Inference

  • Medical Consultants

    • consult dataset is available here

    • Some organ donors work with a medical consultant who helps them throughout the process

    • Assume that the average complication rate for liver donor surgeries in the United States is about 10%

    • One consultant claims she has low rate of complications compared to national average.

    • She has served as a consultant for 62 liver donor surgeries

    • 3 (4.8%) resulted in complications

    • Sample statistic \(\hat{p} =\frac{3}{62}=0.048\)

  • Is her claim supported?

    • Let \(p\) be the consultant’s long-run true complication rate Question: Is this evidence of a truly lower rate?

Notation Check

\(p\) = Population parameter (what we want)

  • The consultant’s true, long-run proportion of complications (value \(p\) is unknown)

\(\hat{p}\) = Sample statistic (what we can calculate)

  • proportion of complications observed in the sample of 62 surgeries (\(\hat{p}=0.042\))

Question: Is this evidence that \(p<0.1\)?

Two Competing Explanations

How do we explain the consultant’s low complication rate?

Explanation 1: The consultant is no different from the national average

  • True rate is 10% (\(p=0.1\))
  • She just happened to get a “lucky” sample

Explanation 2: The consultant truly has a lower complication rate

  • True rate is less than 10% (\(p<0.1\))
  • The sample reflects a real difference

How do we decide which is more plausible?

The Skeptical Approach

Strategy: Start by assuming there’s nothing special going on

  1. Assume the consultant is no different (10% rate)
  2. Ask: How surprising is our data under this assumption?
  3. Decide: If very surprising → evidence against our assumption

Innocent Until Proven Guilty

This is like a courtroom:
Start with “innocent (\(p=0.1\)) until proven guilty (\(p<0.1\))” Look for evidence against innocence
Need strong evidence to convict

Null Hypothesis (\(H_0\))

The null hypothesis represents the skeptical position:

\[H_0: p = 0.10\]

  • “The consultant’s complication rate equals the national average”
  • This is what we assume to be true
  • We look for evidence against it

Key idea: The null hypothesis typically represents “no effect” or “no difference”

Courtroom parallel: \(H_0\) = presumption of innocence

Alternative Hypothesis (\(H_A\))

The alternative hypothesis represents the claim we want to test:

\[H_A: p < 0.10\]

  • “The consultant’s complication rate is lower than the national average”
  • This is what we conclude if we reject \(H_0\)

Stating Hypotheses

Medical Consultant Example:

In words In symbols
\(H_0:\) Consultant’s rate equals national average \(p = 0.10\)
\(H_A:\) Consultant’s rate is lower than national average \(p < 0.10\)

Important:

  • Hypotheses are about the population parameter (\(p\)), not the sample statistic (\(\hat{p}\))
  • We already know \(\hat{p} = 0.048\); we’re asking about \(p\)

The Testing Strategy

  1. Assume \(H_0\) is true (consultant rate = 10%)

  2. Simulate what we’d expect to see under this assumption

  3. Compare our observed data to these simulations

  4. Decide:

    • If our data is very unusual → reject \(H_0\)
    • If our data is plausible → cannot reject \(H_0\)

What We Need: A Null Distribution

Question: If \(H_0\) is true, what values of \(\hat{p}\) should we expect?

Answer: We need the null distribution

  • The distribution of \(\hat{p}\) values we’d see if \(H_0\) were true
  • Shows us what’s “typical” under the null hypothesis
  • Lets us judge whether our observed \(\hat{p}\) is unusual

How do we get it? Simulation!

Random Number Generator

Instead of a spinner we could use a random number generator

Random numbers between 1 and 10 (e.g., random.org)

  • 1: “complication”
  • 2-10: “no complication”

To simulate one sample under \(H_0\):

  1. Draw 62 numbers (once per surgery)
  2. Count how many are 1: “complication”
  3. Calculate \(\hat{p} = \frac{\text{number of complications}}{62}\)

Let’s Try It!

Activity: Simulate your own sample under \(H_0\)

  • Go to www.random.org/integers/
  • Generate 62 random integers between 1 and 10
  • Count the number of times you get 1 (“complication”)
  • Calculate \(\hat{p}\) for your simulated sample

Class Null Distribution

Activity: Let’s build the null distribution together!

Instructions: Draw a dot above your simulated \(\hat{p}\) value

Complete the questions in In Class Activity 4 about your \(\hat{p}\) and the null distribution

Hypothesis Test Using Randomization

  • We can use simulation to approximate the null distribution
  • We simulate the outcome by spinning a spinner with 10% of the area representing “complication” and 90% representing “no complication”
  • For each simulation, spin the spinner 62 times (sample size) and record the proportion of complications in the sample
  • Repeat to obtain proportions for 1,000 simulated samples

Here are the first 100 simulations

  • Each dot = one simulated sample proportion under \(H_0\)
  • Red dashed line is our observed value (\(\hat{p} = 0.048\))
  • There are 9 simulated statistics \(\le 0.048\) out of 100

Key Question

Our question: Is \(\hat{p} = 0.048\) unusually low, or just normal variation?

  • 9 out of 100 simulations were at or below 0.048

  • Note that since the alternative hypothesis is in the form “less than”, p-value region is in the left tail

  • So the proportion of times when the simulated value was at 0.048 or below is 0.09

Is this unusual enough to reject \(H_0\)? Let’s do more simulations for a clearer picture…

  • Now, there are 111 simulated statistics \(\le 0.048\) out of 1000

Measuring Surprise: The P-Value

P-value = Probability of observing results “as extreme as” the observed result (i.e. 0.048) if \(H_0\) is true

Interpretation: How likely is it to see a sample proportion of 0.048 or less if the true rate is 0.10?

Courtroom parallel: P-value = strength of prosecution’s evidence

Understanding “More Extreme”

“More extreme” means in the direction of \(H_A\):

  • Since \(H_A: p < 0.10\) (consultant’s complication rate is SMALLER than 0.1)
  • “More extreme” = LOWER proportions
  • We count simulations where \(\hat{p} \leq 0.048\)

Computing the P-Value

From our 1000 simulations:

  • Count simulations where \(\hat{p} \leq 0.048\)
  • Result: 111 out of 1000

\[\text{P-value} = \frac{111}{1000} = 0.111\]

About 11.1% of simulations produced a result as extreme as (or more extreme than) what we observed

Building Intuition: What If?

How would the p-value change with different observed results?

Suppose the consultant still had 62 surgeries, but…

Observed complications \(\hat{p}\) Expected p-value
1 complication 0.016 Very small (≈0.01)
3 complications 0.048 0.11 (our actual result)
6 complications 0.097 Large (≈0.57)
7 complications 0.113 Large (≈0.72)

Pattern: Lower \(\hat{p}\) → smaller p-value → stronger evidence against \(H_0\) (why?)

Interpreting the P-Value

P-value \(\approx\) 0.111 means:

“If the consultant truly has the same 10% complication rate as the national average, there is about an 11% chance of observing 3 or fewer complications in 62 surgeries.”

Is this surprising?

  • Not extremely rare, but not common either
  • We need a standard to make a decision

The Significance Level (\(\alpha\))

To make decisions, we set a significance level (\(\alpha\)) in advance

Common choice: \(\alpha = 0.05\) (5%)

Decision rule:

  • If p-value \(< \alpha\): Reject \(H_0\) (result is “statistically significant”)
  • If p-value \(\geq \alpha\): Fail to reject \(H_0\) (result is not statistically significant)

Courtroom parallel: \(\alpha\) = standard for “beyond reasonable doubt”

Making a Decision

Our results:

  • P-value \(\approx\) 0.111
  • Significance level \(\alpha = 0.05\)

Comparison:

  • Is 0.111 \(<\) 0.05? No

Conclusion:

We fail to reject \(H_0\) at the \(\alpha = 0.05\) level.

The data do not provide convincing evidence that the consultant has a lower complication rate than the national average.

Practice: Making Decisions

For each p-value below, decide whether to reject \(H_0\) (using \(\alpha = 0.05\)):

P-value Decision?
0.001
0.049
0.15
0.50

Answers: Reject, Reject, Fail to reject, Fail to reject

Notice: Smaller p-values = stronger evidence against \(H_0\)

What Does “Fail to Reject” Mean?

Important distinctions:

What we CAN say What we CANNOT say
Data don’t provide strong evidence against \(H_0\) \(H_0\) is true
10% is a plausible value for the consultant’s rate Consultant’s rate is exactly 10%
We can’t rule out that the consultant is average Consultant is definitely not better

“Fail to reject” ≠ “Accept”

The consultant might have a lower rate — we just don’t have enough evidence to conclude that

Courtroom parallel: Failed to find enough evidence for the “guilty” verdict

Watch Out: Three Different “p” Terms!

Don’t confuse these three:

  • \(p\) = population parameter (the true proportion)
  • \(\hat{p}\) = sample statistic (our calculated estimate)
  • p-value = probability of observing our result (or more extreme) if \(H_0\) is true

They all use the letter “p” but mean completely different things!

Payday Loan Regulations

  • Borrowers use payday loans to get a cash advance before their next payday
  • Borrower writes a check for loan amount + service fee
  • Lender holds check until borrower’s payday
  • Very high APR equivalent (often over 300%)

Question: Does the majority of all payday borrowers in MI support additional regulation that would require payday lenders to do a credit check?

Data:

  • payday data set is available here
  • Researchers selected a random sample of 830 payday borrowers
    • 443 out 830 (or proportion of 0.534) said they would support a regulation
    • \(\hat{p}=0.534\)
    • Let \(p\) be the true long-run proportion of all MI payday borrowers who support additional regulation

Hypotheses

In words:

  • \(H_0:\) the proportion of all payday borrowers in MI that support additional regulation is 50%.

  • \(H_A:\) The majority of all payday borrowers (more than 50%) in MI that support additional regulation.

In symbols:

  • \(H_0: p = 0.5\)

  • \(H_A: p > 0.5\)

Guided Practice: Set Up

Step 1: Identify the parameter

  • \(p\) = proportion of ALL payday borrowers who support regulations

Step 2: State the hypotheses

  • \(H_0: p = 0.50\) (exactly half support)
  • \(H_A: p > 0.50\) (more than half support)

Step 3: Calculate sample statistic

  • \(\hat{p} = \frac{443}{830} = 0.534\)

Guided Practice: Simulate

Step 4: Simulate 1000 samples under \(H_0\)

  • Use the Randomize module in Jamovi

Step 5: Calculate the p-value

  • The Randomize module will do this for us

Using Randomize Module

  • Open the data set payday.csv if it is not already open

  • Go to “Randomize” -> “Single Proportion:Hypothesis Testing”

  • Move variable legislation to “Variable” box

  • Make sure that your “Test Value” is the same as the null hypothesis value (0.5 in our case)

  • Choose the appropriate “Alternative hypothesis” (“Greater” in our case)

  • Make the “Number of simulated samples” to be 1000 and press “Enter”

  • Your scrren should look like the picture below
  • Note that your p-value could be slightly different than on the picture

Guided Practice: Interpret Results

Your conclusions:

  1. Decision (using \(\alpha = 0.05\)): Decide whether to reject \(H_0\)

  2. Interpretation

  • p-value of the test of significance is \(< 0.05\)
    • We reject null hypothesis and rule in the favor of the alternative hypothesis

    In the context of the problem:The data provide statistically significant evidence that MORE than half of all payday borrowers support new legislation

The Hypothesis Testing Framework

Summary of the process:

  1. State hypotheses: \(H_0\) and \(H_A\)

  2. Collect data: Calculate observed statistic (\(\hat{p}\))

  3. Simulate null distribution: What would we see if \(H_0\) were true?

  4. Calculate p-value: How extreme is our observation?

  5. Make decision: Compare p-value to \(\alpha\)

  6. State conclusion: In context of the original question

References