Hypothesis Testing with Randomization

Chapter 11
Math 215

Flavor Preferences

Research question: Do people on the East Coast have a higher preference for cola than people on the West Coast?
soda dataset
2 variables
- location: East or West
- drink preference: Orange or Cola
60 individuals (34 from East, 26 from West)

location	Cola	Orange	total
East	28	6	34
West	19	7	26
total	47	13	60

Standardized barplot showing proportions of drink preferences

Difference in proportions

Success: drink = Cola
Statistic of interest: difference in proportions \[\hat{p}_E-\hat{p}_W\]
Observed difference: \[\frac{28}{34}-\frac{19}{26}=0.09276\]

Hypothesis Test

From the sample it appears that there is a stronger preference for cola on the East Coast
It may be that there is no real difference in preference in the population, and the observed difference is not surprising when selecting a sample of this size from the population
A hypothesis test states these two possibilities formally as hypotheses then weighs them against each other using the results from the sample as evidence

Hypotheses

The null hypothesis, denoted \(H_0\), represents a skeptical perspective or a claim of no difference
The alternative hypothesis, denote \(H_A\), represents an alternative claim of difference.
As statisticians, we usually establish hypotheses before viewing the data in order to avoid bias
Depending on you research question, you can have \(H_A\) in form “\(<\)” or “\(\neq\)”

In words:

    \(H_0:\) Location has no
    effect on preference for
    cola over orange soda.

    \(H_A:\) There is a higher
    preference for cola
    over orange soda on the
    East Coast than on the
    West Coast.

In symbols:

\(H_0: p_E - p_W = 0\)
\(H_A: p_E - p_W > 0\)

Note that \(p_E\) and \(p_W\) are parameters,i.e. long-run proportions of all people who prefer Cola on East Coast and West Coast

Null Distribution

We test the null hypothesis by comparing the observed value of the statistic to a null distribution
If the null hypothesis is true and we select different samples of the same size from the population, we would expect the value of the statistic to vary between samples
The null distribution is the distribution that describes those values
It is an example of a sampling distribution (distribution of a statistic)

Null Distribution Using Random Permutation

Suppose that I suspect Hope students that sit in the front of class had a higher high school GPA than students that sit in the back
I ask each of you to write your high school GPA on a sheet of paper and I calculate the difference in mean GPA for students in the front and in the back
I want to know how that difference compares to differences I would measure if there is no difference

The GPAs I collected is my best picture of what the distribution of GPAs is like at Hope
To simulate the null hypothesis being true (no difference between front and back), I could mix up your GPAs and hand them back to you
Then I could collect them again and measure the difference in means between front and back
If I do this many times it will give me a good idea of what the differences would look like if the null hypothesis is true (the null distribution)

Mixing up the values of the response variable as in the GPA example is called random permutation
I can use random permutation to create a null distribution
Usually we will do this with a computer, because we want to calculate the statistic for 1,000 or 10,000 random permutations

Here is the original soda data with 5 random permutations.

Random Shuffle

Now let’s simulate 100 samples assuming true null hypothesis
We’ll calculate a difference in proportions for each permutation
Use infer package

set.seed(8675309)

library(infer)
soda_perm <- soda |> 
  specify(drink ~ location, success = "Cola") |>
  hypothesize("independence") |>
  generate(reps = 100, type = "permute") |>
  calculate(stat = "diff in props", order = c("East", "West"))

Dot plot of 100 differences in randomized proportions (null distribution), showing observed difference as dashed vertical line.

Red dots are as large or larger than the observed test statistic.

p-Value

To test the null hypothesis (\(p_E-p_W = 0\)) we consider how probable it would be to get a difference in proportions that is at least as large as the observed difference if \(H_0\) is true
This probability is called a p-value
We use the null distribution to calculate the p-value

There are 28 differences in randomization proportions that are greater than or equal to the observed value (0.09276). So we estimate the p-value to be 28/100 = 0.28.

The p-value is the proportion of red dots.

Significance Level

Before we conduct a study, we define a significance level, denoted \(\alpha\)
We decide that in order to reject the null hypothesis as false, the p-value must be less than \(\alpha\)
The significance level is the standard of evidence we will use to judge the null hypothesis
We presume the null hypothesis is true, but we are willing to reject it if the evidence against it is strong enough (the p-value is less than \(\alpha\))

Typical values for \(\alpha\) are 0.05 and 0.01
Sometimes other values are used
Unless otherwise noted, we will always use \(\alpha = 0.05\)
The significance level \(\alpha\) is the probability of rejecting the null hypothesiswhen it is true
The error that you make in this case is called Type I Error

Conclusion

In the soda example, the observed difference in proportions (\(\hat{p}_E-\hat{p}_W = 0.09276\)) does not allow us to reject the null hypothesis (p = 0.28) at the \(\alpha = .05\) significance level.
So, our formal conclusion is that we failed to reject the null hypothesis
The difference in the proportions is not statistically significant
This means that it is plausible that there is no difference in the proportions of people who prefer cola to orange soda between the East and West Coast.