Hypothesis Testing with Randomization

Chapter 11
Math 215

Flavor Preferences

  • Research question: Do people on the East Coast have a higher preference for cola than people on the West Coast?
  • soda dataset
  • 2 variables
    • location: East or West
    • drink preference: Orange or Cola
  • 60 individuals (34 from East, 26 from West)

Results (EDA)

location Cola Orange total
East 28 6 34
West 19 7 26
total 47 13 60

Standardized barplot showing proportions of drink preferences

Difference in proportions

  • Success: drink = Cola
  • Statistic of interest: difference in proportions \[\hat{p}_E-\hat{p}_W\]
  • Observed difference: \[\frac{28}{34}-\frac{19}{26}=0.09276\]

Hypothesis Test

  • From the sample it appears that there is a stronger preference for cola on the East Coast
  • It may be that there is no real difference in preference in the population, and the observed difference is not surprising when selecting a sample of this size from the population
  • A hypothesis test states these two possibilities formally as hypotheses then weighs them against each other using the results from the sample as evidence

Hypotheses

  • The null hypothesis, denoted \(H_0\), represents a skeptical perspective or a claim of no difference
  • The alternative hypothesis, denote \(H_A\), represents an alternative claim of difference.
  • As statisticians, we usually establish hypotheses before viewing the data in order to avoid bias
  • Depending on you research question, you can have \(H_A\) in form “\(<\)” or “\(\neq\)

In words:

    \(H_0:\) Location has no
    effect on preference for
    cola over orange soda.
    \(H_A:\) There is a higher
    preference for cola
    over orange soda on the
    East Coast than on the
    West Coast.

In symbols:

    \(H_0: p_E - p_W = 0\)
    \(H_A: p_E - p_W > 0\)
  • Note that \(p_E\) and \(p_W\) are parameters,i.e. long-run proportions of all people who prefer Cola on East Coast and West Coast

Null Distribution

  • We test the null hypothesis by comparing the observed value of the statistic to a null distribution
  • If the null hypothesis is true and we select different samples of the same size from the population, we would expect the value of the statistic to vary between samples
  • The null distribution is the distribution that describes those values
  • It is an example of a sampling distribution (distribution of a statistic)

Null Distribution Using Random Permutation

  • Suppose that I suspect Hope students that sit in the front of class had a higher high school GPA than students that sit in the back
  • I ask each of you to write your high school GPA on a sheet of paper and I calculate the difference in mean GPA for students in the front and in the back
  • I want to know how that difference compares to differences I would measure if there is no difference
  • The GPAs I collected is my best picture of what the distribution of GPAs is like at Hope
  • To simulate the null hypothesis being true (no difference between front and back), I could mix up your GPAs and hand them back to you
  • Then I could collect them again and measure the difference in means between front and back
  • If I do this many times it will give me a good idea of what the differences would look like if the null hypothesis is true (the null distribution)
  • Mixing up the values of the response variable as in the GPA example is called random permutation
  • I can use random permutation to create a null distribution
  • Usually we will do this with a computer, because we want to calculate the statistic for 1,000 or 10,000 random permutations

Here is the original soda data with 5 random permutations.

Random Shuffle

  • Now let’s simulate 100 samples assuming true null hypothesis
  • We’ll calculate a difference in proportions for each permutation
  • Use infer package
set.seed(8675309)

library(infer)
soda_perm <- soda |> 
  specify(drink ~ location, success = "Cola") |>
  hypothesize("independence") |>
  generate(reps = 100, type = "permute") |>
  calculate(stat = "diff in props", order = c("East", "West"))

Dot plot of 100 differences in randomized proportions (null distribution), showing observed difference as dashed vertical line.

Red dots are as large or larger than the observed test statistic.

p-Value

  • To test the null hypothesis (\(p_E-p_W = 0\)) we consider how probable it would be to get a difference in proportions that is at least as large as the observed difference if \(H_0\) is true
  • This probability is called a p-value
  • We use the null distribution to calculate the p-value

There are 28 differences in randomization proportions that are greater than or equal to the observed value (0.09276). So we estimate the p-value to be 28/100 = 0.28.

The p-value is the proportion of red dots.

Significance Level

  • Before we conduct a study, we define a significance level, denoted \(\alpha\)
  • We decide that in order to reject the null hypothesis as false, the p-value must be less than \(\alpha\)
  • The significance level is the standard of evidence we will use to judge the null hypothesis
  • We presume the null hypothesis is true, but we are willing to reject it if the evidence against it is strong enough (the p-value is less than \(\alpha\))
  • Typical values for \(\alpha\) are 0.05 and 0.01
  • Sometimes other values are used
  • Unless otherwise noted, we will always use \(\alpha = 0.05\)
  • The significance level \(\alpha\) is the probability of rejecting the null hypothesiswhen it is true
  • The error that you make in this case is called Type I Error

Conclusion

  • In the soda example, the observed difference in proportions (\(\hat{p}_E-\hat{p}_W = 0.09276\)) does not allow us to reject the null hypothesis (p = 0.28) at the \(\alpha = .05\) significance level.
  • So, our formal conclusion is that we failed to reject the null hypothesis
  • The difference in the proportions is not statistically significant
  • This means that it is plausible that there is no difference in the proportions of people who prefer cola to orange soda between the East and West Coast.