Compare Paired Means

IMS2 Ch. 21
Math 115

Yurk

Textbook Prices

  • Will you save money if you buy textbooks from Amazon instead of a university bookstore?
  • We will compare prices of books from Amazon and the UCLA bookstore
  • For each book in the data we will calculate the difference between the book’s price at the UCLA bookstore and its price on Amazon
  • Since our data consists of a single difference for each book, the analysis will be similar to the single mean case

Inference

  • We will estimate the difference mean difference in book price \(\mu_{diff}\) using a confidence interval
  • We will conduct a hypothesis test with hypotheses
    • \(H_0: \mu_{diff} = 0\)
    • \(H_A: \mu_{diff} \neq 0\)
  • We will calculate differences with the order UCLA - Amazon

Data

  • ucla_textbooks_f18 dataset is available here
  • Sample of 68 books used in courses at UCLA in 2018
  • bookstore_new is price of new book at bookstore
  • amazon_new is price of new book on Amazon
  • One way to analyze the data would be to treat the books on Amazon and the books at the bookstore as two groups. Then we could compare the difference in the group means
  • Each observation would be a book on Amazon or a book at the bookstore
  • This ignores the paired structure of the data (observations are not independent)
  • Analysis would be inappropriate and have lower power
  • By analyzing the difference in price, we account for the paired structure
  • Each observation is a different book
  • We can create a new variable in Jamovi that gives the price difference for each book (UCLA bookstore - Amazon)
  • The next slide shows the data with the new variable, price_diff

EDA

Price differences (USD) between UCLA bookstore and Amazon for 68 books.
n mean median sd iqr
68 3.58 0.625 13.4 3.98
  • The observed mean difference is \(\bar{x}_{diff}=3.58\)
  • Based on the shape of the distribution, you could easily argue that the median is a more appropriate measure of center!

Hypothesis Test Using Random Permutation

  • We can use randomization to simulate variability in the statistic under a true null hypothesis
  • To simulate independence between price and bookseller, we randomly reassign the book prices for each book
  • E.g., here are the data for the first book
subject course_num bookstore_new amazon_new price_diff
American Indian Studies M10 47.97 47.45 0.52
  • Random reassignment results in one of two possible outcomes: original prices or swapped prices
subject course_num bookstore_new amazon_new price_diff
American Indian Studies M10 47.97 47.45 0.52

Or

subject course_num bookstore_new amazon_new price_diff
American Indian Studies M10 47.45 47.97 -0.52
  • We can think of the randomization as flipping a coin for each book to determine which of the two assignments will occur in the randomized sample
  • We can use the Randomize module in Jamovi to perform the randomization
  • Let’s create 1,000 random permutations of the data

Histogram of 1,000 mean of randomized differences (null distribution). Dashed vertical line at 3.58 (observed mean difference).

  • There were 17 randomized mean difference that were at least as large as the observed mean difference (a proportion of 17/1,000 = 0.017)
  • Since this is a two-sided test, the p-value is twice this proportion \[\text{p-value}=2\times0.017=0.034\]
  • We reject the null hypothesis, and conclude that Amazon prices are, on average, different from UCLA bookstore prices

Bootstrap Confidence Intervals

  • We can calculate bootstrap percentile confidence intervals using the same approach as in the singe mean case
  • We resample the price differences (UCLA - Amazon) from the sample with replacement to simulate the variability in the statistic
  • We can use the randomize module in Jamovi to do this

Histogram of 1,000 means of bootstrapped differences

  • The 95% bootstrap percentile confidence interval for the mean price difference is \((\$0.81, \$7.05)\).

Hypothesis Test Using a Mathematical Model

  • We can use the same mathematical model as the single mean case to conduct a hypothesis test
  • The standard error for the mean difference is \[SE_{diff}=\frac{s_{diff}}{\sqrt{n_{diff}}}=\frac{13.4}{\sqrt{68}}=1.62\]
  • The \(T\) statistic is \[T=\frac{\bar{x}_{diff}-0}{SE_{diff}}=\frac{3.58-0}{1.63}=2.20\]
  • The degrees of freedom are \(df = 68-1=67\)
  • The p-value is calculated by finding the areas below and above the observed value of \(T\) (\(\geq2.2\)). The two-sided p-value is twice the smaller area.
  • Since the \(t\)-distribution is symmetric, the p-value can also be calculated as total area that is \(\leq-2.2\) or \(\geq2.2\)
  • We can use the Randomize module in Jamovi to calculate this area

The \(t\)-distribution with 67 degrees of freedom. The observed \(T\)-statistic is 2.2. The p-value is the total area to the left of -2.2 or to the right of 2.2 (red areas).

  • The resulting p-value is 0.0313

Confidence Interval Using a Mathematical Model

  • We can also use a mathematical model to calculate confidence intervals

  • The interval is \[\bar{x}_{diff}\pm t^{\ast}_{df}\times SE_{diff}\]

  • We can use the Randomize module in Jamovi to calculate \(t^{\ast}_{67}=1.996\) for a 95% CI

  • A 95% CI is given by \(3.58\pm1.996×1.62=(\$0.35, \$6.81)\)