Inference: Two-Way Tables

Chapter 18
Math 115

Government Spending

  • Do people that identify as belonging to different U.S. political parties have different views about government spending?
  • We will explore the relationship between party affiliation and opinions on government spending on both national defense and space exploration

Data

  • gss2016 1 dataset contains 149 respondents and available here
  • Subset of General Social Survey (GSS) data from 2016
  • Variables:
    • party (Dem, Ind, or Rep)
    • natarms opinion on current level of government spending on national defense
    • natspac opinion on current level of government spending on space exploration

Two-way Tables

Party TOO LITTLE ABOUT RIGHT TOO MUCH Total
Dem 17 14 12 43
Ind 20 28 24 72
Rep 24 8 2 34
Total 61 50 38 149
Party TOO LITTLE ABOUT RIGHT TOO MUCH Total
Dem 8 22 13 43
Ind 13 37 22 72
Rep 9 17 8 34
Total 30 76 43 149
  • With more than 2 groups, we can’t use a single difference in proportions to compare groups
  • Since there are more than two values for each categorical variable there are no “successes” or “failures”
  • We will use the \(\chi^2\) statistic (chi-squared) to measure the difference between groups

Military Spending

  • Hypotheses stated in terms of an association

    • \(H_0\): There is no association between opinions on government spending on national defense and political affiliations.
    • \(H_A\): There is an association between opinions on government spending on national defense and political affiliations.
  • Hypotheses stated in terms of differences

    • \(H_0\): There is no difference in opinions on government spending on national defense between people with different political affiliations.
    • \(H_A\): There is some difference in opinions on government spending on national defense between people with different political affiliations.

Expected Counts

Party TOO LITTLE ABOUT RIGHT TOO MUCH Total
Dem 17 (17.60) 14 (14.43) 12 (10.97) 43
Ind 20 (29.48) 28 (24.16) 24 (18.36) 72
Rep 24 (13.92) 8 (11.41) 2 (8.67) 34
Total 61 50 38 149
  • First compute the expected cell counts, assuming \(H_0\)
  • Overall proportion of people that said “too little” spending = \(61/149\) = \(0.4094\)
  • If no association between party and opinion, we expect the same proportion of Democrats, Independents and Republicans to have this opinion
    • Expected count for dems with opinion “too little” = \(0.4094\cdot 43 = 17.60\)
    • Expected count for inds with opinion “too little” = \(0.4094\cdot 72 = 29.48\)
    • Expected count for reps with opinion “too little” = \(0.4094\cdot 34 = 13.92\)
  • Easy way to remember how to compute each expected cell count is \[\frac{row\_total \cdot column\_total}{table\_total}\]
Party TOO LITTLE ABOUT RIGHT TOO MUCH
Dem \(\frac{(17 -17.60)^2}{17.60}=0.02\) \(\frac{(14-14.43)^2}{14.43}=0.01\) \(\frac{(12-10.97)^2}{10.97}=0.10\)
Ind \(\frac{(20-29.48)^2}{29.48}=3.05\) \(\frac{(28-24.16)^2}{24.16}=0.61\) \(\frac{(24-18.36)^2}{18.36}=1.73\)
Rep \(\frac{(24-13.92)^2}{13.92}=7.30\) \(\frac{(8-11.41)^2}{11.41}=1.02\) \(\frac{(2-8.67)^2}{8.67}=5.13\)
  • Compute \(\frac{(observed\,count-expected\,count)^2}{expected\,count}\) for each cell
  • Add values to obtain \(\chi^2\) statistic, \[\begin{array}{rcl}\chi^2 &=& 0.02+0.01+0.10 \\ & + & 3.05 + 0.61 + 1.73 \\ & + & 7.30 + 1.02 + 5.13 \\ & =& 18.97\end{array}\]
  • Or just ask Jamovi…
  • Use the Independent Samples analysis listed under the Frequencies menu
  • Drag party into Rows and natarms into Columns

 χ² Tests                              
 ───────────────────────────────────── 
         Value       df    p           
 ───────────────────────────────────── 
   χ²    18.96998     4    0.0007967   
   N          149                      
 ───────────────────────────────────── 

Randomization Test for Independence

  • We can randomly permute the response (opinion) to simulate the null hypothesis being true
  • For each permuted sample, we calculate value of the \(\chi^2\) statistic
  • Let’s construct a null distribution for the military spending question

Here is the original GSS data with 5 random permutations.

Let us begin plotting the resulting \(\chi^2\) values on the dotplot:

Histogram of \(X^2\) statistics for 5 random permutations. Observed value (\(18.97\)) indicated by dashed vertical line.

Here is the resulting histogram of 1,000 simulations

Histogram of \(X^2\) statistics for 1,000 random permutations. Observed value (\(18.97\)) indicated by dashed vertical line.

  • Note that the shape of the histogram is neither symmetric nor bell-shaped. In fact, it only uses non-negative values
  • The p-value is always in the in the right tail (as large or larger than observed \(\chi^2\) stat)
  • From the histogram, there were no values of \(\chi^2\) that were as extreme as the observed value
  • So the p-value is approximately 0
  • We reject null hypothesis
  • In the context of the problem, we can conclude that there is strong evidence of an association between opinions on military spending and political party (the two variables are not independent)

Test for Independence Using a Mathematical Model

Chi-squared test for assessing independence between categorical variables

When the null-hypothesis is true and the following conditions are met, \(X^2\) has a Chi-squared distribution with \(df=(r-1)\times(c-1)\) degrees of freedom:

  1. Independent observations
  2. Large samples: at least 5 expected counts in each cell
  • \(r\) is the number of rows and \(c\) is the number of columns in the two-way table (no totals)
  • Both two-way tables satisfy the large samples condition (at least 5 expected counts in each cell)
  • In both cases there are 3 rows and 3 columns in the table, so \(df=(r-1)\times(c-1)=(3-1)\times(3-1)=4\)

Chi-squared disribution with \(df=4\). Purple line shows observed value for space question. Red line shows observed value for military question.

  • The p-value is the area under the curve that is beyond the observed \(X^2\) value
  • Here the the p-value for the hypothesis test on military spending

Chi-squared disribution with \(df=4\). Red line shows test statistic for military question.
  • Here is Jamovi output

 χ² Tests                              
 ───────────────────────────────────── 
         Value       df    p           
 ───────────────────────────────────── 
   χ²    18.96998     4    0.0007967   
   N          149                      
 ───────────────────────────────────── 

Test Results

  • As with the randomization-based test, the p-value is very small (<0.001) for the military spending question
  • Conclusion
    • We reject null hypothesis in this case
    • There is strong evidence that opinion on military spending and political affiliation are associated
    • We can generalize these results to a larger population since it was a representative sample
    • We cannot draw cause-and-effect conclusion since it was an observational study

Spending on Space Exploration

  • Hypotheses stated in terms of an association

    • \(H_0\): There is no association between opinions on government spending on space exploration and political affiliations.
    • \(H_A\): There is an association between opinions on government spending on space exploration and political affiliations.
  • Hypotheses stated in terms of differences

    • \(H_0\): There is no difference in opinions on government spending on space exploration between people with different political affiliations.
    • \(H_A\): There is some difference in opinions on government spending on space exploration between people with different political affiliations.

Test of Significance

Party TOO LITTLE ABOUT RIGHT TOO MUCH Total
Dem 8 22 13 43
Ind 13 37 22 72
Rep 9 17 8 34
Total 30 76 43 149
Party TOO LITTLE ABOUT RIGHT TOO MUCH Total
Dem 8 (8.66) 22 (21.93) 13 (12.41) 43
Ind 13 (14.50) 37 (36.72) 22 (20.78) 72
Rep 9 (6.85) 17 (17.34) 8 (9.81) 34
Total 30 76 43 149

Use Jamovi to calculate \(\chi^2\) statistic


 χ² Tests                              
 ───────────────────────────────────── 
         Value       df    p           
 ───────────────────────────────────── 
   χ²    1.326060     4    0.8569388   
   N          149                      
 ───────────────────────────────────── 

\(X^2\) statistics for 1,000 random permutations. Observed value (\(1.326\)) indicated by dashed vertical line.

\(X^2\) statistics for 1,000 random permutations. Observed value (\(1.326\)) indicated by dashed vertical line.

Chi-squared disribution with \(df=4\). Dashed red line is Chi-squared test statistic.

 χ² Tests                              
 ───────────────────────────────────── 
         Value       df    p           
 ───────────────────────────────────── 
   χ²    1.326060     4    0.8569388   
   N          149                      
 ───────────────────────────────────── 

Test Results

  • The p-value is quite large (\(p=0.857\)) for the space exploration question
  • Conclusion
    • We failed to reject null hypothesis in this case
    • There is no significant evidence that opinion on government spending on space exploration and political affiliation are associated. It is plausible that these two variables are independent
    • We can generalize these results to a larger population since it was a representative sample
    • We cannot draw cause-and-effect conclusion since it was an observational study

\(X^2\) distributions for different \(df\)

Chi-squared disributions with different degrees of freedom (df).

  • Chi-squared distribution is more peaked for lower \(df\)
  • Thicker tail for higher \(df\)

Hospital Admissions

  • Data set admission is available here
  • The primary goal is to determine if there is a statistically significant association between a patient’s Age Group and the Primary Cause of their Hospital Admission.
  • Variables and Sample: A total of \(n=120\) patient records were sampled for analysis.
  • Two key nominal categorical variables were recorded for each patient:
    • Age Group: Patients were classified into three major life stages.
      • Youth: Patients under 18 years old.
      • Adult: Patients aged 18 to 64 years old.
      • Senior: Patients 65 years and older.
    • Admission: The primary medical reason documented for the patient’s admission to the hospital, categorized into four common groups.
      • Cardiovascular: Heart and circulation-related issues (e.g., heart attacks, strokes).
      • Respiratory: Lung and breathing-related issues (e.g., severe flu, pneumonia).
      • Injury: Trauma or accidents.
      • Infectious: Illnesses caused by bacteria or viruses (e.g., severe infections).

Below is the contingency table:

Age Group Cardiovascular Respiratory Injury Infectious Total
Youth 5 10 20 5 40
Adult 15 5 5 15 40
Senior 5 15 15 5 40
Total 25 30 40 25 120

Research question: Is there an association between the primary cause of a hospital admission and the patient’s age group?

  • State the null and alternative hypotheses
  • Find expected counts for one of the cells using the formula
  • Use Jamovi to find the test statistic and p-value of the test
  • Make full conclusion

Goodness-of-Fit Test (Optional)

  • The Chi-square goodness-of-fit test checks if observed categorical data fit an expected distribution.

  • Formula: \[\chi^2 = \sum \frac{(Obs - Exp)^2}{Exp}\]

  • Used for genetics, health studies, or marketing data.

  • Example: Do observed blood type frequencies in a population match known distribution?

Blood Type Example

  • Assume that the expected probabilities of various blood types in the general population are:
    A = 0.40, B = 0.11, AB = 0.04, O = 0.45

  • Suppose we have a random sample of 350 people with the following observed blood types:
Blood Type Observed Count
A 170
B 120
AB 30
O 80
Total 350
  • Research Question: Is there significant evidence that the distribution of blood types in the sample is different from the population’s distribution?
  • We need to calculate the \(\chi^2\)-statistic for this data set.
  • The expected counts are calculated as “Sample Size” \(\times\) “Assumed Probability”
Blood Type Observed Count Expected Counts
A 155 350*0.40 = 140
B 40 350*0.11 = 38.5
AB 15 350*0.04 = 14
O 140 350*0.45 = 157.5
  • If the validity conditions are met (Expected counts \(\ge\) 5) then the distribution of the test statistic is \(\chi^2(d-1)\), where \(d\) is the number of values in the categorical variable. (In our example \(d = 4\))

Chi-square Calculation and p-value

The value of the \(\chi^2\) statistic is:\[\chi^2=\frac{(155-140)^2}{140}+\frac{(40-38.5)^2}{38.5}+\frac{(15-14)^2}{14}+\frac{(140-157.5)^2}{157.5}=3.6815\]

Chi-Square distribution and p-value

Chi_Square DF P_Value
3.6815 3 0.298

Interpretation

  • If p-value < 0.05 → observed distribution significantly differs from expected.

  • If p-value > 0.05 → we don’t have significant evidence that the observed distribution significantly differs from expected.

  • Here, p-value indicates whether the sample matches the expected blood type proportions.

Summary

  • The Chi-square goodness-of-fit test compares observed and expected categorical frequencies.
  • This test is widely used in biology, genetics, and social sciences.
  • Based on the p-value, this data set does not provide significant evidence that the distribution of the observed blood types differ significantly from known population ratios.