topic14

Inference for Contingency Tables

Chi-square Test

Math 115

Extending Our Framework

Previously, we compared two groups with a binary outcome (2×2 table).

Used difference in proportions: \(\hat{p}_1 - \hat{p}_2\)
Used Z-statistic for hypothesis testing

But what if we have:

More than 2 groups? (e.g., Dem, Ind, Rep)
Non-binary outcomes? (e.g., Spending: Too Little, About Right, Too Much)

We need a new approach!

Two Ways to Generalize

Generalization 1: More groups

Explanatory variable has 3+ categories
Example: Democrat, Independent, Republican

Generalization 2: Non-binary response

Response variable has 3+ categories
Example (Government Spending): “Too Little”, “About Right”, “Too Much”

With either generalization, we can’t use a single difference in proportions!

The Hypothesis Framework

Same concept as before, different statistic:

\(H_0\): No association between the variables (independence)
\(H_A\): There is an association (variables are not independent)

New statistic: Chi-square, \(\chi^2\), measures deviation from independence.

Government Spending

Research question: Is political party associated with opinions on military spending?

Explanatory: Party (Dem, Ind, Rep) — 3 categories
Response: Opinion on military spending (“Too Little”, “About Right”, “Too Much”) — 3 categories
Data: General Social Survey, 2016 (n = 149)

This creates a 3×3 contingency table.

Military Spending

Hypotheses stated in terms of an association
- \(H_0\): There is no association between opinions on government spending on national defense and political affiliations.
- \(H_A\): There is an association between opinions on government spending on national defense and political affiliations.
Hypotheses stated in terms of differences
- \(H_0\): There is no difference in opinions on government spending on national defense between people with different political affiliations.
- \(H_A\): There is some difference in opinions on government spending on national defense between people with different political affiliations.

The Contingency Table

Party	Too Little	About Right	Too Much	Total
Dem	17	14	12	43
Ind	20	28	24	72
Rep	24	8	2	34
Total	61	50	38	149

Beyond simple \(2\times 2\) tables, we can’t compute just one difference in proportions.

We need to compare all observed counts to what we’d expect under \(H_0\).

What Would We Expect Under \(H_0\)?

If \(H_0\) is true (no association), each party should have the same distribution of opinions.

Expected count for any cell:

\[\text{Expected} = \frac{\text{row total} \times \text{column total}}{\text{grand total}}\]

This distributes the overall opinion proportions equally across all parties.

Expected Counts

Example: Democrats who say “Too Little”

\[\text{Expected} = \frac{43 \times 61}{149} = 17.6\]

Party	Too Little	About Right	Too Much
Dem	17 (17.6)	14 (14.4)	12 (11)
Ind	20 (29.5)	28 (24.2)	24 (18.4)
Rep	24 (13.9)	8 (11.4)	2 (8.7)

Observed (Expected) — How different are they?

The Chi-Square Statistic

Measures total deviation of observed from expected:

\[\chi^2 = \sum_{\text{all cells}} \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}}\]

Why this formula?

Squaring: Makes all contributions positive
Dividing by Expected: Standardizes (a difference of 5 matters more when Expected is 10 than when Expected is 100)

Computing \(\chi^2\)

For each cell, compute \(\frac{(O - E)^2}{E}\):

Party	Too Little	About Right	Too Much
Dem	\(\frac{(17 - 17.6)^2}{17.6} = 0.02\)	\(\frac{(14 - 14.4)^2}{14.4} = 0.01\)	\(\frac{(12 - 11)^2}{11} = 0.10\)
Ind	\(\frac{(20 - 29.5)^2}{29.5} = 3.05\)	\(\frac{(28 - 24.2)^2}{24.2} = 0.61\)	\(\frac{(24 - 18.4)^2}{18.4} = 1.73\)
Rep	\(\frac{(24 - 13.9)^2}{13.9} = 7.30\)	\(\frac{(8 - 11.4)^2}{11.4} = 1.02\)	\(\frac{(2 - 8.7)^2}{8.7} = 5.13\)

\[\chi^2 = 0.02 + 0.01 + 0.10 + 3.05 + 0.61 + 1.73 + 7.30 + 1.02 + 5.13 = 18.97\]

What Does \(\chi^2\) Tell Us?

\(\chi^2 = 0\) means observed = expected exactly (perfect independence)
Larger \(\chi^2\) means greater deviation from independence
Larger \(\chi^2\) → stronger evidence against \(H_0\)

But how large is “large enough” to reject \(H_0\)?

We need a reference distribution → chi-square distribution

Randomization Test for Independence

We can randomly permute the response (opinion) to simulate the null hypothesis being true
For each permuted sample, we calculate value of the \(\chi^2\) statistic
Let’s construct a null distribution for the military spending question

Here is the original GSS data with 5 random permutations.

# A tibble: 149 × 8
      id natarms    party randPerm1 randPerm2 randPerm3 randPerm4 randPerm5
   <int> <fct>      <chr> <fct>     <fct>     <fct>     <fct>     <fct>    
 1     1 TOO LITTLE Ind   Rep       Ind       Ind       Ind       Dem      
 2     2 TOO MUCH   Ind   Dem       Dem       Ind       Rep       Ind      
 3     3 TOO MUCH   Dem   Dem       Rep       Rep       Rep       Dem      
 4     4 TOO MUCH   Ind   Rep       Ind       Dem       Rep       Rep      
 5     5 TOO MUCH   Ind   Ind       Dem       Rep       Ind       Ind      
 6     6 TOO MUCH   Ind   Dem       Rep       Rep       Dem       Ind      
 7     7 TOO MUCH   Ind   Dem       Dem       Ind       Ind       Dem      
 8     8 TOO MUCH   Dem   Ind       Ind       Ind       Dem       Rep      
 9     9 TOO LITTLE Dem   Ind       Dem       Ind       Rep       Ind      
10    10 TOO LITTLE Ind   Dem       Dem       Ind       Rep       Ind      
# ℹ 139 more rows

Here is the dotplot of the corresponding \(\chi^2\) statistics.

Here is the resulting histogram of 1000 simulations

Histogram of \(X^2\) statistics for 1,000 random permutations. Observed value (\(18.97\)) indicated by dashed vertical line.

Note that the shape of the histogram is neither symmetric nor bell-shaped. In fact, it only uses non-negative values

The p-value is always in the in the right tail (as large or larger than observed \(\chi^2\) stat)
From the histogram, there were no values of \(\chi^2\) that were as extreme as the observed value
So the p-value is approximately 0
We reject null hypothesis
In the context of the problem, we can conclude that there is strong evidence of an association between opinions on military spending and political party (the two variables are not independent)

Test for Independence Using a Mathematical Model

Chi-squared test for assessing independence between categorical variables

When the null-hypothesis is true and the following conditions are met, \(X^2\) has a Chi-squared distribution with \(df=(r-1)\times(c-1)\) degrees of freedom:

Independent observations
Large samples: at least 5 expected counts in each cell

\(r\) is the number of rows and \(c\) is the number of columns in the two-way table (no totals)

Degrees of Freedom

For contingency tables:

\[df = (\text{rows} - 1) \times (\text{columns} - 1)\]

For our 3×3 table:

\[df = (3-1) \times (3-1) = 2 \times 2 = 4\]

Intuition: After fixing row and column totals, only \(df\) cells can vary freely.

The contingency table satisfies the large samples condition (at least 5 expected counts in each cell)

\(\chi^2(4)\) distribution
Overlay \(\chi^2(4)\) to histogram

The Chi-Square Distribution

Always right-skewed, starts at 0
As df increases, distribution shifts right

The Null Distribution

When conditions are met, we can use the chi-square distribution as our reference:

Use the chi-square distribution with \(df = (r-1) \times (c-1)\)
This distribution describes what \(\chi^2\) values we’d expect if \(H_0\) is true

Computing the p-value:

P-value = area to the right of observed \(\chi^2\) under the chi-square curve
Larger \(\chi^2\) values are more extreme → stronger evidence against \(H_0\)

Conditions for Chi-Square Test

Independence:

Random sample ✓
Observations independent of each other ✓

Expected Counts (under \(H_0\)):

At least 5 expected in each cell ✓

Check expected counts, not observed counts!

Finding the P-value

Chi-squared distribution with \(df=4\)

P-value < 0.001

Conclusion: Military Spending

Results:

\(\chi^2 = 18.97\)
df = 4
P-value < 0.001

Result:

As with the randomization-based test, the p-value is very small (<0.001) for the military spending question

Conclusion
- We reject null hypothesis in this case
- There is strong evidence that opinion on military spending and political affiliation are associated
- We can generalize these results to a larger population since it was a representative sample
- We cannot draw cause-and-effect conclusion since it was an observational study

Note: The chi-square test tells us there IS an association, but not the direction or which groups differ.

Second Example: Space Exploration

Same parties, same opinion categories, but asking about space exploration spending.

Party	Too Little	About Right	Too Much	Total
Dem	8	22	13	43
Ind	13	37	22	72
Rep	9	17	8	34

\(\chi^2 = 1.33\), df = 4, P-value = 0.857

Comparing the Two Results

Issue	\(\chi^2\)	P-value	Conclusion
Military	18.97	< 0.001	Strong association
Space	1.33	0.857	No evidence of association

Same groups, different topics, very different conclusions!

Political party is associated with military spending views
No evidence that political party is associated with space exploration views

Goodness-of-Fit Test (Optional)

The Chi-square goodness-of-fit test checks if observed categorical data fit an expected distribution.
Formula: \[\chi^2 = \sum \frac{(Obs - Exp)^2}{Exp}\]
Used for genetics, health studies, or marketing data.
Example: Do observed blood type frequencies in a population match known distribution?

Blood Type

Assume that the expected probabilities of various blood types in the general population are:
A = 0.40, B = 0.11, AB = 0.04, O = 0.45

Suppose we have a random sample of 350 people with the following observed blood types:

Blood Type	Observed Count
A	170
B	120
AB	30
O	80
Total	350

Research Question: Is there significant evidence that the distribution of blood types in the sample is different from the population’s distribution?

We need to calculate the \(\chi^2\)-statistic for this data set.
The expected counts are calculated as “Sample Size” \(\times\) “Assumed Probability”

Blood Type	Observed Count	Expected Counts
A	155	350*0.40 = 140
B	40	350*0.11 = 38.5
AB	15	350*0.04 = 14
O	140	350*0.45 = 157.5

If the validity conditions are met (Expected counts \(\ge\) 5) then the distribution of the test statistic is \(\chi^2(d-1)\), where \(d\) is the number of values in the categorical variable. (In our example \(d = 4\))

Chi-square Calculation and p-value

The value of the \(\chi^2\) statistic is:\[\chi^2=\frac{(155-140)^2}{140}+\frac{(40-38.5)^2}{38.5}+\frac{(15-14)^2}{14}+\frac{(140-157.5)^2}{157.5}=3.6815\]

# Observed data
observed <- c(A = 155, B = 40, AB = 15, O = 140)

# Assumed distribution of proportions
expected_prop <- c(A = 0.40, B = 0.11, AB = 0.04, O = 0.45)

# Expected counts
total <- sum(observed)
expected <- total * expected_prop

# Chi-square statistic
chi_sq <- sum((observed - expected)^2 / expected)
chi_sq

[1] 3.681457

# Degrees of freedom = (categories - 1)
df <- length(observed) - 1
df

[1] 3

# p-value
p_val <- pchisq(chi_sq, df, lower.tail = FALSE)

Chi-Square distribution and p-value

Chi_Square	DF	P_Value
3.6815	3	0.298

Interpretation

If p-value < 0.05 → observed distribution significantly differs from expected.
If p-value > 0.05 → we don’t have significant evidence that the observed distribution significantly differs from expected.
Here, p-value indicates whether the sample matches the expected blood type proportions.

Summary

Chi-square test extends two-proportion test to larger tables:

Component	Formula/Value
Expected count	\(\frac{\text{row total} \times \text{column total}}{\text{grand total}}\)
Chi-square statistic	\(\chi^2 = \sum \frac{(O - E)^2}{E}\)
Degrees of freedom	\(df = (r-1) \times (c-1)\)
P-value	Right tail of chi-square distribution

Conditions: Independence + Expected counts ≥ 5 in each cell

Connection to Topic 10

	Two Proportions	Chi-Square Test
Table size	2×2 only	Any r×c
Test statistic	Z	\(\chi^2\)
Null distribution	Normal	Chi-square
What we test	\(p_1 - p_2 = 0\)	Independence
Direction	Can be one-sided	Always two-sided

Chi-square generalizes the two-proportion test to larger tables!

(For 2×2 tables, both methods give equivalent results)

References

Introduction to Modern Statistics (2e) textbook by Mine Çetinkaya-Rundel and Johanna Hardin
Section 18.2