| align | No Dual | Public | Secret | Unknown | Total |
|---|---|---|---|---|---|
| Bad | 453 | 2,106 | 4,352 | 7 | 6,918 |
| Good | 640 | 2,905 | 2,430 | 0 | 5,975 |
| Neutral | 377 | 946 | 908 | 2 | 2,233 |
| Reformed Criminals | 0 | 1 | 1 | 0 | 2 |
| Total | 1,470 | 5,958 | 7,691 | 9 | 15,128 |
Math 115
Previously: Examined variables individually
Now: Examine relationships between two variables
When we examine two variables together:
Our goal: Determine if two categorical variables are associated

A contingency table summarizes the relationship between two categorical variables:
Also called a two-way table or cross-tabulation
| align | No Dual | Public | Secret | Unknown | Total |
|---|---|---|---|---|---|
| Bad | 453 | 2,106 | 4,352 | 7 | 6,918 |
| Good | 640 | 2,905 | 2,430 | 0 | 5,975 |
| Neutral | 377 | 946 | 908 | 2 | 2,233 |
| Reformed Criminals | 0 | 1 | 1 | 0 | 2 |
| Total | 1,470 | 5,958 | 7,691 | 9 | 15,128 |
Raw counts alone don’t reveal patterns clearly…
Conditional proportions: Proportions calculated within groups
Formula (conditioned on rows): (count in cell) / (row total)
Example:
\[\begin{array}{c} \text{Proportion of} \\ \text{bad characters with} \\ \text{secret identities} \end{array} = \frac{\text{Bad characters with secret identity}}{\text{Total bad characters}}\]
| align | No Dual | Public | Secret | Unknown |
|---|---|---|---|---|
| Bad | 0.065 | 0.304 | 0.629 | 0.001 |
| Good | 0.107 | 0.486 | 0.407 | 0.000 |
| Neutral | 0.169 | 0.424 | 0.407 | 0.001 |
| Reformed Criminals | 0.000 | 0.500 | 0.500 | 0.000 |
Key finding: Bad characters appear to be more likely to have secret identities (62.9%) compared to good characters (40.7%)
Three types of bar plots for two categorical variables:
Standardized bar plots are most useful for detecting associations
Standardized bar plot showing identity proportions by alignment
Clearly shows bad characters have higher proportion of secret identities
When we suspect one variable may affect another:
Not all studies have clear explanatory/response roles
Researchers studied stent effectiveness for preventing strokes:
| group | outcome |
|---|---|
| control | no event |
| treatment | no event |
| treatment | stroke |
| treatment | no event |
| control | no event |
Recall from before: The first scope of inference question
NEW: The second scope of inference question
These are TWO SEPARATE QUESTIONS:
| Question | Determined by | Key feature |
|---|---|---|
| Generalization | Sampling method | Random sampling |
| Causation | Study design | Random assignment |
A study can:
Observational study: Researchers observe without manipulation
Experiment: Researchers deliberately manipulate explanatory variable
A surprising finding: Strong positive association between ice cream sales and drowning deaths
Does eating ice cream cause drowning?
No! A confounding variable explains both:
A confounding variable is associated with BOTH the explanatory and response variables
flowchart TD
A[Temperature<br/>confounder] --> B[Ice cream sales]
A --> C[Drowning deaths]
style A fill:#ffcccc,stroke:#cc0000,stroke-width:3px
style B fill:#cce5ff,stroke:#0066cc,stroke-width:2px
style C fill:#cce5ff,stroke:#0066cc,stroke-width:2px
Problem: Creates alternative explanations for observed associations
The solution to confounding: Random assignment
Random assignment: Cases are randomly assigned to treatment groups
Key benefit: Ensures treatment groups are similar on average with respect to all possible confounders
How random assignment worked in the stent study:
Result of random assignment: Treatment and control groups similar on average in:
What if the stent study had been observational?
Problem: Patients with more severe stenosis may be more likely to:
Severity of stenosis
is a confounder:
flowchart TD
A[Severity of stenosis] --> B[Stent treatment]
A --> C[Stroke outcome]
style A fill:#ffcccc,stroke:#cc0000,stroke-width:3px
style B fill:#cce5ff,stroke:#0066cc,stroke-width:2px
style C fill:#cce5ff,stroke:#0066cc,stroke-width:2px
Random assignment breaks the confounder link:
In the actual randomized stent study:
Without random assignment, we couldn’t distinguish:
| Random Sampling | Random Assignment | |
|---|---|---|
| What | How cases are selected from population | How cases are assigned to groups |
| Purpose | Enable generalization | Enable causal conclusions |
| Controls for | Sampling bias | Confounding variables |
These are different concepts that address different questions!
| Group | Stroke | No Event |
|---|---|---|
| control | 28 | 199 |
| treatment | 45 | 179 |
Stroke proportions: Control 12%, Treatment 20%
It turns out that this difference is statistically significant
This was a randomized experiment → Can we conclude stents caused more strokes? Yes! Random assignment rules out confounders.
Question: Does a sign above a recycling bin reduce contamination?
Study design: Students compare two bins over one month
Is this an experiment? No - no manipulation of sign placement
Possible confounder: Location type (public vs. residential spaces may have different behaviors regardless of signs)
Same question, different design:
Now it’s an experiment!
Summarizing relationships:
Key concepts:
Two separate questions about what we can conclude: