Blocking in Experiments

Controlling Variation in Experiments
Math 215

Experiments with two factors

  • A factor is a categorical variable that we think may explain some of the variability in the response variable
  • The interaction of two factors implies simultaneous change in the levels of both factors
    • Changes in the level of factor A results in different changes in the response for different levels of factor B
  • A treatment factor is a factor that we want to evaluate to see if there is an effect
  • A blocking factor is a factor that we believe will have an effect
  • The treatment factor is the factor of interest
  • The blocking factor is used to reduce the unexplained variability in the response
  • The “hotdogs” example we looked at in the previous topic:
    • There was a lot of “within group” variation
    • It can be difficult to detect treatment effect
  • We can reduce the variation in a study by imposing an inclusion criterion (for example use only men aged 45-55 in the study)
  • It will also limit generalizabilty of the study
  • Alternatively, we can diversify observational units (for example, include gender as a factor)
  • We also need to separate variation due to this factor from other sources and make sure that this variable is not confounded
  • An example of this approach is the matched pairs design
    • We can keep track of three sources of variation: due to explanatory variable, variation between pairs and unexplained variation
  • We will expand this idea to a more general block designs
    • More than two repeated measures and/or grouping of observational units
    • Assigning the treatment conditions to each group

Block Design

  • A block design creates blocks of experimental units that are similar to each other
  • Randomly assigns the treatments within each block
  • Analyzes the data in a way which accounts for block-to-block variations
  • When there are only two groups being compared then it is called a matched pair design
  • The term comes from agricultural experiments in large fields where separate parts of the field were called “blocks”
  • An experiment with two factors may have two treatment factors (in this case we use Two-way ANOVA, more on this later)
  • Or it may have a treatment factor and a blocking factor

Experimental Designs

Randomize complete block design (RCBD)

  • One of the most used experimental designs
  • Groups similar experimental units into blocks or replicates
  • The blocks of experimental units should be as uniform as possible

One treatment factor with \(t\) levels, one blocking factor with \(b\) levels

  • Each block has exactly \(t\) experimental units (one for each treatment level)
  • Experimental units in same block expected to respond similarly if treated similarly
  • Matched pair design: Each block contains \(t=2\) units. Blocking factor is the pairing criterion
  • Repeated measures design: Each block is a single individual, subject to all \(t\) treatments. Blocking factor is each individual

Factorial design (more on this later)

  • More general than RCBD
  • Two or more factors
  • One or more observations per cell (replication)
  • Examining several factors simultaneously
  • Each level of one independent variable is combined with each level of the others to produce all possible combinations.
  • Each combination, then, becomes a condition in the experiment.

Strawberries

  • Does changing the air in which strawberries are stored affect the firmness of the berries?
  • Data from Smith and Skog (1992) 1
  • Inspired by an example from Tintle et al. (2020)
  • Response: Firmness (force in N to pierce berry)

  • Treatment factor: Storage with 3 levels (\(t=3\)):

    • “Control” (not stored)
    • “Air” (21% O2, 0.04% CO2)
    • “ModifiedAir” (15% CO2, 18% O2)
  • Blocking factor: Variety with 5 levels (\(b=5\)): “Allstar”, “Bounty”, “Kent”, “Selva”, “Vesper”

  • Stored berries were stored at \(0.5^{\circ}\)C for two days

  • Three clamshells of each variety randomly assigned to treatments (Storage)

Statistical Model

\[y_{ij}=\mu+\tau_i+\rho_j+\varepsilon_{ij}\]

  • \(\mu\) is the overall mean
  • \(\tau_i\) is the differential effect of treatment \(i\)
  • \(\rho_j\) is the differential effect of block \(j\)
  • \(\varepsilon_{ij}\sim N(0,\sigma^2)\) represents the error
  • This is an additive model: the treatment is assumed to have the same effect in each block

Hypothesis Test

  • We expect firmness to vary according to variety (blocking factor)

  • However, we are interested in the effect of the air type (treatment)

  • We conduct a hypothesis test for the treatment variable:

    • \(H_0: \tau_1 = \tau_2 = \tau_3 = 0\)
    • \(H_A:\) At least one \(\tau_i\) is different

EDA

Strawberry firmness for different storage conditions.

Firmness for different strawberry varieities.

Firmness for different strawberry varieities with different storage conditions.

ANOVA Table (Not acconting for Variety)

  • First we consider ANOVA without accounting for Variety
  • Standard ANOVA model \[y_{ij}=\mu+\tau_i+\varepsilon_{ij}\]
lm(Firmness ~ Storage, data = strawberries) |>
  anova() |>
  tidy()
# A tibble: 2 × 6
  term         df sumsq meansq statistic p.value
  <chr>     <int> <dbl>  <dbl>     <dbl>   <dbl>
1 Storage       2  11.2   5.59     0.995   0.398
2 Residuals    12  67.4   5.62    NA      NA    
  • We are unable to detect a treatment effect
  • According to the results, there is no significant difference in Firmness between different Storage groups
  • Much of the variability in firmness is due to differences between varieties and is left unexplained

ANOVA Table (Accounting for variety)

  • This time we account for Variety
  • Statistical model that includes blocking \[y_{ij}=\mu+\tau_i+\rho_j+\varepsilon_{ij}\]
lm(Firmness ~ Variety + Storage, data = strawberries) |>
  anova() |>
  tidy()
# A tibble: 3 × 6
  term         df sumsq meansq statistic    p.value
  <chr>     <int> <dbl>  <dbl>     <dbl>      <dbl>
1 Variety       4 63.5  15.9        32.4  0.0000547
2 Storage       2 11.2   5.59       11.4  0.00455  
3 Residuals     8  3.93  0.491      NA   NA        
# A tibble: 3 × 6
  term         df sumsq meansq statistic    p.value
  <chr>     <int> <dbl>  <dbl>     <dbl>      <dbl>
1 Variety       4 63.5  15.9        32.4  0.0000547
2 Storage       2 11.2   5.59       11.4  0.00455  
3 Residuals     8  3.93  0.491      NA   NA        
  • Note that the value of the F-statistic is still calculated as MSG/MSE
  • For example
    • F-statistic for the difference of means of Variety is \(15.9/0.491=32.4\)
    • F-statistic for the difference of means of Storage is \(5.59/0.491=11.4\)
  • Now the treatment effect is apparent
  • We reject \(H_0\) and conclude that there is convincing evidence of a treatment effect.
  • Since we rejected the null hypothesis, we can perform the follow-up analysis

Pairwise Comparisons

  • We can follow up with pairwise comparisons between different treatment levels
  • Adjust for multiple comparisons
aov(Firmness ~ Variety + Storage, data = strawberries) |>
  TukeyHSD("Storage")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Firmness ~ Variety + Storage, data = strawberries)

$Storage
                    diff        lwr      upr     p adj
Control-Air         0.10 -1.1659048 1.365905 0.9723992
ModifiedAir-Air     1.88  0.6140952 3.145905 0.0070490
ModifiedAir-Control 1.78  0.5140952 3.045905 0.0095586

There is a significant difference between the mean firmness for “Modified Air” and “Air” and between “Modified Air” and “Control”

Scope of Inference

  • Because this was a randomized experiment, we can conclude that the difference in storage method caused the difference in firmness
  • Storing strawberries in air enriched in CO2 increases firmness compared to not storing the berries at all or storing them in normal air
  • Choosing varieties with large variability in firmness (large variation between blocks) broadens the scope of inference
  • For example, we could have controlled unexplained variability by focusing on a single variety
  • Then our conclusions would be limited to that single variety