# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -105. 7.54 -13.9 1.50e-37
2 hgt 1.02 0.0440 23.1 2.83e-81
bdims 1 body measurement dataset.
507 physically active individuals (247 men, 260 women)
age, weight (wgt), height (hgt), sex, 21 body girth variables (e.g., hip girth)
Observations of wgt vs. hgt and least squares line for the entire population.
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -105. 7.54 -13.9 1.50e-37
2 hgt 1.02 0.0440 23.1 2.83e-81
bdims dataObservations of wgt vs. hgt and least squares line for first sample of 20.
Sample 1
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -186. 47.3 -3.94 0.000964
2 hgt 1.51 0.276 5.46 0.0000346
Observations of wgt vs. hgt and least squares lines for first two samples of 20.
Sample 2
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -119. 42.8 -2.77 0.0125
2 hgt 1.10 0.247 4.47 0.000299
Observations of wgt vs. hgt and least squares lines for first three samples of 20.
Sample 3
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -117. 26.2 -4.46 0.000299
2 hgt 1.07 0.151 7.05 0.00000140
Least squares lines for 100 random samples of 20.
Dotplot of slopes of least squares lines from 100 random samples.
# A tibble: 1 × 3
n mean sd
<int> <dbl> <dbl>
1 100 1.01 0.221
We can form a 95% bootstrap CI for the slope:
# A tibble: 1 × 2
ci_lo ci_hi
<dbl> <dbl>
1 0.606 1.46
Note
When the null hypothesis is true and the following conditions are met, the \(T\) score has a \(t\)-distribution with \(df=n-2\) degrees of freedom.
One way to check conditions is to look at residual plots.
Linearity? Independent observations? Normality of residuals? Constant variability?
restNYC dataset1Price (USD, includes tip and drink)Food (rating: 1 to 30)Scatter plot of Price vs Food with least squares line.
Linearity? Independent observations? Normality of residuals? Constant variability?
Residual plot.
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -17.8 5.86 -3.04 2.74e- 3
2 Food 2.94 0.283 10.4 9.63e-20
Price) to simulate the null hypothesisPrice and FoodHistogram of slopes from different random permultations of Price (null distribution).
p-value \(\approx0\)
Histogram of slopes from bootstrapped data.
95% bootstrap percentile confidence interval: (2.38, 3.45)
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 2.38 3.45