# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -105. 7.54 -13.9 1.50e-37
2 hgt 1.02 0.0440 23.1 2.83e-81
bdims 1 body measurement dataset.
507 physically active individuals (247 men, 260 women)
age, weight (wgt), height (hgt), sex, 21 body girth variables (e.g., hip girth)
Observations of wgt vs. hgt and least squares line for the entire population.
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -105. 7.54 -13.9 1.50e-37
2 hgt 1.02 0.0440 23.1 2.83e-81
Let us run a test to see if there significant evidence of the linear relationship between weight and height
Since the direction of the test is not indicated, we will use a two-sided alternative: \[H_0:\beta_1=0\] \[H_A:\beta_1 \ne 0\]
We can randomly permute the value of the response (wgt) to simulate the null hypothesis
Each time, compute the slope of the relationship between Wgt and hgt
p-value \(\approx0\)
# A tibble: 1 × 2
num_extreme pval
<int> <dbl>
1 0 0
Note
When the null hypothesis is true and the following conditions are met, the \(T\) score has a \(t\)-distribution with \(df=n-2\) degrees of freedom.
One way to check conditions is to look at residual plots.
Linearity? Independent observations? Normality of residuals? Constant variability?
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -105. 7.54 -13.9 1.50e-37
2 hgt 1.02 0.0440 23.1 2.83e-81
bdims dataObservations of wgt vs. hgt and least squares line for first sample of 20.
Sample 1
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -186. 47.3 -3.94 0.000964
2 hgt 1.51 0.276 5.46 0.0000346
Observations of wgt vs. hgt and least squares lines for first two samples of 20.
Sample 2
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -119. 42.8 -2.77 0.0125
2 hgt 1.10 0.247 4.47 0.000299
Observations of wgt vs. hgt and least squares lines for first three samples of 20.
Sample 3
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -117. 26.2 -4.46 0.000299
2 hgt 1.07 0.151 7.05 0.00000140
Least squares lines for 100 random samples of 20.
# A tibble: 1 × 3
n mean sd
<int> <dbl> <dbl>
1 100 1.01 0.221
Based on a 100 simulations, we can form a 95% bootstrap CI for the slope:
# A tibble: 1 × 2
ci_lo ci_hi
<dbl> <dbl>
1 0.606 1.46
95% bootstrap percentile confidence interval: \((0.933, 1.10)\)
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 0.933 1.10
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -105. 7.54 -13.9 1.50e-37
2 hgt 1.02 0.0440 23.1 2.83e-81
restNYC dataset1Price (USD, includes tip and drink)Food (rating: 1 to 30)Scatter plot of Price vs Food with least squares line.
Linearity? Independent observations? Normality of residuals? Constant variability?
Residual plot.
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -17.8 5.86 -3.04 2.74e- 3
2 Food 2.94 0.283 10.4 9.63e-20
Price) to simulate the null hypothesisPrice and FoodHistogram of slopes from different random permultations of Price (null distribution).
p-value \(\approx0\)
Histogram of slopes from bootstrapped data.
95% bootstrap percentile confidence interval: (2.38, 3.45)
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 2.39 3.48