We will consider two data sets:
payday)consult)We will use the first data set as an example of an inference based on a mathematical model and second as an example of a simulation based inference
The decision in each case is based on the success-failure condition (also known as the validity condition) of the Central Limit Theorem
Sampling distribution of \(\hat{p}\)
The sampling distribution of \(\hat{p}\) based on a sample of size \(n\) from a poplation with true proportion \(p\) will be approximately normal with mean \(p\) and standard error \[SE=\sqrt{\frac{p(1-p)}{n}}\]
if the following technical conditions are met:
In words:
\(H_0:\) the proportion of all payday borrowers in MI that support additional regulation is 50%.
\(H_A:\) The majority of all payday borrowers (more than 50%) in MI that support additional regulation.
In symbols:
\(H_0: p = 0.5\)
\(H_A: p > 0.5\)
Data
payday data set\[Z = \frac{\hat{p}-p_0}{SE(\hat{p})}=\frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\]
Normal model, \(N(0,1)\). P-value is area of shaded region.
Medical Consultants
Is her claim supported?
Hypotheses:
Data:
consult datasetConsultant study
Parametric bootstrap simulation is equivalent to the following physical simulation:
We can do the bootstrapping using the infer package
Here are the first 100 simulations with the test statistic (\(\hat{p} = 0.048\)) indicated by a dashed line

# A tibble: 1 × 2
n_extreme p_val
<int> <dbl>
1 14 0.014

# A tibble: 1 × 2
ci_lo ci_hi
<dbl> <dbl>
1 0 0.113
We are 95% confident that the consultant’s long-run complication rate is between 0 and 0.113
Note that the value we used in the null hypothesis (0.15) is not in the confidence interval, confirming that we rejected 15% as a plausible value for the parameter of interest using significance level \(\alpha=0.05\)