---
title: "Decision Errors"
subtitle: |
  | Chapter 14 
  | Math 215
##author: "Yurk"
format: 
  revealjs:
    theme: beige
editor: source
---

## CPR Study

<!-- Useful: https://quarto.org/docs/presentations/revealjs/ -->

<!-- this, too: https://quarto.org/docs/authoring/markdown-basics.html -->

```{r}
#| include: false
#| echo: false

library(tidyverse)
library(infer)
library(openintro)
```

-   **Research question:** Do blood thinners affect survival rate in heart attack patients that have received CPR?
-   `cpr` [^1] dataset
-   2 variables
    -   *group*: treatment (received blood thinner) or control (did not)
    -   *outcome*: died or survived (for at least 24 hours)
-   90 patients (40 treatment, 50 control, randomly assigned)

[^1]: `cpr` is from the *openintro* package

## Hypotheses

-   Blood thinners can be administered to treat a clot that is causing a heart attack
-   CPR can cause internal injuries
-   Blood thinners can make it more difficult for these injuries to heal
-   Do blood thinners affect survival in a positive or negative way?
-   Alternative hypothesis reflects the fact that we don't have expectations about the direction of the relationship

------------------------------------------------------------------------

Two-sided hypothesis test

-   $H_0$: Blood thinners do not affect survival rate. $p_T-p_C = 0$

-   $H_A$: Blood thinners affect survival rate. $p_T-p_C \neq 0$

Example of one-sided hypothesis test

-   $H_0$: Blood thinners do not affect survival rate. $p_T-p_C = 0$

-   $H_A$: Blood thinners increase survival rate. $p_T-p_C > 0$

## Results (EDA)

::: panel-tabset
### Counts

```{r}
#| include: false
#| echo: false

cpr |>
  count(group, outcome) |>
  pivot_wider(names_from = outcome, values_from = n) |>
  mutate(total = died + survived)
```

| group     | died | survived | total |
|-----------|:----:|:--------:|------:|
| control   |  39  |    11    |    50 |
| treatment |  26  |    14    |    40 |
| total     |  65  |    25    |    90 |

### Bar Plot

```{r}
#| include: true
#| echo: false
#| fig-cap: "Standardized barplot showing proportions of students that buy or do not buy"

cpr |>
  ggplot(aes(x = group, fill = outcome)) +
  geom_bar(position = "fill") +
  labs(y = "proportion") +
  theme_minimal()
```
:::

## Difference in proportions

```{r}
#| include: false
#| echo: false

cpr |>
  group_by(group) |>
  summarize(prop = mean(outcome == "survived"))  |>
  summarize(dif = diff(prop)) #treatment - control
```

-   Success: outcome = "survived"
-   Statistic of interest: difference in proportions $$\hat{p}_T-\hat{p}_C$$
-   Observed difference: $$\frac{14}{40}-\frac{11}{50}=0.13$$

## Hypothesis test

```{r}
#| include: false
#| echo: false

cpr |>
  group_by(group) |>
  summarize(n = n(), prop = mean(outcome == "survived")) |>
  mutate(varpi = prop*(1-prop)/n) |>
  summarize(se = sqrt(sum(varpi)))
```

-   Technical conditions met for using a normal approximation
-   SE = 0.0955 (more on estimating this in Ch. 17)
-   Null distribution: $N(0, 0.0955)$
-   We will use significance level $\alpha=0.05$

------------------------------------------------------------------------

```{r}
#| include: true
#| echo: false
#| fig-cap: "Density function for N(0, 0.0955) with tails shaded beyond 0.13."

normTail(m = 0, s = 0.0995, L = -0.13, U = 0.13)
```

-   p-value is probability that statistic is at least as extreme as observed value (0.13)
-   2-sided test: at least as extreme means $\geq 0.13$ or $\leq -0.13$

------------------------------------------------------------------------

-   Since the null distribution is symmetric, the p-value will be twice as large for a 2-sided test

```{r}
#| include: true
#| echo: true

2 * pnorm(-0.13, mean = 0, sd = 0.0995)
```

-   Since the p-value is greater than 0.05, we do not reject the null hypothesis
-   It is plausible that there is no difference in survival rates between the two groups

## Decision Errors

-   There are two types of errors we can make in a hypothesis test
-   **Type 1 Error** occurs if we conclude that the null hypothesis is false when it is not (false alarm)
-   **Type 2 Error** occurs if we fail to reject the null hypothesis even though it is false (missed opportunity)

## Example of Type 1 and Type 2 Errors

-   In the study on the blood thinners, we concluded that there is no significant difference in survival rates. If in reality the rate of survival is significantly different between two groups that would mean that we committed Type 2 Error

- If the p-value of the test were lower than the significance level and we rejected the null hypothesis, but, in reality, the survival rates are the same - that would mean we committed a Type 1 Error

- Note that once a conclusion is made (based on p-value or z-score) we can possibly commit only one type of error

## Summary of Type 1 and Type 2 Errors

![](https://faculty.hope.edu/bekmetjev/215/Slides/randomsamplesvsassign.png)

## Controlling Type 1 Errors

-   Type 1 Error is typically considered more severe
-   The probability of making a type 1 error (assuming the null is true) is equal to the significance level
-   Can reduce $\alpha$ to make type 1 errors less likely

## Controlling Type 2 Errors

-   There is a trade-off between the two types of errors
-   Decreasing probability of type 1, increases the probability of type 2
-   **Power** is the probability of rejecting the null hypothesis if the alternative is true
-   $Pr[Type\:2\:Error]= 1- Power$
-   Higher power reduces the chance of making a type 2 error
-   Power is related to effect size (easier to detect larger effects), sample size (larger sample results in more power), among other things