---
title: "Regression and Other Stories: Fake midterm and final exam"
author: "Andrew Gelman, Jennifer Hill, Aki Vehtari"
date: "`r format(Sys.Date())`"
output:
  html_document:
    theme: readable
    toc: true
    toc_depth: 2
    toc_float: true
    code_download: true
---
Fake dataset of 1000 students' scores on a midterm and final
exam. See Chapter 6 in Regression and Other Stories.

-------------


```{r setup, include=FALSE}
knitr::opts_chunk$set(message=FALSE, error=FALSE, warning=FALSE, comment=NA)
# switch this to TRUE to save figures in separate files
savefigs <- FALSE
```

#### Load packages

```{r }
library("rstanarm")
```

#### Simulate fake data

```{r }
n <- 1000
# set the random seed to get reproducible results
# change the seed to experiment with variation due to random noise
set.seed(2243)
true_ability <- rnorm(n, 50, 10)
noise_1 <- rnorm(n, 0, 10)
noise_2 <- rnorm(n, 0, 10)
midterm <- true_ability + noise_1
final <- true_ability + noise_2
exams <- data.frame(midterm, final)
```

#### Linear regression

The option `refresh = 0` supresses the default Stan sampling
progress output. This is useful for small data with fast
computation. For more complex models and bigger data, it can be
useful to see the progress.

```{r }
fit_1 <- stan_glm(final ~ midterm, data=exams, refresh = 0)
print(fit_1, digits=2)
```

#### Plot midterm and final exam scores

```{r eval=FALSE, include=FALSE}
if (savefigs) pdf(here("FakeMidtermFinal/figs","FakeMidtermFinal1.pdf"), height=4, width=4)
```
```{r }
par(mar=c(3, 3, 2, 1), mgp=c(1.7, .5, 0), tck=-.01)
par(pty="s")
plot(midterm, final, xlab="Midterm exam score", ylab="Final exam score", xlim=c(0,100), ylim=c(0,100), xaxs="i", yaxs="i", xaxt="n", yaxt="n", pch=20, cex=.5)
x <- seq(0,100,20)
axis(1, x)
axis(2, x)
for (i in x){
  abline(h=i, col="gray70", lty=2)
  abline(v=i, col="gray70", lty=2)
}
abline(coef(fit_1))
```
```{r eval=FALSE, include=FALSE}
if (savefigs) dev.off()
```