--- title: "Regression and Other Stories: Sesame street" author: "Andrew Gelman, Jennifer Hill, Aki Vehtari" date: "`r format(Sys.Date())`" output: html_document: theme: readable toc: true toc_depth: 2 toc_float: true code_download: true --- Causal analysis of Sesame Street experiment. See Chapters 18 and 21 in Regression and Other Stories. ------------- ```{r setup, include=FALSE} knitr::opts_chunk$set(message=FALSE, error=FALSE, warning=FALSE, comment=NA) ``` #### Load packages ```{r } library("rprojroot") root<-has_file(".ROS-Examples-root")$make_fix_file() library("rstanarm") library("brms") ``` Set random seed for reproducability ```{r } SEED <- 1234 ``` #### Load data ```{r } sesame <- read.csv(root("Sesame/data","sesame.csv")) head(sesame) ``` ## Compliance ```{r } (sesame_tab <- table(sesame[,c('watched','encouraged')])) round(prop.table(sesame_tab, margin=2), digits=2) ``` ## Wald estimator Estimate the intent-to-treat (ITT) effect of the instrument (encouragement) on the treatment (regular watching), that is, percentage of children actually induced to watch Sesame Street by the intervention ```{r } itt_zt <- stan_glm(watched ~ encouraged, data=sesame, seed=SEED, refresh=0) print(itt_zt, digits=2) ``` Estimate the intent-to-treat (ITT) estimate on the outcome (post-treatment letter identification) ```{r } itt_zy <- stan_glm(postlet ~ encouraged, data=sesame, refresh=0) print(itt_zy, digits=1) ``` Calculate Wald estimate, ie the ratio of the above two estimates ```{r } wald_est <- coef(itt_zy)["encouraged"] / coef(itt_zt)["encouraged"] round(wald_est, digits=1) ``` ## Two stage approach #### Predict the "treatment" variable on the randomized instrument The first step is to regress the "treatment" variable---an indicator for regular watching (watched)---on the randomized instrument, encouragement to watch (encouraged). ```{r } fit_2a <- stan_glm(watched ~ encouraged, data=sesame, seed=SEED, refresh=0) print(fit_2a, digits=2) summary(fit_2a$fitted, digits=2) sesame$watched_hat <- fit_2a$fitted ``` Then we plug predicted values of watched into the equation predicting the letter recognition outcome. ```{r } fit_2b <- stan_glm(postlet ~ watched_hat, data=sesame, seed=SEED, refresh=0) print(fit_2b, digits = 1) ``` ## Two stage approach with instrumental variables Two stage approach with adjusting for covariates in an instrumental variables framework. #### Predict the "treatment" variable on the randomized instrument and pre-treatment variables. The first step is to regress the "treatment" variable on the randomized instrument, encouragement to watch (encouraged) and pre-treatment variables. ```{r } fit_3a <- stan_glm(watched ~ encouraged + prelet + as.factor(site) + setting, data=sesame, seed=SEED, refresh=0) print(fit_3a, digits=2) summary(fit_3a$fitted, digits=2) ``` Then we plug predicted values of watched into the equation predicting the letter recognition outcome. ```{r } watched_hat_3 <- fit_3a$fitted fit_3b <- stan_glm(postlet ~ watched_hat_3 + prelet + as.factor(site) + setting, data=sesame, seed=SEED, refresh=0) print(fit_3b, digits = 1) ``` #### Estimate the standard errors Use the predictor matrix from this second-stage regression. ```{r } X_adj <- X <- model.matrix(fit_3b) X_adj[,"watched_hat_3"] <- sesame$watched n <- nrow(X) p <- ncol(X) ``` #### Compute the standard deviation of the adjusted residuals ```{r } RMSE1 <- sqrt(sum((sesame$postlet - X %*% coef(fit_3b))^2)/(n-p)) RMSE2 <- sqrt(sum((sesame$postlet - X_adj %*% coef(fit_3b))^2)/(n-p)) se_adj <- se(fit_3b)["watched_hat_3"] * sqrt(RMSE1 / RMSE2) print(se_adj, digits=2) ``` ## Two-stage approach with brms #### Predict the "treatment" variable on the randomized instrument ```{r } f1 <- bf(watched ~ encour) f2 <- bf(postlet ~ watched) ``` ```{r results='hide'} IV_brm_a <- brm(f1 + f2, data=sesame, seed=SEED) ``` ```{r } print(IV_brm_a, digits=1) ``` #### Incorporate other pre-treatment variables as controls ```{r } f1 <- bf(watched ~ encour + prelet + setting + factor(site)) f2 <- bf(postlet ~ watched + prelet + setting + factor(site)) ``` ```{r results='hide'} IV_brm_b <- brm(f1 + f2, data=sesame, seed=SEED) ``` ```{r } print(IV_brm_b, digits=1) ```