Chapter 2 The big picture

On approach

Whether reviewing or preparing to write a manuscript, it pays to consider the following five questions.

  1. What is the research question?
  2. Why do I (or, why should anyone else) care about it?
  3. What methodology is adopted?
  4. What is the paper’s key estimating equation?
  5. What are the main findings?

Note: This layout also works for your job interview. It’s also a fantastic way to organize your thoughts in preparation for informal questions like “What are you working on these days?”

Note: I would argue that property rights are only established at the third of these. A research question without any thought as to how you would answer it? that’s not enough to start advertising it everywhere, in my opinion.


The role of assumption

Establishing causation is a primary challenge facing empirical researchers in the social sciences. Throughout this course, we will identify a number of errors and biases that undermine causality, including omitted variable bias, selection bias, and measurement error. Causality can often be confidently rejected, but we do not have a test available that proves causality.

As you read through the articles for class, think about the types of errors and biases that undermine causality (some outlined below) and decide whether causal interpretation can be drawn from the findings. In other words, actively assess whether or not you believe the argument and the results presented. Remember… a goal of this course is to prepare you to be critical consumers of empirical research.


With any causal estimator, the result is assumed. It is always by assumption that we make inference. This has important implications.

  1. Notwithstanding sloppy analysis, you cannot be wrong until you make inference. (This should be comforting.)
  2. You can only be wrong insofar as your inference does not follow from your assumptions, stated or implied.

If you can clearly articulate the assumptions required in order to interpret \({\hat \beta}\) as the causal effect of \(X\) on \(y\)… the expected change in \(y\) that would result from a one-unit change in \(X\), all else equal… then you cannot be wrong.


That doesn’t mean that your advisor, or your committee, or your referees, or even your mother will believe it to be reasonable inference you are making.

However, their objection… your objection to the inference of others… should be in the form “I am unwilling to make the assumption required in order to agree that a one-unit change in \(X\) would induce a change in \(y\) of \({\hat \beta}\).”

In general… any identification strategy is “like” random assignment when the assumptions that justify that strategy hold in the data.

That is what you rely on as an applied econometrician.


Two dimensions to validity

We will summarize these later (see section on Manuscript Review), with a collection of threats to both, but, up front, consider two notions of validity.

Internal validity: Should the study be believed?

If the study is not claiming to have identified a causal implication the nature of these questions changes a bit. That said, let us assume that it is a causal relationship that is claimed.

A study has internal validity if one can conclude from statistical results that one variable causally affects another variable within the context of the study.

  • Is the model accurately measuring what it claims to measure?
  • Should the study be believed?

Much of our time will be spent considering the establishment of internal validity.


External validity: Can the study’s results be reasonably generalized?

First, note that without internal validity, we care little to extend the result more broadly. With that condition, a study has external validity if the results of an experiment can be generalized to different individuals, contexts and outcomes. In other words, it is the extent to which the results can/should be applied to the outside world. There are three general threats to external validity: people, place, and time.


So, what to do when randomization is not possible?

Randomization is clearly the gold standard. Why is this?

  • Because the assumption required to make causal inference is easily believed.

What, then, do we do when the causal parameter has not been identified through direct randomization … when the treatment variable of interest is potentially endogenous?

  1. Selection on observables: when treatment and control groups differ from each other only on observable characteristics.
    • Difference estimators
    • OLS
    • Matching
  2. Selection on unobservables: when treatment and control groups differ from each other on unobservable characteristics

    • When treatment and controls are observed before and after treatment
      • difference-in-differences, panel estimators, synthetic control, event studies
    • When the selection mechanism is known
      • regression-discontinuity designs
    • When an exogenous variable induces variation in treatment
      • instrumental variables