- So far we know how to use both continuous and categorical variables to predict continuous variables. What if we want to predict categorical variables? (i.e. use them as the dependent variable) In sociology much of what we care about comes as categorical not continuous.
- Let's say we are interested in explaining why individuals fall into different categories of a dichotomous variable.
- Lets take our titanic example. Let equal 1 if the person survived the titanic and 0 if the person did not. Let's look at how survivorship varied as a function of the fare (in british pounds) that a person payed:
Variable Coefficient Constant 0.3058 (19.27) Fared paid (in british pounds) 0.0023 (9.10) - What do the intercept and slope mean in this particular case? The slope is the the change in your probability associated with paying one more pound for your fare. In this case, for every extra pound paid, your probability of surviving goes up by about 0.22%. Those who paid no fare are estimated to have a probability of survivorship of 30.58%. This is called the
**Linear Probability Model**. - The IQR for fare payed was approximately 17 pounds and the maximum fared payed was 512 pounds. so it seems that fare had a fairly sizeable effect on survivorship, which is not surprising given that fare is a good proxy for the social class of passengers.
- There are two problems with the linear probability model.
- The i.i.d. assumption is violated. The variable is distributed as a bernoulli variable. The variance of a bernoulli variable depends on the true probability of a "yes." Thus the variance of the error term will not be constant. In technical terms, this is the problem of
**heteroskadasticity**. - The linear probability model can produce fitted values that are outside the range of [0,1]. Take the passenger who paid the highest fare.

- The i.i.d. assumption is violated. The variable is distributed as a bernoulli variable. The variance of a bernoulli variable depends on the true probability of a "yes." Thus the variance of the error term will not be constant. In technical terms, this is the problem of
- The first problem can be dealt with using the technique of
**generalized least squares**.- Generalized least squares allows for a weighting matrix to be included into the estimation of 's.

Before, we had:**weighting matrix**. It has the effect of basically transforming the y and x variables. If we put the right values into our weighting matrix, we will recover our i.i.d. assumptions, even in the face of heteroskadasticity and autocorrelation. - The diagonals of the weighting matrix tell us how to adjust our variance to reflect heteroskadasticity. The off-diagonal cells tell us about the autocorrelation between error terms.
- Let's just focus on the heteroskadasticity part for now - time-series models make use of GLS to estimate autocorrelation.
- Because we only have sample data, we don't know exactly what values to put into our weighting matrix. We have to estimate them from our data. When we do that, we call it
**feasible generalized least squares**(FGLS). - The values to put into the diagonal should be the inverse of the variation in our error terms (let's take an example).
- In our binary case above, the variation in is proportional to the underlying true for the th observation.
- If we put the inverse of these values into our weighting matrix, we will recover our i.i.d. assumption.
(1)

- Of course, we don't know what the true is for each observation, so we have to use an estimate. How do we get it? Run a linear probability model and use the resulting in place of . Then we have FGLS.
- FGLS can usually be improved by iterating the result, when it becomes
**Iteratively Reweighted Least Squares**(IRLS).- Run linear probability model using OLS. Get the predicted values of from this run.
- Use in the weighting matrix to run an FGLS. Get the predicted values of from this run.
- Use the updated in the weighting matrix of another FGLS. Get the predicted values of from this run.
- Repeat step 3 until the estimates of no longer change very much.

- Let's try it out on our titanic data:
Coefficient 1 2 3 4 5 6 7 8 Intercept 0.30588 0.31292 0.3074 0.30987 0.30857 0.30921 0.30888 0.30905 Slope 0.00229 0.00186 0.00211 0.00199 0.00205 0.00202 0.00204 0.00203 Coefficient 9 10 11 12 13 14 15 Intercept 0.30896 0.30901 0.30898 0.30899 0.30899 0.30899 0.30899 Slope 0.00203 0.00203 0.00203 0.00203 0.00203 0.00203 0.00203 - You can still run into problems with FGLS if your estimated values of extend beyond the range of [0,1] because it can lead to negative weights. In this example, I had to "hack" a few observations that were outside the range by substituting the maximum value observed within the range.

- Generalized least squares allows for a weighting matrix to be included into the estimation of 's.
- GLS can resolve the problem of error terms which are not i.i.d., but it still can produce fitted values outside the range of the dependent variable. This second problem requires us to move to
**Generalized linear models**.

- Lets take our titanic example. Let equal 1 if the person survived the titanic and 0 if the person did not. Let's look at how survivorship varied as a function of the fare (in british pounds) that a person payed: