Next: Contingency tables and Chi-squared Up: Data Collection and Statistical Previous: Errors in hypothesis testing:   Contents

## t-tests and the t-distribution

• Problem with previous examples
• I played a trick on you: We don't usually know the population standard deviation .

• Therefore, we can't estimate the z-scores we used in our hypothesis test.
• Normally, we replace our with our sample estimate .
• This is problematic because itself is an estimate and thus there will be greater uncertainty in our test.
• In fact when we replace with , our test statistic will no longer have the standard normal distribution (Draw a picture to remind them), but rather come from what is known as the t-distribution.

• The t-distribution
• A t-distribution has one parameter: degrees of freedom, which is given by .
• A t-distribution looks similar to the standard normal distribution except that it has fatter tails and a smaller peak. This difference reflects the higher degree of uncertainty.
• As the degrees of freedom increase (the sample size increases), our t-distribution looks more and more like the standard normal.
• (Pass out a copy of the table from the back of the book) this table gives the critical values for upper tail probabilities. You see that the necessary critical values shrink as increases. When , the critical values for the t-distribution are very close to the critical z-values from a standard normal table. (show the example of .025 --, where ).

 df t 20 2.086 50 2.009 100 1.984 1000 1.962

• Using
• We can substitute in our equation for confidence intervals.

Before:

Now:

With 20 degrees of freedom, we would use for a 95% confidence interval.
• We can use the t-distribution for our one-sample hypothesis tests, where .
• For small , this will still not be accurate if underlying population distribution is non-normal.
• However, it turns out that the t-distribution is fairly robust to violations of the normality assumptions, meaning it gives a good approximation even for small sample sizes. Use the following rule of thumb:
• , if data look close to normal with no outliers, use t-test
• , use t-test except in cases of big outliers or extreme skewness
• , use t-test and you should be approximately accurate.

• The two-sample t-test
• So far we have compared one sample to a hypothesized population mean. More often what we are interested in is comparing the means of two groups for which we only have samples.
• We call this the two-sample t-test.
mean for the first group
mean for the second group
• We are interested in , i.e. the difference between the groups.
• , means are the same
, two-sided alternative
, a one-sided alternative
• Our observed difference is .

• Our test statistic is:

• We are estimating two standard deviations here, so it turns out that our test statistic is only approximately distributed as a . Fairly robust, particularly, if groups are equal in size.
• What should the degrees of freedom be? Two options:
• Take the smaller of or . This will give you a conservative estimate on the degrees of freedom.
• Use a complicated formula, which has little intuition.

• This formula gives very precise results for . It is always as large as the smaller sample size (minus one) and never larger than the largest sample size (minus one).
• If you assume that , then this equation simplifies somewhat.
• Calculate a pooled estimate of the variance - just a weighted average

• Then the variance of the difference between means simplifies

• So the t-statistic is:

• The degrees of freedom are
• When we work with proportions, the pooled variance rule always holds. When we work with other continuous variables, however, it is an assumption.

• Example: let's take the observed age difference between survivors and non-survivors on the Titanic. Earlier in the semester, we calculated that survivors of the Titanic were 1.63 years older on average than non-survivors. Taking this particular sinking as one of many possible sinkings, is this difference due to random chance or is there a real age advantage?

• Calculate numerator of test statistic:

• Calculate denominator of test statistic:

• Calculate t

• Let's use conservative estimate of degrees of freedom and a two-sided test. What is the critical value for ?
• Now let's pool our estimate of the standard deviation:

• So our test statistic becomes:

Next: Contingency tables and Chi-squared Up: Data Collection and Statistical Previous: Errors in hypothesis testing:   Contents
Aaron 2005-12-20