Lecture 9/30

Confidence Intervals (m_y)

Motivation

Let y denote a measurement of interest from some population with mean m_y and standard deviation s_y. Consider the sampling distribution of ybar for n large. For the sake of this motivation, let us consider the interval [L, U] constructed about m_ybarthus:

L: m_ybar- 2s _ybar
U: m_ybar+ 2s _ybar

Since we know that the sampling distribution of ybar is normal with mean m_ybar=m_yand standard deviation s_ybar=s_y/sqrt(n) then L and U define boundaries between which 95.45% of the ybar's for all possible samples of size n will be. You may think of these boundaries dividing the ybar's into those that are relatively close to m_ybar (and so m_y) and those that are relatively distant from m_ybar (and so m_y).

For a particular sample we would like to know if the sample mean (ybar) is one of those that is close to m_y. Unfortunately we do not know m_yand so we cannot tell whether the sample mean is close to m_yor not. However we can address this issue indirectly by making use of our understanding of sampling distributions and proceeding thus:

Note: To keep the argument simple we will assume that s_yis known:

Construct an interval [L', U'] as we did above but about ybar:
L': ybar- 2s _ybar
U': ybar + 2s _ybar
Claim that m_yis in this interval. Notice that this claim will be true if our ybar happens to be one of those that is close (as defined above) to m_y.
State that we are 95.45% confident in our claim. We can safely do this since 95.45% of all possible samples will result in a ybar which will make our claim true.

This strategy captures the basic ideas of confidence interval theory. The only difference is that we will construct intervals that allow us to claim whatever confidence we desire. Also, since s_yis not known, we will estimate it with our sample standard deviation s_y.

Definition

An a% confidence interval for the population parameter m_yis an interval constructed from a sample mean (ybar) within which you expect m_yto be with a% confidence. There are two cases that need to be considered:

n large:
L': ybar- z_a(s_ybar)
U': ybar + z_a(s_ybar)
where s_ybar=s_y/sqrt(n) and z_ais the z value from the standard normal table so that the area between -z and z is a% .
n small:
L': ybar- t_n-1;(100-_a_)/2(s_ybar)
U': ybar + t_n-1;(100-_a_)/2(s_ybar)
where s_ybar=s_y/sqrt(n-1) and t_n-1;(100-_a_)/2is the t value for the standard "t" table indexed by n-1 degrees of freedom and so that the area to the right of t is (100-a)/2% .

Terminology

s_ybar=s_y/sqrt(n) or s_ybar=s_y/sqrt(n-1) is often referred to as the "error" in your estimate of m_y.

z₉₅(s_ybar) or t_n-1;2.5(s_ybar) is often referred to as the "margin of error".

The term "point estimate" refers to the value of the statistic used to estimate the corresponding parameter.

Problems:

Consider the population of disk drives manufactured from a particular production run. You are interested in estimating the mean seek time of this population and select a sample of n=100 drives for examination. You discover that the sample mean is 9.9ms with a standard deviation of 0.5ms.

Construct and interpret a 90% confidence interval for the population mean (m_y).
Construct and interpret a 99% confidence interval for the population mean (m_y).
Let the sample size be 26 instead of 100. Construct and interpret a 90% confidence interval for the population mean (m_y).
Let the sample size be 17 instead of 100. Construct and interpret a 99% confidence interval for the population mean (m_y).

Solutions:

Since s_y=0.5 and n is large then s_ybar=s_y/sqrt(n)=0.5/10=0.05. Also, z_a=z₉₀=1.65, hence the 90% confidence interval for m_y is [9.9-1.65(0.05), 9.9+1.65(0.05)]=[9.8175, 9.9825]. Hence, you are 90% confident that the mean seek time for the population is between 9.8175ms and 9.9825ms.
In this case s_ybar=0.05 as above but z_a=z₉₉=2.6, hence the 99% confidence interval for m_y is [9.9-2.6(0.05), 9.9+2.6(0.05)]=[9.77,10.03]. Hence, you are 99% confident that the mean seek time for the population is between 9.77ms and 10.03ms.
Since n is small then s_ybar=s_y/sqrt(n-1)=0.5/5=0.1. Also, t_n-1;(100-_a_)/2=t_25;5=1.71, hence the 90% confidence interval for m_y is [9.9-1.71(0.1), 9.9+1.71(0.1)]=[9.729, 10.071]. Hence, you are 90% confident that the mean seek time for the population is between 9.729ms and 10.071ms.
Since n is small then s_ybar=s_y/sqrt(n-1)=0.5/4=0.125. Also, t_n-1;(100-_a_)/2=t_16;0.5=2.92, hence the 99% confidence interval for m_y is [9.9-2.92(0.125), 9.9+2.92(0.125)]=[9.535, 10.265]. Hence, you are 99% confident that the mean seek time for the population is between 9.535ms and 10.265ms.

Note: Notice the tradeoff between confidence and interval length. Also, observe that for a particular interval length you must increase sample size to obtain a higher level of confidence.

Summary

We initially set out to answer the question:

"How good is ybar as a point estimate of m_y "

By "good" we mean how close is ybar to m_y. By considering the sampling distribution of ybar we have proposed the following:

s_ybar as the error in our estimate of m_y.
An a% confidence interval as an interval estimate for m_y.

Notice that our confidence interval provides a plus/minus amount (known as the level of accuracy) which expresses how close we think ybar is to m_y with a level of confidence a.