Confidence Intervals (my)
Motivation
Let y denote a measurement of interest from some population with mean my and standard deviation sy. Consider the sampling distribution of ybar for n large. For the sake of this motivation, let us consider the interval [L, U] constructed about mybar thus:
L: mybar - 2s ybar
U: mybar + 2s ybar
Since we know that the sampling distribution of ybar is normal with mean mybar=my and standard deviation sybar=sy/sqrt(n) then L and U define boundaries between which 95.45% of the ybar's for all possible samples of size n will be. You may think of these boundaries dividing the ybar's into those that are relatively close to mybar (and so my) and those that are relatively distant from mybar (and so my).
For a particular sample we would like to know if the sample mean (ybar) is one of those that is close to my. Unfortunately we do not know my and so we cannot tell whether the sample mean is close to my or not. However we can address this issue indirectly by making use of our understanding of sampling distributions and proceeding thus:
Note: To keep the argument simple we will assume that sy is known:
This strategy captures the basic ideas of confidence interval theory. The only difference is that we will construct intervals that allow us to claim whatever confidence we desire. Also, since sy is not known, we will estimate it with our sample standard deviation sy.
Definition
An a% confidence interval for the population parameter my is an interval constructed from a sample mean (ybar) within which you expect my to be with a% confidence. There are two cases that need to be considered:
Terminology
Problems:
Consider the
population of disk drives manufactured from a particular
production run. You are interested in estimating the mean seek
time of this population and select a sample of n=100 drives for
examination. You discover that the sample mean is 9.9ms with a
standard deviation of 0.5ms.
my).
Solutions:
sy=0.5
and n is large then sybar=sy/sqrt(n)=0.5/10=0.05.
Also, za=z90=1.65, hence the 90% confidence interval for my is
[9.9-1.65(0.05), 9.9+1.65(0.05)]=[9.8175, 9.9825]. Hence, you are
90% confident that the mean seek time for the population is
between 9.8175ms and 9.9825ms.
Note: Notice the tradeoff between confidence and interval length. Also, observe that for a particular interval length you must increase sample size to obtain a higher level of confidence.
Summary
We initially set out to answer the question:
By "good" we mean how close is ybar to
my.
By considering the sampling distribution of ybar we have proposed the
following:
Notice that our confidence interval provides a plus/minus amount (known as the level of accuracy) which expresses how close we think ybar is to
my.