Lecture 10/14

Sample Size Determination

Motivation

Consider the situation where you would like to estimate a population parameter to some "level of accuracy" for a particular "level of confidence". You know that a probability sample is necessary but you do not know the minimum sample size necessary to achieve the desired accuracy and confidence.

This is the sample size determination problem. We may utilize confidence interval theory to determine the sample size. We do so by treating the "level of accuracy" as the plus/minus amount in our confidence interval expression and solving for n.

Definition (m_y)

Let n denote the sample size, D the level of accuracy and a the level of confidence. An a% confidence interval for the population parameter m_ymay be expressed thus:

L': ybar- D

U': ybar + D

where:

D = z_a(s_ybar) = z_a(s_y/sqrt(n))

hence, squaring and rearranging terms:

D² = z_a²(s_y²/n)

n = z_a²(s_y²/D²)

Note that s_y is unknown. However, we may address this in one of three ways:

s_y may be available from another study.
The range (i.e. R=max - min) may be available from another study in which case we estimate s_y=R/4.
Conduct a pilot study. That is, select a small sample and determine s_y from the sample.

Note: You will always round up to the nearest integer. Rounding down does not make sense since you would merely be ensuring that your sample is fractionally too small to achieve the accuracy and confidence required.

Problem:

Consider the disk drive problem. You would like to estimate the seek time of disk drives to an accuracy of 0.01ms with 95% confidence. What is the minimum sample size required to achieve this level of accuracy and confidence. Assume that a previous study indicates that seek times range between 9.8ms and 10.2ms.

Solution:

Since a is 95% then z_a=1.95. We may estimate s_y from the range R. That is, R=10.2-9.8=0.4 hence s_y=R/4=0.1. Also, D = 0.01 and so n=1.95²(0.1²/0.01²)=380.25. Since you always round up, 381 drives are required to achieve a level of accuracy of 0.01 with 95% confidence.