Sample Size Determination
Motivation
Consider the situation where you would like to estimate a population parameter to some "level of accuracy" for a particular "level of confidence". You know that a probability sample is necessary but you do not know the minimum sample size necessary to achieve the desired accuracy and confidence.
This is the sample size determination problem. We may utilize confidence interval theory to determine the sample size. We do so by treating the "level of accuracy" as the plus/minus amount in our confidence interval expression and solving for n.
Definition (my)
Let n denote the sample size, D the level of accuracy and a the level of confidence. An a% confidence interval for the population parameter my may be expressed thus:
L': ybar - D
U': ybar + D
where:
D = za(sybar) = za(sy/sqrt(n))
hence, squaring and rearranging terms:
D2 = za2(sy2/n)
n = za2(sy2/D2)
Note that sy is unknown. However, we may address this in one of three ways:
Note: You will always round up to the nearest integer. Rounding down does not make sense since you would merely be ensuring that your sample is fractionally too small to achieve the accuracy and confidence required.
Problem:
Consider the disk drive problem. You would like to estimate the seek time of disk drives to an accuracy of 0.01ms with 95% confidence. What is the minimum sample size required to achieve this level of accuracy and confidence. Assume that a previous study indicates that seek times range between 9.8ms and 10.2ms.
Solution:
Since a is 95% then za=1.95. We may estimate sy from the range R. That is, R=10.2-9.8=0.4 hence sy=R/4=0.1. Also, D = 0.01 and so n=1.952(0.12/0.012)=380.25. Since you always round up, 381 drives are required to achieve a level of accuracy of 0.01 with 95% confidence.