Definition: A probability sample is a sample selected in such a way that every item in the population has the same chance of inclusion in the sample. The phrase simple random sample is often used to refer to a probability sample.

Note: All samples discussed in this class will be probability samples. We use probability samples to ensure that the sample is representative of the population. By so doing we may use sample statistics to make good inferences about population parameters.

 

Sampling Distributions – Mean

Let n denote sample size and let y denote a measurement of interest from some population with mean my and standard deviation sy.

Theorem 1

Let n be large (i.e. n >= 30). Consider all possible samples of size n that may be selected from the population. Compute the sample mean (i.e.ybar) in each case. The distribution of the ybar’s will be normal with mean mybar= my and standard deviation s ybar= s y/sqrt(n).

Problem #1: Consider the population of disk drives manufactured for a particular production run with mean seek time 10ms and standard deviation 0.1ms. What proportion of samples of size 100 would you expect to result in a mean less than 9.98ms.

Solution #1: From Theorem 1 mybar= my=10ms and s ybar= s y/sqrt(n)=0.1/10=0.01. Hence z=(9.98-10)/0.01=-2 and so the desired proportion is 2.275%

Problem #2: Consider the population of C++ programs in a portfolio with a mean development cost per line of code (LOC) of $3.00 and standard deviation $1.00. What proportion of samples of size 100 would you expect to result in a mean greater than $3.40.

Solution #2: From Theorem 1 mybar= my=3 and s ybar= s y/sqrt(n)=1/10=0.1. Hence z=(3.4-3)/0.1=4 and so the desired proportion is 0.003%

 

 

Theorem 2

Let n be small (i.e. n < 30). Consider all possible samples of size n that may be selected from the population. Compute the sample mean (i.e. ybar) in each case. If (and only if) y is normally distributed, then the distribution of the ybar’s will be "Student t" with n-1 degrees of freedom (df), mean mybar=my and standard deviation sybar= s y/sqrt(n-1).

Note: The "Student t" distribution is bell shaped and symmetric about the mean but unlike the normal distribution the empirical rule does not hold. The shape is dependent on the degrees of freedom with fatter tails associated with smaller degrees of freedom. For df>=30 it is (to all intents and purposes) indistinguishable from the normal distribution.

Note: If the mean is zero and the standard deviation one it is referred to as the "Standard Student t" distribution. As for the normal distribution, tables are available for "Standard Student t" distributions. However, since the shape is determined by the df, these tables are summarized into one table indexed by df.

Problem #1: Consider a population with mean 0 and standard deviation 3:
a) What proportion of samples of size 10 would you expect to result in a mean greater than 1.75.
b) What proportion of samples of size 10 would you expect to result in a mean greater than 3.5.

Solution #1: From Theorem 2 mybar= my=0 and s ybar= s y/sqrt(n-1)=3/3=1.
a) Since n is small and the sampling distribution of ybar has mean zero and standard deviation one, we may use the "Standard Student t" table for 9df directly and since 1.75 is between 1.38 and 1.83 the desired proportion is between 5% and 10%.
b) In this case 3.5 is greater than the largest tabled entry (3.25) and so the desired proportion is less than 0.5%.

Problem #2: Consider the population of disk drives manufactured for a particular production run with mean seek time 10ms and standard deviation 1ms.
a) What proportion of samples of size 26 would you expect to result in a mean less than 10.4ms.
b) What proportion of samples of size 26 would you expect to result in a mean less than 9.8ms.

Solution #2: From Theorem 2 mybar= my=10ms and s ybar= s y/sqrt(n-1)=1/5=0.2.
a) Hence t=(10.4-10)/0.2=2. Since df=25, and 2 is between 1.71 and 2.06, and we are interested in the area to the left of 2, the desired proportion is between 95% and 97.5%.
b) In this case t=(9.8-10)/0.2=-1. We are interested in the area to the left of -1 but since the "t" distribution is symmetric about the mean this is equivalent to the area to the right of 1. Since df=25, and 1 is between 0.68 and 1.32, the desired proportion is between 10% and 25%.