Hypothesis Testing - Proportions: Problem #1: An application development manager claims that 75% of programs in the production portfolio are Y2K compliant. The EDP auditor disputes this claim and believes that the proportion is less than claimed. She examines 200 of these programs and discovers that 142 are compliant. 1. Give the appropriate null and one-sided research hypothesis that corresponds to the auditors belief. H0: pi = 0.75 Ha: pi < 0.75 2. Calculate the p-value for the hypotheses stated above. Note that in this case you have a large sample. Since the sample proportion is 142/200=0.71 and SD(p)=sqrt((0.75)*(0.25)/200)=0.031 then assuming H0 true, the z value is: z = (0.71-0.75)/SD(p) = -1.3 Hence the p-value is (100-80.64)/2=9.7 which is approximately 10%. This is a p-value of 0.1 3. Is this significant, highly significant, or non-significant? Non-significant. 4. Comment on the analyst's belief. Since this is a non-significant result, the EDP auditor has insufficient evidence to reject H0 and so has insufficient evidence to challenge the managers claim. Problem #2: A production manager claims that 50% of disk drives coming off a production line have faster seek times than stipulated in the specifications. The QA manager believes that the actual proportion is less than claimed. She takes a sample of 350 drives from a recent production run and finds that 102 are faster. 1. Give the appropriate null and one-sided research hypothesis that corresponds to the auditors belief. H0: pi = 0.5 Ha: pi < 0.5 2. Conduct a test of hypotheses. Since p=102/350=0.29 and SD(p)=sqrt((0.5)*(0.5)/350)=0.027 then: z=(0.29-0.5)/0.027=-7.8 The p-value is almost zero. This is a highly significant result and so the null hypothesis should be rejected. The QA manager therefore has enough evidence to support her belief that the true proportion is less than claimed. Sample Size Determination: Problem #1: Estimate the sample size required to estimate the mean seek time of disk drives from a production run to within 0.025ms with 99% confidence. Assume that SD(y) is known to be 0.075 from a previous study. Solution: Since the required z value for a 99% CI is 2.6 and you know from CI theory that the error in your estimate is z(99)*(SD(y)/sqrt(n)) then: 0.025=2.6*(SD(y)/sqrt(n)) Rearranging terms: n=((2.6)*(0.075)/0.025)**2=60.84 Hence 61 disk drives would be required. Problem #2: Estimate the sample size required to estimate the proportion of programs in a portfolio that are Y2K compliant to an accuracy of 0.05 with 95% confidence. Solution: Since an estimate of pi is not provided the worst that you can do is to use 0.5. You know from CI theory that the error in your estimate is z(95)*sqrt(pi*(1-pi)/n) and since z(95) is 1.95 then: 0.05=1.95*sqrt(0.5*0.5/n) Rearranging terms: n=(1.95**2)*(0.5*0.5)/0.05**2=380.25 Hence 381 programs would be a conservative estimate of the number of programs required. Problem #3: Assume that for other companies like yours 75% of programs are Y2K compliant. Use this fact in estimating the sample size. Solution: In this case it is reasonable to use 0.75 as an estimate for pi. The working is the same as #2 above: 0.05=1.95*sqrt(0.75*0.25/n) Rearranging terms: n=(1.95**2)*(0.75*0.25)/0.05**2=285.19 Hence 286 programs would be the number of programs required.