Sample statistics are: size: mean=800; SD=200 # of change requests: mean=15; SD=5 corr coeff = 0.6 1) What proportion of all programs had more than 20 change requests. Since you are interested in "all" programs, this does not require regression. The z score is: z = (20 - 15)/5 = 1 From the standard normal table, 68.27% of the area under the curve is between -1 and 1. The desired proportion is therefore (100 - 68.27)/2 = 15.86%. 2) Estimate the # of change requests for a program of 600 lines. Since you are estimating (that is predicting) # of change requests for a fixed size this is a regression problem and does not require normal theory. Note that the run is -200 ans so: rise = -200(0.6(5/200)) = -3 Hence 12 change requests can be expected. 3) For programs of 800 lines what proportion had more than 15 change requests. Since (800, 15) is on the regression line then 15 must be the mean of programs of size 800 and so the proportion is 50%. 4) For programs of 600 lines what proportion had more than 20 change requests. We already know that the mean of these programs is 12 and so we need to compute the SD. SD(y|x) = 5*sqrt(1-0.6^2) = 5*0.8 = 4 Compute z: z = (20 - 12)/4 = 2 From the standard normal table, 95.45% of the area under the curve is between -2 and 2. The desired proportion is therefore (100 - 95.45)/2 = 2.275%.