Question: A colleague majoring in IS is interested in using regression methods to develop an equation to predict number of change requests from module size. She designs an experiment, collects data and computes the following statistics: size (lines of code): mean = 1600; Std Dev = 200 # of change requests: mean = 30; Std Dev = 5 correlation between size and # of change requests = 0.8 Assume normality where necessary. 1. What proportion of programs had less than 30 change requests. 2. For programs of size 1600, what proportion had less than 30 change requests. 3. Estimate the number of change requests for programs of size 1900. 4. For programs of size 1900, what proportion had less than 30 change requests. Solution: 1. Since we are not interested in programs of a particular size, regression is not required. The required mean in this case is the mean of all programs (i.e. 30) and so the proportion is 50%. 2. In this case we are interested in programs of a particular size and so regression is applicable. Note however that the size is 1600 and since this is also the overall mean for size then by the regression property that says that (ybar, xbar) is on the regression line we know that the required mean is also 30 and so the proportion is again 50%. 3. In this case we are interested in programs of a particular size and so need to determine the mean number of change requests for this size. Notice also that since we are interested in a particular size then size will be x. rise=run[r(Sy/Sx)] =300[0.8(5/200)] =6 Hence the required mean, and so the desired estimate, is 30+6=36. 3. In this case we are interested in programs of a particular size but since we know the mean for this size from Q3 we only need to compute the std dev before applying normal theory. STEP 1: Determine mean. mean=36 change requests STEP 2: Determine std dev. Sy|x=Sy(sqrt(1-r^2)) =5(sqrt(1-0.8^2)) =3 STEP 3: Assume normality and determine z. z=(30-36)/3=-2 Hence desired proportion is (100-95.45)/2=2.275%