pp198; #5 b) This part requires both regression and normal distribution methods. First you are given the following statistics: Midterm: Mean=50; SD=25 Final: Mean=55; SD=15 r=0.6 Observe that we are only interested in students who scored 80 on the midterm. From this group we wish to determine the percentage that scored over 80 on the final. The dependent (y) variable is therefore 'average final score' (NOTE: The wording for midterm questions will be such that the dependent variable will be easily determined.). Three steps are required. STEP1 - Determine the mean final score for students who earned 80 on the midterm. STEP2 - Determine the SD for students who earned 80 on the midterm. STEP3 - Apply normal distribution methods to solve Applying the three steps: STEP1: Since the mean midterm score is 50 and we are interested in midterm scores of 80 then the run is 30 and we must compute the rise. rise/30 = 0.6(15/25) = 0.36 rise = 0.36*30 = 10.8 Therefore the mean final score for these students is 55+10.8=65.8. STEP2: Since final score is the dependent variable (i.e. y variable) we know that SD(y) is 15 and we can then compute SD(y|80): SD(y|80) = SQRT(1-0.6**2)*15 = 12 STEP3: We can now apply normal distribution methods. Since we are interested in the percentage that scored over 80 on the final we need the area to the right of 80. Transforming: z = (80-65.8)/12 = 1.18 From the standard normal tables the area between 1.18 and 1.18 is about 76.99 and since we need the area to the right of 1.18 we compute (100-76.99)/2=11.5. Therefore about 11.5% of such students scored over 80 on the final. pp198; #5 a) You should also be able to work problems like part a). Note that part a) does not require regression methods. Considering all students, we wish to determine the percentage that scored over 80 on the final. In other words, we do not care what the midterm score is, we are only interested in final scores. First standardize to get the z value: z = (80 - 55)/15 = 25/15 = 1.67 The area between -1.67 and 1.67 is about 90.11 and since we need the area to the right of 1.67 (by symmetry) we compute (100-90.1)/2 = 4.95. Therefore 5% rounded of students scored over 80 on the final. The following problem is from a previous MIDTERM: A recent DePaul graduate is asked to analyze the portfolio of programs at her new job. She finds the following: size (lines of code): average = 550; SD = 150 change requests: average = 12; SD of 5 r = 0.6 The objective is to use regression methods to develop an equation to predict number of change requests (dependent variable) from program size (independent variable). Assume normality where necessary. a) Estimate the number of change requests for a program of size 800 lines of code. rise/run=r(SD(y)/SD(x)) rise/250=0.6(5/150) rise=250[0.6(5/150)]=5 Hence 12+5=17 change requests should be expected for a program of size 800. b) Derive the equation for the regression line of number of change requests as a function of size. b=r(SD(y)/SD(x))=0.6(5/150)=0.02 Since (ybar, xbar) is on the regression line: 12=a + 0.02(550) a=12 - 0.02(550)=12 - 11=1 The regression equation is: y=1 + 0.02(x) or more correctly: change requests= 1 + 0.02(size) c) For programs of size 800 lines, what proportion required more than 20 change requests. We already know that the mean is 17 and so we need to compute the SD of fixed x (i.e. size=800) SD(y|x=800)=SQRT(1-0.6**2)*5=0.8*5=4 We can now apply normal theory. We need to compute the prportion greater than 20 and so need to transform: z = (20-17)/4 = 0.75 Since about 55% of observations are between -0.75 and 0.75 the required proportion is (100-55)/2=22.5%