*** SAS source code file *** --------------------------------------------------------------------------------------- /* Exercise 2.48, page 148. */ options nodate; data foot; infile 'C:\1_Courses\223\Lectures\Week6\sasexpl\ta02-002.txt' delimiter='09'x; * HAV is the response and MA the explanatory variable; input HAV MA; datalines; proc reg data=foot; title 'Regression Data for HAV on MA'; model HAV = MA; output out=foot1 predicted=HAVhat residual=r; run; symbol value=dot color=blue interpol=r; proc gplot data=foot; title 'Scatter plot HAV vs. MA (With Regression Line)'; plot hav*ma/ haxis = 5 to 40 by 5 vaxis = 5 to 50 by 5; run; ---------------------------------------------------------------------------------------- *** SAS output file produced by running above program *** ---------------------------------------------------------------------------------------- Regression Data for HAV on MA The REG Procedure Model: MODEL1 Dependent Variable: HAV Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 188.71350 188.71350 3.62 0.0652 Error 36 1878.54965 52.18193 Corrected Total 37 2067.26316 Root MSE 7.22371 R-Square 0.0913 Dependent Mean 25.42105 Adj R-Sq 0.0660 Coeff Var 28.41624 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 19.72327 3.21717 6.13 <.0001 MA 1 0.33884 0.17818 1.90 0.0652 ------------------------------------------------------------------------------------------ *** Discussion *** 1. The Exercise For purposes of solving Exercise 2.48, much of the above output can be ignored at this point in time in the course. (a) The table labeled 'Parameter Estimates' gives the least-squares regeression line as HAV = 19.72327 + 0.33884MA, i.e., a = 19.72327 and b = 0.33884. (b) So for MA = 25 degrees, we would predict HAV = 19.723 + 0.339 * 25 = 28.2 degrees. (c) The scatter plot shows a lot of spread, so predictions based on the regression line will not be vary reliable. Note that R-Square is 0.0913, so only about 9% of the variation in HAV is explained by the regression line. 2. The Source Code In proc reg, the model statement causes the least-squares line of the form y = a + bx to be calculated, where y is the response variable and x is the explanatory variable. The reason for including the output statement in proc reg, is because proc gplot needs the residuals in order to plot the regression line on the scatter plot. If we didn't need to plot the regression line, then the output statement could be omitted from proc reg. The output statement creates a SAS data set called foot1 with 37 observations and four variables, HAV, MA, HAVhat and r. HAVhat is the predicted value for an observation and r is the residual for an observation, i.e., the difference between the observed value of HAV and the predicted value. Once this is done, then proc gplot produces a scatter plot of HAV versus MA, and the symbol statement with interpol=r causes the least-squares line to be plotted on the graph as well. This is shown in: 2-48.bmp.