*** SAS source code file ***
---------------------------------------------------------------------------------------
/* Exercise 2.48, page 148. */
options nodate;
data foot;
infile 'C:\1_Courses\223\Lectures\Week6\sasexpl\ta02-002.txt' delimiter='09'x;
* HAV is the response and MA the explanatory variable;
input HAV MA;
datalines;
proc reg data=foot;
title 'Regression Data for HAV on MA';
model HAV = MA;
output out=foot1 predicted=HAVhat residual=r;
run;
symbol value=dot color=blue interpol=r;
proc gplot data=foot;
title 'Scatter plot HAV vs. MA (With Regression Line)';
plot hav*ma/ haxis = 5 to 40 by 5 vaxis = 5 to 50 by 5;
run;
----------------------------------------------------------------------------------------
*** SAS output file produced by running above program ***
----------------------------------------------------------------------------------------
Regression Data for HAV on MA
The REG Procedure
Model: MODEL1
Dependent Variable: HAV
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 188.71350 188.71350 3.62 0.0652
Error 36 1878.54965 52.18193
Corrected Total 37 2067.26316
Root MSE 7.22371 R-Square 0.0913
Dependent Mean 25.42105 Adj R-Sq 0.0660
Coeff Var 28.41624
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 19.72327 3.21717 6.13 <.0001
MA 1 0.33884 0.17818 1.90 0.0652
------------------------------------------------------------------------------------------
*** Discussion ***
1. The Exercise
For purposes of solving Exercise 2.48, much of the above output can be ignored at this
point in time in the course.
(a) The table labeled 'Parameter Estimates' gives the least-squares regeression line
as HAV = 19.72327 + 0.33884MA, i.e., a = 19.72327 and b = 0.33884.
(b) So for MA = 25 degrees, we would predict HAV = 19.723 + 0.339 * 25 = 28.2 degrees.
(c) The scatter plot shows a lot of spread, so predictions based on the regression
line will not be vary reliable. Note that R-Square is 0.0913, so only about 9% of the
variation in HAV is explained by the regression line.
2. The Source Code
In proc reg, the model statement causes the least-squares line of the form y = a + bx to
be calculated, where y is the response variable and x is the explanatory variable.
The reason for including the output statement in proc reg, is because proc gplot needs
the residuals in order to plot the regression line on the scatter plot. If we didn't
need to plot the regression line, then the output statement could be omitted from
proc reg. The output statement creates a SAS data set called foot1 with 37 observations
and four variables, HAV, MA, HAVhat and r. HAVhat is the predicted value for an
observation and r is the residual for an observation, i.e., the difference between the
observed value of HAV and the predicted value.
Once this is done, then proc gplot produces a scatter plot of HAV versus MA, and the
symbol statement with interpol=r causes the least-squares line to be plotted on the
graph as well. This is shown in: 2-48.bmp.