*** SAS source code file ***

---------------------------------------------------------------------------------------
/* Exercise 2.48, page 148. */
options nodate;
data foot;
infile 'C:\1_Courses\223\Lectures\Week6\sasexpl\ta02-002.txt' delimiter='09'x;
* HAV is the response and MA the explanatory variable;
input HAV MA;
datalines;

proc reg data=foot;
title 'Regression Data for HAV on MA';
model HAV = MA;
output out=foot1 predicted=HAVhat residual=r;
run;

symbol value=dot color=blue interpol=r;
proc gplot data=foot;
title 'Scatter plot HAV vs. MA (With Regression Line)';
plot hav*ma/ haxis = 5 to 40 by 5 vaxis = 5 to 50 by 5;
run;
----------------------------------------------------------------------------------------

*** SAS output file produced by running above program ***

----------------------------------------------------------------------------------------
                                 Regression Data for HAV on MA                          

                                       The REG Procedure
                                         Model: MODEL1
                                    Dependent Variable: HAV

                                      Analysis of Variance

                                             Sum of           Mean
         Source                   DF        Squares         Square    F Value    Pr > F

         Model                     1      188.71350      188.71350       3.62    0.0652
         Error                    36     1878.54965       52.18193
         Corrected Total          37     2067.26316


                      Root MSE              7.22371    R-Square     0.0913
                      Dependent Mean       25.42105    Adj R-Sq     0.0660
                      Coeff Var            28.41624


                                      Parameter Estimates

                                   Parameter       Standard
              Variable     DF       Estimate          Error    t Value    Pr > |t|

              Intercept     1       19.72327        3.21717       6.13      <.0001
              MA            1        0.33884        0.17818       1.90      0.0652
------------------------------------------------------------------------------------------

*** Discussion ***

1. The Exercise

For purposes of solving Exercise 2.48, much of the above output can be ignored at this
point in time in the course. 
(a) The table labeled 'Parameter Estimates' gives the least-squares regeression line
as HAV = 19.72327 + 0.33884MA, i.e., a = 19.72327 and b = 0.33884.
(b) So for MA = 25 degrees, we would predict HAV = 19.723 + 0.339 * 25 = 28.2 degrees.
(c) The scatter plot shows a lot of spread, so predictions based on the regression
line will not be vary reliable. Note that R-Square is 0.0913, so only about 9% of the
variation in HAV is explained by the regression line.

2. The Source Code

In proc reg, the model statement causes the least-squares line of the form y = a + bx to
be calculated, where y is the response variable and x is the explanatory variable.
The reason for including the output statement in proc reg, is because proc gplot needs
the residuals in order to plot the regression line on the scatter plot. If we didn't
need to plot the regression line, then the output statement could be omitted from 
proc reg. The output statement creates a SAS data set called foot1 with 37 observations
and four variables, HAV, MA, HAVhat and r. HAVhat is the predicted value for an 
observation and r is the residual for an observation, i.e., the difference between the
observed value of HAV and the predicted value.
Once this is done, then proc gplot produces a scatter plot of HAV versus MA, and the
symbol statement with interpol=r causes the least-squares line to be plotted on the
graph as well. This is shown in: 2-48.bmp.