Sample Analysis
Programming Assignment #2

Model:

We wish to model the relationship between starting income, measured in dollars, and grade point average (GPA) for a population of recent non-computer science graduates (class of 1996). If y denotes starting income and x GPA, the simple linear regression model for this problem is:

yij= a + bx i +eij

where eij:

Model parameters are: a - population intercept; b - population slope; s - standard deviation about the regression line for fixed x

Residual Analysis:

To assess these assumptions we conducted a residual analysis. The box-plot of residuals, as well as the residual plot, indicates that there are no distinct outliers and no bias or systematic dependence. In addition the residual plot suggests constant standard deviation for fixed values of x. These plots support the first two model assumptions. Hypotheses for testing the normality assumption are:

H0: Residuals are drawn from a normal distribution
Ha: Residuals are not drawn from a normal distribution

The p-value is 0.6960 indicating that we have insufficient reason to doubt the null-hypothesis. In addition the normal plot is quite linear providing additional support to our conclusion that the normality assumption is tenable. These findings support the assumptions of the regression model and indicate that the simple linear regression model is appropriate for this problem.

Discussion:

From the SAS Analysis of Variance report, parameter estimates are:

a = 271.54
b = 7994.98
s = 2239.25

Hence, the regression equation is y=271.54 + 7994.98x which indicates that a unit increase in GPA results in an increase in starting income of about $7995. Also, the equation indicates that starting income is $271.54 for a GPA of zero. Since the minimum observed GPA is 1.6 this interpretation is probably not meaningful.

The r-square value of 0.8677 indicates that about 87% of the variation in y about its mean is explained by the regression model. Since the model is a simple linear regression model we can also conclude that the correlation coefficient is +sqrt(0.8667) (i.e. 0.931) indicating strong positive correlation between starting income and GPA.

Summary:

Simple linear regression is appropriate for this problem. Furthermore, the model explains about 87% of the variation in starting income, indicating that, for these non-computer science graduates, GPA is a good predictor of starting income.

The regression equation, which represents this functional relationship, is y=271.54 + 7994.98x. Hence, an increase in starting income of about $7995 can be expected for unit increase in GPA. However, since the minimum observed GPA is 1.6, the intercept is probably not meaningful.