Simple Linear Regression
Theorem 1a:
Consider a bivariate sample from some population. Let us
say we are interested in the relationship between x and y where x is the
independent variable and y the dependent variable.
If
Simple Linear Regression (SLR) is appropriate and r is the
correlation between x and y
and sy
is the standard deviation of y
and sx
is the standard deviation of x
and ybar the mean of y and xbar the mean of x then the slope b and intercept a
of the regression line may be determined thus:
Problem:
Consider the income/gpa problem. Given the following statistics,
derive the regression equation:
Solution:
Since b=r(
sy/
sx) then b=0.8(6000/0.4)=12000. Also, a=ybar-b(xbar)=50000
-12000(3.0)=14000. Hence the regression equation is: Theorem 1b: Consider the bivariate population mentioned in Theorem 1a above. The
population may be represented by the following SLR model if a x and y
are related linearly:
where, for fixed x,
e
is assumed to: Notes:
Problem:
Consider the income/gpa problem above. Using the sample statistics to
estimate population parameters, determine the proportion of graduates
with gpa=4.0 that got starting salaries more than $50K.
Solution:
Since income=14000 + 12000(gpa) then the mean income of these graduates
would be 14000+12000(4)=$62K. Also, the std dev of these incomes would be
6000(sqrt(1-0.82))=3600. Since normality of these incomes
follows if we assume the SLR model then z=(50000-62000)/3600=-3.33 and so
99.96% of graduates with gpa=4.0 got starting salaries more than $50K.
Optional Readings: SAS: Chapter 9: pp289 - pp298 You will need to
familiarize yourself with the output of the SAS reg
procedure.
Note: Interpretation - Slope: change in y for unit increase in x;
Intercept: value of y when x is zero. Remember that some thought may be
needed, given the context of the problem, to determine if the intercept
makes sense.
a = ybar - b(xbar)
income: mean=50000; std dev=6000
gpa: mean=3.0; std dev=0.4; r=0.8
income=14000 + 12000(gpa)
y = a + bx +
e
a = m
y -
bm
x