Multiple Regression
Consider some population where, for each item in the population, we have p numeric characteristics of interest. That is, for the ith item in the population, we observe the p-tuple:
We believe that each yi; i=1,...,N is linearly related to the corresponding xi,j's; j=1,...,p-1. That is, an expression that describes the relationship would have the following form:
Notice that we use xj to represent the n instances of the jth x value (i.e. xi,j; i=1,...,n).
So, if you think of each item in our population as a point in p dimensional space then this expression defines a surface which bisects these points and describes how y changes as the xj's; j=1,...,p-1 change.
We refer to y as the response or dependent variable and to xj; j=1,...,p-1 as the explanatory or independent variables.
General Linear Model
Given the scenario above, we may express each yi in terms of the corresponding xi,j's; j=1,...,p-1 by the following general linear model.
where for any
setting of the p-1-tuple
(xi,1,..,xi,p-1) the corresponding
ei:
The parameters of the model are: Note:
Any x Examples:
So, multiple regression models may be used to
express the relationship between a dependent variable and
either several independent variables or higher order terms of a
single independent variable. Note that the
term linear indicates that the
model is linear in the parameters. Following are examples of
multiple regression models. Least Squares Estimation: We may use the least squares estimation method to derive estimates of the The least squares approach achieves this this by finding the
se
Remember that bj; j=1,..,p-1 is the expected change in y for a unit increase in xj; j=1,..,p-1 when all other xk's (k¹j) are held constant.
b0 + b1xi,1 + b2xi,2 + b3xi,3 + ei
e
i = yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)Now, let E = S ei2 then:
E = S {yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}2
So we may minimize the ei’s by minimizing E with respect to the bj’s and we may accomplish this by solving the following p equations:
d E/d b0=0, d E/d b1=0 ,…, d E/d bp-1=0
That is, the first two and the pth are:
d E/d b0 = S 2{yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}(-1) = 0
d E/d b1 = S 2{yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}(- xi,1) = 0
d E/d bp-1 = S 2{yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}(- xi,p-1) = 0
Thus, the first two and the pth are:
S yi = nb0 + S (b1xi,1 + b2xi,2 + … + bp-1xi,p-1)
S xi,1yi = S (b0xi,1 + b1x2i,1 + b2xi,2xi,1 + … + bp-1xi,p-1xi,1)
S xi,p-1yi = S (b0xi,p-1 + b1xi,1xi,p-1 + b2xi,2xi,p-1 + … + bp-1x2i,p-1)
These are known as the normal equations.
Notice that we have p equations in p unknowns (i.e. bj’s) and so we may use a variety of techniques to solve for the bj’s. For the case where p=2 we have the following:
S yi = nb0 + S b1xi,1
S xi,1yi = S b0xi,1 + S b1x2i,1
Note: Since we are minimizing, we need to make sure that the second partial derivative is greater than zero in each case.
We solve for b0 thus:
nb0 = S yi - S b1xi,1
b
0 = ybar- b1xbarTo solve for b1, notice that we first substitute for b0:
S xi,1yi = S (ybar- b1 xbar)xi,1 + S b1 x2i,1
S xi,1yi = ybarS xi,1 + b1(S x2i,1 – xbarS xi,1)
S xi,1yi = ybar(nxbar) + b1(S x2i,1 – xbar(nxbar))
S xi,1yi = ybar(nxbar) + b1(S x2i,1 – xbar(nxbar))
b
1 = [S xi,1yi = nxbar(ybar)]/[S x2i,1 – nxbar2]
Matrix Algebra Representation:
Let Y be the vector of the n yi values. Let X be an n x p matrix, where the first column of X is a column of n 1’s and the remaining columns of X are vectors for each of the p-1 x’s. Let b be the vector of coefficients (i.e. b0, b1,.., bp-1). Let e be the vector of the n residual values (i.e. e i). The multiple regression model may be expressed in matrix terms thus:
Y = Xb + e
Let the surface that bisects p-dimensional space be:
Y = Xb
Since X is an n x p matrix it is not square and so X-1 does not exist. Therefore, we cannot solve for b by simply pre-multiplying by X-1. However, we may proceed thus:
XTY = XTXb
Note that the entries of XTXb are the right hand side of the normal equations and the entries of XTY are the left hand side of the normal equations. To see this, consider a simple multiple regression model example, with two variables x1, x2. Let us say we have n observations and let us denote the columns of X by the vectors C1, C2, C3. These vectors would be CT1 = [1 1 … 1], CT2 = [x1,1, x2,1,…, xn,1], CT3 = [x1,2, x2,2,…, xn,2]. Also, YT = [y1, y2,…, yn]. If you do the matrix multiplication above you will see that you obtain the normal equations.
Now, XTX is square and so, assuming XTX is of full rank, we may solve for b thus:
(XTX)-1XTY = (XTX)-1XTXb
(XTX)-1XTY = I
bThat is, b = (XTX)-1XTY.
You should note the following:
Y = Xb
Y = X(XTX)-1XTY
We refer to X(XTX)-1XT as the hat matrix and denote it H. Hence:
Y = HY
Multiple Regression - Model Building
Multiple regression models usually need to be considered in most
practical situations but, in
particular, should be considered in the following cases:
Model building involves several steps: