Multiple Regression

Consider some population where, for each item in the population, we have p numeric characteristics of interest. That is, for the ith item in the population, we observe the p-tuple:

(yi, xi,1,..., xi,p-1)

We believe that each yi; i=1,...,N is linearly related to the corresponding xi,j's; j=1,...,p-1. That is, an expression that describes the relationship would have the following form:

y = b0 + b1x1 + ,..., + bp-1xp-1

Notice that we use xj to represent the n instances of the jth x value (i.e. xi,j; i=1,...,n).

So, if you think of each item in our population as a point in p dimensional space then this expression defines a surface which bisects these points and describes how y changes as the xj's; j=1,...,p-1 change.

We refer to y as the response or dependent variable and to xj; j=1,...,p-1 as the explanatory or independent variables.

General Linear Model

Given the scenario above, we may express each yi in terms of the corresponding xi,j's; j=1,...,p-1 by the following general linear model.

yi=b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1 + ei

where for any setting of the p-1-tuple (xi,1,..,xi,p-1) the corresponding ei:

  1. are normally distributed.
  2. have mean zero.
  3. have constant se

The parameters of the model are:

  1. b0 - intercept
  2. bj - slope parameter where j=1,…,p-1
  3. se - standard deviation about the regression surface for fixed (x1,..,xp-1)
Remember that bj; j=1,..,p-1 is the expected change in y for a unit increase in xj; j=1,..,p-1 when all other xk's (k¹j) are held constant.

Note: Any xi,j (i=1,...,N; j=1,…,p-1) term may be first order or higher order (i.e. polynomial as well as interaction) or even functions of xi,j. Also, any xi,j term may be either a quantitative or a qualitative term.

Examples:

So, multiple regression models may be used to express the relationship between a dependent variable and either several independent variables or higher order terms of a single independent variable. Note that the term linear indicates that the model is linear in the parameters. Following are examples of multiple regression models.

  1. yi=b0 + b1xi,1 + b2xi,2 + b3xi,3 + ei
  2. yi=b0 + b1xi,1 + b2x2i,2 + b3x3i,1 + ei
  3. yi=b0 + b1xi,1 + b2 x2i,1 + b3xi,1xi,2 + b4 x2i,2 + ei
  4. yi=b0eb1x1 + ei

 

Least Squares Estimation:

We may use the least squares estimation method to derive estimates of the bj parameters. Remember that we are seeking the surface which best describes the relationship between the dependent variable and the independent variable.

The least squares approach achieves this this by finding the bj’s that minimize the ei’s " i; i=1,…,p-1. So, consider ei:

ei = yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)

Now, let E = S ei2 then:

E = S {yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}2

So we may minimize the ei’s by minimizing E with respect to the bj’s and we may accomplish this by solving the following p equations:

d E/d b0=0, d E/d b1=0 ,…, d E/d bp-1=0

That is, the first two and the pth are:

d E/d b0 = S 2{yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}(-1) = 0

d E/d b1 = S 2{yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}(- xi,1) = 0

d E/d bp-1 = S 2{yi - (b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1)}(- xi,p-1) = 0

Thus, the first two and the pth are:

S yi = nb0 + S (b1xi,1 + b2xi,2 + … + bp-1xi,p-1)

S xi,1yi = S (b0xi,1 + b1x2i,1 + b2xi,2xi,1 + … + bp-1xi,p-1xi,1)

S xi,p-1yi = S (b0xi,p-1 + b1xi,1xi,p-1 + b2xi,2xi,p-1 + … + bp-1x2i,p-1)

These are known as the normal equations.

Notice that we have p equations in p unknowns (i.e. bj’s) and so we may use a variety of techniques to solve for the bj’s. For the case where p=2 we have the following:

S yi = nb0 + S b1xi,1

S xi,1yi = S b0xi,1 + S b1x2i,1

Note: Since we are minimizing, we need to make sure that the second partial derivative is greater than zero in each case.

We solve for b0 thus:

nb0 = S yi - S b1xi,1

b0 = ybar- b1xbar

To solve for b1, notice that we first substitute for b0:

S xi,1yi = S (ybar- b1 xbar)xi,1 + S b1 x2i,1

S xi,1yi = ybarS xi,1 + b1(S x2i,1 – xbarS xi,1)

S xi,1yi = ybar(nxbar) + b1(S x2i,1 – xbar(nxbar))

S xi,1yi = ybar(nxbar) + b1(S x2i,1 – xbar(nxbar))

b1 = [S xi,1yi = nxbar(ybar)]/[S x2i,1 – nxbar2]

 

Matrix Algebra Representation:

Let Y be the vector of the n yi values. Let X be an n x p matrix, where the first column of X is a column of n 1’s and the remaining columns of X are vectors for each of the p-1 x’s. Let b be the vector of coefficients (i.e. b0, b1,.., bp-1). Let e be the vector of the n residual values (i.e. e i). The multiple regression model may be expressed in matrix terms thus:

Y = Xb + e

Let the surface that bisects p-dimensional space be:

Y = Xb

Since X is an n x p matrix it is not square and so X-1 does not exist. Therefore, we cannot solve for b by simply pre-multiplying by X-1. However, we may proceed thus:

XTY = XTXb

Note that the entries of XTXb are the right hand side of the normal equations and the entries of XTY are the left hand side of the normal equations. To see this, consider a simple multiple regression model example, with two variables x1, x2. Let us say we have n observations and let us denote the columns of X by the vectors C1, C2, C3. These vectors would be CT1 = [1 1 … 1], CT2 = [x1,1, x2,1,…, xn,1], CT3 = [x1,2, x2,2,…, xn,2]. Also, YT = [y1, y2,…, yn]. If you do the matrix multiplication above you will see that you obtain the normal equations.

Now, XTX is square and so, assuming XTX is of full rank, we may solve for b thus:

(XTX)-1XTY = (XTX)-1XTXb

(XTX)-1XTY = Ib

That is, b = (XTX)-1XTY.

You should note the following:

Y = Xb

Y = X(XTX)-1XTY

We refer to X(XTX)-1XT as the hat matrix and denote it H. Hence:

Y = HY

 

Multiple Regression - Model Building

Multiple regression models usually need to be considered in most practical situations but, in particular, should be considered in the following cases:

  1. If a simple linear regression model explains less than 80% of the variability in the data and there are other candidate variables.
  2. If strong prior evidence of dependence on multiple independent variables exist.
However, we should always seek a balance between parsimony and accuracy. Compact yet expressive models with interpretable parameters are usually preferred. In any event, the challenge is in determining the appropriate model for the problem. This is the model building problem.

Model building involves several steps:

  1. Construct a list of candidate independent variables.
    1. Qualitative and quantitative variables may be included.
    2. If prior knowledge of suitable candidates does not exist then start with all available (for this class assume that all provided are suitable).
  2. Generate all possible first order simple linear regression models. If n candidates build n models.
    1. Select the best simple linear regression model.
      Note: For now we will consider best to be highest R
      2. If more than 80% of the variation is explained consider stopping and continuing at step 4.
  3. Generate all possible first order k variable multiple regression models, starting with k=2, by adding to the variable already in the model all remaining variables.
    1. Select the best model.
    2. Increase k by 1 and repeat step 3 until there are no more variables or until a less than 5% increase in R2 is observed. If there are no more variables to be added continue at step 4.
  4. Check model assumptions by examining residual plots.
  5. If model assumptions hold model building is done. Complete the exercise by interpreting parameters.
  6. If model assumptions do not hold consider remedial measures:
    1. Transform independent variables. If there is bias, consider higher order terms.
    2. Transform dependent variable. Only transform the independent variable if there is non-constant standard deviation. We will consider log transforms only.