Multiple Regression

Multiple Regression

Consider some population where, for each item in the population, we have p numeric characteristics of interest. That is, for the i^th item in the population, we observe the p-tuple:

(y_i, x_i,1,..., x_i,p-1)

We believe that each y_i; i=1,...,N is linearly related to the corresponding x_i,j's; j=1,...,p-1. That is, an expression that describes the relationship would have the following form:

y = b₀ + b₁x₁ + ,..., + b_p-1x_p-1

Notice that we use x_j to represent the n instances of the j^th x value (i.e. x_i,j; i=1,...,n).

So, if you think of each item in our population as a point in p dimensional space then this expression defines a surface which bisects these points and describes how y changes as the x_j's; j=1,...,p-1 change.

We refer to y as the response or dependent variable and to x_j; j=1,...,p-1 as the explanatory or independent variables.

General Linear Model

Given the scenario above, we may express each y_i in terms of the corresponding x_i,j's; j=1,...,p-1 by the following general linear model.

y_i=b0 + b1xi,1 + b2xi,2 + … + bp-1xi,p-1 + e_i

where for any setting of the p-1-tuple (x_i,1,..,xi,p-1) the corresponding e_i:

are normally distributed.

have mean zero.

have constant s_e

The parameters of the model are:

b0 - intercept
bj - slope parameter where j=1,…,p-1
s_e - standard deviation about the regression surface for fixed (x1,..,xp-1)

Remember that bj; j=1,..,p-1 is the expected change in y for a unit increase in xj; j=1,..,p-1 when all other xk's (k¹j) are held constant.

Note: Any xi,j (i=1,...,N; j=1,…,p-1) term may be first order or higher order (i.e. polynomial as well as interaction) or even functions of x_i,j. Also, any xi,j term may be either a quantitative or a qualitative term.

Examples:

So, multiple regression models may be used to express the relationship between a dependent variable and either several independent variables or higher order terms of a single independent variable. Note that the term linear indicates that the model is linear in the parameters. Following are examples of multiple regression models.

y_i=b0 + b1xi,1 + b2xi,2 + b3xi,3 + e_i
y_i=b0 + b1x_i,1 + b2x2_i,2 + b3x3_i,1 + e_i
y_i=b0 + b1xi,1 + b2 x2_i,1 + b3xi,1xi,2 + b4 x2_i,2 + e_i
y_i=b0e^{b1x1 + e_i}

Least Squares Estimation:

We may use the least squares estimation method to derive estimates of the b_j parameters. Remember that we are seeking the surface which best describes the relationship between the dependent variable and the independent variable.

The least squares approach achieves this this by finding the b_j’s that minimize the e_i’s " i; i=1,…,p-1. So, consider e_i:

e_i = y_i - (b₀ + b₁x_i,1 + b₂x_i,2 + … + b_p-1x_i,p-1)

Now, let E = S e_i² then:

E = S {y_i - (b₀ + b₁x_i,1 + b₂x_i,2 + … + b_p-1x_i,p-1)}²

So we may minimize the e_i’s by minimizing E with respect to the b_j’s and we may accomplish this by solving the following p equations:

d E/d b₀=0, d E/d b₁=0 ,…, d E/d b_p-1=0

That is, the first two and the p^th are:

d E/d b₀ = S 2{y_i - (b₀ + b₁x_i,1 + b₂x_i,2 + … + b_p-1x_i,p-1)}(-1) = 0

d E/d b₁ = S 2{y_i - (b₀ + b₁x_i,1 + b₂x_i,2 + … + b_p-1x_i,p-1)}(- x_i,1) = 0

d E/d b_p-1 = S 2{y_i - (b₀ + b₁x_i,1 + b₂x_i,2 + … + b_p-1x_i,p-1)}(- x_i,p-1) = 0

Thus, the first two and the p^th are:

S y_i = nb₀ + S (b₁x_i,1 + b₂x_i,2 + … + b_p-1x_i,p-1)

S x_i,1y_i = S (b₀x_i,1 + b₁x²_i,1 + b₂x_i,2x_i,1 + … + b_p-1x_i,p-1x_i,1)

S x_i,p-1y_i = S (b₀x_i,p-1 + b₁x_i,1x_i,p-1 + b₂x_i,2x_i,p-1 + … + b_p-1x²_i,p-1)

These are known as the normal equations.

Notice that we have p equations in p unknowns (i.e. b_j’s) and so we may use a variety of techniques to solve for the b_j’s. For the case where p=2 we have the following:

S y_i = nb₀ + S b₁x_i,1

S x_i,1y_i = S b₀x_i,1 + S b₁x²_i,1

Note: Since we are minimizing, we need to make sure that the second partial derivative is greater than zero in each case.

We solve for b₀ thus:

nb₀ = S y_i - S b₁x_i,1

b₀ = ybar- b₁xbar

To solve for b₁, notice that we first substitute for b₀:

S x_i,1y_i = S (ybar- b₁ xbar)x_i,1 + S b₁ x²_i,1

S x_i,1y_i = ybarS x_i,1 + b₁(S x²_i,1 – xbarS x_i,1)

S x_i,1y_i = ybar(nxbar) + b₁(S x²_i,1 – xbar(nxbar))

b₁ = [S x_i,1y_i = nxbar(ybar)]/[S x²_i,1 – nxbar²]

Matrix Algebra Representation:

Let Y be the vector of the n y_i values. Let X be an n x p matrix, where the first column of X is a column of n 1’s and the remaining columns of X are vectors for each of the p-1 x’s. Let b be the vector of coefficients (i.e. b₀, b₁,.., b_p-1). Let e be the vector of the n residual values (i.e. e _i). The multiple regression model may be expressed in matrix terms thus:

Y = Xb + e

Let the surface that bisects p-dimensional space be:

Y = Xb

Since X is an n x p matrix it is not square and so X^-1 does not exist. Therefore, we cannot solve for b by simply pre-multiplying by X^-1. However, we may proceed thus:

X^TY = X^TXb

Note that the entries of X^TXb are the right hand side of the normal equations and the entries of X^TY are the left hand side of the normal equations. To see this, consider a simple multiple regression model example, with two variables x₁, x₂. Let us say we have n observations and let us denote the columns of X by the vectors C₁, C₂, C₃. These vectors would be C^T₁ = [1 1 … 1], C^T₂ = [x_1,1, x_2,1,…, x_n,1], C^T₃ = [x_1,2, x_2,2,…, x_n,2]. Also, Y^T = [y₁, y₂,…, y_n]. If you do the matrix multiplication above you will see that you obtain the normal equations.

Now, X^TX is square and so, assuming X^TX is of full rank, we may solve for b thus:

(X^TX)^-1X^TY = (X^TX)^-1X^TXb

(X^TX)^-1X^TY = Ib

That is, b = (X^TX)^-1X^TY.

You should note the following:

Y = Xb

Y = X(X^TX)^-1X^TY

We refer to X(X^TX)^-1X^T as the hat matrix and denote it H. Hence:

Y = HY

Multiple Regression - Model Building

Multiple regression models usually need to be considered in most practical situations but, in particular, should be considered in the following cases:

If a simple linear regression model explains less than 80% of the variability in the data and there are other candidate variables.
If strong prior evidence of dependence on multiple independent variables exist.

However, we should always seek a balance between parsimony and accuracy. Compact yet expressive models with interpretable parameters are usually preferred. In any event, the challenge is in determining the appropriate model for the problem. This is the model building problem.

Model building involves several steps:

Construct a list of candidate independent variables.

Qualitative and quantitative variables may be included.
If prior knowledge of suitable candidates does not exist then start with all available (for this class assume that all provided are suitable).

Generate all possible first order simple linear regression models. If n candidates build n models.

Select the best simple linear regression model.
Note: For now we will consider best to be highest R2. If more than 80% of the variation is explained consider stopping and continuing at step 4.
Generate all possible first order k variable multiple regression models, starting with k=2, by adding to the variable already in the model all remaining variables.
1. Increase k by 1 and repeat step 3 until there are no more variables or until a less than 5% increase in R2 is observed. If there are no more variables to be added continue at step 4.
Check model assumptions by examining residual plots.
If model assumptions hold model building is done. Complete the exercise by interpreting parameters.
If model assumptions do not hold consider remedial measures:
1. Transform independent variables. If there is bias, consider higher order terms.
2. Transform dependent variable. Only transform the independent variable if there is non-constant standard deviation. We will consider log transforms only.