Fishers LDA

Linear Discriminant Analysis

We will consider the problem of distinguishing between two populations, given a sample of items from the populations, where each item has p features (i.e. variables x₁,...,x_p). That is, can we develop a rule, or discriminant function, from the observed features of the sampled items that will allow us to assign some new item to the correct population by examining its features only.

The general approach is to optimally construct a linear combination of the observed variables x₁,...,x_p that we will refer to as the discriminant function. We will consider the two-population problem only and will focus on the approach proposed by Fisher in his 1936 paper The Use of Multiple Measurement in Taxonomic Problems, Ann. Eugenics. For discriminant problems involving more than two populations, see the text Multivariate Statistical Methods by Morrison.

Fishers Linear Discriminant Function:

Let X be an nxp matrix of observations on p variables x₁,...,x_p. Let g be the px1 vector of weights that optimally defines the linear discriminant function:

l = g₁x₁ + g₂x₂ + ... + g_px_p

This discriminant function therefore maps each of the n observations from p dimensional space to a point in 1 dimensional space. That is, we may express this mapping in matrix terms thus:

l = Xg

Notice that l is an nx1 vector. We refer to the values of this vector as the discriminant scores of the original observations. Now, consider the sum of the squares of these discriminant scores.

l^T l = (Xg)^T Xg = g^TX^T Xg

Now, X^TX is a pxp sum of squares matrix. since our observations are from two groups then this matrix is usually referred to as the total sum of squares matrix. Let us denote it by T. We may partition it into the sum of squares due to "within" group differences and the sum of squares due to "between" group differences thus:

T = B + W

Hence:

l^T l = g^TX^T Xg = g^T(B+W) g = g^TB g + g^TW g

The terms g^TB g and g^TW g are the "between" and "within" group components of the sum of squares of the discriminant scores.

The objective of discriminant analysis is to find a weight vector l that maximizes the ratio of these sum of squares components. That is, we want to maximize l where l is given by:

l = (g^TB g)/ (g^TW g)

You may think of the discriminant function as defining a hypersurface that bisects the points in p dimensional points in such a way that when you view the observations from a direction perpendicular to this surface you can optimally discriminate between the groups.

We use calculaus to find the weights by differentiating l with respect to g.

dl/ dg = {2(Bg) (g^TW g) - 2(g^TB g) (W g)}/ (g^TW g) ² = 0

We may simplify by dividing through by g^TW g to obtain:

(B - lW) g = 0

Note that we may find l by solving the polynomial:

|W^-1B - lI| = 0

Hence the problem of finding the optimal weight vector g reduces to the problem of finding the eigenvalues and eigenvectors of the matrix W^-1B. The eigenvectors are the weight vectors of the discriminant function.

Note that in the two population case there is one eigenvector. In fact, for the two population case we can show that the eigenvector is:

g = S^-1 (m₁ - m₂)

Note that m₁ and m₂ are the px1 mean vectors of the p variables x₁,...,x_p.

This weight vector is the weight vector that is known as Fisher's discriminant function.

Note: Two population discriminant analysis may be formulated as a multiple regression problem. The dependent variable defines group membership and is therefore binary valued. The independent variables are the p variables x₁,...,x_p. Note that the coefficients of the regression equation are not the same as the weights mentioned above. We can however show that one can be derived from the other.

SAS:

We have seen that discriminant analysis may be solved by finding the Spectral Decomposition of the matrix W^-1B. Hence, we may easily code an IML routine to obtain the weight vector.

Alternatively, we may use the SAS procedure discrim to find the discriminant function. The general form of the discrim statement is:

  proc discrim options;
     class variable;
     by variables;
     id variable;
     var variables;

Only the proc discrim and class statements are required.