Principal Components Analysis

Definition

Consider the k variables x1,..., xk. Let us say rmn denotes the correlation between xm and xn and rmn!=0 for some m!=n where m=1,...,k and n=1,...,k.

Principal Components Analysis (PCA) is a method of transforming the k original variables (i.e. x1,..., xk) into k new variables p1,..., pk, referred to as principal components, where the new variables are uncorrelated (i.e. rmn=0 for all m!=n). These principal components are linear combinations of the xi's, i=1,...,k. that is:

pi = wi,1x1 + wi,2x2 + ... + wi,kxk

To determine the weights (i.e. wi,j) we need to review some additional matrix algebra through determinants and then we can discuss eigenvalues, eigenvectors, and spectral decomposition.

Note:

  1. The xi's are not divided into response and explanatory variables and so this is not the multiple regression problem.
  2. Principal components are ranked by variance. That is, we use the notation p1 to denote the principal component with the largest variance and so pk would have the smallest variance.
  3. PCA was first proposed by Karl Pearson, in 1901, but is usually attributed to Harold Hotelling for work done in 1933 which appeared as the paper "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, 24:417-441.

Matrix Algebra

Let us say we have n instances of the k variables x1,..., xk and let the nxk matrix X denote these instances then we may define the following:

  1. Mean Vector:

    The mean vector is the vector of the k means of the k variables in X.

    XbarT = (1/n)(1TX)

  2. Covariance Matrix:

    The covariance matrix is a symmetric kxk matrix of covariances of the k variables in X. That is, if smn denotes the covariance between xm and xn, and S is the covariance matrix, then smn will be the entry in the mth row and nth column of S. Note that smn = smn and smm = sm2 (i.e. the variance of the values of xn.

    Let:

    Xs = X-1(XbarT)

    then:

    S = (1/(n-1))(XsTXs)

  3. Correlation Matrix:

    The correlation matrix is a symmetric kxk matrix of correlations of the k variables in X. That is, if rmn denotes the correlation between xm and xn, and R the correlation matrix then rmn will be the entry in the mth row and nth column of R. Note that rmn = smn/(sqrt(smmsnn)).

    Let D be the square diagonal matrix of variances sm2; m=1,...,k and D-1/2 be the matrix of reciprocals of the standard deviations sm; m=1,...,k then let:

    Xr = XsD-1/2

    then:

    R = (1/(n-1))(XrTXr)

  4. Determinants:

    The determinant of the kxk mtrix A, denoted by |A|, is the scalar:

    1. |A| = a11; k=1
    2. |A| = Skj=1 aij |A1j| (-1)1+j ; k>1
    where A1j is the (k-1)x(k-1) matrix obtained by deleting the first row and jth column of the kxk matrix A.

    Note

    1. If I is the kxk identity matrix then |I|=1
    2. If A, B are kxk matrices then:
      1. |A| = |AT|
      2. If each element of a row or column of A is zero then: |A| = 0
      3. If two rows of A are identical then: |A| = 0
      4. |AB| = |A||B|
      5. If c is a scalar then |cA| = ck|A|

  5. Trace:

    Let A be a kxk matrix. The trace of A, denoted tr(A), is the sum of the diagonal elemnets of A. That is, tr(A) = Ski=1aii

    If A, B are kxk matrices then:

    1. tr(cA) = c(tr(A))
    2. tr(A+B) = tr(A) + tr(B)
    3. tr(A-B) = tr(A) - tr(B)
    4. tr(AB) = tr(BA)
    5. tr(B-1AB) = tr(A)
    6. tr(AAT) = Ski=1 Skj=1aij2

  6. Orthogonal Matrix:

    A square matrix A is said to be orthogonal if its rows, considered as vectors, are mutually perpendicular and have unit lengths.

    Note that this means:

    AAT = I

    Also, A is orthogonal iff:

    A-1 = AT

    Note:

    1. Recall that the length of a vector X, denoted Lx, is:

      Lx = sqrt(XTX)

    2. Recall that the cosine of the angle q between two vectors, X, Y is:

      cos(q) = XTY/(LxLy)

      and so X and Y are perpendicular if:

      XTY = 0

      Mutually perpendicular vectors are linearly independent.

  7. Eigenvalues

    Let A be a kxk matrix and I the kxk identity matrix. Also, let l1, l2,..., lk saisfy the polynomial:

    |A - lI|

    then the li's; i=1,...,k, are called the eigenvalues (or the characteristic roots) of A. The equation |A - lI| is called the characteristic equation.

    Example:

    Let A be a 2x2 matrix and, using IML notation, let A={1 0, 1 3}. Find the eigenvalues of A.

    Solution:

    |A - lI| =|{1 0, 1 3} - l{1 0, 0 1}|

    Hence:

    |A - lI| =|{1 - l 0, 1 3 - l}| = (1 - l) (3 - l) = 0

    Hence: l1 = 1 and l2 = 3.