PCA

Principal Components Analysis

Definition

Consider the k variables x₁,..., x_k. Let us say r_mn denotes the correlation between x_m and x_n and r_mn!=0 for some m!=n where m=1,...,k and n=1,...,k.

Principal Components Analysis (PCA) is a method of transforming the k original variables (i.e. x₁,..., x_k) into k new variables p₁,..., p_k, referred to as principal components, where the new variables are uncorrelated (i.e. r_mn=0 for all m!=n). These principal components are linear combinations of the x_i's, i=1,...,k. that is:

p_i = w_i,1x₁ + w_i,2x₂ + ... + w_i,kx_k

To determine the weights (i.e. w_i,j) we need to review some additional matrix algebra through determinants and then we can discuss eigenvalues, eigenvectors, and spectral decomposition.

Note:

The x_i's are not divided into response and explanatory variables and so this is not the multiple regression problem.
Principal components are ranked by variance. That is, we use the notation p₁ to denote the principal component with the largest variance and so p_k would have the smallest variance.
PCA was first proposed by Karl Pearson, in 1901, but is usually attributed to Harold Hotelling for work done in 1933 which appeared as the paper "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, 24:417-441.

Matrix Algebra

Let us say we have n instances of the k variables x₁,..., x_k and let the nxk matrix X denote these instances then we may define the following:

Mean Vector:

The mean vector is the vector of the k means of the k variables in X.

Xbar^T = (1/n)(1^TX)

Covariance Matrix:

The covariance matrix is a symmetric kxk matrix of covariances of the k variables in X. That is, if s_mn denotes the covariance between x_m and x_n, and S is the covariance matrix, then s_mn will be the entry in the m^th row and n^th column of S. Note that s_mn = s_mn and s_mm = s_m² (i.e. the variance of the values of x_n.

Let:

X_s = X-1(Xbar^T)

then:

S = (1/(n-1))(X_s^TX_s)

Correlation Matrix:

The correlation matrix is a symmetric kxk matrix of correlations of the k variables in X. That is, if r_mn denotes the correlation between x_m and x_n, and R the correlation matrix then r_mn will be the entry in the m^th row and n^th column of R. Note that r_mn = s_mn/(sqrt(s_mms_nn)).

Let D be the square diagonal matrix of variances s_m²; m=1,...,k and D^-1/2 be the matrix of reciprocals of the standard deviations s_m; m=1,...,k then let:

X_r = X_sD^-1/2

then:

R = (1/(n-1))(X_r^TX_r)

Determinants:
The determinant of the kxk mtrix A, denoted by |A|, is the scalar:

|A| = a₁₁; k=1
|A| = S^k_j=1 a_ij |A_1j| (-1)^1+j ; k>1
where A_1j is the (k-1)x(k-1) matrix obtained by deleting the first row and j^th column of the kxk matrix A.
Note

If I is the kxk identity matrix then |I|=1
If A, B are kxk matrices then:

|A| = |A^T|
If each element of a row or column of A is zero then: |A| = 0
If two rows of A are identical then: |A| = 0
|AB| = |A||B|
If c is a scalar then |cA| = c^k|A|
Trace:
Let A be a kxk matrix. The trace of A, denoted tr(A), is the sum of the diagonal elemnets of A. That is, tr(A) = S^k_i=1a_ii
If A, B are kxk matrices then:
1. tr(AA^T) = S^k_i=1 S^k_j=1a_ij²
Orthogonal Matrix:
A square matrix A is said to be orthogonal if its rows, considered as vectors, are mutually perpendicular and have unit lengths.
Note that this means:
AA^T = I
Also, A is orthogonal iff:
A^-1 = A^T
Note:
1. Recall that the cosine of the angle q between two vectors, X, Y is:
  cos(q) = X^TY/(L_xL_y)
  and so X and Y are perpendicular if:
  X^TY = 0
  Mutually perpendicular vectors are linearly independent.
Eigenvalues
Let A be a kxk matrix and I the kxk identity matrix. Also, let l₁, l₂,..., l_k saisfy the polynomial:
|A - lI|
then the l_i's; i=1,...,k, are called the eigenvalues (or the characteristic roots) of A. The equation |A - lI| is called the characteristic equation.
Example:
Let A be a 2x2 matrix and, using IML notation, let A={1 0, 1 3}. Find the eigenvalues of A.
Solution:
|A - lI| =|{1 0, 1 3} - l{1 0, 0 1}|
Hence:
|A - lI| =|{1 - l 0, 1 3 - l}| = (1 - l) (3 - l) = 0
Hence: l₁ = 1 and l₂ = 3.