Correlation

 

Definition:

The correlation coefficient (i.e. the Pearson correlation coefficient) is a quantitative measure of the linear association between two variables. The population parameter is denoted by r and the sample statistic by r.

Properties:

  1. The correlation coefficient ranges between -1 and 1(i.e. -1<= r<=1 or -1<=r<=1)
    1. r=0 (r=0) indicates random scatter, that is, no linear association between the two variables.
    2. r=1 or -1 (r=1 or -1) indicates a perfect linear association between the two variables.
    3. r>0 (r>0) indicates an upward sloping trend (sometimes referred to as a positive linear association). That is, as one variable increases the other also increases.
    4. r<0 (r<0) indicates a downward sloping trend (sometimes referred to as a negative linear association). That is, as one variable increases the other decreases.
    5. |r|>0.7 (|r|>0.7) is often used as a threshold to identify a non-trivial correlation coefficient (the || indicates absolute value).
  2. The correlation coefficient squared, and expressed as a percentage, indicates the proportion of variability in one variable accounted for by the other. Notice that for a non-trivial correlation you would expect about 50% (or more) of the variability in one variable to be accounted for by the other.

 

These properties suggest the following two step procedure for interpreting the correlation coefficient for variables x and y:

  1. Examine the sign of r or r. If the sign is negative then state that there is a negative linear association, that is, as x increases y decreases (or vice versa). If the sign is positive then state that there is a positive linear association, that is, as x increases y also increases (or vice versa).
  2. Square r or r. Express as a percentage (n%) and state that n% of the variability in x is accounted for by y (or vice versa).

Note: The sample correlation coefficient r is a statistic and so if the sample is a probability sample then it is a good estimate of the population parameter r.

 

Optional Readings:

SAS: Chapter 9: pp281 - pp289

You should familiarize yourself with the output of the SAS corr procedure. In particular you should be able to find the correlation coefficient given the output.