Correlation
Definition:
The correlation coefficient (i.e. the Pearson
correlation coefficient) is a quantitative measure of the linear
association between two variables. The population parameter is
denoted by r and the sample statistic by r.
Properties:
- The correlation coefficient ranges between
-1 and 1(i.e. -1<= r<=1 or -1<=r<=1)
- r=0 (r=0) indicates random scatter, that
is, no linear association between the two
variables.
- r=1 or -1 (r=1 or -1) indicates a perfect
linear association between the two variables.
- r>0 (r>0) indicates an upward
sloping trend (sometimes referred to as a
positive linear association). That is, as one
variable increases the other also increases.
- r<0 (r<0) indicates a downward
sloping trend (sometimes referred to as a
negative linear association). That is, as one
variable increases the other decreases.
- |r|>0.7
(|r|>0.7) is often used as a threshold to
identify a non-trivial correlation coefficient
(the || indicates absolute value).
- The correlation coefficient squared, and
expressed as a percentage, indicates the proportion of
variability in one variable accounted for by the other.
Notice that for a non-trivial correlation you would
expect about 50% (or more) of the variability in one
variable to be accounted for by the other.
These properties suggest the following two step
procedure for interpreting the correlation coefficient for
variables x and y:
- Examine the sign of r or r. If
the sign is negative then state that there is a negative
linear association, that is, as x increases y decreases
(or vice versa). If the sign is positive then state that
there is a positive linear association, that is, as x
increases y also increases (or vice versa).
- Square r or r. Express as a
percentage (n%) and state that n% of the variability in x
is accounted for by y (or vice versa).
Note: The sample correlation
coefficient r is a statistic and so if the sample is a
probability sample then it is a good estimate of the population
parameter r.
Optional Readings:
SAS: Chapter 9: pp281 - pp289
You should
familiarize yourself with the output of the SAS corr
procedure. In particular you should be able to find the
correlation coefficient given the output.