## Linear Correlation

### Introduction

• The news is filled with examples of correlations and associations:

Drinking a glass of red wine per day may decrease your chances of a heart attack.

Taking one aspirin per day may decrease your chances of stroke or of a heart attack.

Eating lots of certain kinds of fish may improve your health and make you smarter.

Driving slower reduces your chances of getting killed in a traffic accident.

Taller people tend to weigh more.

Pregnant women that smoke tend to have low birthweight babies.

Animals with large brains tend to be more intelligent.

The more you study for an exam, the higher the score you are likely to receive.

• The correlation, denoted by r, measures the amount of linear association between two variables.

• r is always between -1 and 1 inclusive.

• The R-squared value, denoted by R2, is the square of the correlation. It measures the proportion of variation in the dependent variable that can be attributed to the independent variable.

• The R-squared value R2 is always between 0 and 1 inclusive.

• Perfect positive linear association. The points are exactly on the trend line.
Correlation r = 1; R-squared = 1.00

• Large positive linear association. The points are close to the linear trend line.
Correlation r = 0.9; R=squared = 0.81.

• Small positive linear association. The points are far from the trend line.
Correlation r = 0.45; R-squared = 0.2025.

• No association. There is no association between the variables.
Correlation r = 0.0; R-squared = 0.0.

• Small negative association.
Correlation r = -0.3. R-squared = 0.09.

• Large negative association.
Correlation r = -0.95; R-squared = 0.9025

• Perfect negative association.
Correlation r = -1. R-squared = 1.00.

• How high must a correlation be to be considered meaningful? It depends on the discipline. Here are some rough guidelines:

### Calculating the Correlation

• To calculate the correlation, first standardize both the x and y variables:

zxi = (xi - x) / SDx      zyi = (yi - y) / SDy
• Then compute r = the average of the products zxi zyi
• Example:   Compute the correlation r of this dataset:

 x y 1 3 4 5 7 5 9 7 1 13

• We calculate:

x = (1 + 3 + 4 + 5 + 7) / 5 = 4

y = (5 + 9 + 7 + 1 + 13) / 5 = 7

SDx2 = [(1-4)2 + (3-4)2 + (4-4)2 + (5-4)2 + (7-4)2] / 5 = 4

SDx = √4 = 2

SDy2 = [(5-7)2 + (9-7)2 + (7-7)2 + (1-7)2 + (13-7)2] / 5 = 16

SDy = √16 = 4

• Now compute the average of the z-scores of the x- and y-variables:

x y zx zy zxzy
1 5 -1.5 -0.5 0.75
3 9 -0.5 0.5 -0.25
4 7 0.0 0.0 0.00
5 1 0.5 -1.5 -0.75
7 13 1.5 1.5 2.25
Ave. of zxzy: 0.40

• Thus the correlation r is 0.4.

• Remember: the correlation is always between -1 and 1, inclusive.

• Why does this work? Here are three possibilities:

1. In diagram (a), the x- and y-variables have a positive relationship. Most of the (x,y) points lie in quadrants I and III where the zxzy product is positive. Therefore r > 0.

2. In diagram (b), the x- and y-variables have a negative relationship. Most of the (x,y) points lie in quadrants II and IV where the zxzy product is negative. Thereform r < 0.

3. In diagram (c), the x- and y-variables have no relationship. The positive products in quadrants I and III cancel out the negative products in quadrants II and VI so the average of the products is close to 0; r is also close to 0.

### Calculating the Correlation with SD+

• Compute the correlation r of this dataset:

 x y 1 3 4 5 7 5 9 7 1 13

• Use SPSS to calculate descriptive statistics and z-scores:

x = 4.00    SDx+ = 2.236    x = 7.00    SDx+ = 4.472

x y zx zy zxzy
1 5 -1.34164 -0.44721 0.60
2 9 -0.44721 0.44721 -0.20
3 7 0.0 0.0 0.00
4 1 0.44721 -1.34164 -0.60
5 13 1.34164 1.34164 1.80
Ave. of zxzy: 0.32

• Multiply by the correction factor n / (n-1):

(ave of zxzy) * n / (n-1) = 0.32 * 5 / (5-1) = 0.32 * 5 / 4 = 0.4.

This is the same answer obtained previously using SDx and SDy.