|
Correlation Coefficient
The correlation coefficient is used to measure the strength of
the relationship between two variables, say x and y. It is typically
referred to as r. It is calculated using the formula:
The correlation coefficient says something about the strength of the
relationship between two variables, it does not quantify it. It can
provide the basis for further analysis designed to determine a causal
relationship.
The nature of the relationship is shown is indicated by the value of r:
| Relationship |
r |
Linear |
1.0 |
Inverse Linear |
- 1.0 |
Non-Linear/Random |
0 < abs(r) < 1 |
No Relationship |
0 |
Related Topics
Linear Regression
Critical values
of r
From one to almost zero in three graphics
The graphics below illustrate the way that the correlation
coefficient varies with the relationship between x and y for a sample of
15 points.
 |
Case 1
If y = x, there is a linear relationship between the two variables.
|
 |
Case 2
If y = x+random(-20,20), the relationship between the two variables is
is weaker
|
 |
Case 3
If x = random(1,100) and y=random(1,100), there is no relationship between
the variables and the value of r is low.
|
A.K.A.
The correlation coefficient is sometimes referred to as the product moment
correlation coefficient.
Significance Testing
If the correlation coefficient of the population is
zero, the value of the function below for a sample of n will follow
a t distribution with n-2 degrees of freedom

This is shown in graphic form below with the values based on the sample
data set at the 95% confidence level:

Thus if the value of r is greater than the value expected if it was due to
chance, then the relationship between x and y is not due to chance.
This relationship provides the basis for tables of the critical value of r
at a given significance level.
The table below shows the value of t and r for each of the
above cases:
| Case |
r |
t |
95% critical
value for r |
95% critical value for t |
Significant |
| 1 |
+1.00 |
∞ |
0.51 |
2.16 |
Yes |
| 2 |
+0.91 |
+7.31 |
0.51 |
2.16 |
Yes |
| 3 |
+0.17 |
+0.62 |
0.51 |
2.16 |
No |
In Case 1 where there is no random element, a significance test is
irrelevant as indicated by a t value of infinity. In case 2, the X/Y
plot suggests there is a relationship between X and Y and this is
confirmed by the value of t which is greater than the 5% critical value.
Finally Case 3, where there is no relationship between X and Y, t less
less than the critical value.Example
The real world example is based on the price of flour versus its protein
content. The data was obtained from the packet labels and the price
of the products in a single supermarket which provided a range of products
wide enough to provide a half decent dataset.

The data is shown in tabular form below:
| x |
y |
| 10.4 |
0.71 |
| 10.3 |
0.73 |
| 11.5 |
0.83 |
| 12.8 |
0.81 |
| 13.2 |
0.86 |
| 13.9 |
0.97 |
The results are presented in the same form as the contrived examples:
| Case |
r |
t |
5% critical
value for r |
5% critical value for t |
Significant |
| Real-world |
0.91 |
4.39 |
0.81 |
2.78 |
Yes |
In the above table there are 6 data points (4 degrees of freedom)
As the value of r is greater than the critical value, we can say that the
relationship between the price of flour and the protein content is
significant and that it is probable that protein content is one of the
factors that influences the price of flour.
Discussion
Originally, the real world example was selected because experience of
shopping for flour suggested that there was a simple relationship between
protein content and price. However, when the data was collected, as
is often the case, there were several other factors to be considered and
the dataset had to be filtered to remove variance due to other factors.
The other factors were provenance e.g. organic/non-organic and brand e.g.
independent brands/supermarket own brands.
Page updated: 29-Feb-2008 |