The Correlation Coefficient: Definition
Bruce Ratner, Ph.D.
The correlation coefficient, denoted by r, is a measure of the strength of the straight-line or linear relationship between two variables. The correlation coefficient takes on values ranging between +1 and -1. The following points are the accepted guidelines for interpreting the correlation coefficient:
- 0 indicates no linear relationship.
- +1 indicates a perfect positive linear relationship: as one variable increases in its values, the other variable also increases in its values via an exact linear rule.
- -1 indicates a perfect negative linear relationship: as one variable increases in its values, the other variable decreases in its values via an exact linear rule.
- Values between 0 and 0.3 (0 and -0.3) indicate a weak positive (negative) linear relationship via a shaky linear rule.
- Values between 0.3 and 0.7 (0.3 and -0.7) indicate a moderate positive (negative) linear relationship via a fuzzy-firm linear rule.
- Values between 0.7 and 1.0 (-0.7 and -1.0) indicate a strong positive (negative) linear relationship via a firm linear rule.
- The value of r squared is typically taken as “the percent of variation in one variable explained by the other variable,” or “the percent of variation shared between the two variables.”
- Linearity Assumption. The correlation coefficient requires that the underlying relationship between the two variables under consideration is linear. If the relationship is known to be linear, or the observed pattern between the two variables appears to be linear, then the correlation coefficient provides a reliable measure of the strength of the linear relationship. If the relationship is known to be nonlinear, or the observed pattern appears to be nonlinear, then the correlation coefficient is not useful, or at least questionable.
The calculation of the correlation coefficient for two variables, say X and Y, is simple to understand. Let zX and zY be the standardized versions of X and Y, respectively. That is, zX and zY are both re-expressed to have means equal to zero, and standard deviations (std) equal to one. The re-expressions used to obtain the standardized scores are in equations (3.1) and (3.2):
zXi = [Xi - mean(X)]/std(X) (3.1)
zYi = [Yi - mean(Y)]/std(Y) (3.2)
The correlation coefficient is defined as the mean product of the paired standardized scores (zXi, zYi) as expressed in equation (3.3).
rX,Y = sum of [zXi * zYi]/(n-1), where n is the sample size (3.3)
For a simple illustration of the calculation, consider the sample of five observations in Table 1. Columns zX and zY contain the standardized scores of X and Y, respectively. The last column is the product of the paired standardized scores. The sum of these scores is 1.83. The mean of these scores (using the adjusted divisor n-1, not n) is 0.46. Thus, rX,Y = 0.46. ( Related Article: When Data Are Not Straight
For more information about this article, call Bruce Ratner at 516.791.3544,
1 800 DM STAT-1, or e-mail at firstname.lastname@example.org.