37 Spearman Rank Correlation
37.1 Spearman Rank Correlation Scale Tests
The Spearman Rank Correlation Coefficient (Charles Spearman, 1904), often referred to as Spearman’s rho, is a non-parametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function. This test is ideal for cases where the variables may not meet the assumptions necessary for Pearson’s correlation coefficient, such as not having a normal distribution of data or a linear relationship.
37.1.1 Assumptions
The Spearman Rank Correlation test operates under the following assumptions:
- Monotonic Relationship: The relationship between the variables should be monotonic, either increasing or decreasing, but not necessarily at a constant rate.
- Ordinal Data: The test can be applied to ordinal data or to continuous data that do not meet the assumptions required for Pearson’s correlation.
37.1.2 Hypotheses
The hypotheses for the Spearman Rank Correlation test are:
- Null Hypothesis (H₀): There is no association between the two variables (the correlation is zero).
- Alternative Hypothesis (H₁): There is an association between the two variables (the correlation is not zero).
37.1.3 Formula
The Spearman’s rho ($ \() is calculated as follows:\)$ = 1 - $$ Where: - \(d_i\) is the difference between the ranks of corresponding variables. - \(n\) is the number of observations.
37.1.4 Calculation Steps
- Rank each variable separately. Assign average ranks in case of ties.
- Compute the difference (\(d\)) between the ranks of each pair of corresponding variables.
- Square each difference (\(d_i^2\)).
- Sum all squared differences.
- Substitute the summed value into the formula to find $ $.
37.1.5 Interpretation
The Spearman’s rho values range from -1 to +1: - A \(\rho\) of +1 indicates a perfect positive association. - A \(\rho\) of -1 indicates a perfect negative association. - A \(\rho\) of 0 suggests no association.
The significance of \(\rho\) can be tested using tables of critical values or computationally to determine if the observed correlation is unlikely under the null hypothesis.
37.1.6 Example Problem
Suppose a researcher wants to examine if there is a correlation between the ranks of employees based on their performance scores and peer ratings. Here are the data for 5 employees:
- Performance Scores: 90, 85, 80, 95, 70
- Peer Ratings: 88, 80, 85, 90, 75
Hypotheses:
- Null Hypothesis (H₀): There is no correlation between performance scores and peer ratings.
- Alternative Hypothesis (H₁): There is a correlation between performance scores and peer ratings.
37.1.7 Spearman Rank Correlation using Excel:
37.2 Spearman Rank Correlation using R and Python
This test is particularly valuable in research areas where data are ordinal or do not meet the prerequisites for parametric tests, providing a robust method for correlation analysis under such conditions.
Summary
| Concept | Description |
|---|---|
| Foundations | |
| Spearman's Rho | A non-parametric measure of monotonic association between two variables, based on their ranks |
| Monotonic Relationship | Captures relationships in which one variable consistently increases or decreases with the other |
| Non-parametric Counterpart of Pearson | Used in place of Pearson's r when assumptions of linearity or bivariate normality are not met |
| Assumptions | |
| Ordinal or Continuous Data | Applicable when data are ordinal or when continuous variables fail Pearson's assumptions |
| Robustness to Outliers | Less sensitive to extreme values than Pearson's r because it operates on ranks rather than raw values |
| Interpretation | |
| Range of Rho | Always lies between minus one and plus one inclusive |
| Interpretation | Plus one is a perfect monotonic increase, minus one a perfect decrease, zero indicates no monotonic association |
| Hypotheses | |
| Null Hypothesis | States that there is no monotonic association between the two variables |
| Alternative Hypothesis | States that there is a monotonic association, with rho not equal to zero |
| Computation | |
| Rank Each Variable Separately | Each variable is ranked independently, with average ranks assigned for ties |
| Differences in Ranks | Compute the difference between paired ranks to quantify how closely the orderings agree |
| Spearman Formula | Rho equals 1 minus 6 times the sum of squared rank differences divided by n times (n squared minus 1) |
| Tied Ranks Handling | Tied values receive the average of the ranks they would otherwise have occupied |
| In R and Python | |
| R via cor(method = 'spearman') | Use cor(x, y, method = 'spearman') for rho and cor.test(..., method = 'spearman') for the test in R |
| Python via scipy.stats.spearmanr() | Use scipy.stats.spearmanr(x, y) to obtain rho and a two-sided p-value in Python |