When to use spearman correlation
Correlation measures the strength of association between two variables. There are two methods for measuring correlation in EnCorr.
The Pearson correlation method is the most widely used. The Pearson correlation method measures the strength of the linear relationship between normally distributed variables. This is appropriate most of the time for financial returns data.
The Pearson correlation between A and B is
When the variables are not normally distributed or the relationship between the variables is not linear it may be more appropriate to use the Spearman rank correlation method.
The Spearman rank correlation method makes no assumptions about the distribution of the data. It may therefore be more appropriate for data with large outliers that hide meaningful relationships between series or for series that are not normally distributed.
The Spearman rank correlation method first takes the returns data and assigns a rank number to each return. It doesn’t matter if the highest return gets the lowest rank or vice versa. So for example:
Date A B
Jan 5% 8%
Feb 8% 11%
Mar 7% 9%
Date A A Rank B B Rank
Jan 5% 1 8% 1
Feb 8% 3 11% 3
Mar 7% 2 9% 2
The Spearman correlation is then calculated using the
Pearson method on the rank data rather then on the return data.
Date A Rank B Rank
Jan 1 1
Feb 3 3
Mar 2 2
The Spearman correlation for this data is 1. In comparison the Pearson correlation of this data is .928
Because the ranks are used for the calculation the size of the largest and smallest values has no effect. This is why outliers have less effect on the Spearman calculation. If for example the 5% return of series A in 1/2020 had been -50% the Spearman correlation would still be 1 since the ranks would not change while the Pearson correlation would fall to .766.
If the ranks of the data of two series are the same there is obviously a very strong relationship between the two series. The Pearson correlation won’t necessarily show this because it shows the strength of the linear relationship and the relationship might not be linear.
The Pearson correlation is good for measuring the strength of the linear relationship between two normally distributed data sets. The Spearman correlation method is better for data that is not normally distributed. The conversion to ranks loses some of the precision in the data. For this reason the Pearson correlation should be used as long as the data is approximately normally distributed.Source: datalab.morningstar.com