A robust Spearman correlation coefficient permutation test

Han Yu; Alan D Hutson

doi:10.1080/03610926.2022.2121144

. Author manuscript; available in PMC: 2025 Jan 1.

Published in final edited form as: Commun Stat Theory Methods. 2022 Sep 9;53(6):2141–2153. doi: 10.1080/03610926.2022.2121144

A robust Spearman correlation coefficient permutation test

Han Yu ^1,^*, Alan D Hutson ¹

PMCID: PMC11029148 NIHMSID: NIHMS1885371 PMID: 38646087

Abstract

In this work, we show that Spearman’s correlation coefficient test about $H_{0} : ρ_{s} = 0$ found in most statistical software is theoretically incorrect and performs poorly when bivariate normality assumptions are not met or the sample size is small. There is common misconception that the tests about $ρ_{s} = 0$ are robust to deviations from bivariate normality. However, we found under certain scenarios violation of the bivariate normality assumption has severe effects on type I error control for the common tests. To address this issue, we developed a robust permutation test for testing the hypothesis $H_{0} : ρ_{s} = 0$ based on an appropriately studentized statistic. We will show that the test is asymptotically valid in general settings. This was demonstrated by a comprehensive set of simulation studies, where the proposed test exhibits robust type I error control, even when the sample size is small. We also demonstrated the application of this test in two real world examples.

Keywords: rank correlation, studentized, small sample, non-normality

1. Introduction

The concept of correlation and regression was originally conceived by Galton when studying how strongly the characteristics of one generation of living things manifested in the following generation (Stanton, 2001). The ideas prompting the development of more mathematically rigorous treatment of correlation were developed by Karl Pearson in 1896, which yielded the well-known Pearson Product Moment Correlation Coefficient (Pearson, 1896) given as

ρ (X, Y) = \frac{C o v (X, Y)}{σ_{X} σ_{Y}} = \frac{E (X - μ_{X}) (Y - μ_{Y})}{\sqrt{E (X - μ_{X})^{2} E (Y - μ_{Y})^{2}}},

(1)

where $X$ and $Y$ are two random variables from a non-degenerative joint distribution $F_{X Y}$ , $C o v (X, Y)$ denotes the covariance, and $μ_{X}$ and $μ_{Y}$ , $σ_{X}$ and $σ_{Y}$ are the population means and standard deviations, respectively. If we let $(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})$ denote $n$ paired i.i.d. observations, then the sample Pearson correlation coefficient is given as

r (X, Y) = \frac{\sum_{i = 1}^{n} (X_{i} - \overset{‒}{X}) (Y_{i} - \overset{‒}{Y})}{\sqrt{\sum_{i = 1}^{n} (X_{i} - \overset{‒}{X})^{2}} \sqrt{\sum_{i = 1}^{n} (Y_{i} - \overset{‒}{Y})^{2}}},

(2)

where $\overset{‒}{X}$ and $\overset{‒}{Y}$ are the sample means of $X$ and $Y$ , respectively.

Shortly after Pearson’s work was published, Spearman introduced the rank correlation coefficient in 1904. The Spearman correlation has advantages of being robust to extreme values and disparities between the marginal distributions between two variables (Spearman, 1961). However, it should be noted that K. Pearson, in his biography of Galton, says that the latter "dealt with the correlation of ranks before he even reached the correlation of variates, i.e. about 1875", but Galton apparently did not publish anything explicitly about this result (Kendall and Stuart, 1979). Mathematically, Spearman’s correlation coefficient is defined as the Pearson correlation coefficient on the ranks of $(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})$ and denoted as

ρ_{s} (X, Y) = ρ (a, b),

(3)

where $a_{i} = R a n k (X_{i})$ and $b_{i} = R a n k (Y_{i})$ are the ranks of $X_{i}$ and $Y_{i}$ , respectively, $i = 1, 2, \dots, n$ . In general, when discussing the Spearman correlation coefficient little attention is given to its population measure. However, if we consider that $a_{i} ∕ n$ converges to $F (X_{i})$ , then one may consider the population measure linked to Spearman’s sample correlation coefficient as,

ρ_{s} (X, Y) = \frac{E {F_{X} (X) - E [F_{X} (X)]} {F_{Y} (Y) - E [F_{Y} (Y)]}}{\sqrt{E {F_{X} (X) - E [F_{X} (X)]}^{2}} \sqrt{E {F_{Y} (Y) - E [F_{Y} (Y)]}^{2}}} = \frac{E [F_{X} (X) - \frac{1}{2}] [F_{Y} (Y) - \frac{1}{2}]}{\sqrt{E [F_{X} (X) - \frac{1}{2}]^{2}} \sqrt{E [F_{Y} (Y) - \frac{1}{2}]^{2}}},

(4)

where $F_{X}$ and $F_{Y}$ are the marginal cumulative distribution functions (CDFs) for $X$ and $Y$ , respectively. The expectation $E$ is taken over the joint distribution $F (X, Y)$ . The sample estimator of $ρ_{s}$ can be obtained by replacing the original observations with their ranks in Equation 2,

r_{s} (X, Y) = r (a, b) = \frac{\sum_{i = 1}^{n} (a_{i} - \overset{‒}{a}) (b_{i} - \overset{‒}{b})}{\sqrt{\sum_{i = 1}^{n} (a_{i} - \overset{‒}{a})^{2}} \sqrt{\sum_{i = 1}^{n} (b_{i} - \overset{‒}{b})^{2}}} = 1 - \frac{6 \sum_{i = 1}^{n} (a_{i} - b_{i})^{2}}{n (n^{2} - 1)} .

(5)

For samples from a bivariate normal population, there is also a known relation between the Spearman and Pearson correlation coefficients (Moran, 1948), which is

E (r_{s}) = \frac{6}{π (n + 1)} {\sin^{- 1} ρ + (n - 2) \sin^{- 1} \frac{1}{2} ρ} .

(6)

While Pearson’s $ρ$ measures the linear relationship between two random variables it is often described that Spearman’s $ρ_{s}$ measures the strength and direction of association between $X$ and $Y$ monotonically, thus it may be considered a more general measure of association, albeit it does measure the linear association between $F (X)$ and $F (Y)$ . Spearman’s correlation coefficient is also less sensitive to extreme values because it is rank based. Due to these advantages, it is widely used as a measure of association between two measurements. It is often of interest to test whether two random variables are correlated, i.e. $H_{0} : ρ_{s} = 0$ . The common methods include (1) a $t$ -distribution based test, (2) a test based on Fisher’s $Z$ transformation, and (3) what we term the naive permutation test.

The $t$ -distribution based test is commonly used when the sample size is large, with the $t$ -statistic defined as

t = r_{s} \sqrt{\frac{n - 2}{1 - r_{s}^{2}}} .

This statistic was first used for Pearson’s correlation coefficient. Under bivariate normality assumptions it approximately follows student’s $t$ distribution with $n - 2$ degrees of freedom under $H_{0}$ (Edgell and Noon, 1984). When used for Spearman’s $ρ_{s}$ , this test incorrectly based on the approximate bivariate normality of the ranks. For the test based on Fisher’s $Z$ transformation, the statistic is defined as

Z = \frac{1}{2} arctanh (r_{s}) = \frac{1}{2} ln \frac{1 + r_{s}}{1 - r_{s}} .

Under bivariate normality assumptions the transformed $Z$ statistic approximately follows normal distribution $N (0, \frac{1.06}{n - 3})$ under $H_{0}$ (Fieller et al., 1957).

Permutation tests have been applied in a broad range of scenarios (Pauly and Smaga, 2020). For small sample size cases, naive permutation tests are also often used for testing $ρ_{s}$ , where ${X_{i}, i = 1, \dots, n}$ and ${Y_{i}, i = 1, \dots, n}$ are randomly shuffled separately and independently to simulate the sample distribution of $r_{s}$ under $H_{0}$ , which may be an invalid test for testing $H_{0} : ρ_{s} = 0$ given $G (F (X), F (Y)) = G (F (X)) G (F (Y))$ does not imply $ρ_{s} = 0$ , where $G (\cdot, \cdot)$ denotes the joint CDF of $F (X)$ , $F (Y)$ . For example, Marozzi proposed a permutation test for Kendall $W$ coefficient, which is a rank-based measure of concordance between multiple criteria (Marozzi, 2014). When there are only two judges, the Kendall $W$ is a linear transformation of $ρ_{s}$ , so the tests on these two coefficients are equivalent. However, similar to the naive permutation test, the method did not account for the difference between independence and zero correlations.

These tests are so widely used that they are often the default options in common statistical software packages such as R (Team et al., 2013) and SAS (Institute, 2015). Both software packages (cor.test function for R and CORR procedure for SAS) by default uses the $t$ -distribution based test, as shown in their documentations. However, there is little discussion that these tests rely on the untenable assumption that the underlying sample distribution of the ranks has a bivariate normal distribution, which is in fact an impossibility. Even among those who noted this assumption, there is a misconception that the above tests are robust to such deviations because Spearman’s $ρ_{s}$ is rank based. This is exemplified in a discussion by Fieller et al. in their article (Fieller et al., 1957),

Conversely, starting from any bivariate distribution $ϕ (X, Y)$ we can always find monotonic transformations $X = f (x)$ , $Y = g (y)$ to standardized normal variates $x$ and $y$ . The resulting bivariate distribution $ψ (X, Y)$ will not necessarily be bivariate normal, but we think it likely that in practical stations it would not differ greatly from this form. This is a field in which further investigation would be of considerable interest.

However, as we will show in Section 3, all the commonly used tests about $ρ_{s} = 0$ as discussed above, including the naive permutation test, are not even asymptotically valid when the exchangeabilty assumptions are violated under $H_{0}$ . That is, even when $n \to \infty$ , the type I error cannot be controlled at the desired level. In some cases, the type I error can severely drift away from the desired level as the sample size increases! An undesirable feature that is more notable in the era of “big-data”. Another variation of this approach is the Fisher-Yates coefficient, which transforms the original $X$ and $Y$ to their corresponding normal quantiles before the testing (Fisher and Yates, 1938). Although the marginal distributions of the transformed variates take a pseudo normal form, the joint normality of these transformed values is not guaranteed.

In terms of our modified permutation test it is important to note the classic large sample result in Serfling where the “distribution free” large sample normal approximation for the sampling distribution for Pearson’s sample correlation coefficient $r$ is derived using the multivariate delta method (Kendall and Stuart, 1979). This method guarantees type I error converges to $α$ when $n \to \infty$ given finite fourth moments. A straightforward way to obtain a similar result for Spearman’s correlation is given by replacing the $(X_{i}, Y_{i})$ with $(a_{i}, b_{i})$ . The test is asymptotically valid because the ranks are asymptotically independent, as we will discuss in Section 2. Even though large sample approximations about these estimators are asymptotically valid they tend to suffer inflated type I errors in the small sample setting, e.g. $n < 50$ .

Permutation tests provide a strong alternative testing approach. The permutation approach has been applied to a variety of univariate and multivariate problems and its properties have been extensively studied (Giancristofaro and Bonnini, 2008; Arboretti Giancristofaro et al., 2009; Pesarin and Salmaso, 2010a; Basso and Salmaso, 2011; Pesarin and Salmaso, 2012, 2013; Arboretti et al., 2014; Giancristofaro, 2014; Salmaso, 2015). For an overview of permutation tests, please see Salmaso and Pesarin (2010) and Pesarin and Salmaso (2010b). Recently, DiCiccio and Romano have shown that the permutation distribution of Pearson’s correlation coefficient does not converge to the sampling distribution when two random variables are dependent but uncorrelated (DiCiccio and Romano, 2017). Therefore, a a simple permutation test ignoring possible dependency structures can lead to invalid inference about the Pearson’s and related correlation coefficients (DiCiccio and Romano, 2017; Hutson and Yu, 2021).

In this work, we show that a naive permutation test of $ρ_{s}$ suffers a similar problem. To address this issue, we propose a studentized permutation test for Spearman’s correlation $ρ_{s}$ , which extends the work or Diccicio and Romano for Pearson’s correlation coefficient $ρ$ (DiCiccio and Romano, 2017). We will show that the proposed test is asymptotically valid under general assumptions and is exact under exchangeability assumptions when $G (F (X), F (Y)) = G (F (X)) G (F (Y))$ , i.e. more simply when $X$ and $Y$ are independent. We show that our newly proposed test has robust Type I error control even when the exchangeability assumption does not hold and the sample distribution is non-normal. Importantly, even when the sample size is as small as 10, the type I error is still well controlled, which is advantageous over the tests based on large sample approximations. This will be illustrated by a set of simulation studies. Finally, we will demonstrate the application of this test in real world examples of transcriptomic data of TCGA breast cancer patients, as well as a data set of PSA levels and age.

2. Methods

2.1. Spearman’s permutation correlation test

Spearman’s coefficient $ρ_{s} (X, Y)$ is the Pearson correlation coefficient of the ranks of $X$ and $Y$ , that is $ρ (a, b)$ . When there are ties in the data, the tied ranks are typically taken as an average. Unlike Pearson’s correlation coefficient $ρ$ , which measures the linear relationship between two random variables, Spearman’s correlation coefficient $ρ_{s}$ measures a monotonic association, thus is far less restrictive. Note that Spearman’s correlation coefficient is also the linear measure between $F (X)$ and $F (Y)$ . It is also less sensitive to non-normality or extreme values.

Despite the above advantages, it is a misconception that the tests of $ρ_{s} = 0$ based on the bivariate normality assumptions underlying the original data will be robust to the deviation from this assumption. In fact, tests of $ρ_{s}$ typically suffer similar issue as for $ρ$ . We also emphasize that “normality” refers to the joint normality as opposed to marginal normality, because two random variables that are marginally normal can have a joint non-normal distribution. Therefore, the Fisher-Yates coefficient, which back transforms a variables rank through the normal quantile function does not provide what heuristically one may consider as a simple correction. In Section 3, we will empirically show that violation of the joint normality assumption will have severe effect on type I error control. In addition, it is in fact impossible for the joint distribution of the ranks to be bivariate normal.

Our approach is to replace the observations $(X_{i}, Y_{i})$ with their ranks $(a_{i}, b_{i})$ in order to develop a Spearman’s correlation permutation test analog to the Pearson’s correlation permutation test, with some subtle differences. The studentized permutation test of the Pearson’s $ρ$ proposed by DiCiccio and Romano only requires finite fourth moments and that the paired observations are i.i.d (DiCiccio and Romano, 2017), i.e. $(X_{i}, Y_{i})$ and $(X_{j}, Y_{j})$ are independent and identically distributed when $i \neq j$ . However, it should be noted the ranks $(a_{i}, b_{i})$ and $(a_{j}, b_{j})$ are no longer independent pairs. For example, suppose we have $n = 2$ , then after knowing $(a_{1}, b_{1}) = (1, 1)$ we will immediately know $(a_{2}, b_{2}) = (2, 2)$ .

In spite of this, we can show that $(a_{i}, b_{i})$ and $(a_{j}, b_{j})$ are asymptotically independent and follow identical distributions when $i \neq j$ . This result follows naturally from the convergence of empirical CDF to CDF. Without loss of generality, we start by considering a single variable $X$ , Since $X_{i}$ and $X_{j}$ are i.i.d. observations, we have $W_{i} = F (X_{i})$ , $W_{j} = F (X_{j})$ being i.i.d as well, so $F (W_{i}, W_{j}) = F (W_{i}) F (W_{j})$ . Also we have $a_{i}$ and $a_{j}$ being the ranks of $X_{i}$ and $X_{j}$ , so $a_{i} = n \hat{F} (X_{i})$ and $a_{j} = n \hat{F} (X_{j})$ , where $\hat{F}$ is the empirical CDF of $X$ . By the Strong Law of Large Numbers, we have $(\hat{F} (X_{i}), \hat{F} (X_{i})) \to (W_{i}, W_{j})$ almost surely, thus $a_{i} ∕ n$ and $a_{j} ∕ n$ converge almost surely to a pair of independent variables $W_{i}$ and $W_{j}$ , Therefore, $a_{i}$ and $a_{j}$ are asymptotically independent. Note that this result is on two observations of the same random variable, i.e. $a_{i}$ and $a_{j}$ . It does not concern the dependency between $X_{i}$ and $Y_{i}$ , or between $a_{i}$ and $b_{j}$ .

It is obvious that $a_{i}$ and $a_{j}$ follow the same distribution, and the same conclusion applies to $b_{i}$ and $b_{j}$ . Therefore, two paired observations $(a_{i}, b_{i})$ and $(a_{j}, b_{j})$ are identically, and asymptotically independently distributed when $i \neq j$ . Consequently, the exchangeability condition of i.i.d observations holds asymptotically, so the test will be asymptotically exact.

With the above results, the one-sided studentized permutation test for testing $H_{0} : ρ_{s} = 0$ versus $H_{1} : ρ_{s} > 0$ is performed by the following steps. The test is implemented in the R perk (permutation tests of correlation c(k)oefficients) package, which will be available on CRAN (The Comprehensive R Archive Network, https://cran.r-project.org/) and GitHub (https://github.com/hyu-ub/perk).

For $n$ paired i.i.d. observations $(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})$ , calculate their ranks within each random variable, $(a_{1}, b_{1}), (a_{2}, b_{2}), \dots, (a_{n}, b_{n})$ .
Estimate the Spearman’s $ρ_{s}$ using Equation 5 as $r_{s}$ .
Estimate the variance of sample estimates $r_{s}$ by
${\hat{τ}}_{n}^{2} = \frac{{\hat{μ}}_{22}}{{\hat{μ}}_{20} {\hat{μ}}_{02}}, {\hat{μ}}_{p q} = \frac{1}{n} \sum_{i = 1}^{n} (a_{i} - \overset{‒}{a})^{p} (b_{i} - \overset{‒}{b})^{q} .$
Calculate the studentized statistic $R_{s} = r_{s} ∕ {\hat{τ}}_{n}$ .
Randomly shuffle ${b_{1}, b_{2}, \dots, b_{n}}$ for $B$ times. For each permutation, calculate the permuted studentized statistic $R_{s}^{k}$ , $k \in {1, \dots, B}$ . Note that ${\hat{τ}}_{n}^{2}$ needs to be calculated for each permutation because ${\hat{μ}}_{22}$ is not permutation invariant. Specifically, for permutation $π$ , we have
${\hat{μ}}_{22}^{*} = \frac{1}{n} \sum_{i = 1}^{n} (a_{i} - \overset{‒}{a})^{2} (b_{π (i)} - \overset{‒}{b})^{2}, {\hat{τ}}_{n}^{2 (*)} = \frac{{\hat{μ}}_{22}^{*}}{{\hat{μ}}_{20} {\hat{μ}}_{02}} .$
Calculate the p-value by
$p = \frac{1}{B} Σ_{k = 1}^{B} I (R_{s}^{k} > R_{s}) .$
Reject $H_{0}$ if $p \leq α$ .

3. Simulations

We examined the Type I error control across all of the tests introduced above using distributions commonly found in the literature for these examinations across a wide range of settings (DiCiccio and Romano, 2017; Hutson, 2019). For our simulation study, we focused on testing $H_{0} : ρ_{s} = 0$ versus $H_{1} : ρ_{s} > 0$ , with sample sizes $n$ = 10, 25, 50, 100, 200. Each simulation utilized 10, 000 Monte Carlo replications and the number of permutations used is 1, 000. We compared the $t$ test, Fisher’s $Z$ -transformation (Fisher’s $Z$ ), Fisher-Yates method, Serfling’s large sample normal approximation (Asymp Norm), naive permutation test (Permute), and studentized permutation test (Stu Permute). The Type I error control for $α = 0.05$ was examined. The simulation scenarios 1 through 5 from DiCiccio and Romano. Two additional distributions were studied as well:

Multivariate normal (MVN) with mean zero and identity covariance.
Exponential given as $(X, Y) = r S^{T} u$ where $S = d i a g (\sqrt{2}, 1)$ , $r \sim exp (1)$ , and $u$ is uniformly distributed on the two dimensional unit circle.
$t_{4.1}$ where $X = W + Z$ and $Y = W - Z$ , where $W$ and $Z$ are iid $t_{4.1}$ random variables.
Circular given as the uniform distribution on a two dimensional unit circle.
Multivariate $t$ -distribution (MVT) with 5 degrees of freedom.
Mixture of two bivariate normal distributions given as $(X, Y) = W Z_{1} + (1 - W) Z_{2}$ where $W \sim B e r n o u l l i (0.5)$ , $Z_{1} \sim N ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}))$ , $Z_{2} \sim N ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1 & - ρ \\ - ρ & 1 \end{matrix}))$ . We select a range of $ρ$ ’s: 0.1, 0.3, 0.6 and 0.9 to simulate different degrees of dependencies between $X$ and $Y$ (MVN 1, MVN 3, MVN6, MVN 9).
Mixture of four bivariate normal distributions (MVN 45), given as $(X, Y) = Σ_{k = 1}^{4} I (W = k) Z_{k}$ where $P (W = k) = 0.25$ , $k = 1, 2, 3, 4$ . In addition, $Z_{k} \sim N (μ_{k}, I_{2})$ , where $μ_{1} = (5, 5)^{T}$ , $μ_{2} = (5, - 5)^{T}$ , $μ_{3} = (- 5, 5)^{T}$ , $μ_{4} = (- 5, - 5)^{T}$ .

The results in Table 1 and Figure 2 show that the large sample asymptotic normal approximation has inflated type I error rates for all distributions when $n \leq 50$ . The $t$ test, Fisher’s $Z$ test, Fisher-Yates, and naive permutation tests tend to be over-conservative for the exponential and circular distributions. While for the $t_{4.1}$ distribution, the type I error is consistently inflated. Note that, for these tests, such deviation cannot be corrected as sample size increases. Instead, they may converge to an arbitrary level, either lower or higher than $α$ .

Table 1:

Type I error rate of testing $H_{0} : ρ_{s} = 0$ versus $H_{0} : ρ_{s} > 0$ .

Distribution	n	t test	Fisher’s Z	Fisher-Yates	Asymp Norm	Permute	Stu Permute
MVN	10	0.0498	0.0420	0.0479	0.1223	0.0495	0.0457
	25	0.0501	0.0444	0.0499	0.0811	0.0494	0.0487
	50	0.0512	0.0473	0.0509	0.0639	0.0508	0.0525
	100	0.0529	0.0484	0.0527	0.0596	0.0530	0.0518
	200	0.0479	0.0438	0.0493	0.0525	0.0487	0.0496
Exponential	10	0.0707	0.0623	0.0813	0.1300	0.0715	0.0479
	25	0.0744	0.0689	0.0914	0.0804	0.0741	0.0492
	50	0.0795	0.0724	0.1061	0.0680	0.0798	0.0509
	100	0.0772	0.0706	0.0975	0.0561	0.0774	0.0497
	200	0.0789	0.0735	0.0998	0.0549	0.0785	0.0509
$t_{4.1}$	10	0.0625	0.0531	0.0693	0.1275	0.0606	0.0489
	25	0.0631	0.0574	0.0846	0.0768	0.0629	0.0480
	50	0.0631	0.0579	0.0910	0.0614	0.0628	0.0457
	100	0.0677	0.0611	0.1082	0.0567	0.0672	0.0491
	200	0.0728	0.0675	0.1149	0.0572	0.0731	0.0521
Circular	10	0.0124	0.0095	0.0041	0.0907	0.0124	0.0514
	25	0.0052	0.0040	0.0004	0.0620	0.0048	0.0474
	50	0.0031	0.0024	0.0001	0.0575	0.0030	0.0487
	100	0.0021	0.0015	< 0.0001	0.0537	0.0024	0.0497
	200	0.0016	0.0009	< 0.0001	0.0489	0.0020	0.0482
MVT	10	0.0582	0.0502	0.0626	0.1305	0.0584	0.0497
	25	0.0596	0.0543	0.0726	0.0825	0.0602	0.0496
	50	0.0542	0.0489	0.0700	0.0608	0.0553	0.0456
	100	0.0584	0.0535	0.0740	0.0566	0.0574	0.0496
	200	0.0590	0.0539	0.0852	0.0527	0.0590	0.0496
MVN 1	10	0.0541	0.0458	0.0499	0.1271	0.0531	0.0534
	25	0.0484	0.0451	0.0490	0.0777	0.0482	0.0498
	50	0.0481	0.0428	0.0478	0.0631	0.0484	0.0492
	100	0.0497	0.0452	0.0520	0.0564	0.0498	0.0497
	200	0.0495	0.0462	0.0523	0.0529	0.0493	0.0488
MVN 3	10	0.0547	0.0467	0.0539	0.1341	0.0544	0.0497
	25	0.0557	0.0514	0.0607	0.0814	0.0551	0.0516
	50	0.0571	0.0498	0.0607	0.0686	0.0562	0.0508
	100	0.0530	0.0473	0.0651	0.0561	0.0544	0.0497
	200	0.0529	0.0468	0.0603	0.0517	0.0508	0.0473
MVN 6	10	0.0648	0.0555	0.0740	0.1328	0.0643	0.0461
	25	0.0694	0.0624	0.0880	0.0832	0.0686	0.0526
	50	0.0658	0.0606	0.0917	0.0638	0.0650	0.0494
	100	0.0706	0.0644	0.0988	0.0590	0.0694	0.0498
	200	0.0678	0.0615	0.1034	0.0527	0.0667	0.0496
MVN 9	10	0.0944	0.0843	0.1173	0.1405	0.0924	0.0573
	25	0.0932	0.0865	0.1276	0.0829	0.0913	0.0502
	50	0.0965	0.0891	0.1414	0.0662	0.0948	0.0519
	100	0.0997	0.0928	0.1425	0.0593	0.0994	0.0498
	200	0.0950	0.0882	0.1531	0.0536	0.0953	0.0494
MVN 45	10	0.0564	0.0467	0.0487	0.1301	0.0547	0.0515
	25	0.0492	0.0444	0.0480	0.0791	0.0491	0.0482
	50	0.0481	0.0445	0.0507	0.0634	0.0482	0.0512
	100	0.0492	0.0447	0.0498	0.0557	0.0497	0.0488
	200	0.0521	0.0468	0.0492	0.0552	0.0509	0.0519

Open in a new tab

Figure 2: — Type I error rate of testing $H_{0} : ρ_{s} = 0$ versus $H_{1} : ρ_{s} > 0$ .

For MVN 1-9, we simulated a range of dependency among uncorrelated $X$ and $Y$ , where MVN 1 has the weakest and MVN 9 has the strongest dependency (Figure 1). The above four tests showed the type I error rate inflation becomes increasingly severe as the dependency increases. This demonstrates the failure in controlling type I error results from the data being dependent, which can occur when the underlying distribution is non-normal.

The MVN 45 is a case where the dependency of original data is remedied by using the ranks. In this case, the ranks will distribute as if it comes from a bivariate normal distribution, regardless of the distance between the centers of individual Gaussian subpopulations. Therefore, all four tests show well control of the type I error rate.

On the other hand, the studentized permutation test robustly control type I error for all distributions examined, even when the $n$ is as small as 10. This demonstrates a clear advantage of the proposed test over all other commonly used tests for Spearman’s correlation coefficient.

We also examined the power of different testing methods when $ρ_{s} = 0.3$ or 0.6 under the scenario of multivariate normal distributions. For other distributions, most of the tests fail to control the type I error except for the proposed test, so the comparison of power is not meaningful. Table 2 shows that the power of the proposed test is generally lower than that of the $t$ test and naive permutation test, but the difference is generally less than 3%. The exception is when $n = 10$ and $ρ_{s} = 0.6$ , where the loss of power is around 7%. However, this difference quickly diminishes when $n$ increases to 25. On the other hand, the proposed test has a comparable power to Fisher’s $Z$ test in almost all scenarios. These results demonstrate that the proposed test achieve a very robust type I error control at the cost of a small decrease in power.

Table 2:

Power of testing $H_{0} : ρ_{s} = 0$ versus $H_{0} : ρ_{s} > 0$ under bivariate normal distributions.

	n	t test	Fisher’s Z	Fisher-Yates	Asymp Norm	Permute	Stu Permute
$ρ_{s} = 0.3$	10	0.2130	0.1848	0.2076	0.3909	0.2087	0.1824
	25	0.4402	0.4202	0.4502	0.5248	0.4397	0.4143
	50	0.7036	0.6861	0.7282	0.7434	0.6996	0.6870
	100	0.9339	0.9264	0.9489	0.9406	0.9305	0.9285
	200	0.9965	0.9961	0.9983	0.9968	0.9964	0.9965
$ρ_{s} = 0.6$	10	0.5821	0.5461	0.5794	0.7599	0.5774	0.5090
	25	0.9376	0.9313	0.9502	0.9587	0.9356	0.9215
	50	0.9981	0.9977	0.9995	0.9986	0.9979	0.9973
	100	> 0.9999	> 0.9999	> 0.9999	> 0.9999	> 0.9999	> 0.9999
	200	> 0.9999	> 0.9999	> 0.9999	> 0.9999	> 0.9999	> 0.9999

Open in a new tab

All the bivariate distributions studies here are symmetric about the origin. However, because the Spearman’s correlation is invariant to monotonic transformations of $X$ and $Y$ , the results covered in our simulation studies can be readily extended to a wide range of asymmetric distributions, e.g. log-normal distributions.

4. Application

4.1. TCGA breast cancer data

As an illustration of our approach, we tested $H_{0} : ρ_{s} = 0$ versus $H_{1} : ρ_{s} > 0$ using The Cancer Genome Atlas (TCGA) breast cancer RNA sequencing (RNA-seq) data. The gene abundance was RSEM normalized (Li and Dewey, 2011). Fibroblast growth factor (FGF)2, FGF4, FGF7 and FGF20 are representative paracrine FGFs binding to heparan-sulfate proteoglycan and fibroblast growth factor receptors (FGFRs), whereas FGF19, FGF21 and FGF23 are endocrine FGFs binding to Klotho and FGFRs. FGFR1 is relatively frequently amplified and overexpressed in breast and lung cancer, and FGFE2 in gastric cancer. Moreover, FGF2 activates human dermal fibroblasts through transcriptional downregulation of the TP53 gene (Katoh, 2016). In this application, we examine whether the transcriptomic abundance of FGFR1 is correlated with that of TP53. To investigate the performance in small sample settings, we selected 18 samples from 17 mucinous carcinoma patients. The scatter plot of log-transformed TP53 and FGFR1 abundances is shown in Figure 3A. The marginal normality of data was examined by Shapiro-Wilk test and the bivariate normality was examined by Henze-Zikler test. The $p$ values of Shapiro-Wilk tests for log-transformed TP53 and FGFR1 abundances are 0.6077 and 0.0644, respectively. The $p$ value of Henze-Zikler test is 0.3478. Although there is no statistical significance, the marginal of FGFR1 abundance likely deviates from normal distribution.

Figure 3: — Scatter plot of log-transformed TP53 versus FGFR1 abundance for TCGA data (A), and age versus −log(PSA) for the PSA data (B).

The estimated Spearman’s correlation is ${\hat{ρ}}_{s} = 0.4056$ . Table 3 shows the results of hypothesis testing. Only the result of studentized permutation test is non-significant at $α = 0.05$ and suggests there is no evidence of positive correlation between TP53 and FGFR1. In fact the biology does not support a positive correlation either, since FGFR1 mediates negative regulation of TP53 by FGF2 at transcriptional level (Katoh, 2016). Indeed, if we include all samples ( $n \approx 1200$ ) from TCGA breast cancer cohort, then all tests will fail to reject $H_{1}$ with $p$ -values over 0.5 except for Fisher-Yates test. Together with the results from the simulations, the result by studentized permutation test is clearly more reliable.

Table 3:

Results of testing $H_{0} : ρ_{s} = 0$ versus $H_{1} : ρ_{s} > 0$ for TCGA breast cancer data and PSA data.

Tests	p value (TCGA)	p value (PSA)
t test	0.047	< 0.001
Fisher’s Z	< 0.001	< 0.001
Fisher-Yates	< 0.001	< 0.001
Asymp Norm	0.039	< 0.001
Permute	0.033	< 0.001
Stu Permute	0.081	< 0.001

Open in a new tab

4.2. PSA data

The testing methods were also applied to a data set of age and baseline prostate-specific antigen (PSA) levels (Sweeney et al., 2015). The data consists of age and PSA levels of 480 subjects, of which 473 have complete paired observations. The sample Spearman’s correlation coefficient between age and PSA is ${\hat{ρ}}_{s} = - 0.1622$ . Since the alternative hypothesis of proposed test is $H_{1} : ρ_{s} > 0$ , we applied a negative log transformation on PSA levels. Similar as the TCGA example, the marginal normality of data was examined by Shapiro-Wilk test, and the bivariate normality was examined by Henze-Zikler test. The $p$ values of Shapiro-Wilk tests for log-transformed age and PSA levels are 0.0208 and 0.0301, respectively. The $p$ value of Henze-Zikler test is < 0.0001. The results indicates the distribution is not bivariate normal. Figure 3B shows the scatter plot of age versus −log(PSA). Table 3 shows that all tests reject the $H_{0}$ and conclude there is a non-zero correlation between age and PSA. The is an example where all tests have consistent results when there is a true correlation. Although the normality tests are significant, such deviation may have been remedied by using the ranks in this specific example.

5. Discussion

Conventional tests of the Spearman’s correlation rely on normality assumption, including $t$ test, Fisher’s $Z$ transformation, and naive permutation test, which fails to control Type I error rates when the assumption is violated. This was illustrated in our simulations studies (Section 3). Such defect cannot be remedied by transforming the marginal distributions such as by Fisher-Yates coefficient. Notably, the deviation from bivariate normality can result in a convergence of type I error rate to an arbitrary level when $n \to \infty$ . This indicates that, under scenarios when two random variables are uncorrelated but dependent, the type I error will not be controlled at desired level no matter how large the sample size is. On the other hand, Serfling’s test based on the delta method guarantees that the type I error rate converges to $α$ as long as the fourth order moment is finite. However, it typically suffers an inflated type I error when sample size is under 50. It should be noted that although we only examined Spearman’s correlation, a similar phenomenon is expected for other rank-based correlation as well, such as Kendall’s $τ$ . The tests based on large sample theories are also anticipated to have inflated type I error when the sample size is small.

We present a robust Spearman’s correlation permutation test based on studentized statistic for testing $H_{0} : ρ_{s} = 0$ versus $H_{1} : ρ_{s} > 0$ . The proposed approach is inspired by the work by DiCiccio and Romano (DiCiccio and Romano, 2017), which was developed for Pearson’s correlation. Through extensive simulation studies and real world application, we show the proposed test controls type I error even when sample size is as small as 10 and normality assumption is violated. Therefore, the test is valid in testing monotonic correlations under more general scenarios of underlying distributions and sample sizes. In addition, the studentized statistic can also be used for bootstrapping tests, so as to test for more general point null hypotheses (Hutson, 2019). In conclusion, the proposed studentized permutation test should be used as a routine for testing non-zero Spearman’s correlation coefficient.

Acknowledgements

This work was supported by Roswell Park Cancer Institute and National Cancer Institute (NCI) grant P30CA016056, NRG Oncology Statistical and Data Management Center grant U10CA180822 and IOTN Moonshot grant U24CA232979-01. The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga. The PSA data example is based on research using information obtained from www.projectdatasphere.org, which is maintained by Project Data Sphere, LLC. Neither Project Data Sphere, LLC nor the owner(s) of any information from the website have contributed to, approved or are in any way responsible for the contents of this publication.

References

Arboretti R, Bonnini S, Corain L, and Salmaso L (2014). A permutation approach for ranking of multivariate populations. Journal of Multivariate Analysis, 132:39–57. [Google Scholar]
Arboretti Giancristofaro R, Bonnini S, and Pesarin F (2009). A permutation approach for testing heterogeneity in two-sample categorical variables. Statistics and Computing, 19(2):209–216. [Google Scholar]
Basso D and Salmaso L (2011). A permutation test for umbrella alternatives. Statistics and Computing, 21(1):45–54. [Google Scholar]
DiCiccio CJ and Romano JP. (2017). Robust permutation tests for correlation and regression coefficients. Journal of the American Statistical Association, 112(519):1211–1220. [Google Scholar]
Edgell SE and Noon SM (1984). Effect of violation of normality on the t test of the correlation coefficient. Psychological bulletin, 95(3):576. [Google Scholar]
Fieller EC, Hartley HO, and Pearson ES (1957). Tests for rank correlation coefficients. i. Biometrika, 44(3/4):470–481. [Google Scholar]
Fisher RA and Yates F (1938). Statistical tables: For biological, agricultural and medical research. Oliver and Boyd. [Google Scholar]
Giancristofaro RA (2014). Permutation solutions for multivariate ranking and testing with applications. Communications in Statistics-Theory and Methods, 43(4):891–905. [Google Scholar]
Giancristofaro RA and Bonnini S (2008). Moment-based multivariate permutation tests for ordinal categorical data. Journal of Non-parametric Statistics, 20(5):383–393. [Google Scholar]
Hutson AD (2019). A robust pearson correlation test for a general point null using a surrogate bootstrap distribution. Plos one, 14(5):e0216287. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hutson AD and Yu H (2021). A robust permutation test for the concordance correlation coefficient. Pharmaceutical Statistics, 20(4):696–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
Institute, S. (2015). Base SAS 9.4 procedures guide. SAS Institute. [Google Scholar]
Katoh M (2016). Fgfr inhibitors: Effects on cancer cells, tumor microenvironment and whole-body homeostasis. International journal of molecular medicine, 38(1):3–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kendall M and Stuart A (1979). The advanced theory of statistics, 2: 240–80. London: Charles Griffin. [Google Scholar]
Li B and Dewey CN (2011). Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC bioinformatics, 12(1):323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marozzi M (2014). Testing for concordance between several criteria. Journal of Statistical Computation and Simulation, 84(9):1843–1850. [Google Scholar]
Moran P (1948). Rank correlation and product-moment correlation. Biometrika, 35(1/2):203–206. [PubMed] [Google Scholar]
Pauly M and Smaga L (2020). Asymptotic permutation tests for coefficients of variation and standardised means in general one-way anova models. Statistical Methods in Medical Research, pages 2733–2748. [DOI] [PubMed] [Google Scholar]
Pearson K (1896). Vii. mathematical contributions to the theory of evolution.—iii. regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, (187):253–318. [Google Scholar]
Pesarin F and Salmaso L (2010a). Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. Journal of Non-parametric Statistics, 22(5):669–684. [Google Scholar]
Pesarin F and Salmaso L (2010b). The permutation testing approach: a review. Statistica, 70(4):481–509. [Google Scholar]
Pesarin F and Salmaso L (2012). A review and some new results on permutation testing for multivariate problems. Statistics and Computing, 22(2):639–646. [Google Scholar]
Pesarin F and Salmaso L (2013). On the weak consistency of permutation tests. Communications in Statistics-Simulation and Computation, 42(6):1368–1379. [Google Scholar]
Salmaso L (2015). Combination-based permutation tests: Equipower property and power behavior in presence of correlation. Communications in Statistics-Theory and Methods, 44(24):5225–5239. [Google Scholar]
Salmaso L and Pesarin F (2010). Permutation tests for complex data: theory, applications and software. John Wiley & Sons. [Google Scholar]
Spearman C (1961). The proof and measurement of association between two things. [DOI] [PubMed] [Google Scholar]
Stanton JM (2001). Galton, pearson, and the peas: A brief history of linear regression for statistics instructors. Journal of Statistics Education, 9(3). [Google Scholar]
Sweeney CJ, Chen Y-H, Carducci M, Liu G, Jarrard DF, Eisenberger M, Wong Y-N, Hahn N, Kohli M, Cooney MM, et al. (2015). Chemohormonal therapy in metastatic hormone-sensitive prostate cancer. New England Journal of Medicine, 373(8):737–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
Team, R. C. et al. (2013). R: A language and environment for statistical computing. [Google Scholar]

[R1] Arboretti R, Bonnini S, Corain L, and Salmaso L (2014). A permutation approach for ranking of multivariate populations. Journal of Multivariate Analysis, 132:39–57. [Google Scholar]

[R2] Arboretti Giancristofaro R, Bonnini S, and Pesarin F (2009). A permutation approach for testing heterogeneity in two-sample categorical variables. Statistics and Computing, 19(2):209–216. [Google Scholar]

[R3] Basso D and Salmaso L (2011). A permutation test for umbrella alternatives. Statistics and Computing, 21(1):45–54. [Google Scholar]

[R4] DiCiccio CJ and Romano JP. (2017). Robust permutation tests for correlation and regression coefficients. Journal of the American Statistical Association, 112(519):1211–1220. [Google Scholar]

[R5] Edgell SE and Noon SM (1984). Effect of violation of normality on the t test of the correlation coefficient. Psychological bulletin, 95(3):576. [Google Scholar]

[R6] Fieller EC, Hartley HO, and Pearson ES (1957). Tests for rank correlation coefficients. i. Biometrika, 44(3/4):470–481. [Google Scholar]

[R7] Fisher RA and Yates F (1938). Statistical tables: For biological, agricultural and medical research. Oliver and Boyd. [Google Scholar]

[R8] Giancristofaro RA (2014). Permutation solutions for multivariate ranking and testing with applications. Communications in Statistics-Theory and Methods, 43(4):891–905. [Google Scholar]

[R9] Giancristofaro RA and Bonnini S (2008). Moment-based multivariate permutation tests for ordinal categorical data. Journal of Non-parametric Statistics, 20(5):383–393. [Google Scholar]

[R10] Hutson AD (2019). A robust pearson correlation test for a general point null using a surrogate bootstrap distribution. Plos one, 14(5):e0216287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Hutson AD and Yu H (2021). A robust permutation test for the concordance correlation coefficient. Pharmaceutical Statistics, 20(4):696–709. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Institute, S. (2015). Base SAS 9.4 procedures guide. SAS Institute. [Google Scholar]

[R13] Katoh M (2016). Fgfr inhibitors: Effects on cancer cells, tumor microenvironment and whole-body homeostasis. International journal of molecular medicine, 38(1):3–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Kendall M and Stuart A (1979). The advanced theory of statistics, 2: 240–80. London: Charles Griffin. [Google Scholar]

[R15] Li B and Dewey CN (2011). Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC bioinformatics, 12(1):323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Marozzi M (2014). Testing for concordance between several criteria. Journal of Statistical Computation and Simulation, 84(9):1843–1850. [Google Scholar]

[R17] Moran P (1948). Rank correlation and product-moment correlation. Biometrika, 35(1/2):203–206. [PubMed] [Google Scholar]

[R18] Pauly M and Smaga L (2020). Asymptotic permutation tests for coefficients of variation and standardised means in general one-way anova models. Statistical Methods in Medical Research, pages 2733–2748. [DOI] [PubMed] [Google Scholar]

[R19] Pearson K (1896). Vii. mathematical contributions to the theory of evolution.—iii. regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character, (187):253–318. [Google Scholar]

[R20] Pesarin F and Salmaso L (2010a). Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. Journal of Non-parametric Statistics, 22(5):669–684. [Google Scholar]

[R21] Pesarin F and Salmaso L (2010b). The permutation testing approach: a review. Statistica, 70(4):481–509. [Google Scholar]

[R22] Pesarin F and Salmaso L (2012). A review and some new results on permutation testing for multivariate problems. Statistics and Computing, 22(2):639–646. [Google Scholar]

[R23] Pesarin F and Salmaso L (2013). On the weak consistency of permutation tests. Communications in Statistics-Simulation and Computation, 42(6):1368–1379. [Google Scholar]

[R24] Salmaso L (2015). Combination-based permutation tests: Equipower property and power behavior in presence of correlation. Communications in Statistics-Theory and Methods, 44(24):5225–5239. [Google Scholar]

[R25] Salmaso L and Pesarin F (2010). Permutation tests for complex data: theory, applications and software. John Wiley & Sons. [Google Scholar]

[R26] Spearman C (1961). The proof and measurement of association between two things. [DOI] [PubMed] [Google Scholar]

[R27] Stanton JM (2001). Galton, pearson, and the peas: A brief history of linear regression for statistics instructors. Journal of Statistics Education, 9(3). [Google Scholar]

[R28] Sweeney CJ, Chen Y-H, Carducci M, Liu G, Jarrard DF, Eisenberger M, Wong Y-N, Hahn N, Kohli M, Cooney MM, et al. (2015). Chemohormonal therapy in metastatic hormone-sensitive prostate cancer. New England Journal of Medicine, 373(8):737–746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Team, R. C. et al. (2013). R: A language and environment for statistical computing. [Google Scholar]

PERMALINK

A robust Spearman correlation coefficient permutation test

Han Yu

Alan D Hutson

Abstract

1. Introduction

2. Methods

2.1. Spearman’s permutation correlation test