Skip to main content
BMC Bioinformatics logoLink to BMC Bioinformatics
letter
. 2016 Nov 15;17:462. doi: 10.1186/s12859-016-1322-0

Comments on: fold change rank ordering statistics: a new method for detecting differentially expressed genes

Doulaye Dembélé 1,2,, Philippe Kastner 1,3
PMCID: PMC5111201  PMID: 27846811

Abstract

We published a new method (BMC Bioinformatics 2014, 15:14) for searching for differentially expressed genes from two biological conditions datasets. The presentation of theorem 1 in this paper was incomplete. We received an anonymous comment about our publication that motivates the present work. Here, we present a complementary result which is necessary from the theoretical point of view to demonstrate our theorem. We also show that this result has no negative impact on our conclusions obtained with synthetic and experimental microarrays datasets.

Keywords: Differentially expressed genes, Fold change, Average of ranks

Background

To search for differentially expressed (DE) genes in profiling studies, we presented a new method based on fold change rank ordering statistics (FCROS). For the derivation of this method, we considered microarrays data from two biological conditions where n probes (genes) were used with m 1 control and m 2 test samples. We performed k pairwise comparisons (k=m 1 m 2) of the data samples and computed fold changes (FC) for each gene. The FCs obtained for each comparison were sorted in increasing order and their corresponding ranks were associated with genes. Hence, we can form a matrix of rank values R with components r ij (i=1,2,…,n,j=1,2,…,k). We noted r i=[r i1 r i2r ik]T the vector of rank values associated with gene i. We noted r¯i, the average of ranks (a.o.r) value for gene i. The value for r¯i varies between a=mini{r¯i} and b=maxi{r¯i}. That allows to associate an unique vector of a.o.r values with the n genes: r¯=[a,(a+δ1),(a+δ1+δ2),,(a+δ1++δn2),b]T where the scalars δ i are the differences between consecutive ordered a.o.r. Without loss of generality, we assumed that the differences δ i have the same value which is approximated by their mean: δ=ban1. Using these notations, we derived a theorem showing a normal distribution for vector r¯ [1]. The content of this theorem was incomplete as shown in the following lemma we received from an anonymous reader.

Lemma 1

Let consider the matrix of rank values R under the assumption that the rank values in each column are all distinct. Assume uniform random sampling without replacement model for the columns of R, i.e. each column of R is an independent draw from the set of all permutations of {1,…,n} with uniform probability 1n! for each permutation. Then, the asymptotic distribution of the unordered vector average of rank (a.o.r.), r=ri=1kj=1kRij,i1n, has a mean n+121n and degenerate variance-covariance matrix Σ(n,n), detΣ=0:

Σ=βααααβααααβααααβ 1

with diagonal element β=n2112, off-diagonal element α=βn1 and 1 n=[1,1,…,1]T.

Proof

Note that for k, the appearance of all elements of the set {1,…,n} in each row of R under the assumed sampling model are equally likely, hence by the weak law of large numbers ([2], page 235) the asymptotic mean is the constant vector 1ni=1ni1n=n+121n. Under the same observation, the asymptotic variance, ∀∈{1,…,n}, is equal to:

Varrkβ=1ni=1nin+122=n2112 2

The asymptotic covariance is computed as a two-index summation over the set {1,…,n} with the restriction that no two indices can be the same since the columns are permutations by construction, hence ∀m∈{1,…,n}:

Covr,rmkα=1nn1i=1nj=1jinin+12jn+12 3
=1nn1i=1nin+122i=1nin+122 4
=1nn1i=1nin+122=βn1. 5

Thus, since Σ 1 n=0, it follows that detΣ=0. □

This lemma shows that the covariance term was missed in our theorem. In the next section, we present a complete version of our theorem using the notations we adopted in [1].

Results

From our notations, we have r¯=[a,a+δ,a+2δ,,a+(n1)δ]T the vector with the a.o.r values. Each component of the vector r¯ can be writen as: R=(a+ℓδ),=0,1,,n1. The theorem 1 in ([1], page 3) should be read as:

Theorem 1

When the number k of the pairwise comparisons grows, the ordered average of ranks (a.o.r.) r¯ have a normal distribution. The mean of this distribution is a+b21n, its variance-covariance matrix has diagonal element n2112δ2 and off-diagonal element n+112δ2, where a and b are the minimum and the maximum of the observed a.o.r., r¯, respectively. δ is the average difference between consecutive ordered a.o.r. r¯.

Proof

From the following definitions:

E{R}=1n=1nRVar(R)=E{R2}E{R}2Cov(R,Rm)m=E{RRm}E{R}2

and using δ=ban1, a component of the mean of the normal distribution is:

E=0n1(a+ℓδ)=1n=0n1(a+ℓδ)=a+n12δ=b+a2. 6

A component of the variance (diagonal element) of the normal distribution matrix is:

Var(R)=E=0n1(a+ℓδ)2a+n12δ2=E=0n1a2+2aδℓ+δ22a+n12δ2=1nna2+2n(n1)2+n(n1)(2n1)6δ2a+n12δ2=n2112δ2. 7

A component of the covariance (off-diagonal element) of the normal distribution matrix is:

Cov(R,Rm)m=E=0n1m=0mn1(a+ℓδ)(a+)a+n12δ2=E=0n1m=0n1a2+aδℓ+aδm+δ2mℓ=0n1(a+ℓδ)2a+n12δ2=1n(n1)n2a2+n2(n1)+n2(n1)24δ2na2n(n1)n(n1)(2n1)6δ2a+n12δ2=n+112δ2. 8

By setting a=δ=1 and b=n in the theorem 1, the mean and the variance-covariance component values are the same as in lemma 1. These setting values for a,b and δ correspond to the case we called ideal situation ([1], page 4).

For the FCROS algorithm, we used the standardized rank value, i.e., each observed rank value is divided by n. The mean and variance-covariance components should be divided by n and n 2 respectively. This leads to a mean component r=12+12n, and a variance-covariance matrix with a diagonal component β=112112n2 and a off-diagonal component α=112112n21n1. Table 1 shows the values for r ,β and α when n increases. For a large value for n, the off-diagonal components of the variance-covariance matrix vanish. Hence, when n is large, a good approximation for the mean and the variance components are 12 and 112, respectively.

Table 1.

Values of the mean, the variance and the covariance components when n increases

n 10 100 1,000 10,000
r 12+5102 12+5103 12+5104 12+5105
β 1128.33104 1128.33106 1128.33108 1128.331010
α −9.17∗10−3 −8.4∗10−4 −8.34∗10−5 −8.33∗10−6

Discussion and conclusions

As shown, the theorem we previously presented was incomplete since the covariance term was missed. The present complementary result is necessary from the theoretical point of view, and we are grateful to the anonymous reader for pointing this out. This result will be useful for small values of n. However, for high throughput biological datasets, n is large, often greater than 10,000 ([1], page 2). For such values of n, the rank deficient variance-covariance matrix of the normal distribution associated with the a.o.r values is near a diagonal matrix. Hence, it is as if the a.o.r values of each gene follow a normal distribution with parameters 12 and 112.

Acknowledgments

We thank the anonymous reader for drawing our attention to this result.

Funding

This work was supported by funds from CNRS, INSERM and University of Strasbourg.

Availability of data and materials

Not Applicable.

Authors’ contributions

DD drafted the paper and performed the analyses. Both authors developed the method and contributed to the manuscript. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not Applicable.

Ethics approval and consent to participate

Not Applicable.

References

  • 1.Dembélé D. Kastner P. Fold change ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinforma. 2014;15(1):14. doi: 10.1186/1471-2105-15-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Feller W. An Introduction to Probability Theory and Its Applications. New York: John Wiley & Sons; 1971. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not Applicable.


Articles from BMC Bioinformatics are provided here courtesy of BMC

RESOURCES