Box 5 | Co-expression Measurements
Gene co-expression is widely used for functional annotation, pathway analysis, and the reconstruction of gene regulatory networks. Co-expression measurements assess the similarity between a pair of gene expression profiles by detecting bivariate associations between them. These co-expression measurements can be summarized in five categories (Kumari et al. 2012; Allen et al. 2012; Song et al. 2012; Wang et al. 2014): |
Correlation
The most widely used co-expression measure is Pearson correlation, due to its straightforward conceptual interpretation and computational efficiency. However, Pearson correlation can only capture linear relationships between variables. Alternatively, Spearman correlation is a nonparametric measure of non-linear associations. Other correlation-based methods include Renyi correlation, Kendall rank correlation, and bi-weight mid-correlation. |
Partial correlation
Partial correlation is used to measure direct relationships between a pair of variables, excluding indirect relationships. Based on Gaussian graphical models, partial correlations infer conditional dependency as the non-zero entries in the precision matrix (the inverse of the covariance matrix). |
Mutual-Information
Mutual information-based methods measure general statistical dependence between two variables. Based on information theory, mutual information does not assume monotonic relationships and hence can capture non-linear dependencies. |
Other measures
Euclidian distance; Cosine similarity; Kullback-Leibler divergence; Hoeffding’s D, distance covariance, and probabilistic measures (as used in Baysian networks). |