Skip to main content
. 2021 Feb 1;37(14):1982–1989. doi: 10.1093/bioinformatics/btab041

Fig. 1.

Fig. 1.

Pipeline to construct networks from gene expression data using signed distance correlation. (A) We pre-process the input matrix M with raw gene expression data using quantile normalization and setting the lowest 20% values from each sample to the minimum value in M to obtain M*. (B) We compute the expression distance matrices Y˜(i) and Y˜(j) for each gene i,j{1,,m}, and we double center them to obtain Y(i) and Y(j). (C) We compute the distance correlation matrix D, whose entries Di,j are the positive root of the Pearson correlation between Y(i) and Y(j), for every pair of genes. (D) We compute the Pearson correlation between each pair of rows in the M* to obtain the Pearson correlation matrix P. (E) To construct the signed distance correlation matrix S, we multiply every distance correlation between the expression of two genes Di,j by the sign of their Pearson correlation sign(Pi,j). (F) Using COGENT (Bozhilova et al., 2020), we find the optimal threshold θ* that produces the most self-consistent network AS(θ*) from S. (F′) Analogously, we find the optimal threshold θ to generate the network AP(θ) from P; this step is not part of the pipeline and only necessary to be able to compare Pearson and signed distance correlation networks.