dCCA: detecting differential covariation patterns between two types of high-throughput omics data

Hwiyoung Lee; Tianzhou Ma; Hongjie Ke; Zhenyao Ye; Shuo Chen

doi:10.1093/bib/bbae288

. 2024 Jun 18;25(4):bbae288. doi: 10.1093/bib/bbae288

dCCA: detecting differential covariation patterns between two types of high-throughput omics data

Hwiyoung Lee ^1,², Tianzhou Ma ³, Hongjie Ke ⁴, Zhenyao Ye ^5,⁶, Shuo Chen ^7,^8,^9,^✉

PMCID: PMC11184902 PMID: 38888456

Abstract

Motivation

The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes.

Results

We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions.

Availability and Implementation

The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.

Keywords: canonical correlation analysis, differential correlation, bipartite graph, multivariate-to-multivariate, multiomics, RNA gene regulation

Introduction

Multiomics data have recently gained increased attention due to their multifaceted involvement in various aspects of the underlying biological environment. For example, in cancer research, the joint analysis of gene expression and non-coding RNAs (ncRNAs) that are not translated into proteins, including microRNAs (miRNAs), long noncoding RNAs (lncRNAs), and circular RNAs (circRNAs), has become a promising avenue to uncover the pivotal functional role of ncRNAs in cancer. ncRNA may display both tumor suppressive and oncogenic functions, and aberrant expression of ncRNAs can induce abnormal transcriptional regulation in critical tumor-related genes, which ultimately contribute to tumor initiation and progression. Existing studies focused on a few specific ncRNAs and their regulatory roles in a small set of genes without fully utilizing the information from the multiomics data generated by high-throughput technology [1, 2]. Gaining a comprehensive picture of the association between non-coding RNAs and genes at a transcriptome-wide level is imperative to advance our knowledge of cancer pathogenesis. In practical applications, the combined analysis of two types of omics data (such as gene expression and microbiome, or metabolomics and microbiome, among various combinations) offers a novel approach to comprehending the intricacies and interactive nature of biological systems. Despite the potentially valuable findings from multiomics data, the joint analysis of two sets of high-dimensional variables raises computational challenges.

Canonical Correlation Analysis (CCA), originally introduced by [3], is widely used to assess associations between two sets of multivariate data [4, 5]. As common covariation among variables may exist within each set of multivariate data, CCA aims to identify latent factors for both multivariate vectors that maximize the correlations between them. As a popular model to decipher the interactions between two sets of multivariate data, CCA has been widely applied to a wide range of biomedical data analysis [6]. The resulting canonical variables (i.e. factors) by CCA facilitate visualization and effectively reveal associations between two distinct data blocks in a lower dimensional space. Furthermore, they can serve as input features in various tasks, including classification, particularly in situations where the use of the original variables is challenging due to multicollinearity and high dimensionality [7].

Conventionally, CCA is only applicable to multivariate vectors with a dimensionality lower than the sample size due to the singularity of the sample covariance matrices [8]. The recent advances in statistical methods, e.g. various versions of sparse CCA methods (sCCA) [9, 10] have been developed to alleviate this dimensionality constraint by utilizing regularization techniques that ensure algorithmic stability and promote parsimony for enhanced interpretability. However, challenges remain for sCCA methods to identify the underlying differential multivariate-to-multivariate association patterns across clinical groups. For example, the associations between ncRNAs and gene expressions can exhibit variations influenced by factors such as different cancer stages, and subtypes, thereby introducing significant heterogeneity. Neither classic CCA nor sCCA methods can capture the underlying differential covariation patterns [11], which motivates our current research.

To address this unmet need, we propose a new differential Canonical Correlation Analysis (dCCA) method to identify the heterogeneity in multivariate-to-multivariate associations across groups with different clinical or experimental conditions. We propose a novel objective function that maximizes the multivariate-to-multivariate correlations while recognizing inter-group discrepancy. By relaxing multiple constraints imposed on the covariance matrices, we implement the objective function using a subgradient-based algorithm. Additionally, to address the high dimensionality of both data blocks, we introduce a bipartite dense graph-based screening procedure.

The rest of this paper is organized as follows. In Method, we introduce the details of dCCA method and conduct extensive simulation studies to assess its performance by comparing it with competing methods. In Results, we apply the method to data obtained from the Cancer Genome Atlas Program (TCGA) database to explore the association between noncoding RNA and gene expression in kidney cancer. The paper concludes in Conclusion with a discussion.

Method

In this study, we consider a multivariate-multivariate dataset comprising Inline graphic observations. The dataset consists of two high-dimensional data blocks of dimensions and , respectively, denoted by and . We first consider the case where , and for the case where , we resort to a screening procedure (see Screening) to reduce dimensionality. Additionally, a binary group variable Inline graphic , which takes values of either 0 or 1, serves as a moderator, differentiating the association patterns between and . Based on , we divide the complete data into two subsets: and , where the subscripts indicate the corresponding values. Here, and represent the numbers of participants in groups Inline graphic and , respectively. For example, in our data application, represents microRNA (miRNA) data, represents gene expression data, and represents distinct subtypes of kidney cancer, where corresponds to a common subtype and corresponds to a rare subtype.

dCCA (Association analysis)

Our primary objective is two-fold: (i) to assess whether underlying association patterns exist between two sets of high-dimensional variables Inline graphic and by maximally revealing the common patterns and (ii) further to identify differential associations between and for those with vs. . To achieve the first objective, we can employ the classic CCA with the objective function represented as follows:

where Inline graphic and are loading vectors that assign the weights to the original variables in the datasets and , respectively.

To address the second objective, we consider the differences in the canonical correlations between two subgroups categorized by the value of Inline graphic , represented as

This maximizes the discrepancy in association patterns across distinct subsets, allowing us to gain insights into the heterogeneity of the association patterns between subgroups. Therefore, to simultaneously achieve both goals, we propose the dCCA approach with an integrated objective function

We can rewrite the objective function as

(1)

where Inline graphic matrices , , and represent the cross-covariance matrices of the respective pairs of data: , , and . Additionally, , , , , , and denote the covariance matrices of each individual dataset (, , , , , and ).

In summary, both CCA and dCCA aim to identify vectors Inline graphic . The goal of CCA is to maximize the correlation between and for all groups. In contrast, while maintaining this primary objective, dCCA also aims to maximize the difference in correlations between two groups ( vs ), i.e., vs .

The objective function (1) can simultaneously identify the underlying correlation patterns for both groups and highlight the differential correlations between groups. These two terms are linked by a tuning parameter Inline graphic . Thus, plays a crucial role in balancing the classical CCA term and the discrepancy term. Specifically, a higher value of places a stronger emphasis on the between-group discrepancy, whereas a smaller leads to results more similar to the classic CCA. We adopt the commonly used cross-validation strategy to objectively select the optimal Inline graphic [12]. The canonical variables (i.e. and ) in (1) are used to reduce the dimensions for both multivariate vectors and highlight the latent correlation patterns (see Fig. 1C).

The demonstration of dCCA workflow: (A) is the heatmap of marginal correlation matrix between two vectors of high-dimensional variables and ; each row represents a gene expression variable, and each column represents a miRNA variable; (B) shows the heatmap in (A) after a screening step when the dimensionality of and is high; non-informative variables and can be excluded for further analysis; (C) the differential correlation patterns between the two clinical groups are demonstrated in the enlarged heatmaps; we next perform the dCCA analysis on postscreened and to compute and , and (D) illustrates the contrasting results of the canonical variables from and in dCCA vs CCA within the first block. Specifically, dCCA can better identify differential correlation patterns between the two clinical groups; note that the screening step in (B) is not necessary when .

Inline graphic — The demonstration of dCCA workflow: (A) is the heatmap of marginal correlation matrix between two vectors of high-dimensional variables and ; each row represents a gene expression variable, and each column represents a miRNA variable; (B) shows the heatmap in (A) after a screening step when the dimensionality of and is high; non-informative variables and can be excluded for further analysis; (C) the differential correlation patterns between the two clinical groups are demonstrated in the enlarged heatmaps; we next perform the dCCA analysis on postscreened and to compute and , and (D) illustrates the contrasting results of the canonical variables from and in dCCA vs CCA within the first block. Specifically, dCCA can better identify differential correlation patterns between the two clinical groups; note that the screening step in (B) is not necessary when .

Implementation. We numerically optimize the objective function as follows. The numerators involve the cross-covariance between Inline graphic and , capturing the correlation between these two sets of variables and forming a primary focus of our analysis. The denominators, which encompass the normalization of and using their respective covariance matrices, ensure that their contributions are scaled relative to the variability (or covariance) of the data.

By reformulating the above using the constraint form commonly employed in CCA, it can be expressed as

The above objective function retains its focus on maximizing the correlation between linear combinations while incorporating the regularization term that accounts for the difference between two groups. All constraints from the denominators in the original objective function in (1) aim to ensure the loading vectors Inline graphic and possessing unit length within distinct covariance structures associated with subsets of data (, and ). However, optimizing the above while simultaneously satisfying all the constraints is computationally intractable. Following a commonly used numerical optimization strategy by [9] and [13], we relax the constraints by substituting all covariance matrices within the constraints with identity matrices of the same dimensions. Consequently, the modified objective function becomes

(2)

By reformulating the constraint optimization problem in (2) using the Lagrangian function (i.e. Inline graphic ), we develop an optimization algorithm in Algorithm 1.

graphic file with name bbae288fx1.jpg

Screening

When the dimensionality of Inline graphic and is higher than the sample size or non-informative noise presents, we can first conduct a screening step to exclude inactive pairs before implementing the objective function (2). This alleviates computational limitations for CCA and facilitates a more efficient study by narrowing down the focus to a subset of variables of interest. Following [14], many screening methods have been developed across diverse contexts, each tailored to accomplish its specific objectives. For example, [15] developed a screening procedure for two high-dimensional variables. In this research, we introduce a novel graph-based screening process to efficiently identify active pairs of variables between high-dimensional Inline graphic and high-dimensional .

We present the association between Inline graphic and as a bipartite graph, denoted as , where and represent distinct node sets for and , respectively (i.e. , where denote the cardinality of the set), and denotes the edges (i.e. ). Assuming that associations between and are concentrated in highly correlated pairs of nodes rather than occurring across the entire set of pairs, we extract Inline graphic quasi-bicliques which are subsets of pairs of nodes with dense associations and filter out the irrelevant variables (i.e. screening).

Let Inline graphic be the biadjacency matrix with entries , obtained by thresholding the weighted edge matrix of the bipartite graph (e.g. the absolute correlation matrix: ) with the threshold value . Then, the th quasi-biclique consisting of the node set is obtained by optimizing the following objective function:

(3)

where Inline graphic . Note that is the biadjacency matrix of the subgraph induced by the nodes (, ), and is the entry-wise norm (i.e. ). To implement the above screening procedure, we utilize a greedy algorithm [16], and the algorithm’s summary is provided in Algorithm 2 (see details for the algorithm in the Supplementary Material).

graphic file with name bbae288fx2.jpg

The tuning parameter Inline graphic in (3) plays a crucial role in extracting the dense subset . For example, large tends to yield a more parsimonious result, characterized by reduced size and increased density of . We select the optimal in a data-driven manner using the Kullback–Leibler (KL) divergence. Specifically, considering two distinct blocks; (i) the dense block ( Inline graphic ), and (ii) outside the dense block (), where within the dense block, is more likely to be 1 for a pair of variables and with a strong association, while outside the dense block, it is likely to be 0 (i.e. uncorrelated). Therefore, the binarized association strength indicator variable Inline graphic can be assumed to follow a mixture of Bernoulli distribution, i.e. , where . Alternatively, one can consider a reference Bernoulli distribution , assuming that pairs exhibit no clustered patterns, where the dense bipartite graph-based screening is not effective. As the KL divergence quantifies the dissimilarity between the well-modeled distribution Inline graphic (representing the dense pattern), and the naive distribution (i.e. ), it serves as a suitable measure for selecting the tuning parameter . Thus, we select the tuning parameter by maximizing the following KL divergence:

(4)

The Bernoulli distribution parameters (i.e. Inline graphic ) can be estimated using maximum likelihood estimation; see the Supplementary Material for details.

By filtering out non-informative signals (potential noise or weak associations), the remaining quasi-bicliques can better reveal the latent (differential) correlation patterns. Therefore, the screening step can generally reduce the computational cost and improve the estimation accuracy. However, when the dimensions of variables Inline graphic and are moderate (e.g. less than the sample size, i.e. ) and the noise level is low, dCCA can be performed without the screening step.

Simulation

In this section, we conduct simulation studies to evaluate the performance of dCCA with the screening procedure (dCCA_+Screen) and benchmark it with comparable multivariate analysis methods, including sparse CCA (SCCA), sparse PCA (SPCA), and sparse LDA (SLDA, [17] implemented in the sparseLDA package). Both SCCA and SPCA are based on the unified penalized matrix decomposition framework in [9] and are implemented through the PMA package in R. In addition, to assess the effectiveness of the screening procedure, we also use the unscreened version of dCCA as a competing method. For CCA and SPCA, we performed stratified analyses by applying the methods separately to each group. SCCA_Sep and SPCA_Sep denote these separate applications, respectively. Within these methods, subscripts 0 and 1 indicate the groups corresponding to Inline graphic and , respectively.

We simulate the multivariate predictors Inline graphic from a -dimensional multivariate normal distribution (i.e. ). By introducing a binary group label , which serves as a moderator in the association between and , we generate the multivariate response (gene expression) from two different -dimensional multivariate normal distributions: Inline graphic and . Here, and represent the regression coefficients matrices corresponding to subgroups where equals or , respectively. Additionally, we use equal group sizes with .

We set the dimensions to Inline graphic and . Among all possible pairs of associations, we specify the active pairs within two dense blocks: the first block is sized and , and the second block is sized and . Non-zero values are assigned to the entries within these dense blocks of the coefficient matrix, while the remaining entries are set to zero. This configuration is designed to replicate the circuitry commonly observed in RNA gene regulation networks, wherein the RNA-gene pairs within the dense block exhibit concentrated interactions, while the inactive pairs outside the block do not play a role in influencing gene expression through RNA.

Under this multi-dense block configuration, we consider two settings. In the first, the direction (sign) of the association differs depending on Inline graphic , while in the second, the association between and only exists when . In both settings, the non-zero coefficients within are assigned negative values, while the first scenario involves positive coefficients in , and coefficients of 0 in for the second scenario. Our simulation settings are designed to emulate biologically plausible scenarios. In the first scenario, different clinical statuses can lead to contrasting regulatory effects of RNA on gene expression. The second scenario reflects the selective and condition-dependent nature of association in biological systems, where the association between RNA and gene expression is present only under specific clinical statuses.

We assess the performance for all methods by two criteria: (i) variable selection accuracy and (ii) recovery of the differential correlation patterns between groups. Specifically, we evaluate the accuracy of Inline graphic and selection using precision, recall, and score. To assess how precisely the method recovers differential correlation patterns, we calculate the absolute bias (i.e. ) for different groups separately. Here, represents the estimated canonical correlation of the th subgroup, while represents the theoretical correlation under the noiseless case. Thus, in the first and second settings, we have Inline graphic equal to 1 and 0, respectively, while remains fixed in both scenarios. Moreover, we assess whether canonical variables derived from dCCA and comparable methods can distinguish the two groups. We fitted a logistic regression with the group as the outcome and canonical variables as predictors. Performance was evaluated by comparing the Areas Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curves.

Results are summarized in Tables 1 and 2 and displayed in Fig. 2, with 100 replications per simulation setting. Regarding variable selection, dCCA_+Screen accurately identifies non-zero, correlated variables in both data blocks Inline graphic and for both settings (see scores), and outperforms the competing methods. SCCA exhibits a high rate of false positives across all settings, resulting in low precision. SPCA accurately selects variables in but not in . SLDA misses most true variables because it is designed for classification instead of variable selection.

Table 1.

Simulation Results (Setting 1: The direction of the association pattern is opposite between groups): We compare dCCA with the screening procedure (dCCA_+Screen) to dCCA without the screening procedure (dCCA), and three competing methods (sparse CCA (SCCA), sparse LDA (SLDA), and sparse PCA (SPCA)); SCCA_Sep and SPCA_Sep are used to denote these separate applications, respectively; subscripts 0 and 1 denote the groups corresponding to Inline graphic , and , respectively.

Variable selection in ( selection)
Method	Precision	Recall
dCCA_+Screen	0.8582 (0.14)	0.9920 (0.03)	0.9121 (0.11)
SCCA	0.0809 (0.04)	0.1700 (0.09)	0.1094 (0.06)
SCCA_Sep₀	0.2645 (0.03)	0.7173 (0.05)	0.3856 (0.04)
SCCA_Sep₁	0.2590 (0.03)	0.7207 (0.06)	0.3803 (0.04)
SLDA	0.0607 (0.06)	0.0607 (0.06)	0.0607 (0.06)
SPCA	0.0760 (0.05)	0.1560 (0.10)	0.1019 (0.06)
SPCA_Sep₀	0.0771 (0.05)	0.1573 (0.10)	0.1031 (0.06)
SPCA_Sep₁	0.0692 (0.04)	0.1407 (0.09)	0.0925 (0.06)
Variable selection in ( selection)
Method	Precision	Recall
dCCA_+Screen	0.9878 (0.09)	1.0000 (0.00)	0.9899 (0.09)
SCCA	0.2159 (0.04)	0.7547 (0.09)	0.3352 (0.05)
SCCA_Sep₀	0.1548 (0.02)	0.8140 (0.09)	0.2600 (0.03)
SCCA_Sep₁	0.1556 (0.02)	0.8063 (0.10)	0.2607 (0.04)
SLDA	0.1093 (0.05)	0.1093 (0.05)	0.1093 (0.05)
SPCA	1.0000 (0.00)	0.6543 (0.02)	0.7909 (0.01)
SPCA_Sep₀	1.0000 (0.00)	0.6517 (0.02)	0.7889 (0.01)
SPCA_Sep₁	1.0000 (0.00)	0.6507 (0.02)	0.7882 (0.01)
Identifying correlation and classification ()
Method			AUC
dCCA_+Screen	0.0342 (0.02)	0.0352 (0.02)	0.9831 (0.01)
dCCA	0.0938 (0.01)	0.1138 (0.02)	0.9498 (0.01)
SCCA	0.5722 (0.11)	1.4408 (0.1)	0.5601 (0.04)
SCCA_Sep	0.0629 (0.01)	1.9353 (0.02)	0.5433 (0.03)
SLDA	1.0018 (0.07)	0.9951 (0.07)	0.8241 (0.02)
SPCA	1.0043 (0.12)	1.0113 (0.12)	0.5585 (0.04)
SPCA_Sep	1.0300 (0.12)	0.9847 (0.12)	0.5486 (0.03)

Open in a new tab

Table 2.

Simulation Results (Setting 2: The association between Inline graphic and exists only in one clinical group, specifically when ): We compare dCCA with the screening procedure (dCCA_+Screen) to dCCA without the screening procedure (dCCA), and three competing methods (sparse CCA (SCCA), sparse LDA (SLDA), and sparse PCA (SPCA)); SCCA_Sep and SPCA_Sep are used to denote these separate applications, respectively; subscripts 0 and 1 denote the groups corresponding to Inline graphic , and , respectively.

Variable selection in ( selection)
Method	Precision	Recall
dCCA_+Screen	0.8620 (0.09)	0.9680 (0.05)	0.9083 (0.06)
SCCA	0.2564 (0.04)	0.7207 (0.06)	0.3772 (0.04)
SCCA_Sep₀	0.0752 (0.04)	0.1627 (0.09)	0.1026 (0.05)
SCCA_Sep₁	0.2585 (0.03)	0.7193 (0.06)	0.3796 (0.04)
SLDA	0.0627 (0.06)	0.0627 (0.06)	0.0627 (0.06)
SPCA	0.0788 (0.05)	0.1620 (0.10)	0.1057 (0.07)
SPCA_Sep₀	0.0766 (0.05)	0.1567 (0.10)	0.1025 (0.06)
SPCA_Sep₁	0.0690 (0.04)	0.1407 (0.09)	0.0924 (0.06)
Variable selection in ( selection)
Method	Precision	Recall
dCCA_+Screen	0.9994 (0.00)	0.9930 (0.02)	0.9961 (0.01)
SCCA	0.1593 (0.02)	0.8103 (0.09)	0.2661 (0.03)
SCCA_Sep₀	0.0731 (0.03)	0.1587 (0.07)	0.0999 (0.04)
SCCA_Sep₁	0.1559 (0.02)	0.8067 (0.10)	0.2612 (0.04)
SLDA	0.1063 (0.05)	0.1063 (0.05)	0.1063 (0.05)
SPCA	1.0000 (0.00)	0.6520 (0.02)	0.7892 (0.01)
SPCA_Sep₀	0.0765 (0.05)	0.0837 (0.05)	0.0795 (0.05)
SPCA_Sep₁	1.0000 (0.00)	0.6507 (0.02)	0.7882 (0.01)
Identifying correlation and classification ()
Method			AUC
dCCA_+Screen	0.1030 (0.06)	0.0754 (0.03)	0.8648 (0.02)
dCCA	0.5532 (0.34)	1.2009 (0.69)	0.8369 (0.05)
SCCA	0.1798 (0.07)	1.9327 (0.02)	0.8423 (0.02)
SCCA_Sep	0.8091 (0.02)	1.9354 (0.02)	0.7116 (0.02)
SLDA	0.0583 (0.04)	0.9958 (0.07)	0.8237 (0.01)
SPCA	0.0562 (0.04)	1.0188 (0.12)	0.5580 (0.04)
SPCA_Sep	0.0631 (0.05)	0.9889 (0.12)	0.5465 (0.02)

Open in a new tab

Results of simulation studies: (A) ROC curves compare the performance of the methods’ canonical variables in the classification task; the middle and bottom panels display scatter plots of the projected (canonical) variables from different methods for (B) Setting 1 and (C) Setting 2, respectively.

When capturing differential correlation patterns between groups, dCCA_+Screen generally demonstrates the least absolute bias, except when Inline graphic in setting 2, where . In this setting, SPCA achieves the best performance; however, dCCA shows a nearly comparable performance. Note that the satisfactory performance of SLDA and SPCA in this specific case stems from their inherent design, which does not prioritize uncovering association patterns between two multivariate data blocks. Consequently, they consistently yield near-zero correlations in all settings, which leads to a significant bias in every other case. This renders their projected variables lacking meaningful interpretation (see Fig. 2). Due to the absence of addressing group heterogeneity, conventional SCCA produced nearly identical canonical correlations for both groups in setting 1. Furthermore, SCCA cannot distinguish the direction (sign) of the overall association for different groups and generated positively correlated canonical variables when Inline graphic (see (B) in Fig. 2), even though the true underlying correlation is negative. This results in a significant bias (see in Table 1 and Table 2). Performing SCCA and SPCA separately (i.e. SCCA_Sep and SPCA_Sep) for each clinical group misses the underlying differential association patterns between groups, resulting in high correlation estimation biases. In contrast, dCCA_+Screen accurately discerns the underlying correlation between the two groups.

In addition, the canonical variables obtained from dCCA_+Screen achieve the highest AUC in both settings, demonstrating their advantage in the classification task over those derived from other dimension-reduction techniques.

In summary, dCCA method outperforms the benchmark multivariate association analysis models in accurately selecting active pairs of variables and identifying distinct underlying association patterns for different groups. The dCCA-derived canonical variables can also classify groups with improved accuracy.

Assessing robustness of dCCA: We further examine whether dCCA introduces false positive differential correlations when the cross-group differential association pattern is absent. In this setting, we simulate identical regression coefficient matrices, i.e. Inline graphic within the same multi-block structure employed in the previous simulation settings, and assess false positive findings.

The results in Table 3 demonstrate that the false positive rate (FPR) is below 5% for dCCA_+Screen. The results for correlation estimation and variable selection are provided in Table S1 of the Supplementary Material.

Table 3.

FPR; we test the difference in canonical vectors between groups under the test level Inline graphic .

Method	dCCA_+Screen	dCCA	SCCA	SCCA_Sep
FPR	0.04	0.16	0.07	0.37
Method	SLDA	SPCA	SPCA_Sep
FPR	0.04	0.02	0.23

Open in a new tab

Results

We applied our method to Pan-kidney cohort data obtained from TCGA. This cohort offers a wide array of datasets, including gene expression, non-coding RNA (e.g. long noncoding RNAs (lncRNAs) and microRNAs (miRNAs)), along with clinical information (e.g. cancer stage and subtypes), enabling comprehensive research into kidney cancer. In our analysis, we uncover how the association between miRNAs and gene expression is influenced by different cancer subtypes. RNA sequencing was used for miRNA data (in RPM) and gene expression data (in RPKM), both of which were downloaded from LinkedOmics [18]. We conducted data preprocessing steps. Specifically, for the miRNA data, we excluded miRNAs with zero expression across all samples and applied Inline graphic transformation to stabilize variance and make the data more symmetrically distributed. In gene expression data, genes with low expression levels are regarded as uninformative. Therefore, we applied a mean expression cutoff of to filter out such genes, enabling us to prioritize those with robust expression levels. The processed dataset contains a total of Inline graphic observations and has dimensions of and for miRNAs and genes, respectively.

Renal cell carcinoma (RCC) is the predominant form of kidney cancer in adults and is categorized into various subtypes based on histopathological characteristics. In samples from the Pan-Kidney cohort, three subtypes are identified: Clear cell renal cell carcinoma (ccRCC), Papillary renal cell carcinoma (pRCC), and Chromophobe renal cell carcinoma (chRCC). Each of these subtypes exhibits unique cancer progression patterns, genetic traits, and RNA profiles, which, in turn, can impact gene and RNA regulation differently. The first two subtypes (ccRCC and pRCC) are common types, collectively accounting for 85%–95% of RCC cases, while chRCC is a rare subtype that accounts for Inline graphic 5% of all RCC cases. In our analysis, we treat these kidney cancer types as the group variable, assigning to common kidney cancer types (ccRCC and pRCC), and for chRCC, a rare kidney cancer type. The number of subjects in each cancer subtype is for the common subtype and for the rare subtype.

Since both Inline graphic and in this study, we first perform the screening step of dCCA. The screening procedure filters out non-informative pairs of miRNAs and genes and retains 77 miRNA variables and 591 gene variables. These variables comprises three () bipartite blocks (see Fig. 3): Block 1: 43 miRNAs 319 genes; Block 2: 18 miRNAs Inline graphic 227 genes; and Block 3: 16 miRNAs 45 genes. In each block, distinct differential association patterns are present between the two cancer subtypes (see Fig. 3). In Block 1 (upper left corner), miRNA and gene are stronger (positively) correlated in the common cancer type group than in the chRCC group. Block 2 also demonstrates stronger (negatively) correlations for the common cancer type group in comparison to the chRCC group. Contrastingly, in Block 3, the correlations are stronger for the chRCC group compared with the common subtype. We then implement the objective function of dCCA on the filtered data to assess the differentially expressed Inline graphic correlation patterns between groups. The produced canonical variables (see Fig. 4) reflect the differential correlation patterns in three blocks. In Block 1, both groups exhibit positive correlations between canonical variables, with a stronger strength observed in the common subtypes (ccRCC, pRCC) in comparison to chRCC. In Block 2, canonical correlation is negative for the common subtypes, whereas it is close to zero for chRCC. In Block 3, the correlation associated with chRCC is stronger than that of the common subtypes.

(A) Heat map illustrating the difference in the correlation matrix (miRNAs vs. genes) between different subtypes (common vs. rare) within the dense blocks; (B) network plots: in each block, nodes to the left represent the top 10 miRNAs, while nodes to the right represent the top 10 genes; the top 10 miRNAs and genes were chosen based on the summation of the absolute values of elements within each column (miRNA) and row (gene) of the corresponding block in the correlation matrix; the direction (sign) of the association is denoted by different colors (positive: red, negative: blue) in the edges, while the strength of the connection is visualized through both the width and transparency of the edge.

Comparison of scatter plots of canonical variables: miRNA (on the -axis) vs. gene expression (on the -axis) obtained from CCA in the upper panel and dCCA in the lower panels (orange strip); different subtypes of kidney cancer are visually distinguished by color (red for common subtypes (ccRCC and pRCC), and blue for chRCC); Pearson correlation coefficients (R) separately calculated from each subtype and their corresponding -values (p) are given and color-coded similarly as above; in addition, the statistical significance of the difference (*Diff*) in canonical correlations between the two subtypes is tested, and the associated -values are given (in black).

These findings are well aligned with results from prior studies. For example, block 1 identifies miRNA-gene pairs that are tightly connected in ccRCC and pRCC subtypes but loosely connected in the chRCC subtype. We searched two existing databases, miRCancer [19 http://mircancer.ecu.edu] and dbDEMC [20 https://www.biosino.org/dbDEMC/index]. Most miRNAs in Block 1 (e.g. miR-126, miR-145, miR-122) were identified as critical and differentially expressed in the ccRCC subtype but not in the chRCC subtype (see the Supplementary Material). In Block 2, miR-141, a unique miRNA signature in clear cell RCC [21], was found to be associated with critical tumor suppressor genes such as USH1C [22] in common RCC subtypes, while it was not associated in rare RCC subtypes. We also performed pathway analysis on the identified genes and found that several genes in Block 3 (e.g. CD3D, CD3E, CD2, SIT1) were enriched in pathways related to T cell and lymphocyte activation (see the Supplementary Material). dCCA identified miR-150, a miRNA that plays critical roles in lymphocyte development and is significantly associated with RCC survival [23, 24]. It was found to be strongly co-expressed with these genes in the chRCC subtype but weakly in the ccRCC and pRCC subtypes. A stronger miR-150-immune gene regulatory bond in chRCC may explain its utility in prognosis of RCC survival compared with the ccRCC and pRCC subtypes [25]. In addition, miR-223 in Block 3, a cancer–specific survival-related biomarker [26, 27], was found to be associated with genes in chRCC but not in other kidney cancer subtypes.

In comparison, we also apply the classic CCA[4] and sCCA[9] to this dataset. However, neither of the two methods identifies the underlying differential correlation patterns extracted by dCCA. For example, in Fig. 4, the correlations between canonical variables by CCA are almost identical between the two clinical groups, which misses the group differences potentially reflecting differential biological mechanisms.

Conclusion

We have developed a new multivariate-to-multivariate analysis tool, dCCA, to decipher the complex interaction patterns between two types of high-dimensional omics data. We focus on extracting differentially expressed omics-to-omics interaction patterns between clinical groups. dCCA, unlike classic CCA and SCCA methods, more effectively uncovers interaction patterns between two types of omics data that are related to clinical status. Thus, the differential interaction patterns identified by dCCA can help pinpoint potential biomarkers that distinguish the subtypes. This approach may also provide insights into the distinct biological mechanisms that differ between groups. For example, identifying subtype-specific mechanisms may suggest targeted therapeutic strategies for the disease. However, our study remains exploratory rather than confirmatory, future studies and experiments need to be performed to further validate the findings. For example, an in vitro approach by growing cell lines from each cancer type may further validate our findings.

dCCA is computationally efficient and can handle the interactions between thousands-to-thousands variables as the graph-based screening procedure can efficiently filter non-informative features. For validation, we applied dCCA to an additional dataset (breast cancer study in TCGA). See the Supplementary Material for details.

The proposed dCCA method currently focuses on analyzing datasets with binary group variables. Expanding dCCA to accommodate group variables with more than two categories, such as the four molecular subtypes of breast cancer (HER2-enriched, Luminal A, Luminal B, Basal-like), will significantly enhance its utility for analyzing more complex datasets. For example, applying this method to datasets involving patients at different stages of cancer can provide insights into uncovering ordinal trends and dynamically varying association patterns throughout the progression of the disease. We provide a potential two-step solution for handling more than two groups in the Supplementary Material.

Key Points

dCCA deciphers the complex interaction patterns between two types of high-dimensional omics data.
Specifically dCCA extracts differentially expressed omics-to-omics interaction patterns between clinical groups, which can provide insights into the distinct biological mechanisms that differ between groups.
We propose a novel graph-based approach to efficiently identify active variable pairs in two high-dimensional spaces (i.e. and ), outperforming existing methods in accurately selecting variables within both and .

Supplementary Material

supp_material_dCCA_bbae288

supp_material_dcca_bbae288.pdf^{(6.6MB, pdf)}

Author Biographies

Hwiyoung Lee is a postdoctoral research fellow at the Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore.

Tianzhou Ma is an assistant professor at the Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park.

Hongjie Ke is a PhD candidate at the Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park.

Zhenyao Ye is a PhD candidate at the Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore.

Shuo Chen is a professor at the Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore.

Contributor Information

Hwiyoung Lee, Maryland Psychiatric Research Center, School of Medicine, University of Maryland, Baltimore, MD 21201, United States; The University of Maryland Institute for Health Computing (UM-IHC), North Bethesda, MD 20852, United States.

Tianzhou Ma, Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, United States.

Hongjie Ke, Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, United States.

Zhenyao Ye, The University of Maryland Institute for Health Computing (UM-IHC), North Bethesda, MD 20852, United States; Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, MD 21201, United States.

Shuo Chen, Maryland Psychiatric Research Center, School of Medicine, University of Maryland, Baltimore, MD 21201, United States; The University of Maryland Institute for Health Computing (UM-IHC), North Bethesda, MD 20852, United States; Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, MD 21201, United States.

Funding

This work was supported by the National Institutes of Health under Award Number: 1DP1DA048968-01.

Availability and Implementation

The software package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.

Data availability

The miRNA and gene expression data utilized in this study are accessible through the Cancer Genome Atlas Program (TCGA) Pan-kidney cohort via the website https://www.cancer.gov/ccg/research/genome-sequencing/tcga.

References

1. Zhu S, Hailong W, Fangting W. et al. Microrna-21 targets tumor suppressor genes in invasion and metastasis. Cell Res 2008;18:350–9. 10.1038/cr.2008.24. [DOI] [PubMed] [Google Scholar]
2. Bhan A, Soleimani M, Mandal SS. Long noncoding rna and cancer: a new paradigm. Cancer Res 2017;77:3965–81. 10.1158/0008-5472.CAN-16-2634. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Hotelling H. Relations between two sets of variates. Biometrika 1936;28:321–77. 10.1093/biomet/28.3-4.321. [DOI] [Google Scholar]
4. Yang X, Liu W, Liu W. et al. A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 2019;33:2349–68. 10.1109/TKDE.2019.2958342. [DOI] [Google Scholar]
5. Zhuang X, Yang Z, Cordes D. A technical review of canonical correlation analysis for neuroscience applications. Hum Brain Mapp 2020;41:3807–33. 10.1002/hbm.25090. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Jiang M-Z, Aguet F, Ardlie K. et al. Canonical correlation analysis for multi-omics: application to cross-cohort analysis. PLoS Genet 2023;19:1–22. 10.1371/journal.pgen.1010517. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Rousu J, Agranoff DD, Sodeinde O. et al. Biomarker discovery by sparse canonical correlation analysis of complex clinical phenotypes of tuberculosis and malaria. PLoS Comput Biol 2013;9:1–10. 10.1371/journal.pcbi.1003018. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Cao KAL, González I, Déjean S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 2009;25:2855–6. 10.1093/bioinformatics/btp515. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 2009;10:515–34. 10.1093/biostatistics/kxp008. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Lei D, Liu K, Yao X. et al. Detecting genetic associations with brain imaging phenotypes in alzheimer’s disease via a novel structured scca approach. Med Image Anal 2020;61:101656. 10.1016/j.media.2020.101656. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Lei D, Liu F, Liu K. et al. Li Shen, and for the Alzheimer’s Disease Neuroimaging Initiative. Identifying diagnosis-specific genotype-phenotype associations via joint multitask sparse canonical correlation analysis and classification. Bioinformatics 2020;36:i371–9. 10.1093/bioinformatics/btaa434. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, 2nd edn. New York, NY: Springer, 2009, 10.1007/978-0-387-84858-7. [DOI] [Google Scholar]
13. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002;97:77–87. 10.1198/016214502753479248. [DOI] [Google Scholar]
14. Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Series B Stat Methodology 2008;70:849–911. 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Ke H, Ren Z, Qi J. et al. High-dimension to high-dimension screening for detecting genome-wide epigenetic and noncoding RNA regulators of gene expression. Bioinformatics 2022;38:4078–87. 10.1093/bioinformatics/btac518. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Charikar M. Greedy approximation algorithms for finding dense components in a graph. In: Jansen K, Khuller S (eds). Approximation Algorithms for Combinatorial Optimization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, 84–95. 10.1007/3-540-44436-X_10. [DOI] [Google Scholar]
17. Clemmensen L, Witten D, Hastie T. et al. Sparse discriminant analysis. Dent Tech 2011;53:406–13. 10.1198/TECH.2011.08118. [DOI] [Google Scholar]
18. Vasaikar SV, Straub P, Wang J. et al. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res 2017;46:D956–63. 10.1093/nar/gkx1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Xie B, Ding Q, Han H. et al. Hongjin Han, and Di Wu. miRCancer: a microRNA-cancer association database constructed by text mining on literature. Bioinformatics 2013;29:638–44. 10.1093/bioinformatics/btt014. [DOI] [PubMed] [Google Scholar]
20. Feng X, Wang Y, Ling Y. et al. Dbdemc 3.0: functional exploration of differentially expressed mirnas in cancers of human and model organisms. Genomics Proteomics Bioinformatics 2022;20:446–54. Bioinformatics Commons–2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Cairns P. Renal cell carcinoma. Cancer Biomark 2011;9:461–73. 10.3233/CBM-2011-0176. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Chen L, Liu P, Evans TC. et al. Dna damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 2017;355:752–6. 10.1126/science.aai8690. [DOI] [PubMed] [Google Scholar]
23. Hu YZ, Li Q, Wang PF. et al. Multiple functions and regulatory network of mir-150 in b lymphocyte-related diseases. Front Oncol 2023;13:1140813. 10.3389/fonc.2023.1140813. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Chanudet E, Wozniak MB, Bouaoun L. et al. Large-scale genome-wide screening of circulating micrornas in clear cell renal cell carcinoma reveals specific signatures in late-stage disease. Int J Cancer 2017;141:1730–40. 10.1002/ijc.30845. [DOI] [PubMed] [Google Scholar]
25. Garje R, Elhag D, Yasin HA. et al. Comprehensive review of chromophobe renal cell carcinoma. Crit Rev Oncol Hematol 2021;160:103287. 10.1016/j.critrevonc.2021.103287. [DOI] [PubMed] [Google Scholar]
26. Ghafouri-Fard S, Shirvani-Farsani Z, Branicki W. et al. Microrna signature in renal cell carcinoma. Front Oncol 2020;10:596359. 10.3389/fonc.2020.596359. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Kajdasz A, Majer W, Kluzek K. et al. Identification of rcc subtype-specific micrornas-meta-analysis of high-throughput rcc tumor microrna expression data. Cancer 2021;13:548. 10.3390/cancers13030548. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp_material_dCCA_bbae288

supp_material_dcca_bbae288.pdf^{(6.6MB, pdf)}

Data Availability Statement

[ref1] 1. Zhu S, Hailong W, Fangting W. et al. Microrna-21 targets tumor suppressor genes in invasion and metastasis. Cell Res 2008;18:350–9. 10.1038/cr.2008.24. [DOI] [PubMed] [Google Scholar]

[ref2] 2. Bhan A, Soleimani M, Mandal SS. Long noncoding rna and cancer: a new paradigm. Cancer Res 2017;77:3965–81. 10.1158/0008-5472.CAN-16-2634. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] 3. Hotelling H. Relations between two sets of variates. Biometrika 1936;28:321–77. 10.1093/biomet/28.3-4.321. [DOI] [Google Scholar]

[ref4] 4. Yang X, Liu W, Liu W. et al. A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 2019;33:2349–68. 10.1109/TKDE.2019.2958342. [DOI] [Google Scholar]

[ref5] 5. Zhuang X, Yang Z, Cordes D. A technical review of canonical correlation analysis for neuroscience applications. Hum Brain Mapp 2020;41:3807–33. 10.1002/hbm.25090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] 6. Jiang M-Z, Aguet F, Ardlie K. et al. Canonical correlation analysis for multi-omics: application to cross-cohort analysis. PLoS Genet 2023;19:1–22. 10.1371/journal.pgen.1010517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7. Rousu J, Agranoff DD, Sodeinde O. et al. Biomarker discovery by sparse canonical correlation analysis of complex clinical phenotypes of tuberculosis and malaria. PLoS Comput Biol 2013;9:1–10. 10.1371/journal.pcbi.1003018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Cao KAL, González I, Déjean S. integrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 2009;25:2855–6. 10.1093/bioinformatics/btp515. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9. Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 2009;10:515–34. 10.1093/biostatistics/kxp008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. Lei D, Liu K, Yao X. et al. Detecting genetic associations with brain imaging phenotypes in alzheimer’s disease via a novel structured scca approach. Med Image Anal 2020;61:101656. 10.1016/j.media.2020.101656. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Lei D, Liu F, Liu K. et al. Li Shen, and for the Alzheimer’s Disease Neuroimaging Initiative. Identifying diagnosis-specific genotype-phenotype associations via joint multitask sparse canonical correlation analysis and classification. Bioinformatics 2020;36:i371–9. 10.1093/bioinformatics/btaa434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] 12. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, 2nd edn. New York, NY: Springer, 2009, 10.1007/978-0-387-84858-7. [DOI] [Google Scholar]

[ref13] 13. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002;97:77–87. 10.1198/016214502753479248. [DOI] [Google Scholar]

[ref14] 14. Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Series B Stat Methodology 2008;70:849–911. 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15. Ke H, Ren Z, Qi J. et al. High-dimension to high-dimension screening for detecting genome-wide epigenetic and noncoding RNA regulators of gene expression. Bioinformatics 2022;38:4078–87. 10.1093/bioinformatics/btac518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Charikar M. Greedy approximation algorithms for finding dense components in a graph. In: Jansen K, Khuller S (eds). Approximation Algorithms for Combinatorial Optimization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, 84–95. 10.1007/3-540-44436-X_10. [DOI] [Google Scholar]

[ref17] 17. Clemmensen L, Witten D, Hastie T. et al. Sparse discriminant analysis. Dent Tech 2011;53:406–13. 10.1198/TECH.2011.08118. [DOI] [Google Scholar]

[ref18] 18. Vasaikar SV, Straub P, Wang J. et al. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res 2017;46:D956–63. 10.1093/nar/gkx1090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] 19. Xie B, Ding Q, Han H. et al. Hongjin Han, and Di Wu. miRCancer: a microRNA-cancer association database constructed by text mining on literature. Bioinformatics 2013;29:638–44. 10.1093/bioinformatics/btt014. [DOI] [PubMed] [Google Scholar]

[ref20] 20. Feng X, Wang Y, Ling Y. et al. Dbdemc 3.0: functional exploration of differentially expressed mirnas in cancers of human and model organisms. Genomics Proteomics Bioinformatics 2022;20:446–54. Bioinformatics Commons–2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 21. Cairns P. Renal cell carcinoma. Cancer Biomark 2011;9:461–73. 10.3233/CBM-2011-0176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] 22. Chen L, Liu P, Evans TC. et al. Dna damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 2017;355:752–6. 10.1126/science.aai8690. [DOI] [PubMed] [Google Scholar]

[ref23] 23. Hu YZ, Li Q, Wang PF. et al. Multiple functions and regulatory network of mir-150 in b lymphocyte-related diseases. Front Oncol 2023;13:1140813. 10.3389/fonc.2023.1140813. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 24. Chanudet E, Wozniak MB, Bouaoun L. et al. Large-scale genome-wide screening of circulating micrornas in clear cell renal cell carcinoma reveals specific signatures in late-stage disease. Int J Cancer 2017;141:1730–40. 10.1002/ijc.30845. [DOI] [PubMed] [Google Scholar]

[ref25] 25. Garje R, Elhag D, Yasin HA. et al. Comprehensive review of chromophobe renal cell carcinoma. Crit Rev Oncol Hematol 2021;160:103287. 10.1016/j.critrevonc.2021.103287. [DOI] [PubMed] [Google Scholar]

[ref26] 26. Ghafouri-Fard S, Shirvani-Farsani Z, Branicki W. et al. Microrna signature in renal cell carcinoma. Front Oncol 2020;10:596359. 10.3389/fonc.2020.596359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] 27. Kajdasz A, Majer W, Kluzek K. et al. Identification of rcc subtype-specific micrornas-meta-analysis of high-throughput rcc tumor microrna expression data. Cancer 2021;13:548. 10.3390/cancers13030548. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

dCCA: detecting differential covariation patterns between two types of high-throughput omics data

Hwiyoung Lee

Tianzhou Ma

Hongjie Ke

Zhenyao Ye

Shuo Chen

Abstract

Motivation

Results

Availability and Implementation

Introduction

Method

dCCA (Association analysis)

Figure 1.

Screening

Simulation

Table 1.

Table 2.

Figure 2.

Table 3.

Results

Figure 3.

Figure 4.

Conclusion

Key Points

Supplementary Material

Author Biographies

Contributor Information

Funding

Availability and Implementation

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases