Abstract
Identifying the genes responsible for driving cancer is of critical importance for directing treatment. Accordingly, multiple computational tools have been developed to facilitate this task. Due to the different methods employed by these tools, different data considered by the tools, and the rapidly evolving nature of the field, the selection of an appropriate tool for cancer driver discovery is not straightforward. This survey seeks to provide a comprehensive review of the different computational methods for discovering cancer drivers. We categorise the methods into three groups; methods for single driver identification, methods for driver module identification, and methods for identifying personalised cancer drivers. In addition to providing a “one-stop” reference of these methods, by evaluating and comparing their performance, we also provide readers the information about the different capabilities of the methods in identifying biologically significant cancer drivers. The biologically relevant information identified by these tools can be seen through the enrichment of discovered cancer drivers in GO biological processes and KEGG pathways and through our identification of a small cancer-driver cohort that is capable of stratifying patient survival.
Keywords: cancer driver, cancer driver discovery, computational method, coding gene, microRNA
1. Introduction
Identifying cancer driver genes (cancer drivers for short) is vital since these genes play a significant role in the development of cancer. Understanding cancer drivers and their regulatory mechanism is crucial to the design of effective cancer treatments.
Classical methods of identifying cancer driver genes are based on detecting the mutations in the DNA sequences of coding genes in wet-lab experiments. There are many mutation types in the genome such as single-nucleotide variants (SNVs), structural variants (SVs), insertions and deletions (indels), and copy number aberrations (CNAs) 1. These mutations may cause normal cells to transform to tumour cells, resulting in the development of cancer. For example, it has been confirmed that mutations in genes VHL and MET cause kidney cancer 2 and mutations in genes AKT1 and BRCA1 are related to breast cancer 3. However, many mutated genes are not driver genes and may not regulate the progression of cancer. The reason is that not all mutations in the genome contribute to cancer development. Mutations which play a significant role in cancer progression are called driver mutations while mutations which do not have any impact on cancer development are called passenger mutations 4, 5. Genes which bear cancer driver mutations are considered as cancer drivers 6. Nevertheless, some cancer drivers may not contain mutations. For example, genes which may not contain mutations but regulate targets to develop cancer are also considered as cancer driver, e.g. the overexpression of KDM5C decreases p54 expression to enhance the proliferation and invasion of gastric cancer cells and KDM5C is considered as a cancer driver 7. The illustration of cancer drivers and genes with mutations is shown in Figure 1.
Given the complexity of the regulation by cancer drivers and the large number of genes, over twenty thousand, detecting cancer driver genes is challenging with the wet-lab experiments and many computational methods utilising multiple types of genomic data have been developed to reveal cancer drivers and their regulatory mechanism behind the cancer development 8-12. Cancer driver discovery methods are increasingly popular recently because of the fast development of machine learning techniques and significant revolution of DNA sequencing techniques. Taking these advantages, numerous methods have been proposed to detect cancer driver genes. For example, MutSigCV 13 investigates the significance of mutations in genes to predict cancer drivers, OncodriveFM 14 and OncodriveCLUST 15 evaluate the functional influence and clustering of gene mutations respectively, DriverNet 16, MEMo 17, and CBNA 18 examine the role of genes in gene regulatory networks. Due to the large number of the current computational methods for cancer driver discovery, it may take the huge amount of effort for people to find a good resource to know the state-of-the-art methods, and thus a review is necessary and helpful.
There have been previous works 1, 19 reviewing the computational methods for identifying cancer drivers. These reviews only focus on single cancer drivers (i.e. individual genes as cancer drivers) at the population level (i.e. cancer drivers for the whole population of patients of a cancer type). However, it is important to gain mechanistic insight into how cancer drivers work together in driving cancer. Besides, cancer drivers of each patient may be different from others since cancer is a heterogeneous disease, each patient has a different genome and the disease of each patient may be driven by different cancer driver genes. Thus, we also need to consider cancer driver modules and personalised cancer drivers (i.e. cancer drivers for a specific patient). In addition, there are numerous new cancer driver identification methods which have been developed since then. Therefore, it is required to have a more comprehensive review about the current computational methods for identifying cancer drivers.
In this paper, we survey computational methods for discovering both single cancer drivers and cancer driver modules at the population level and the individual level as well. We then analyse the advantages/disadvantages of the current methods and identify challenges of the field. To facilitate the development of new computational methods for cancer driver detection, we survey resources which can be used as tools in conducting cancer driver research and validating predicted cancer drivers (see Section 2 in the Supplement). In addition, with the case study conducted to compare the performance of the current methods in this paper, we believe it will be useful for researchers, who are interested or work in the field, to develop their new methods.
The paper is structured as follows. In Section 2, we review computational methods for identifying single and cancer driver modules from genomic data, including cancer drivers for both the population and individuals. In Section 3, we carry out a case study. We analyse the current methods to identify their advantages and limitations then discuss future directions and challenges of the field in Section 4. Finally, we make recommendations and conclude the paper.
2. Cancer driver discovery methods
The current computational methods use a wide range of genomic data types, including mutations, gene expression, pathways, etc. to discover different types of cancer drivers. Thus, we categorise the methods into various categories and sub-categories. The diagram of the categorisation is shown in Figure 2.
In the categorisation, we differentiate single cancer drivers from modules of cancer drivers. While a single cancer driver is an individual gene which initialises and progresses cancer, a module of cancer drivers is a set of genes which influence their targets to develop a certain cancer. A cancer driver module may include genes which have a mutual exclusivity of mutations 4, 17, 20, or cohesive genes which have a high density of interactions in a gene network 21. We distinguish the two types of cancer drivers since there is evidence showing that multiple genes work in concert to influence their targets in different biological processes 22, thus the roles of single genes and sets of genes in driving cancer may be different. Furthermore, as cancer is a heterogeneous disease, each patient may have a different morphology and clinical outcome. For instance, two patients, who have the same cancer type and receive the same treatment, may experience different outcomes. The reasons can come from the difference of the patients' genome or non-genomic events such as infiltration of immune cells. However, for the research of discovering possible genetic cancer drivers (which has its focus on the difference of the patients' genome), we believe it is reasonable to hypothesise that different patients' diseases could be driven by different driver genes, leading to a strong need to study cancer driver genes specific to an individual patient. Thus, we categorise the current computational methods for cancer driver discovery into three groups, including methods to identify single cancer drivers, methods to identify cancer driver modules, and methods to discover personalised cancer drivers (i.e. cancer drivers for a specific patient). In addition, based on the key techniques used in the methods, we divide single cancer driver identification methods into two sub-groups, including mutation-based methods and network-based methods.
Mutation-based methods use different characteristics of mutations such as mutation significance, functional impact of mutations, location of mutations to discover cancer drivers while network-based methods evaluate the role of genes in biological networks to predict cancer drivers. Most of cancer driver module identification methods use the mutual exclusivity of mutations to identify modules of cancer drivers. We will discuss the detail of the methods in the following sections.
2.1 Methods for identifying single cancer drivers
Most current methods identify single cancer drivers at the population level. In general, they can be grouped in mutation-based methods and network-based methods. Mutation-based methods use the characteristics of mutations (e.g. the significance of mutations in genes, the functional impacts of mutations, the recurrence of mutations in genes, etc.) to identify cancer driver genes while network-based methods use gene networks to assess the role of genes then combine with the mutation information to predict cancer drivers. The general idea of the network-based methods is illustrated in Section 1 of the Supplement. The summary of the single cancer driver identification methods is presented in Table 1.
Table 1.
Method | Description and reference | Additional information |
---|---|---|
Single cancer driver identification | ||
Mutation-based methods (using mutation significance) | ||
MutSigCV | Assesses the significance of mutations in DNA sequencing to discover cancer driver genes 13 | The result includes false positives (i.e. passenger mutations with a high degree) |
Mutation-based methods (using functional impact of mutations) | ||
OncodriveFM | Uses the functional impact of mutations of genes to detect cancer drivers with the hypothesis that any bias of variations with a significantly functional impact in genes can be used to identify candidate driver genes 14 | It can identify driver genes with low mutation recurrence |
OncodriveFML | Uses the functional impact of gene mutations to reveal both coding and non-coding drivers 23 | It is applied to 19 cancer datasets and detects several well-known drivers |
DriverML | Uses the functional impact of mutations to unravel cancer drivers through a supervised machine learning approach 24 | It can be improved if integrating additional well-annotated datasets (e.g. CGC) into the training data |
Mutation-based methods (using structural consequences of gene mutations) | ||
ActiveDriver | Looks at the enrichment of mutations in externally defined regions to uncover cancer driver genes 25 | It only analyses missense mutations while other mutations are also important such as in frame del, frame shift del, etc. |
SGDriver | Uses a Bayes inference statistical framework to incorporate somatic missense mutations into protein-ligand binding-site residues in order to figure out the functional role of the mutations 26 | It can be improved if integrating more mutation types and using molecular network to identify the interacting partners of mutated proteins to expand the candidate pool |
AlloDriver | Maps mutations to allosteric/orthosteric sites derived from the three-dimensional protein structures to detect potentially functional genes/proteins in cancer patients 27 | It also uses only missense mutations |
OncodriveCLUST | Detects cancer genes with a large bias in clustering mutations based on the idea that gain-of-function mutations usually cluster in particular protein sections and these mutations contribute to the development of cancer cells 15 | It cannot identify cancer drivers whose mutations are distributed across the sequence |
Mutation-based methods (others: combining with gene expression, pathways, protein structures) | ||
IntOGen-mutations | Uses somatic mutations, gene expression, and tumour pathways to identify cancer drivers for various tumour types by combining OncodriveFM and OncodriveCLUST 28 | It can discover driver mutations which are distributed across the sequence and have significant functional impacts |
PathScan | Combines genomic mutations with the information of genes in known pathways to uncover cancer driver genes 29 | It can be extended to integrate other types of genetic anomalies |
Sakoparnig et al. | Introduces a computational method to detect genomic alterations with low occurrence frequencies based on mutation timing 30 | It may not discover drivers which are already present at very early cancer stages as we cannot observe a steep rise for them |
CONEXIC | Applies a score-guided search to detect combinations of modulators which reflect the expression of a gene module in a set of tumour samples then it identifies those which have the highest score in amplified or deleted regions 31 | It is mainly bases on copy number aberrations |
ncDriver | Screens non-coding mutations with conservations and cancer specificity to reveal non-coding cancer drivers 32 | It tests both recurrence and distribution of mutations to identify cancer drivers |
HotSpot3D | Identifies spatial hotspots to interpret the function of mutations in the encoded protein 36 | It can detect rare cancer drivers |
3D clusters | Clusters somatic mutations in cancer to identify rare mutations based on 3D protein structures 37 | It is limited due to the lack of complete protein structure data for several genes |
Network-based methods | ||
Vinayagam et al. | Applies controllability analysis on the directed network of human protein-protein interaction to identify disease genes 38 | As it uses a general protein network (i.e. not specific for a cancer type), uncovered drivers are not particular for any cancer type |
CBNA | Identifies coding and miRNA cancer drivers by analysing the controllability of the miRNA-TF-mRNA network and mutation data 18 | It builds the gene network for a specific cancer type, thus the results are for the cancer type of interest |
DriverNet | Uncovers cancer drivers by evaluating the influence of mutations on transcriptional networks in cancer 16 | It relies on a predetermined influence graph which is sparse and incomplete |
Methods for identifying cancer drivers based on gene mutations
Although all the mutation-based methods use mutational impact to identify cancer drivers, different methods have different hypotheses. For example, some methods (e.g. OncodriveFM, DriverML, etc.) hypothesise that mutations with functional impacts may be driver mutations while other methods (e.g. ActiveDriver, SGDriver, etc.) hypothesise that driver mutations may cluster in particular protein sections. Thus, to present the mutation-based methods in a structured way, we have grouped them by considering the mutation information used by these methods. We have divided the methods into four sub-groups, including using the significance of mutations in genes, using the functional impacts of mutations, using structural consequences of gene mutations, and others. Other methods combine the mutation information of genes with gene expression and/or tumour pathways to detect cancer drivers. The details of the methods in the four sub-groups are discussed below.
Using the significance of mutations in genes
MutSigCV 13 is a method to discover cancer drivers by assessing the significance of mutations in genes.
Cancer drivers predicted by MutSigCV are mutated more frequently than expected by chance based on inferred background mutation processes. However, MutSigCV has a limitation since there are still genes which have a high degree of mutations, but these mutations are passenger mutations and do not contribute to the cancer development.
Using the functional impacts of mutations
OncodriveFM 14 uses the functional impact of genomic mutations to detect cancer drivers instead of evaluating the significance of mutations in genes like MutSigCV. OncodriveFM hypothesises that any bias of variations (i.e. mutations) in genes with a significantly functional impact may be an indicator for identifying candidate driver genes. The significant point of this method is that instead of assessing how many mutations a gene has, it evaluates how biased mutations with highly functional impacts are. Thus, it can detect driver genes having mutations with low recurrence, but their mutations play a significant role in the cancer development.
Similar to OncodriveFM, OncodriveFML 23 also uses the functional impact of mutations to discover cancer drivers. However, while OncodriveFM only uses coding gene mutations, OncodriveFML is designed to analyse both coding and non-coding mutations. The OncodriveFML framework is then applied to 19 tumour datasets and uncovers well-known coding drivers like TP53, KEAP1, ARID2, and RUNX1 with high functional impacts. It also identifies non-coding drivers such as MALAT1 and MIAT. In particular, MALAT1 is a lncRNA which has been proved to be involved in lung adenocarcinomas and MIAT is a non-protein-coding transcript related to myocardial infarction.
Another method assessing the functional impact of gene mutations to unravel cancer driver is DriverML 24. Different from OncodriveFM and OncodriveFML, DriverML assumes that the functional impact of mutations is affected by mutation types. Thus, it proposes a method to detect cancer drivers by scoring functional influences of alterations based on mutation types. The method uses various properties to weight the impact of mutation types and it obtains optimised weight parameters by using a supervised machine learning approach with pan-cancer training data.
Using structural consequences of gene mutations
Instead of using the functional impact of mutations like OncodriveFM, OncodriveFML, and DriverML, other methods, such as ActiveDriver 25, SGDriver 26, AlloDriver 27, and OncodriveCLUST 15, identify cancer drivers based on structural consequences of gene mutations. ActiveDriver discovers cancer driver genes by detecting the enrichment of somatic mutations in post-translationally modified sites, including phosphorylation, acetylation, and ubiquitination sites. SGDriver uses a Bayes inference statistical framework to incorporate somatic missense mutations into protein-ligand binding-site residues in order to figure out the functional role of the mutations. AlloDriver maps mutations to allosteric/orthosteric sites derived from the three-dimensional protein structures to detect potentially functional genes/proteins in cancer patients.
OncodriveCLUST is based on the fact that gain-of-function mutations usually cluster in particular protein sections and these mutations contribute to the development of cancer cells. Thus, it detects cancer genes with a large bias in clustering mutations. As this method bases on the mutation clustering, it cannot identify cancer drivers whose mutations are distributed across the sequence. In addition, to have a good result, it requires a large number of observed mutations. Thus, this method should be used to complement results of other methods in detecting cancer drivers.
Others: Combining with gene expression, pathways, protein structures, etc
The platform IntOGen-mutations 28 is developed based on OncodriveFM and OncodriveCLUST to discover cancer drivers for various tumour types. This platform uses somatic mutations, gene expression, and tumour pathways as the input parameters. It takes the advantages of both methods using the functional impact of mutations and methods using the location of mutations by applying OncodriveFM to identify driver genes which are biased significantly toward mutations with high functional impacts and applying OncodriveCLUST to detect driver genes which have mutations highly concentrating in specific regions of proteins.
Also using mutational information in detecting cancer genes, PathScan 29 combines mutations with the information of genes in known pathways. PathScan tests the scenario in which pathway mutations contribute to the development of tumour. Sakoparnig et al. 30 introduce a computational method to detect genomic alterations with low occurrence frequencies based on mutation timing.
Especially, methods such as CONEXIC 31 and ncDriver 32 combine a wide range of data types in order to identify cancer drivers more effectively. In 31, the authors develop a computational framework which uses CNVs and gene expression as the inputs to uncover cancer drivers. The framework is named COpy Number and EXpression in Cancer (CONEXIC). It applies a score-guided search to detect combinations of modulators which reflect the expression of a gene module in a set of tumour samples. Then it identifies those having the highest score in amplified or deleted regions on chromosome. The authors hypothesise that in case the expression of gene A and its copy number are related, the copy number variation likely results in changes in expression of gene A and there is a high probability that A is a driver candidate and it regulates other genes. The authors apply this framework to the dataset of melanoma and detect exactly its known cancer drivers.
In addition, ncDriver 32 identifies non-coding cancer drivers with a two-stage procedure. The first stage is mutational recurrence test which uses mutations (including indels and SNVs) and genomic elements as the inputs to detect elements with mutational recurrence. The second stage is to assess whether mutations of each element have a significant cancer-specific distribution and significant bias for highly conserved positions of each element, then it finds out if the conservation level of mutations is significantly large comparing to the overall conservation distribution. This procedure is applied to the pan-cancer whole-genome dataset to identify cancer drivers and significant non-coding drivers identified by the method are MIR142 lncRNA and XRNU5A-1 sncRNA.
The methods above only rely on mutations with high frequency (i.e. using mutation significance) or low frequency (i.e. combining with functional impacts of mutations, gene expression, etc.). However, according to Nussinov et al. 33-35, the mutations of driver genes can be rare mutations too. Rare mutations are different from high or low frequency mutations as they can be allosteric while high or low frequency mutations locate at active or functional sites. If mutations relieve autoinhibition, they are likely to be drivers although they are rare 34. As a result, rare drivers (i.e. drivers with rare mutations) may not be identified by these methods. Recently, there are some methods developed to identify rare cancer drivers such as HotSpot3D 36 and 3D clusters 37. HotSpot3D is a computational tool to detect three-dimensional (3D) spatial relationships in the encoded protein (i.e. spatial hotspots) and predict the protein function of mutations in the detected hotspots. Using the method, 369 rare drivers such as TP53, PTEN, VHL were detected which are all related to hotspots having potential functional implications. The 3D clusters method uses 3D protein structures to cluster somatic mutations, then it considers recurrence of mutations in clusters of spatially close residues when identifying rare drivers.
Analysis
Although all the methods above base on mutation data to identify cancer drivers, each has a different approach. MutSigCV evaluates the significance of mutations in genes to detect cancer drivers. However, there are still genes which are mutated significantly, but most of their mutations are passenger mutations, which do not progress cancer. Thus, these genes are not cancer driver genes. To eliminate passenger mutations, ActiveDriver, SGDriver, AlloDriver, and OncodriveCLUST consider the location of mutations. Although these methods can reduce the false positives in predicting driver mutations, they may overlook cancer drivers with mutations distributing across the protein since they only evaluate mutations which are concentrated in particular protein sections. Instead of using the location of mutations, other methods use different strategies. For instance, OncodriveFM, OncodriveFML, and DriverML utilise the functional impact of genomic mutations to evaluate the importance of mutated genes to discover cancer drivers. Sakoparnig et al. 30 bases on the timing of mutations, PathScan combines with the pathway data, and CONEXIC combines with the gene expression data. There are also methods which use an integrated approach such as IntOGen-mutations, which considers both the functional impact of mutations and their clustering as well. Furthermore, since mutations in both coding regions and non-coding regions play a significant role in cancer development, cancer drivers can be coding or non-coding elements. Methods like OncodriveFML and ncDriver are developed to detect non-coding cancer drivers.
As these methods evaluate different aspects of mutations to identify cancer drivers, they can detect several validated cancer drivers. The novel cancer drivers identified by these methods are potential and they can be used in wet-lab experiments to confirm their role in cancer progression. However, although these methods can be easily applied to different mutation datasets, mutation databases are incomplete and the applications of these methods are limited.
Methods for identifying cancer drivers based on gene networks
In general, network-based methods evaluate the role of genes in biological networks and then combine with the mutation information of genes to predict cancer drivers. There are three methods in this group, including Vinayagam et al. 38, CBNA 18, and DriverNet 16. The details of these methods are discussed as below.
The details of methods
Vinayagam et al. 38 applies controllability analysis on the directed network, i.e., the network with directed edges, of human protein-protein interaction (PPI). The input network includes nodes which are proteins and edges which are interactions between proteins. The controllability analysis categorises nodes into the three types which are ”indispensable”, “dispensable”, or ”neutral” based on their impact on minimum driver node set (MDS), i.e., the minimum node set driving the whole network. Indispensable nodes are nodes which make the number of MDS increased when the nodes are removed from the network, while dispensable nodes make the number of MDS decreased. The removal of neutral nodes from the network has no effect on the number of driver nodes. Then the study analyses the controllability of perturbated network to identify sensitive indispensable nodes, i.e., indispensable nodes in the original network but not in the perturbated network. These sensitive indispensable nodes are the candidate cancer drivers.
Also inspired by the network controllability, CBNA 18 analyses the controllability of a gene regulatory network to discover cancer drivers. However, the network built by CBNA is a miRNA-TF-mRNA network which consists of microRNAs (miRNAs), Transcription Factors (TFs), and mRNAs. Since this network is constructed from the expression data of miRNAs/mRNAs of cancer patients and the existing gene interaction databases such as PPI 39, miRTarBase 40, and TransmiR 41, it is more reliable and specific to a cancer type. In addition, different from the method of Vinayagam et al. 38, CBNA analyses the network controllability to indicate critical nodes of the network, i.e. nodes increase the number of the minimum node set controlling the whole network if they are removed from the network, then combining with the mutation data to identify cancer drivers. As CBNA uses the miRNA-TF-mRNA network, it can identify both coding and miRNA driver genes. Furthermore, it can also be used to discover drivers for a cancer type or cancer subtype.
Instead of evaluating the controllability of a subset of nodes of a gene network like Vinayagam et al. 38 and CBNA 18, DriverNet 16 considers the influence of mutated genes on other genes in a network.
DriverNet integrates different data types, including genome data (i.e. non-synonym SVNs, indels, and copy number variation), influence graph of biological pathway information, and gene expression. It constructs a bipartite graph of genes to detect the effect of mutated genes on genes which have an outlying expression. The putative drivers are mutated genes which impact on a high number of outlying-expression genes in several patients. The method is applied to four cancer datasets, including glioblastoma, breast, triple negative breast, and serous ovarian, and it reveals various candidate cancer drivers related to transcriptional networks.
Analysis
The three methods above use biological networks to predict single cancer drivers, other methods using networks to discover cancer driver modules or personalised cancer drivers are discussed in Section 2.2 and 2.3 respectively. In general, network-based methods evaluate the role of genes in the whole networks to predict cancer drivers. Various techniques are used to analyse the networks such as network controllability in Vinayagam et al. and CBNA or the influence of genes in DriverNet. These methods can elucidate molecular mechanisms in cancer development at the network level, but they need large datasets to produce reliable results. In addition, the networks used in some methods (i.e. Vinayagam et al. and DriverNet) are not specific to any cancer type, thus they may miss the important information which is specific to a cancer type. Another limitation of network-based methods like DriverNet is predicting genes which affect other genes' expression as cancer drivers, because some cancer drivers may not alter the expression of other genes or other genes accidentally change other genes' expression although they are not cancer drivers.
2.2 Methods for identifying cancer driver modules
Recently, several methods have been developed to discover cancer drivers in modules. Most of the methods identifying cancer driver modules use mutual exclusivity of mutations. Thus, we divide methods for identifying cancer driver modules into two sub-groups: using mutual exclusivity of mutations and others. Other methods use mutations, gene expression, gene network, RNA sequencing, etc. to detect cancer driver modules. The details of methods in the two sub-groups are discussed as below and the summary of the methods is presented in Table 2.
Table 2.
Method | Description and reference | Additional information |
---|---|---|
Cancer driver module identification | ||
Using mutual exclusivity of mutations | ||
CoMEt | Identifies cancer genes by using the exact statistical test to test mutual exclusivity of genomic events and applies techniques to do simultaneous analysis for mutually exclusive alterations 4 | It has a low computational complexity |
WeSME | Discovers cancer drivers by evaluating the mutual exclusivity of mutations of gene pairs 20 | It can only detect driver gene pairs (i.e. only two driver genes in each module) |
MEMo | Analyses mutual exclusivity of mutated genes in subnetworks to identify mutual exclusivity modules in cancer 17 | It depends on the prior biological knowledge of gene interactions |
Others: using mutations, gene expression, gene network | ||
iMCMC | Uses the cancer genomic data including mutations, CNAs, and gene expression from cancer patients to identify mutated core modules in cancer 42 | It provides flexibility by using two input parameters to balance different sources of data |
NetBox | Uses biological networks to assess network modules statistically and identify core pathways in GBM 21 | It is only used for Glioblastoma |
TieDIE | Applies network diffusion to discover the relationship of genomic events and changes in cancer subtypes 43 | It has a high computational cost |
CICERO | Uses RNA sequencing data and extensive annotation to detect driver fusions with a local assembly-based algorithm 44 | It may miss low-expressed gene fusions |
Hamilton et al. | Uses the pan-cancer dataset of TCGA and the miRNA target data of AGO-CLIP to detect a pan-cancer oncogenic miRNA superfamily with a central core seed motif 45 | It discovers a miRNA driver superfamily consisting of miR-17, miR-19, miR-130, miR- 93, miR-18, miR-455 and miR-210 |
Using mutual exclusivity of mutations
CoMEt (the Combinations of Mutually Exclusive Alterations) 4 uses mutual exclusivity technique to detect cancer driver modules. Because different cancer patients have different combinations of genomic alterations which develop the disease, CoMEt detects combinations of alterations (i.e. modules of mutated genes) in the same pathway, which are mutual exclusive across samples. The method uses the exact statistical test to test mutual exclusivity and it does simultaneous analysis for mutually exclusive alterations specific to cancer subtypes. The advantage of this method is that it has a low computational complexity. Similarly, WeSME 20 also assesses the mutual exclusivity of mutations of genes to detect cancer drivers. However, instead of evaluating genes in the same pathway, WeSME only considers gene pairs and the gene pairs whose mutations have a significantly mutual exclusivity are considered as modular candidate cancer drivers.
MEMo (Mutual Exclusivity Modules) 17 applies mutual exclusivity technique in biological networks to identify oncogenic network modules. According to 17, although individual tumours of the same cancer type may have different genomic alterations, these alterations just happen in a restricted number of pathways. In addition, alterations in the same pathway are not likely to exist in the same patient. Based on these, MEMo does correlation analysis and applies statistical tests to detect network modules based on three criteria: (1) genes in a network module are altered across the sample; (2) member genes tend to join into the same biological process; (3) alterations in modules are mutually exclusive. The method is applied to the glioblastoma multiforme (GBM) dataset and detects successfully known network modules, i.e., groups of cancer drivers, in GBM.
Others: Using mutations, gene expression, gene network, RNA sequencing, etc
iMCMC (an approach to identify Mutated Core Modules in Cancer) 42 is developed to uncover groups of genes driving cancer using the cancer genomic data from cancer patients. The method uses somatic mutation, CNV, and gene expression to build a gene network. Then, it identifies coherent subnetworks (modules) from the network through an optimisation model by selecting vertices and edges with high weights. Finally, the significance of subnetworks is assessed by performing a random test and the mutual exclusivity of subnetworks is tested by adopting Markov chain Monte Carlo permutation strategy. The method is applied to the GBM and the ovarian carcinoma (OV) datasets from TCGA. Many discovered core modules are related to known pathways and most of the identified genes are cancer driver genes which are already reported relating to cancer pathogenesis in other research.
NetBox 21 uses biological networks in studying drivers for GBM. It introduces a network-based method to detect oncogenic processes and cancer driver genes. The hypothesis of the approach is that biological networks include multiple functional modules, and tumours target specific functional modules. The method analyses sequence mutations, CNVs, an interaction network including both PPIs and signalling pathways to identify and assess network modules statistically.
Another method to identify cancer driver modules is TieDIE (Tied Diffusion through Interacting Events) 43. TieDIE applies network diffusion to discover the relationship of genomic events and changes in cancer subtypes. The approach collects a subnetwork of PPIs, interactions of genomic perturbations, predicted transcription factor-to-target connections, and transcriptomic states from literature. The method is applied to the breast adenocarcinoma (BRCA) dataset of TCGA and it detects signalling pathways and interlinking genes corresponding to cancer signalling.
CICERO 44 has a different approach in identifying cancer driver modules. It considers gene fusions, the results from genomic structural variations, as drivers which can initialise and develop cancer. Thus, it uses RNA sequencing data and extensive annotation to detect driver fusions with a local assembly-based algorithm.
The methods above identify coding cancer driver modules. However, because non-coding RNAs (e.g. miRNAs) can modulate tumorigenesis by promoting or suppressing specific genes and various cancer types have overlaps in oncogenic pathways, a group of miRNAs which drives or suppresses tumorigenesis in different tumour types may exist. Hamilton et al. 45 use the pan-cancer dataset of TCGA and the miRNA target data of Argonaute Crosslinking Immunoprecipitation (AGO-CLIP) 46-48 to detect pan-cancer miRNA drivers. The idea is that the set of cancer miRNA drivers will modulate tumorigenesis and share a central core seed motif. The result shows that an oncogenic miRNA superfamily, which includes miR-17, miR-18, miR-19, miR-93, miR-130, miR-210, and miR-455, coregulates tumour suppressors through a GUGC core motif.
Analysis
As can be seen from the methods above, most of the methods use mutual exclusivity of mutations to identify cancer driver modules. With this technique, the mutation from only one member in an identified module is enough to trigger cancer progression 20, 49. Thus, the identified drivers in a module may not work together to regulate their targets to drive cancer. However, as discussed above, genes should collaborate to increase their influence on target genes to progress cancer. Therefore, it is necessary to develop novel methods to discover cancer driver groups whose members work in concert to initialise and develop cancer.
2.3 Methods for identifying personalised cancer drivers
The methods discussed in Section 2.1 and 2.2 discover cancer drivers at the population level. Since different patients possess different genomes and their diseases might be driven by different driver genes 50, it is necessary to investigate cancer drivers which are specific to an individual patient (i.e. personalised cancer drivers). There are three methods in this group, including DawnRank 51, SCS 52, and PNC 53. All of them base on gene regulatory networks to predict personalised cancer drivers. The details of these methods are discussed as below, and the summary of the methods is presented in Table 3.
Table 3.
Method | Description and reference | Additional information |
---|---|---|
Personalised cancer driver identification | ||
DawnRank | A ranking framework which applies PageRank to evaluate the impact of genes in an interaction network to detect cancer drivers 51 | It bases on the same gene network for all patients, thus may reduce the personalised information |
SCS | Detects the minimal set of mutated genes controlling the maximal differentially expressed genes as cancer drivers 52 | It builds a gene network for each patient; its application is limited as it requires the corresponding normal sample for each patient |
PNC | Identifies cancer drivers as the minimum gene set which covers all the edges based on a bipartite graph 53 | It also requires the corresponding normal sample for each patient |
The details of methods
A representative of methods for identifying personalised cancer drivers is DawnRank 51. In general, the idea of the method is that mutations in genes which have higher connectivity in an interaction network are more impactful. DawnRank uses the information of gene expression and gene network as the inputs. In particular, it is a ranking framework which applies PageRank 54, 55 to evaluate the impact of genes on the gene network. The impact is presented in terms of network connectivity and the number of downstream genes expressed differentially. The higher the rank of a gene is, the more downstream genes it has effects on in the gene network. Ranks of genes are then combined with somatic alteration data like copy number variations to detect driver alterations. Although DawnRank bases on the same gene regulatory network for all patients, it assesses the impact of genes in each patient using the patient's gene expression data to detect personalised cancer drivers. The algorithm has been applied to TCGA datasets and it shows effectiveness in detecting cancer drivers.
To assess the impact of genes in each patient, DawnRank uses the gene expression data of each patient, but it bases on the same gene regulatory network of all patients. As a result, it may miss important information of gene regulation of each patient. Thus, to detect personalised cancer drivers, SCS 52 builds a gene regulatory network for each patient from the patient's gene expression data and its neighbour's gene expression data (i.e. the corresponding normal sample's gene expression data). SCS detects cancer driver genes as the minimal set of mutated genes which impacts on the maximal differentially expressed genes. Like SCS, PNC 53 also uses the gene expression data of a patient and its neighbour to construct personalised networks. Nevertheless, PNC only selects edges which are different between the tumour and normal state. It then converts the gene regulatory network to a bipartite graph in which, nodes on the top represent genes and nodes on the bottom represent edges. PNC predicts cancer driver genes as the minimum gene set on the top of the bipartite graph which covers all the edges on the bottom.
Analysis
Although these methods can discover personalised cancer drivers, they still have some limitations.
DawnRank bases on the same gene network of all patients. It ignores the network information specific to an individual patient, leading to false positives in its results. On the other hand, SCS and PNC use the genetic data of each patient to construct personalised gene networks. However, they require the genetic data of a pair of samples (i.e. a tumour and its tumour neighbour) but identifying the neighbour of a tumour is challenging and it is not always existing. In addition, these methods only discover coding cancer drivers while non-coding genes (e.g. miRNAs) can also be cancer drivers as discussed above.
3. A comparative study of cancer driver discovery methods
3.1 Performance of methods in identifying cancer drivers
In this section, we present a comparative study to compare the performance of the methods above. As there is not a ground truth to compare the results of methods for discovering cancer driver modules, we only select five methods for identifying single cancer drivers and three methods for identifying personalised cancer drivers for the comparison, including ActiveDriver 25, DawnRank 51, DriverML 24, DriverNet 16, MutSigCV 13, OncodriveFM 14, PNC 53, and SCS 52. These methods represent for different approaches in detecting cancer driver genes. ActiveDriver, DriverML, MutSigCV, and OncodriveFM are mutation-based methods while DawnRank, DriverNet, PNC, and SCS are network-based methods. In addition, DawnRank, PNC, and SCS identify personalised cancer drivers while other five methods identify cancer drivers at the population level. Although DawnRank, PNC, and SCS detect cancer drivers for each patient, they all have a method to aggregate the results of individual patients to predict cancer drivers for the population. Thus, we can compare these three methods with the others. The comparison is performed based on the results of the eight methods in identifying drivers for breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), kidney renal clear cell carcinoma (KIRC), head and neck squamous cell carcinoma (HNSC). We obtain the predicted cancer drivers of the eight methods for the selected five cancer types from 53.
Since there is not a real ground truth for cancer driver discovery (and many other biological researches), it is a common practice to evaluate the findings by computational methods against the information in high quality databases such as the CGC of COSMIC, although the CGC only includes driver genes which are manually curated or predicted by multiple methods. In this comparative study, we also use the CGC as the ground truth to validate the cancer drivers predicted by the methods. The performance of a method is measured using F1Score based on the number of discovered cancer drivers that are validated by the CGC. The F1Score indicates the enrichment ability of discovered cancer drivers in the gold standard (i.e. the CGC) and it is computed based on Precision P and Recall R as shown in Eq. 1. The higher the F1Score a method has, the better the method is.
(1) |
In Eq. 1, P (Precision) shows the fraction of predicted driver genes in the CGC among the predicted driver genes and R (Recall) indicates the fraction of predicted driver genes in the CGC among the driver genes in the CGC. As F1Score is computed from Precision P and Recall R, it will indicate both the ability to predict exactly cancer drivers and the ability to predict many confirmed cancer drivers of a method.
The comparison result is shown in Figure 3 and the details are shown in Table 4. It can be seen that with the four data sets of BRCA, LUAD, LUSC, and KIRC samples, PNC outperforms the other methods and with HNSC, ActiveDriver has the best performance.
Table 4.
No. | Method | BRCA | LUAD | LUSC | KIRC | HNSC |
---|---|---|---|---|---|---|
1 | ActiveDriver | 0.062 | 0.035 | 0.046 | 0.054 | 0.080 |
2 | DawnRank | 0.045 | 0.043 | 0.040 | 0.040 | 0.043 |
3 | DriverML | 0.077 | 0.032 | 0.019 | 0.053 | 0.006 |
4 | DriverNet | NA | NA | 0.016 | 0.030 | NA |
5 | MutSigCV | 0.066 | 0.037 | 0.016 | 0.019 | 0.040 |
6 | OncodriveFM | 0.024 | 0.030 | 0.0101 | 0.016 | 0.046 |
7 | PNC | 0.178 | 0.174 | 0.182 | 0.188 | 0.115 |
8 | SCS | NA | 0.011 | 0.005 | 0.008 | NA |
Given the complexity of cancer, it is not practical to include all cancer types in the comparison, so based upon the availability of data, we have chosen the five data sets such that a sufficiently large comparison between the available tools can be done to identify differences in the performance of the tools, not just between each other but also between cancer types. For example, ActiveDriver outperforms the others in HNSC, whilst PNC is superior in the other four cancer types. A method used for different cancer types is likely to yield different results so regardless of the number of cancer types selected here, it remains incumbent for the user to run their required analyses.
Moreover, to see if the methods detect similar cancer drivers, we compare the results of the five methods used for identifying cancer drivers at the population level (i.e. DriverML, ActiveDriver, DriverNet, MutSigCV, and OncodriveFM). Figure 4 shows the overlap between the validated cancer drivers discovered by each pair of the methods, for each of the five cancer types. It can be seen that there is little overlap among the results of the methods. For example, in breast cancer, only one cancer driver (TP53) is identified by all the five methods, two cancer drivers (CDH1 and PIK3CA) are detected by four methods (DriverML, DriverNet, MutSigCV, and OncodriveFM), and eight cancer drivers (GATA3, NCOR1, PTEN, ARID1A, FOXA1, PIK3R1, CTCF, and ERBB2) are predicted by three methods (see the detailed overlap of the predicted driver genes in Section 3 of the Supplement). As the results of these methods are complementary, they should be used together to maximize the overall performance of the cancer driver prediction. In addition, it should be pointed out that although the CGC is popular in validating cancer drivers in cancer research, it is incomplete in the sense that the database is constantly being updated when new cancer drivers come to light. Therefore, although some of the predicted cancer drivers cannot be validated with existing knowledge, they can be novel cancer drivers which is worth for wet-lab experiments to confirm their roles in progressing cancer.
3.2 Identified cancer drivers enriched significantly in GO biological processes and KEGG pathways
To have a detailed look at the discovered cancer drivers, we take breast cancer as an example for the further analysis. Breast cancer is selected as the breast cancer dataset has the largest number of samples among all the available cancer datasets. We combine all the breast cancer drivers predicted by the five methods (DriverML, ActiveDriver, DriverNet, MutSigCV, and OncodriveFM) at the population level, which results in altogether 509 cancer drivers. Among them, 63 drivers are predicted by at least two of the five methods (see the details of these 63 driver genes in Section 3 of the Supplement). We use Enrichr 56 to do enrichment analysis of these 63 drivers. Table 5 and Table 6 show the GO biological processes and KEGG pathways in which these cancer drivers are significantly enriched (adjusted p-value less than 0.05). Among the 63 driver genes, 16 genes (25.4%) are enriched in 7 GO biological processes and 15 genes (23.8%) are enriched in 26 KEGG pathways related to breast cancer. It indicates that the predicted cancer drivers are closely associated with the biological condition of breast cancer and biologically meaningful.
Table 5.
Term | #Genes | p-value |
---|---|---|
GO:0045598 regulation of fat cell differentiation | 5 | 2.0e-03 |
GO:0045596 negative regulation of cell differentiation | 6 | 3.6e-03 |
GO:0045604 regulation of epidermal cell differentiation | 3 | 1.2e-02 |
GO:0042127 regulation of cell proliferation | 10 | 2.5e-02 |
GO:0045599 negative regulation of fat cell differentiation | 3 | 2.8e-02 |
GO:0045580 regulation of T cell differentiation | 3 | 2.9e-02 |
GO:2000736 regulation of stem cell differentiation | 4 | 3.1e-02 |
Table 6.
Term | #Genes | p-value |
---|---|---|
ErbB signaling pathway | 6 | 5.3e-06 |
Thyroid hormone signaling pathway | 6 | 2.8e-05 |
Sphingolipid signaling pathway | 6 | 3.1e-05 |
Neurotrophin signaling pathway | 6 | 3.0e-05 |
PI3K-Akt signaling pathway | 8 | 1.7e-04 |
AGE-RAGE signaling pathway in diabetic complications | 5 | 1.7e-04 |
HIF-1 signaling pathway | 5 | 1.7e-04 |
FoxO signaling pathway | 5 | 5.1e-04 |
Fc epsilon RI signaling pathway | 4 | 5.2e-04 |
Toll-like receptor signaling pathway | 4 | 2.2e-03 |
TNF signaling pathway | 4 | 2.7e-03 |
Relaxin signaling pathway | 4 | 4.6e-03 |
VEGF signaling pathway | 3 | 5.1e-03 |
Estrogen signaling pathway | 4 | 5.3e-03 |
mTOR signaling pathway | 4 | 7.3e-03 |
Prolactin signaling pathway | 3 | 7.4e-03 |
B cell receptor signaling pathway | 3 | 7.6e-03 |
p53 signaling pathway | 3 | 7.8e-03 |
MAPK signaling pathway | 5 | 1.2e-02 |
T cell receptor signaling pathway | 3 | 1.8e-02 |
Rap1 signaling pathway | 4 | 1.8e-02 |
C-type lectin receptor signaling pathway | 3 | 1.9e-02 |
AMPK signaling pathway | 3 | 2.6e-02 |
Apelin signaling pathway | 3 | 3.5e-02 |
Insulin signaling pathway | 3 | 3.4e-02 |
Phospholipase D signaling pathway | 3 | 4.1e-02 |
3.3 Identified cancer drivers are useful in predicting survival
Since the predicted cancer driver genes likely cause carcinogenesis, they could be used as biomarkers to classify tumours. To explore this concept, we use the predicted drivers to stratify breast cancer patients. Among the 63 predicted cancer drivers above, there are four significant genes, AKT1, PTEN, CDKN1B, and TP53, which are enriched in both GO biological processes and KEGG pathways. For instance, AKT1 are enriched in two GO biological processes and 25 KEGG pathways, PTEN are enriched in two GO biological processes and five KEGG pathways. Thus, we use these four genes for this analysis. In addition, we obtain the BRCA gene expression data and clinical data from 57, and use the Similarity Network Fusion (SNF) method 58, 59, a popular method for discovering the similarities among patients, to cluster cancer patients. The SNF takes expression of these four genes as input and outputs subtypes of cancer patients. We then analyse the survival outcomes of patients in the classified subtypes. The results indicate that the survival level of patients in different classified subtypes are significantly different (p-value = 0.0245) as shown in Figure 5. Furthermore, the clustering display shows the similarity of samples in each identified subtype and the silhouette plot indicates a good clustering with a large average silhouette width (0.76).
4. Gaps and future directions
From the discussion above, we see that there are a wide range of computational methods for identifying cancer drivers from genomic data. In this paper, we categorise the methods into three groups: methods for identifying single cancer drivers (including mutation-based methods and network-based methods), methods for identifying cancer driver modules, and methods for identifying personalised cancer drivers. Although these methods have detected successfully various cancer drivers, there are still several gaps in the research of the field.
Firstly, most of the current methods focus on coding mutations to identify coding cancer drivers while non-coding cancer drivers are not fully examined and the number of methods for identifying non-coding drivers is limited. However, non-coding cancer drivers are important because protein-coding regions account for only around two percent of the human genome. The large part of mutations exist in non-coding regions and these mutations can regulate the expression of genes and drive cancer 60, 61. In addition to the limited number of non-coding cancer driver identification methods, the current methods focus much on non-coding mutations, i.e., correlations of mutations in non-coding elements with other factors like survival 32. Nevertheless, cancer drivers can be non-coding RNAs without mutations, but they can regulate other genes to progress cancer, thus it is required to investigate non-coding RNAs with and without mutations to detect non-coding cancer drivers.
Secondly, some methods have been developed to identify groups of cancer drivers 17, 42, but they are mostly based on mutations to detect mutated modules, called cancer driver modules. Since in a module, the mutation of a member is sufficient to develop cancer, the identified drivers in a module may not in fact work together to regulate their targets to drive cancer. However, there is evidence that some genes work in concert to regulate other genes' expression and influence different biological processes, such as the cooperation of miRNAs in EMT, the transformation of epithelial cells into mesenchymal cells 22, 62. In addition, in some biological processes, the regulation of single genes might not have significant impacts and research has emerged to use wet-lab experiments to investigate the regulatory of group-based regulators in biological processes. All of these highlight the importance of studying biological factors in groups, and computational methods which utilise a variety of data and techniques are in demand for investigating groups of drivers.
Finally, although there have been methods for detecting personalised cancer drivers 51-53, they still have some limitations. Some methods, such as DawnRank, use the gene network of the population to predict personalised cancer drivers. This leads to that they may ignore the information of the gene network specific to an individual patient and they may discover many false positives in their results. Other methods, such as SCS and PNC, use the personal genetic data to build personalised gene networks but they need the genetic data of a sample pair (i.e. a cancer patient and its neighbour in the normal state). The neighbour of a cancer patient is not always existing. Thus, the application of these methods is limited. Furthermore, these methods only detect coding cancer drivers while it is also necessary to identify non-coding cancer drivers as the discussion above. All of these indicate that there is a strong need to develop novel computational methods for detecting personalised coding/non-coding cancer drivers.
5. Recommendation and conclusion
We have investigated a wide range of computational methods for identifying cancer drivers from genomic data. In addition, the advantages and limitations of the surveyed methods are analysed, based on which we identify various opportunities for the development of the research in the field. It is clear that the research in computational approach to cancer driver identification is still in its growth phase. Much more work needs to be done and many opportunities exist in this area. Nevertheless, there are also different challenges in advancing the research in cancer driver identification. Identifying exactly biological factors which drive cancer is quite complicated. Future research needs to focus on both coding and non-coding datasets to identify candidate cancer drivers. To improve the accuracy of the novel computational methods, we should combine different types of data such as gene expression, mutations, and clinical information, etc. to detect cancer drivers.
We have also surveyed available resources which can be used in the research of discovering cancer drivers. The existing resources are plentiful, but they are fragmented. Thus, to utilise cancer data more effectively for the research, it requires to have policies to achieve better data sharing. In addition, another difficulty when developing computational methods for uncovering non-coding cancer drivers is the validation. The reason is that most of the current databases are for coding cancer drivers and there is no one for non-coding cancer drivers. Therefore, we make an urgent call for the building of databases for non-coding drivers given their crucial role in the success of the research in the field.
To evaluate the performance of the current methods in detecting cancer drivers as well as provide an example of the evaluation of cancer driver discovery methods for the researchers who would like to penetrate the field, a comparative study has been conducted. From the results of the experiment in the comparative study, it can be seen that each method can uncover different cancer drivers and the overlaps between the results of the methods are small. Therefore, the methods are complementary, and we should use them together to maximize the effectiveness of cancer driver prediction of the methods. This is also an indicator for the different approaches of the methods and to achieve a significant result, novel methods should combine various resources and techniques in detecting cancer drivers.
In conclusion, although computational methods may never completely replace wet laboratory experiments to validate biological findings, it is widely acknowledged that the predicted drivers by computational methods can be used as candidates for further wet laboratory experiments to confirm their roles in cancer development. While there are numerous computational methods for discovering cancer drivers now, there exist various gaps and opportunities for advancing the research of the field. However, due to the complexity of cancer initialisation and development, identifying cancer drivers faces many challenges. Through this paper, we hope that we can help researchers who are interested in the field to establish a solid background and motivate them to tackle the current challenges.
Supplementary Material
Acknowledgments
This research is supported by the ARC DECRA (No: 200100200) and the Australian Research Council Discovery Grant (No: DP170101306). It is also supported by the Australian Government Research Training Program (RTP) Scholarship and the Vice Chancellor & President's Scholarship offered by the University of South Australia.
Abbreviations
- AGCOH
Atlas of Genetics and Cytogenetics in Oncology and Haematology
- AGO-CLIP
Argonaute Crosslinking Immunoprecipitation
- BRCA
breast adenocarcinoma
- CCLE
Cancer Cell Line Encyclopedia
- CNA
copy number aberration
- CoMEt
Combinations of Mutually Exclusive Alterations
- CONEXIC
COpy Number and EXpression In Cancer
- DGIdb
Drug-Gene Interaction database
- GBM
glioblastoma multiforme
- GDC
Genomic Data Commons
- HNSC
head and neck squamous cell carcinoma
- iMCMC
identify Mutated Core Modules in Cancer
- indels
insertions and deletions
- KIRC
kidney renal clear cell carcinoma
- LUAD
lung adenocarcinoma
- LUSC
lung squamous cell carcinoma
- MEMo
Mutual Exclusivity Modules
- MDS
minimum driver node set
- miRNA
microRNA
- NCG
Network of Cancer Genes
- OV
ovarian carcinoma
- PPI
protein-protein interaction
- SNF
Similarity Network Fusion
- SNV
single-nucleotide variant
- SV
structural variant
- TF
Transcription Factor
- TieDIE
Tied Diffusion Through Interacting Events
References
- 1.Dimitrakopoulos CM, Beerenwinkel N. Computational approaches for the identification of cancer genes and pathways. Wiley Interdiscip Rev Syst Biol Med. 2017;9(1):e1364. doi: 10.1002/wsbm.1364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Linehan WM, Srinivasan R, Schmidt LS. The genetic basis of kidney cancer: a metabolic disease. Nat Rev Urol. 2010;7:277. doi: 10.1038/nrurol.2010.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stephens PJ, Tarpey PS, Davies H, Van Loo P, Greenman C, Wedge DC. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature. 2012;486:400. doi: 10.1038/nature11017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Leiserson MD, Wu HT, Vandin F, Raphael BJ. CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer. Genome Biol. 2015;16(1):160. doi: 10.1186/s13059-015-0700-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vandin F. Computational methods for characterizing cancer mutational heterogeneity. Front Genet. 2017;8:83. doi: 10.3389/fgene.2017.00083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tokheim CJ, Papadopoulos N, Kinzler KW, Vogelstein B, Karchin R. Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci U S A. 2016;113(50):14330–14335. doi: 10.1073/pnas.1616440113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Xu L, Wu W, Cheng G, Qian M, Hu K, Yin G. et al. Enhancement of proliferation and invasion of gastric cancer cell by KDM5C via decrease in p53 expression. Technol Cancer Res Treat. 2017;16(2):141–149. doi: 10.1177/1533034616629261. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 8.Luigi F, Rita M, Mario A, Giulia R, Alessandro L, Pisanu ME. et al. miR-579-3p controls melanoma progression and resistance to target therapy. Proc Natl Acad Sci U S A. 2016;113(34):E5005. doi: 10.1073/pnas.1607753113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gasparini P, Cascione L, Landi L, Carasi S, Lovat F, Tibaldi C. et al. microRNA classifiers are powerful diagnostic/prognostic tools in ALK-, EGFR-, and KRAS-driven lung cancers. Proc Natl Acad Sci U S A. 2015;112(48):14924. doi: 10.1073/pnas.1520329112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Papaemmanuil E, Gerstung M, Bullinger L, Gaidzik VI, Paschka P, Roberts ND. et al. Genomic classification and prognosis in acute myeloid leukemia. N Engl J Med. 2016;374(23):2209–2221. doi: 10.1056/NEJMoa1516192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rassenti LZ, Balatti V, Ghia EM, Palamarchuk A, Tomasello L, Fadda P. et al. MicroRNA dysregulation to identify therapeutic target combinations for chronic lymphocytic leukemia. Proc Natl Acad Sci U S A. 2017;114(40):10731. doi: 10.1073/pnas.1708264114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Pham VVH, Zhang J, Liu L, Truong B, Xu T, Nguyen TT. et al. Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction. BMC Bioinformatics. 2019;20(1):143. doi: 10.1186/s12859-019-2668-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gonzalez-Perez A, Lopez-Bigas N. Functional impact bias reveals cancer drivers. Nucleic Acids Res. 2012;40(21):e169–e169. doi: 10.1093/nar/gks743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tamborero D, Gonzalez-Perez A, Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013;29(18):2238–44. doi: 10.1093/bioinformatics/btt395. [DOI] [PubMed] [Google Scholar]
- 16.Bashashati A, Haffari G, Ding J, Ha G, Lui K, Rosner J. et al. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol. 2012;13(12):R124–R124. doi: 10.1186/gb-2012-13-12-r124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ciriello G, Cerami E, Sander C, Schultz N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 2012;22(2):398–406. doi: 10.1101/gr.125567.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pham VVH, Liu L, Bracken CP, Goodall GJ, Long Q, Li J. et al. CBNA: A control theory based method for identifying coding and non-coding cancer drivers. PLoS Comput Biol. 2019;15(12):e1007538. doi: 10.1371/journal.pcbi.1007538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Nussinov R, Jang H, Tsai CJ, Cheng F. Review: Precision medicine and driver mutations: Computational methods, functional assays and conformational principles for interpreting cancer drivers. PLoS Comput Biol. 2019;15(3):e1006658–e1006658. doi: 10.1371/journal.pcbi.1006658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kim YA, Madan S, Przytycka TM. WeSME: uncovering mutual exclusivity of cancer drivers and beyond. Bioinformatics. 2017;33(6):814–821. doi: 10.1093/bioinformatics/btw242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cerami E, Demir E, Schultz N, Taylor BS, Sander C. Automated network analysis identifies core pathways in glioblastoma. PLoS One. 2010;5(2):e8918. doi: 10.1371/journal.pone.0008918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Cursons J, Pillman KA, Scheer K, Gregory P, Foroutan M, Zadeh S, Post-transcriptional control of EMT is coordinated through combinatorial targeting by multiple microRNAs. bioRxiv. 2017.
- 23.Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A, L´opez-Bigas N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 2016;17(1):128. doi: 10.1186/s13059-016-0994-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Han Y, Yang J, Qian X, Cheng WC, Liu SH, Hua X. et al. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res. 2019;47(8):e45–e45. doi: 10.1093/nar/gkz096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Reimand J, Bader GD. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol Syst Biol. 2013;9:637–637. doi: 10.1038/msb.2012.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhao J, Cheng F, Wang Y, Arteaga CL, Zhao Z. Systematic prioritization of druggable mutations in 5000 genomes across 16 cancer types using a structural genomics-based approach. Mol Cell Proteomics. 2016;15(2):642–656. doi: 10.1074/mcp.M115.053199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Song K, Li Q, Gao W, Lu S, Shen Q, Liu X. et al. AlloDriver: a method for the identification and analysis of cancer driver targets. Nucleic Acids Res. 2019;47(W1):W315–w321. doi: 10.1093/nar/gkz350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat Methods. 2013;10:1081. doi: 10.1038/nmeth.2642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wendl MC, Wallis JW, Lin L, Kandoth C, Mardis ER, Wilson RK. et al. PathScan: a tool for discerning mutational significance in groups of putative cancer genes. Bioinformatics. 2011;27(12):1595–602. doi: 10.1093/bioinformatics/btr193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sakoparnig T, Fried P, Beerenwinkel N. Identification of constrained cancer driver genes based on mutation timing. PLoS Comput Biol. 2015;11(1):e1004027. doi: 10.1371/journal.pcbi.1004027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC. et al. An integrated approach to uncover drivers of cancer. Cell. 2010;143(6):1005–1017. doi: 10.1016/j.cell.2010.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hornshoj H, Nielsen MM, Sinnott-Armstrong NA, Switnicki M, Juul M, Madsen T. et al. Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival. NPJ Genom Med. 2018;3(1):1. doi: 10.1038/s41525-017-0040-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Nussinov R, Tsai CJ, Jang H. Why are some driver mutations rare? Trends Pharmacol Sci. 2019;40(12):919–929. doi: 10.1016/j.tips.2019.10.003. [DOI] [PubMed] [Google Scholar]
- 34.Nussinov R, Tsai CJ, Jang H. Autoinhibition can identify rare driver mutations and advise pharmacology. The FASEB Journal. 2020;34(1):16–29. doi: 10.1096/fj.201901341R. [DOI] [PubMed] [Google Scholar]
- 35.Nussinov R, Jang H, Tsai CJ, Cheng F. Precision medicine review: rare driver mutations and their biophysical classification. Biophys Rev. 2019;11(1):5–19. doi: 10.1007/s12551-018-0496-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J. et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat Genet. 2016;48(8):827–37. doi: 10.1038/ng.3586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gao J, Chang MT, Johnsen HC, Gao SP, Sylvester BE, Sumer SO. et al. 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Med. 2017;9(1):4. doi: 10.1186/s13073-016-0393-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Vinayagam A, Gibson TE, Lee HJ, Yilmazel B, Roesel C, Hu Y. et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc Natl Acad Sci U S A. 2016;113(18):4976. doi: 10.1073/pnas.1603992113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Vinayagam A, Stelzl U, Foulle R, Plassmann S, Zenkner M, Timm J. et al. A directed protein interaction network for investigating intracellular signal transduction. Sci Signal. 2011;4(189):rs8. doi: 10.1126/scisignal.2001699. [DOI] [PubMed] [Google Scholar]
- 40.Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, Lee WH. et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44(D1):D239–D247. doi: 10.1093/nar/gkv1258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang J, Lu M, Qiu C, Cui Q. TransmiR: a transcription factor-microRNA regulation database. Nucleic Acids Res. 2010;38(Database issue):D119–22. doi: 10.1093/nar/gkp803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhang J, Zhang S, Wang Y, Zhang XS. Identification of mutated core cancer modules by integrating somatic mutation, copy number variation, and gene expression data. BMC Syst Biol. 2013;7(Suppl 2):S4. doi: 10.1186/1752-0509-7-S2-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Paull EO, Carlin DE, Niepel M, Sorger PK, Haussler D, Stuart JM. Discovering causal pathways linking genomic events to transcriptional states using tied diffusion through interacting Events (TieDIE) Bioinformatics. 2013;29(21):2757–64. doi: 10.1093/bioinformatics/btt471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Tian L, Li Y, Edmonson MN, Zhou X, Newman S, McLeod C. et al. CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biol. 2020;21(1):126. doi: 10.1186/s13059-020-02043-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hamilton MP, Rajapakshe K, Hartig SM, Reva B, McLellan MD, Kandoth C. et al. Identification of a pan-cancer oncogenic microRNA superfamily anchored by a central core seed motif. Nat Commun. 2013;4:2730. doi: 10.1038/ncomms3730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chi SW, Zang JB, Mele A, Darnell RB. Ago HITS-CLIP decodes miRNA-mRNA interaction maps. Nature. 2009;460(7254):479–486. doi: 10.1038/nature08170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010;141(1):129–141. doi: 10.1016/j.cell.2010.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hafner M, Lianoglou S, Tuschl T, Betel D. Genome-wide identification of miRNA targets by PAR-CLIP. Methods. 2012;58(2):94–105. doi: 10.1016/j.ymeth.2012.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pham VVH, Liu L, Bracken CP, Goodall GJ, Li J, Le TD. DriverGroup: a novel method for identifying driver gene groups. Bioinformatics. 2020;36(Supplement 2i):583–591. doi: 10.1093/bioinformatics/btaa797. [DOI] [PubMed] [Google Scholar]
- 50.Pham VVH, Liu L, Bracken CP, Nguyen T, Goodall GJ, Li J, pDriver: A novel method for unravelling personalised coding and miRNA cancer drivers. bioRxiv. 2020. 2020. 04.23.058727. [DOI] [PubMed]
- 51.Hou JP, Ma J. DawnRank: discovering personalized driver genes in cancer. Genome Med. 2014;6(7):56. doi: 10.1186/s13073-014-0056-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Guo WF, Zhang SW, Liu LL, Liu F, Shi QQ, Zhang L. et al. Discovering personalized driver mutation profiles of single samples in cancer by network control strategy. Bioinformatics. 2018;34(11):1893–1903. doi: 10.1093/bioinformatics/bty006. [DOI] [PubMed] [Google Scholar]
- 53.Guo WF, Zhang SW, Zeng T, Li Y, Gao J, Chen L. A novel network control model for identifying personalized driver genes in cancer. PLoS Comput Biol. 2019;15(11):e1007520. doi: 10.1371/journal.pcbi.1007520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst. 1998;30(1-7):107–117. [Google Scholar]
- 55. Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web; 1998.
- 56.Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90–7. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zhang J, Pham V, Liu L, Xu T, Truong B, Li J. et al. Identifying miRNA synergism using multiple-intervention causal inference. BMC Bioinformatics. 2019;20(23):613. doi: 10.1186/s12859-019-3215-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–337. doi: 10.1038/nmeth.2810. [DOI] [PubMed] [Google Scholar]
- 59.Xu T, Le TD, Liu L, Su N, Wang R, Sun B. et al. CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization. Bioinformatics. 2017;33(19):3131–3133. doi: 10.1093/bioinformatics/btx378. [DOI] [PubMed] [Google Scholar]
- 60.Puente XS, Bea S, Valdes-Mas R, Villamor N, Gutierrez-Abril J, Martin-Subero JI. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature. 2015;526:519. doi: 10.1038/nature14666. [DOI] [PubMed] [Google Scholar]
- 61.Weinhold N, Jacobsen A, Schultz N, Sander C, Lee W. Genome-wide analysis of non-coding regulatory mutations in cancer. Nat Genet. 2014;46(11):1160–1165. doi: 10.1038/ng.3101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lamouille S, Xu J, Derynck R. Molecular mechanisms of epithelial-mesenchymal transition. Nat Rev Mol Cell Biol. 2014;15(3):178–196. doi: 10.1038/nrm3758. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.