Adaptation of a mutual exclusivity framework to identify driver mutations within oncogenic pathways

Xinjun Wang; Caroline Kostrzewa; Allison Reiner; Ronglai Shen; Colin Begg

doi:10.1016/j.ajhg.2023.12.009

. 2024 Jan 16;111(2):227–241. doi: 10.1016/j.ajhg.2023.12.009

Adaptation of a mutual exclusivity framework to identify driver mutations within oncogenic pathways

Xinjun Wang ^1,^∗, Caroline Kostrzewa ¹, Allison Reiner ¹, Ronglai Shen ¹, Colin Begg ^1,^∗∗

PMCID: PMC10870134 PMID: 38232729

Summary

Distinguishing genomic alterations in cancer-associated genes that have functional impact on tumor growth and disease progression from the ones that are passengers and confer no fitness advantage have important clinical implications. Evidence-based methods for nominating drivers are limited by existing knowledge on the oncogenic effects and therapeutic benefits of specific variants from clinical trials or experimental settings. As clinical sequencing becomes a mainstay of patient care, applying computational methods to mine the rapidly growing clinical genomic data holds promise in uncovering functional candidates beyond the existing knowledge base and expanding the patient population that could potentially benefit from genetically targeted therapies. We propose a statistical and computational method (MAGPIE) that builds on a likelihood approach leveraging the mutual exclusivity pattern within an oncogenic pathway for identifying probabilistically both the specific genes within a pathway and the individual mutations within such genes that are truly the drivers. Alterations in a cancer-associated gene are assumed to be a mixture of driver and passenger mutations with the passenger rates modeled in relationship to tumor mutational burden. We use simulations to study the operating characteristics of the method and assess false-positive and false-negative rates in driver nomination. When applied to a large study of primary melanomas, the method accurately identifies the known driver genes within the RTK-RAS pathway and nominates several rare variants as prime candidates for functional validation. A comprehensive evaluation of MAGPIE against existing tools has also been conducted leveraging the Cancer Genome Atlas data.

Keywords: mutual exclusivity, driver mutations, oncogenic pathway, RTK-RAS

Cancer is known to arise after a cell experiences multiple driver mutations that allow it to grow uncontrollably and ultimately metastasize to distant anatomic sites. In this paper, we introduce a computational approach for identifying driver mutations by leveraging the mutual exclusivity of genes and variants within oncogenic pathways.

Introduction

It is now well known that cancer is a genetic disease that develops through the accumulation of somatic mutations. When individual tumors are subjected to mutation analysis, countless mutations are identified. A major challenge is to identify genes that carry driver mutations, the ones that are pivotal in producing uncontrolled tumor growth. Such genes are known as driver genes. A number of computational methods and tools have been developed, falling within several overarching categories. MutSigCV,¹ DriverML,² ActiveDriver,³ and OncodriveFML⁴ are frequency-based approaches that target genes exhibiting mutation rates surpassing anticipated levels; OncoDriverCLUST⁵ and MSEA⁶ operate as hotspot-based methods, excelling in the detection of gain-of-function mutations within specific protein domains; DawnRank⁷ and DriverNet⁸ are network-based methods aiming to uncover clusters of driver genes through leveraging prior knowledge of pathways, proteins, or genetic interactions.

Another major technical concept that has influenced research in this field is mutual exclusivity. If somatic mutations in two (or more) genes tend not to occur together in the same tumor, then this is evidence that the disruption of the genes involved are leading to similar effects. Presence of such mutual exclusivity is strong evidence that the genes are cancer-associated genes.⁹ Important findings on mutual exclusivity through existing large-scale sequencing studies include EGFR and KRAS in lung adenocarcinoma (LUAD)¹⁰^,¹¹^,¹² and the RTK-RAS pathway primarily involving BRAF, NRAS, and NF1 in melanoma.¹³^,¹⁴ Studies of this phenomenon usually involve searching for evidence of mutual exclusivity of genes in a “pathway” of genes that are believed to possess related effects. A number of authors have studied this problem from a statistical perspective, developing several techniques. The methods called Dendrix¹⁵ and Mutex¹⁶ define criteria or score metrics based on searching for the mutually exclusive gene sets using a greedy approach. MEMo⁹ and gcMECM¹⁷ are based on a search for mutually exclusive genes using graph or network-based approaches. WeSME¹⁸ and FaME¹⁹ employ computational-oriented methods that can scale up to genome-wide analysis. CoMEt²⁰ and WExT²¹ employ a permutation-based test for mutual exclusivity. A method by Szczurek et al.,²² MEGSA,²³ DISCOVER,²⁴ TiMEx,²⁵ and MEScan²⁶ utilize probabilistic model-based tests to assess the significance of mutual exclusivity in a given gene set.

Most of these methods focus on de novo search of gene sets (from the around 22,000 genes in the human genome) that display mutual exclusivity. In this article, we turn our attention to leveraging the mathematical property of mutual exclusivity for identifying probabilistically both the specific genes within a pathway and the individual mutations within such genes that are truly the drivers. The evaluation of mutual exclusivity occurs within pre-defined pathways of genes, i.e., collections of genes that have been shown in previous research to share biological functions.²⁷ A major assumption is that, within any given pathway, only one of the mutations observed can be the “driver” for that tumor, though there could be additional drivers in other pathways.

We build our method around the likelihood function developed by Hua et al.²³ In their model, a mutation in a given gene can represent either a driver or a passenger mutation. Passenger mutations in this context represent random, non-consequential background somatic mutations that are a result of the genetic instability commonplace in tumor cells. We limit attention to non-synonymous mutations, except for the calculation of tumor mutational burden (TMB), a confounding factor. The global patterns of mutual exclusivity in the data allow the model to identify the proportion of tumors that possess a driver in the pathway based on the assumption that the driver mutations are completely mutually exclusive, i.e., a tumor can contain at most 1 driver in the pathway under investigation. A test of the hypothesis that this proportion is zero thus represents a test of mutual exclusivity of the set of genes under consideration. By applying this repeatedly to different subsets of genes in the pathway, one can find the most significant subset, and thus conclude that this subset of genes consists of the drivers. Hua et al. made the assumption that the relative proportions of drivers versus passengers in each gene have a common proportionality.²³ In our approach, we relax this assumption, allowing us to estimate the proportions of driver mutations for each gene and then, through Bayes’ rule, to determine probabilistically which mutations in each tumor are drivers and which ones are passengers. By mapping all of these probabilities, we can determine for which genes drivers predominate. Also, these individual probabilities allow us to shed light on which types of mutation within any given gene show up as drivers frequently. These results are driven fundamentally by the global empirical patterns of mutual exclusivity in the dataset.

In summary we show that our method goes beyond using statistical tests for mutual exclusivity to create a framework for inferring probabilistically which genes have the strongest evidence as drivers and which mutations within these genes are specifically identified as drivers. We show through detailed analysis of the RTK-RAS pathway in a large sample of melanomas how the method confirms the prominence of a small number of recurring mutations in well-known genes in this pathway and identifies some rare individual mutations that appear to be important. We benchmark our method against seven existing tools for driver-gene nomination, leveraging the Cancer Genome Atlas (TCGA) data from 11 cancer sites for a comprehensive evaluation under real-world scenarios. The method has been implemented in a Python package named MAGPIE (mutual exclusivity analysis of cancer-associated genes and variants and their probability of being driver) and is available on GitHub at https://github.com/tarot0410/MAGPIE.

Material and methods

Our data framework involves a set of $N$ tumors, and the analysis is restricted to a set of $M$ genes in the pathway under consideration. That is, the analytic framework is pathway specific. The data could include sequencing of genes in other pathways, but analyses of these other pathways would be conducted independently. That is, any given tumor may have multiple driver genes, but a key assumption is that there can only be one driver mutation in the pathway under consideration.

MEGSA framework

We initially construct our strategy on the MEGSA (mutually exclusive gene set analysis) likelihood-based analysis introduced by Hua et al.²³ Let $x_{i} = (x_{i 1}, \dots, x_{i M})$ denote the observed binary mutation status of tumor $i$ ( $i = 1, \dots, N$ ), where $x_{i j} = 1$ if a non-synonymous alteration is observed in the $j^{t h}$ gene ( $j = 1, \dots, M$ ) and 0 otherwise. Any observed mutation must be either a driver mutation or a passenger mutation. Only one driver mutation from the pathway is possible in a given tumor. As a result, all driver mutations observed must be mutually exclusive, i.e., no two drivers can occur in a given tumor. We define $γ$ ( $0 \leq γ \leq 1)$ as the proportion of tumors in the study cohort that have a driver mutation in the pathway under investigation. Let $p_{j}$ denote the relative frequency of such tumors that possess a driver mutation in the $j^{t h}$ gene in the pathway with $\sum_{j = 1}^{M} p_{j} = 1$ and $0 \leq p_{j} \leq 1$ . Independent of driver mutations, each gene has a constant passenger mutation rate, denoted by $π_{j}$ for the $j^{t h}$ gene. The log likelihood of the observed data is

l o g L (γ, π; X) = \sum_{i = 1}^{N} \log {(1 - γ) \prod_{j = 1}^{M} π_{j}^{x_{i j}} {(1 - π_{j})}^{1 - x_{i j}} + γ \sum_{j = 1}^{M} p_{j} I_{{x_{i j} = 1}} \prod_{k \neq j} π_{k}^{x_{i k}} {(1 - π_{k})}^{1 - x_{i k}}} .

(Equation 1)

For the purpose of developing a statistical test of mutual exclusivity, Hua et al. made the assumption that the relative frequencies of driver mutations in each gene are proportional to the passenger mutation rates, i.e., $p_{j} \propto π_{j}$ . Thus, the log likelihood is reduced to

l o g L (γ, π; X) = \sum_{i = 1}^{N} \log {(1 - γ) \prod_{j = 1}^{M} π_{j}^{x_{i j}} {(1 - π_{j})}^{1 - x_{i j}} + γ \frac{1}{\sum_{k = 1}^{M} π_{k}} \sum_{j = 1}^{M} π_{j} I_{{x_{i j} = 1}} \prod_{k \neq j} {π_{k}^{x_{i k}} (1 - π_{k})}^{1 - x_{i k}}} .

(Equation 2)

To test the fundamental null hypothesis that there is no mutual exclusivity in the pathway, a likelihood ratio test can be employed. This is, in effect, a test of the hypothesis $H_{0} : γ = 0$ versus the alternative $(H_{1})$ that $γ > 0$ , where the test statistic has a null distribution of ${0.5 χ}_{0}^{2} + {0.5 χ}_{1}^{2} .$ ²³

Proposed approach

MEGSA is a powerful tool to quantify the overall mutual exclusivity in the pathway or to select a subset of genes that reflect mutual exclusivity most strongly. Although the method we propose in this article is adapted from the MEGSA framework, the problem it solves is fundamentally different. Our method is designed to identify specifically which tumors possess a driver mutation and to identify the driver if more than one mutation in the pathway is present. It is further able to identify which variants within a given gene have the capacity to be driver variants.

We follow the model proposed in Equation 1. However, unlike Hua et al. we do not assume $p_{j} \propto π_{j}$ , a critical assumption in the MEGSA approach. We further reformulate the likelihood into a mixture model framework. Assume that the gene membership of the driver mutation for tumor $i$ in the study cohort is denoted by $z_{i} = (z_{i 0}, z_{i 1}, \dots, z_{i M}$ ). Tumors with $z_{i k} = 1, k > 0$ have driver mutations in the $k^{t h}$ gene. Tumors with $z_{i 0} = 1$ do not possess a driver mutation. We emphasize that $z_{i}$ is unobserved and must be inferred. Let $τ = (τ_{0}, τ_{1}, \dots, τ_{M})$ denote the vector of proportions of tumors having each gene-specific driver mutation, i.e., $τ_{k} = p (z_{i k} = 1$ ). Note that in this new notation $τ_{k}$ represents the absolute relative frequency of tumors with drivers in the $k^{t h}$ gene (or no driver in the case of $k = 0$ ), while in the earlier notation $p_{j}$ represents the corresponding relative frequency of the presence of a driver in the $j^{t h}$ gene among tumors that have drivers in the pathway. Following a standard mixture model framework, the log likelihood of the observed data is

l o g L (τ, π; X) = \sum_{i = 1}^{N} \log {\sum_{k = 0}^{M} τ_{k} f_{k} (x_{i} | π)}

(Equation 3)

where $f_{k} (x_{i} | π) = {\begin{array}{c} \prod_{j = 1}^{M} {π_{j}}^{x_{i j}} {(1 - π_{j})}^{1 - x_{i j}}, & k = 0 \\ \frac{x_{i k}}{{π_{k}}^{x_{i k}} {(1 - π_{k})}^{1 - x_{i k}}} \prod_{j = 1}^{M} {π_{j}}^{x_{i j}} {(1 - π_{j})}^{1 - x_{i j}}, & k > 0 \end{array}$ is the cluster-specific probability density function of $x_{i}$ .

Equations 3 and 1 are in fact equivalent. To be specific, $τ_{0} = 1 - γ$ and $τ_{k} = γ p_{k}$ for $k > 0$ . As before, $γ = 1 - τ_{0}$ quantifies the overall influence of a pathway, i.e., the proportion of tumors with a driver mutation in the pathway, while $τ_{k}, k > 0$ , quantifies the relative frequency for which gene $k$ is the driver. One of the advantages of using a mixture model framework is that ${τ_{k}}^{'} s$ are defined both under the null hypothesis of no driver mutations in the cohort (i.e., $τ_{0} = 1$ or $γ$ = 0) and under the alternative (i.e., $τ_{0} < 1$ or $γ > 0$ : there exists evidence of a mutually exclusive pattern), while for the MEGSA model the ${p_{j}}^{'} s$ are undefined under the null hypothesis.

Parameters ${{τ}_{k}}$ and ${π_{j}}$ can be estimated using the expectation-maximization (EM) algorithm.²⁸ The complete data log likelihood is

l o g L (τ, π, z | X) = \sum_{i = 1}^{N} \sum_{k = 0}^{M} z_{i k} \log {τ_{k} f_{k} (x_{i} | π)} .

(Equation 4)

In the E step, we compute the posterior probability $w_{i k} = p (z_{i k} = 1 | x_{i})$ at the $t^{t h}$ iteration:

w_{i k}^{(t)} = \frac{{\hat{τ}}_{k}^{(t)} f_{k} (x_{i} | {\hat{π}}^{(t)})}{\sum_{s = 0}^{M} {{\hat{τ}}_{s}^{(t)} f}_{s} (x_{i} | {\hat{π}}^{(t)})} .

In the M step, Equation 4 is maximized in terms of ${{τ}_{k}}$ and ${{π}_{j}}$ with $w_{i k}$ fixed at $w_{i k}^{(t)}$ :

{\hat{τ}}_{k}^{(t + 1)} = \frac{W_{k}^{(t)}}{N} a n d {\hat{π}}_{j}^{(t + 1)} = \frac{\sum_{i = 1}^{N} x_{i j} (1 - w_{i j}^{(t)})}{N - W_{j}^{(t)}}

where $W_{k}^{(t)} = \sum_{i = 1}^{N} w_{i k}^{(t)}$ .

In general, given initial estimates ${\hat{τ}_{k}^{(0)}}$ and ${\hat{π}_{j}^{(0)}}$ , the EM algorithm then iterates between E step and M step until the estimates converge.

Adjustment for tumor mutational burden

Up to now, the model has been based on the assumption that the passenger mutation rates ${{π}_{j}}$ are considered constant across the set of tumors. In fact, this assumption is quite unrealistic since the overall TMB is known to vary widely across tumors and is, in many oncologic settings, an influential prognostic factor.²⁹ To address this important potential confounder, we extend the method to allow adjustment for the effect of mutational burden in the model. Let $y_{i}$ denote the TMB score for tumor $i$ . We have elected to compute the raw TMB score for each tumor by counting the total number of observed mutations (including both synonymous and non-synonymous mutations) among all sequenced genes and use a centered log-scaled score as the input to the model. This represents the overall propensity for mutations to occur in a specific tumor. Due to this dependency, we now identify the passenger mutation rates using ${π_{i j}}$ rather than ${π_{j}}$ . In our later example mutational burden is represented by the total number of mutations observed across all genes that are genotyped, not just those in the pathway under investigation. Since $π_{i j}$ is bounded by 0 and 1, a natural approach to adjust for mutational burden is to use

l o g i t (π_{i j}) = β_{0 j} + β_{1} y_{i},

(Equation 5)

where $β_{0 j}$ represents the baseline log odds of the passenger mutation rate for the $j^{t h}$ gene, and $β_{1}$ measures the common influence of mutational burden on the passenger mutation rate for all genes in the pathway. Since we are primarily interested in estimating the probabilities ${{τ}_{k}}$ , which represent the relative frequencies for which each gene is the driver, ${β_{0 j}}$ and $β_{1}$ are effectively nuisance parameters in the model. The conditional data density for $x_{i}$ | $y_{i}$ is

p (x_{i} | y_{i}) = \sum_{k = 0}^{M} p (x_{i}, z_{i k} = 1 | y_{i}) = \sum_{k = 0}^{M} p (x_{i} | y_{i}, z_{i k} = 1) p (z_{i k} = 1 | y_{i}) .

(Equation 6)

We assume $z_{i} ⊥ y_{i}$ s.t. $p (z_{i k} = 1 | y_{i}) = p (z_{i k} = 1) = τ_{k}$ , and as a result Equation 6 is reduced to

p (x_{i} | y_{i}) = \sum_{k = 0}^{M} τ_{k} p (x_{i} | y_{i}, z_{i k} = 1) .

(Equation 7)

The log likelihood of the observed data adjusting for mutational burden is

l o g L (τ, β_{0}, β_{1}; X, Y) = \sum_{i = 1}^{N} \log {\sum_{k = 0}^{M} τ_{k} g_{k} (x_{i} | y_{i}, z_{i k} = 1)}

(Equation 8)

where

g_{k} (x_{i} | y_{i}, z_{i k} = 1, β_{0}, β_{1}) = {\begin{array}{c} \prod_{j = 1}^{M} {π_{i j}}^{x_{i j}} {(1 - π_{i j})}^{1 - x_{i j}}, & k = 0 \\ \frac{x_{i k}}{{π_{i k}}^{x_{i k}} {(1 - π_{i k})}^{1 - x_{i k}}} \prod_{j = 1}^{M} {π_{i j}}^{x_{i j}} {(1 - π_{i j})}^{1 - x_{i j}}, & k > 0 \end{array}

is the cluster-specific probability density function of $x_{i}$ conditional on $y_{i}$ and $π_{i j} = \frac{1}{1 + e^{- (β_{0 j} + β_{1} y_{i})}}$ . There are no analytical solutions for ${\hat{β}}_{0 j}^{(t + 1)}$ and ${\hat{β}}_{1}^{(t + 1)}$ in the M step if using the EM algorithm. Thus, we use limited-memory BFGS (L-BFGS)³⁰ implemented in PyTorch to minimize the negative log likelihood function and estimate ${τ_{k}}$ , ${β_{0 j}}$ , and $β_{1}$ .

A statistical test to establish mutual exclusivity

Our proposed methodology seeks to identify drivers from a framework of observed mutual exclusivity. However, before performing such an analysis on a chosen pathway we propose first conducting a statistical test of the null hypothesis of no mutual exclusivity, i.e., a test of the hypothesis that $γ = 0$ , or equivalently, $τ = (τ_{0}, τ_{1}, \dots, τ_{M}) = (1,0, \dots, 0)$ . In their original development of the MEGSA model, Hua et al. derived an asymptotic likelihood ratio test. Their limiting distribution depends crucially on the assumption that the relative frequencies of driver mutations in each gene are proportional to the passenger mutation rates, an assumption we dropped as indicated earlier. Consequently, we propose to compute the empirical p value by using a parametric bootstrap approach.

Let $θ = (τ, β_{0}, β_{1})$ denote the parameters in our model. We introduce the following bootstrap estimator for the restricted (under null) and unrestricted settings, respectively.

\tilde{θ} = \underset{θ \in Θ_{H_{0}}}{\arg \max} l o g L (θ), a n d \hat{θ} = \underset{θ \in Θ}{\arg \max} l o g L (θ)

(Equation 9)

where $Θ = Θ_{τ} \times Θ_{β_{0}} \times Θ_{β_{1}}$ is the full parameter space and $Θ_{H_{0}} = {θ \in Θ : τ = (1,0, \dots, 0)}$ .

We propose the bootstrap likelihood ratio statistic

L R = - 2 (l o g L (\tilde{θ}) - l o g L (\hat{θ}))

(Equation 10)

as the test statistic for the null hypothesis.

To construct the test, we generate $B$ bootstrap samples (algorithm will be introduced later) denoted by $X^{(b)}, b = 1,2, \dots, B$ . Denoting by ${L R}^{*}$ the test statistic from the observed dataset and ${L R}^{(b)}$ its value from the $b^{t h}$ bootstrap dataset, the empirical p value is

p = \frac{1 + \sum_{b = 1}^{B} I_{{{L R}^{(b)} \geq {L R}^{*}}}}{1 + B} .

(Equation 11)

The following data generation algorithm is employed both to create the null distribution (when $γ = 0)$ and to generate datasets under positive levels of mutual exclusivity for our later simulations of model properties. (1) Generate the latent gene membership $z_{i} = (z_{i 0}, z_{i 1}, \dots, z_{i M}$ ) of the driver mutation in each tumor $i$ (note that $z_{i 0} = 1, \forall i$ when we are generating a reference distribution under the null hypothesis). Specifically, $z_{i j} \overset{i . i . d .}{\sim} M u l t i n o m i a l (1, τ)$ where $τ = (τ_{0}, τ_{1}, \dots, τ_{M})$ . Tumor $i$ has a driver mutation in the $k^{t h}$ gene ( $k = 1, \dots, M$ ) if $z_{i k} = 1$ . Otherwise, it does not possess a driver mutation and $z_{i 0} = 1$ . (2) Generate the centered log-scale mutational burden ( $y_{i}$ ) for each tumor $i$ : $y_{i} \overset{i . i . d .}{\sim} N (0, σ)$ . (3) Generate the individual mutations as $x_{i j} = 1$ if $z_{i j} = 1$ and $x_{i j} \sim B i n o m i a l (1, π_{i j})$ otherwise, where $π_{i j}$ is computed using Equation 5 with given ${β_{0 j}}, β_{1}$ and $y_{i}$ . For simulating data replicates, $σ, {β_{0 j}}$ and $β_{1}$ are pre-specified. For generating bootstrap data samples, we set $β_{1}$ = ${\hat{β}}_{1}$ , the estimated $β_{1}$ by fitting our model to the observed data, and then solved for $β_{0 j}$ using the following equation

q_{j} = \frac{1}{N} \sum_{i = 1}^{N} π_{i j} = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{1 + e^{- (β_{0 j} + β_{1} y_{i})}}

(Equation 12)

where $q_{j}$ denotes the overall mutation rate for $j^{t h}$ gene in the observed data. Equation 12 allows for the bootstrap samples to maintain the association between $π_{i j}$ and $y_{i}$ while controlling the similar overall mutation rate for each gene. There is no closed-form solution for $β_{0 j}$ in Equation 12, so we solved for $β_{0 j}$ numerically using Newton’s method.

Identifying driver mutation for each tumor and specific variants within a gene

For every tumor we seek to identify the driver mutation or to determine that there is no driver mutation in the pathway. This can be inferred probabilistically from the posterior probabilities computed through Bayes’ rule. In the absence of adjustment for mutational burden the posterior probability that the mutation in the $k^{t h}$ gene ( $k > 0$ ) in the pathway is the driver is

w_{i k} = p (z_{i k} = 1 | x_{i}) = \frac{{\hat{τ}}_{k} f_{k} (x_{i} | \hat{π})}{\sum_{s = 0}^{M} {\hat{τ}}_{s} f_{s} (x_{i} | \hat{π})} .

(Equation 13)

When $k = 0$ , Equation 13 provides the probability that tumor $i$ does not have any driver mutation in the pathway. If a tumor is observed to have mutations in multiple genes, the most-likely driver mutation can be determined using

z_{i}^{*} = {\arg \max}_{k} w_{i k} .

(Equation 14)

Similarly, the posterior probability under the scenario of adjusting for mutational burden is

w_{i k} = p (z_{i k} = 1 | x_{i}, y_{i}) = \frac{{\hat{τ}}_{k} g_{k} (x_{i} | y_{i}, {\hat{β}}_{0}, {\hat{β}}_{1})}{\sum_{s = 0}^{M} {\hat{τ}}_{s} g_{s} (x_{i} | y_{i}, {\hat{β}}_{0}, {\hat{β}}_{1})} .

(Equation 15)

Further, we can gauge the relative influence of individual variants within genes as drivers by averaging these posterior probabilities across the tumors in which the specific variant was observed. Let $x_{i j (l)}$ denote the mutation status (1 = yes; 0 = no) of variant $l$ from gene $j$ in tumor $i$ , where variant $l$ is nested within gene $j$ . The observed mutation frequency for variant $l$ (in gene $j$ ) is

N_{j (l)} = \sum_{i = 1}^{N} x_{i j (l)},

(Equation 16)

and the average posterior probability that variant $l$ is a driver is

P_{j (l)} = \frac{1}{N_{j (l)}} \sum_{i = 1}^{N} x_{i j (l)} w_{i j,}

(Equation 17)

a term we refer to as the “driver frequency.”

Application of MAGPIE: The InterMEL study

The InterMEL study involves genomic sequencing of primary tumors from individuals with stage IIA–IIIB melanomas. Data from the InterMEL study serve as an illustrative example for demonstrating the application of MAGPIE. The InterMEL study protocol was approved by the institutional review boards (ethics committees) at each participating institution, material and data user agreements are in place, and research has been conducted according to the principles expressed in the Declaration of Helsinki. The need for informed consent was waived by the ethics committees due to the retrospective nature of the study.

Results

We apply the method to the InterMEL consortium dataset of early-stage melanoma tumors³¹^,³² (Database of Genotypes and Phenotypes [dbGaP] study accession: phs003099.v1.p1) and conduct an analysis on the RTK-RAS pathway for an illustration of how MAGPIE works and what it delivers. We then apply the method to three different pathways within 11 different cancer sites from TCGA data and benchmark its performance with several other existing methods. Lastly, we explore, via simulations, the properties of the method.

Illustration of MAGPIE: Data from the InterMEL study

Our analysis is based on 495 tumor samples genotyped to date through the InterMEL study. DNA samples were sequenced at Memorial Sloan Kettering Cancer Center using the Integrated Mutation Profiling of Actionable Cancer Targets, or MSK-IMPACT, a clinically validated and US Food and Drug Administration (FDA)-approved hybridization capture-based, next-generation sequencing assay developed to guide cancer treatment.³³^,³⁴ This involved sequencing of 468 cancer-associated genes.

For our illustration of the method, we focus solely on mutations in the RTK-RAS pathway, the major known pathway that influences the development of melanomas.¹⁴^,³⁵ The MSK-IMPACT panel includes 38 genes from the RTK-RAS pathway where the list of pathway genes are as defined by Sanchez-Vega et al.²⁷ It is well known that mutations occur in melanomas frequently in several genes in this pathway, most prominently the genes BRAF and NRAS. Mutations in these two genes are almost always mutually exclusive. However, mutual exclusivity has not been studied systematically for other genes in this pathway. Also, hotspot mutations occur very frequently at the 600^th residue in BRAF (hereafter referred to as BRAF Val600 variant) and the 61^st residue in NRAS (hereafter referred to as NRAS Gln61 variant), but the importance of mutations altering other residues is less clear.

The data reveal that 91% of the 495 tumors had a mutation in the RTK-RAS pathway. However, our estimate of $\hat{γ}$ = 0.76 (p value = 0.001) indicates that in only 76% of the tumors is one of the mutations considered to be the driver. TMB varies widely with a standard deviation of 1.1 for the log tumor burden and an estimated effect of $β_{1}$ = 1.21. Figure 1 displays data (top) and model estimates (bottom) for the 18 genes with the highest estimated values of $τ_{j}$ (proportion of tumors carrying a driver mutation in gene j). Figure S1 displays the results for all 38 genes. The top panel displays a structured waterfall plot of the observed mutations. The two most frequently occurring genes at the top of the figure, BRAF and NRAS, are almost always mutually exclusive, the exceptions being the 4 cases at the extreme left of the figure. It is further noticeable that mutations in these genes frequently occur in the absence of mutations in any of the other genes in the pathway (see the right-most columns in the BRAF and NRAS rows). This high degree of general mutual exclusivity is the key pattern in the data that influences our analysis, confirming a high probability of a driver for mutations observed in these two genes. This is reflected in the bottom panel of Figure 1 where the depth of the shade indicates the strength of evidence that the mutation in question is the driver. Quantitative details are provided in Table 1, which displays the relative frequencies for which mutations in the genes occur alongside the portion of these occurrences that are flagged as drivers by our method. This gene-specific driver frequency is the estimated $τ_{j}$ . Interestingly, the method suggests that NRAS is the driver in all tumors involving NRAS mutations, notably the 4 tumors in which BRAF and NRAS mutations occurred simultaneously. Moving down the gene list, the analysis suggests that KIT mutations, which occur in about 5% of tumors, is the driver about half of the time, NF1 mutations are drivers in about ¼ of the 26% of tumors that harbor NF1 mutations, and that there is limited evidence of driver status for the low-frequency genes. Although NF1 has a relatively high mutation rate, our method downgrades its importance as a driver gene because of its strong association with high TMB, which is displayed in the bottom panel of Figure 1 where a dark line indicates that the mutational burden for that tumor is in the top 25^th percentile. In general, tumors with high mutational burden are less likely to have a driver mutation identified after model adjustment, but the final posterior probability vector for each tumor computed with the estimated parameter values also depends on other factors (e.g., the proportion of tumors with mutation in a given gene that are singletons). Table S1 summarizes the results for all 38 genes.

Illustration of the observed binary mutation status, estimated posterior probability of driver mutation, and the distribution of binary tumor mutational burden for the 18 genes with the highest estimated driver frequency in the RTK-RAS pathway

Table 1.

Summary of the observed mutation frequency and the estimated driver frequency for each gene among the top genes ranked by the estimated driver frequency in the RTK-RAS pathway from the InterMEL study

Gene	Mutation frequency	Driver frequency
BRAF	0.402	0.382
NRAS	0.196	0.196
NF1	0.261	0.069
KIT	0.051	0.026
FGFR4	0.091	0.010
KRAS	0.014	0.010
MAPK1	0.032	0.009
ALK	0.131	0.009
HRAS	0.026	0.008
JAK2	0.040	0.007
ERBB2	0.065	0.006
ERBB3	0.057	0.006
NTRK2	0.065	0.005
PTPN11	0.065	0.004
RIT1	0.024	0.004
MAP2K1	0.071	0.004
MAP2K2	0.046	0.003
IGF1R	0.004	0.002

Open in a new tab

Finally, we illustrate results for individual variants within driver genes. In Table 2, we provide the frequencies and average posterior probabilities for each of the 48 distinct variants observed in BRAF. High probabilities are generally assigned when the variant occurs as a singleton, and lower probabilities are assigned when other variants in the pathway occur. Very low probabilities occur when a driver variant in a different gene is observed. Thus, the 4 variants at the bottom of this table are the variants (on the extreme left in Figure 1) that occurred alongside an NRAS mutation.

Table 2.

Variant analysis

BRAF Mutation^a	Type	Prob.	Freq.	OncoKB	Classification	Co-occurring mutations
c.1406G>C (p.Gly469Ala)	missense	>0.99	1	oncogenic	class 2	–
c.2209G>A (p.Gly737Ser)	missense	>0.99	1	–	–	BRAF c.1799T>A (p.Val600Glu)^a
c.1799T>G (p.Val600Gly)	missense	>0.99	2	likely oncogenic	class 1	–
c.2191C>T (p.Pro731Ser)	missense	>0.99	2	–	–	BRAF c.1798G>A (p.Val600Met)^a; BRAF c.1799T>G (p.Val600Gly)^a
c.1457_1471delATGTGAC AGCACCTA (p.Asn486_Pro490del)	in frame	>0.99	1	likely oncogenic	class 2	–
c.1843G>A (p.Gly615Arg)	missense	>0.99	1	–	–	–
c.1756G>A (p.Glu586Lys)	missense	>0.99	2	likely oncogenic	–	BRAF c.1799T>A (p.Val600Glu)^a
c.1404_1406delTGGinsAAA (p.Phe468_Gly469delinsLeuLys)	missense	0.99	1	–	–	–
c.1997T>A (p.Ile666Asn)	missense	0.99	1	–	–	BRAF c.1799T>A (p.Val600Glu)^a
c.1798G>A (p.Val600Met)	missense	0.99	3	inconclusive	class 1	–
c.1799T>A (p.Val600Glu)	missense	0.99	116	oncogenic	class 1	–
c.1801A>G (p.Lys601Glu)	missense	0.98	4	likely oncogenic	class 2	–
c.1798_1799delGTinsAG (p.Val600Arg)	missense	0.98	8	oncogenic	class 1	–
c.1798_1799delGTinsAA (p.Val600Lys)	missense	0.98	31	oncogenic	class 1	–
c.1808G>A (p.Arg603Gln)	missense	0.97	1	–	–	–
c.1790T>G (p.Leu597Arg)	missense	0.97	1	likely oncogenic	class 2	–
c.95_100delGCGCCG (p.Gly32_Ala33del)	in frame	0.97	1	–	–	–
c.1405G>A (p.Gly469Arg)	missense	0.96	3	oncogenic	class 2	–
c.1789_1790delCTinsTC (p.Leu597Ser)	missense	0.96	2	likely oncogenic	class 2	–
c.983C>T (p.Pro328Leu)	missense	0.95	1	–	–	–
c.2212T>C (p.Phe738Leu)	missense	0.94	1	–	–	–
c.950C>T (p.Ser317Phe)	missense	0.94	1	–	–	–
c.1750C>T (p.Leu584Phe)	missense	0.93	1	inconclusive	–	–
c.1397G>A (p.Gly466Glu)	missense	0.93	1	oncogenic	class 3	–
c.952C>T (p.Pro318Ser)	missense	0.92	2	–	–	BRAF c.1798_1799delGTinsAA (p.Val600Lys)^a
c.1781A>C (p.Asp594Ala)	missense	0.92	1	oncogenic	–	–
c.990T>G (p.Ile330Met)	missense	0.92	1	–	–	–
c.1780G>A (p.Asp594Asn)	missense	0.91	2	oncogenic	class 3	–
c.421C>T (p.Pro141Ser)	missense	0.91	1	–	–	–
c.1454T>G (p.Leu485Trp)	missense	0.90	1	likely oncogenic	class 2	–
c.1781A>G (p.Asp594Gly)	missense	0.90	2	oncogenic	class 3	–
c.1796C>T (p.Thr599Ile)	missense	0.85	1	likely oncogenic	class 2	–
c.1495A>G (p.Lys499Glu)	missense	0.85	1	likely oncogenic	–	–
c.1244C>T (p.Ala415Val)	missense	0.85	1	–	–	–
c.1033C>T (p.Pro345Ser)	missense	0.85	1	–	–	–
c.1397G>C (p.Gly466Ala)	missense	0.85	1	oncogenic	class 3	–
c.1391G>A (p.Gly464Glu)	missense	0.80	1	oncogenic	class 2	–
c.2203C>T (p.Arg735Trp)	missense	0.75	1	–	–	–
c.1165C>T (p.Arg389Cys)	missense	0.74	1	–	–	–
c.755G>A (p.Arg252Gln)	missense	0.74	1	–	–	–
c.1753C>T (p.His585Tyr)	missense	0.68	1	–	–	–
c.2195C>T (p.Ser732Phe)	missense	0.66	1	–	–	–
c.980G>A (p.Gly327Glu)	missense	0.66	1	–	–	–
c.1400C>T (p.Ser467Leu)	missense	0.59	3	oncogenic	class 3	NRAS c.181C>A (p.Gln61Lys)^b
c.31G>A (p.Gly11Ser)	missense	<0.01	1	–	–	NRAS c.37G>T (p.Gly13Cys)^b
c.1352A>T (p.Glu451Val)	missense	<0.01	1	–	–	NRAS c.37G>T (p.Gly13Cys)^b
c.1501G>A (p.Glu501Lys)	missense	<0.01	1	inconclusive	–	NRAS c.35_36delGTinsAG (p.Gly12Glu)^b
c.1733A>T (p.Lys578Met)	missense	<0.01	1	–	–	NRAS c.181C>A (p.Gln61Lys)^b

Open in a new tab

GenBank: NM_004333.4 (BRAF).

GenBank: NM_002524.4 (NRAS).

BRAF variants have been studied extensively and classified into a few major classes with varying potency in oncogenicity based on differences in dimerization requirement and RAS dependency (class 1 mutations are Val600 mutations that activate the pathway downstream as monomers; class 2 comprises RAS-independent dimers that activate kinase activity; class 3 variants show diminished kinase activity but signal as RAS-dependent dimers).³⁶^,³⁷^,³⁸^,³⁹ Table 2 shows that the BRAF variants with high estimated probability of being a driver (>90%) include most of the functionally potent class 1 and class 2 variants. OncoKB is a widely popular evidence-based variant annotation tool that integrates such known biologic and oncogenic effects,⁴⁰ which can provide orthogonal evidence about the classifications. Notably, the probabilities are generally higher for variants classified as oncogenic by OncoKB. Furthermore, we observed that c.1781A>C (GenBank: NM_004333.4) (p.Asp594Ala) variant, despite not being classified as a BRAF class 1–3 mutation, exhibits a high estimated driver probability of 0.92 and is categorized as oncogenic by OncoKB. Other rare BRAF variants identified with high estimated probabilities, such as c.1843G>A (p.Gly615Arg), c.1404_1406delTGGinsAAA (p.Phe468_Gly469delinsLeuLys), c.1808G>A (p.Arg603Gln), c.95_100delGCGCCG (p.Gly32_Ala33del), and c.983C>T (p.Pro328Leu), currently lack established biological and clinical evidence. These variants merit investigation for functional validation. Individual variant analyses of four additional genes are provided in Table S2. For KIT, KRAS, and STK11, there is clearly a strong correlation between the MAGPIE classification and the OncoKB reference, with truncation mutations having notably higher probabilities in the case of the tumor suppressor STK11. For NRAS, almost all of the observed variants both have a high assigned probability and are classified as either oncogenic or likely oncogenic by OncoKB.

Benchmarking MAGPIE against existing tools using TCGA data

To further assess MAGPIE’s performance in real-world scenarios, we applied it to a comprehensive set of independent analyses involving three distinct pathways (RTK-RAS, PI3K, and Wnt) across 11 cancer sites, utilizing TCGA data. Pathway genes are defined based on the framework established by Sanchez-Vega et al.²⁷ Table 3 summarizes the following descriptive statistics and estimates for individual tumor sites and/or pathways: the number of tumors from each tumor site (# of tumors); relative frequency of observed mutations (mut freq); estimated proportion of tumors possessing a driver mutation (driver freq); p value of the corresponding significance test for mutual exclusivity (p val). The results reveal that the RTK-RAS pathway exhibits significant mutual exclusivity (p value < 0.05) in eight distinct cancer sites, encompassing breast cancer (BRCA), lower-grade glioma (LGG), uterine corpus endometrial carcinoma (UCEC), LUAD, head and neck squamous cell carcinoma (HNSC), papillary thyroid carcinoma (THCA), urothelial bladder cancer (BLCA), and cutaneous melanoma (SKCM). The PI3K pathway is deemed significant in five cancer sites—BRCA, LGG, UCEC, LUAD, and SKCM—and nearly reaches the threshold for significance in HNSC (p value = 0.051). The Wnt pathway shows a significant mutually exclusive pattern in seven cancer sites, namely LGG, UCEC, LUAD, HNSC, prostate adenocarcinoma (PRAD), BLCA, and SKCM.

Table 3.

Results of independent MAGPIE analyses on individual pathways in each tumor site

Site^a	# of tumors	RTK-RAS			PI3K			Wnt
Site^a	# of tumors	Mut freq	Driver freq	p val	Mut freq	Driver freq	p val	Mut freq	Driver freq	p val
BRCA	987	0.217	0.090	0.017	0.475	0.400	<0.001	0.098	0.039	0.266
LGG	520	0.221	0.115	0.033	0.187	0.095	0.032	0.056	0.034	0.029
UCEC	508	0.614	0.385	<0.001	0.904	0.719	<0.001	0.575	0.309	<0.001
LUAD	503	0.809	0.632	<0.001	0.400	0.205	0.002	0.314	0.140	0.002
HNSC	489	0.409	0.177	0.024	0.317	0.246	0.051	0.225	0.093	0.011
THCA	486	0.728	0.723	<0.001	0.047	0.008	0.599	0.037	0.010	0.249
PRAD	477	0.117	0.069	0.370	0.094	0.044	0.313	0.094	0.062	0.004
LUSC	464	0.597	0.213	0.356	0.407	0.221	0.186	0.349	0.157	0.288
BLCA	399	0.684	0.398	<0.001	0.451	0.194	0.435	0.291	0.145	0.049
SKCM	365	0.942	0.895	<0.001	0.504	0.228	0.005	0.553	0.188	0.014
KIRC	353	0.235	0.080	0.317	0.195	0.054	0.909	0.079	0.049	0.129

Open in a new tab

TCGA disease codes and abbreviations: BRCA, breast cancer; LGG, lower grade glioma; UCEC, uterine corpus endometrial carcinoma; LUAD, lung adenocarcinoma; HNSC, head and neck squamous cell carcinoma; THCA, papillary thyroid carcinoma; PRAD: Prostate adenocarcinoma; LUSC, lung squamous cell carcinoma; BLCA, urothelial bladder cancer; SKCM, cutaneous melanoma; KIRC: clear cell kidney carcinoma.

Figure 2 provides a comprehensive overview of the driver genes identified. The size of the bubbles in the graph corresponds to the relative frequency of mutations associated with a specific gene (x axis) within the corresponding cancer site (y axis). The depth of shade of these bubbles indicates the ratio of the estimated driver mutation rate to the observed mutation rate for a given gene. This ratio effectively represents the conditional probability that an observed mutation is considered a driver alteration. Strong driver genes are expected to exhibit a substantial bubble size and a dark tone, ensuring easy visual discernment.

Bubble plots showing the likelihood of individual genes within a pathway as driver across different tumor sites

(A) RTK-RAS pathway.

(B) PI3K pathway.

(C) Wnt pathway. TCGA disease codes and abbreviations: BRCA, breast cancer; LGG, lower-grade glioma; UCEC, uterine corpus endometrial carcinoma; LUAD, lung adenocarcinoma; HNSC, head and neck squamous cell carcinoma; THCA, papillary thyroid carcinoma; PRAD, prostate adenocarcinoma; LUSC, lung squamous cell carcinoma; BLCA, urothelial bladder cancer; SKCM, cutaneous melanoma; KIRC, clear cell kidney carcinoma.

Within the RTK-RAS pathway, several potent driver genes emerge across diverse cancer sites. Specifically, BRAF is identified as a strong driver gene in THCA and SKCM, as is EGFR in LUAD. The roster of strong drivers includes FGFR2 in UCEC, FGFR3 in BLCA, KRAS in UCEC and LUAD, and NRAS in SKCM. In the context of the PI3K pathway, strong driver genes are as follows: PIK3CA in UCEC, HNSC, BRCA, and BLCA; PIK3R1 in UCEC; PPP2R1A in UCEC; and STK11 in LUAD. In the Wnt pathway, the strong drivers encompass AMER1 in SKCM and CTNNB1 in UCEC and SKCM. A number of driver genes displaying moderate or weaker significance are also identified. These genes, though challenging to discern from bubble charts, can be identified from their estimated driver frequencies in Tables S3–S5.

Next, we benchmark the performance of MAGPIE against existing methods for driver gene identification. Although MAGPIE is designed to identify driver genes and individual variants at any frequency, in fact the identification of rare drivers is a unique strength. For the purpose of comparing it with existing frequency-based methods, we have restricted attention to genes that have both been identified as statistically significant for mutual exclusivity and have an estimated driver frequency of at least 1%. We compare MAGPIE with published results from seven existing methods: MutSigCV,¹ DriverML,² ActiveDriver,³ and OncodriveFML⁴—each a frequency-based approach targeting genes exhibiting mutation rates surpassing anticipated levels. OncoDriverCLUST⁵ focuses on hotspot detection and identifying genes displaying a marked bias toward mutations clustering within regions encoding specific protein domains; DawnRank⁷ employs a sub-network framework to rank genes based on their downstream impact within interaction networks; and Dendrix¹⁵ is devoted to identifying mutually exclusive gene sets through a measure quantifying the intricate balance between coverage and exclusivity. The driver genes, identified through these competing methods applied to TCGA data, are comprehensively documented in Han et al.²

In Figure 3, we present a concise summary of the results from four cancer sites (SKCM, LUAD, UCEC, and BLCA) that compares the driver genes identified by the different methods, facilitating a comprehensive comparative analysis. The complete results for all 11 cancer sites are summarized in Figure S3. Given that there is limited ground truth to judge if a gene is indeed a driver, we focus on the extent of agreement between MAGPIE’s selections and those of other methodologies.

Driver gene nomination by different methods (selected cancer sites)

(A) RTK-RAS pathway.

(B) PI3K pathway.

(C) Wnt pathway. TCGA disease codes and abbreviations: SKCM, cutaneous melanoma; LUAD, lung adenocarcinoma; UCEC, uterine corpus endometrial carcinoma; BLCA, urothelial bladder cancer.

The overall impression from these results is that the methods broadly target the same genes but there are wide discrepancies in the results. The major, known driver genes, such as BRAF, NRAS, EGFR, KRAS, and NF1 in the RTK-RAS pathway; PIK3CA, PTEN, and PIK3R1 in the PIKS pathway; and APC and CTNNB1 for Wnt, are all identified for their key cancer sites by multiple methods. Of note, MAGPIE is generally consistent in identifying these key genes, unlike some of the competitors. Some particular observations include the following. In RTK-RAS, MAGPIE uniquely identifies KRAS in SKCM and RIT1 in LUAD, and the results supported are by previous research.⁴¹^,⁴² Conversely, for PI3K, MAGPIE fails to identify PTEN as a driver for UCEC. However, in this site PTEN’s omission as a driver stems from the fact that a substantial majority of PTEN mutations co-occur with mutations in PIK3CA and PIK3R1. Furthermore, mutations in PIK3CA and PIK3R1 exhibit a notable level of mutual exclusivity (Figure S2). Consequently, the algorithm’s preference leans toward selecting PIK3CA and PIK3R1 as driver genes while excluding PTEN. Finally, MAGPIE nominates CTNNB1 and APC as drivers in SKCM, LUAD, and BLCA. This aligns with findings by Karachaliou et al., who observed a mutually exclusive mutation pattern of APC and CTNNB1 in TCGA-SKCM data, and further established an association between APC/CTNNB1 mutations and adverse outcomes in stage IV melanoma.⁴³ The implications of APC/CTNNB1 in LUAD and BLCA, however, warrant further comprehensive examination.

We further evaluated the performance of each method with quantitative metrics by using the driver genes curated in the Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census (CGC) as the benchmark. The current release of the CGC includes over 700 evidence-based, manually curated cancer-driver genes (release v98, May 23, 2023).⁴⁴ We consider all genes in the three pathways as the gene pool, among which the true positive genes are those collected in the CGC. Sensitivity and specificity are summarized for each method within individual tumor types along with the average Youden’s index across all tumor types (Table S6). Here, we excluded KIRC from the analysis because there is no associated driver gene found in the CGC, and we further combined LUAD and LUSC into one category named “LUNG” following the CGC’s coding convention. Overall, MAGPIE is ranked 3^rd among all eight methods according to the average Youden’s index. Finally, we want to clarify that this quantitative evaluation could be biased because the CGC classifies driver genes using a conservative approach, and thus the reference driver gene list used in the analysis is unlikely to be a complete set.

Operating characteristics of the method

We have conducted simulations to examine the properties of the method. There are many different features of a pathway that could potentially affect these properties. We have elected to generate data by using selected features of the RTK-RAS pathway that were estimated from the InterMEL data and then to vary some key aspects of these results to explore the influence of selected features. Specifically, we focus on a pathway with three types of genes: (1) strong driver genes that function as drivers most of the time (like BRAF and NRAS), (2) moderate driver genes that sometimes function as drivers and sometimes do not (like NF1), and (3) genes that are presumed to never be drivers. Details of their overall and driver frequencies are provided in Table 4. We consider two general configurations, denoted by $A$ and $B$ . Configuration $A$ refers to the low-noise setting in which two genes are generated from each type. Configuration $B$ refers to the high-noise setting in which the number of non-driver genes is increased to 10. The probabilities in Table 4 correspond to pathways where mutual exclusivity is present. When we evaluate the test size under the null hypothesis of no exclusivity (Table S7), we use configuration $A$ under the assumption that all genes in the configuration have driver rates of 0%. As previously described in the data generation algorithm, the centered log-scale mutational burden was generated from a normal distribution with standard deviation $σ = 1$ . $β_{0 j}$ was set to maintain the designed overall and driver mutation frequency under each setting as summarized in Table 4. For all settings, we simulated 1,000 data replicates under our model structure, and for each test, we generated 1,000 bootstrap samples.

Table 4.

Characteristics of simulated genes

Gene type	Mutation frequencies		Configuration (# genes)
Gene type	Overall	Driver	A low noise	B high noise
Strong sriver	20%	20%	2	2
Moderate driver	20%	10%	2	2
Non-driver	20%	0%	2	10

Open in a new tab

We first examined the properties of the initial significance test to determine the evidence that mutual exclusivity exists in the pathway, using Equation 11. We calculated the size of the test for sample sizes ranging from 500 to 5,000 under a model in which there was no effect of TMB ( $β_{1} = 0$ ) and under a model in which the effect of TMB was in the range of that observed in the real dataset ( $β_{1} = 1$ ). The results are summarized in Table S7, where the test size is computed as the average proportion of null hypothesis rejections among the 1,000 simulated data replicates. We observe that the test size of our proposed bootstrap-based test is, in general, close to the nominal level of 5% across different settings.

Next, we explored the ability of the model to identify drivers in individual tumors. We used two distinct statistics for this purpose. First, we evaluated overall measures that characterize the true-positive rates (TPRs) and false-positive rates (FPRs) for identifying whether or not a tumor has a driver in the pathway. For this calculation, the overall FPR is given by

O v e r a l l F P R = \frac{1}{\sum_{i = 1}^{N} I_{z_{i 0} = 1}} \sum_{i = 1}^{N} I_{z_{i 0} = 1} I_{{argmax}_{k} w_{i k} > 0} .

(Equation 18)

The corresponding TPR is given by

C r u d e o v e r a l l T P R = \frac{1}{\sum_{i = 1}^{N} I_{z_{i 0} \neq 1}} \sum_{i = 1}^{N} I_{z_{i 0} \neq 1} I_{{argmax}_{k} w_{i k} > 0} .

(Equation 19)

Table 5 summarizes the results of overall accuracy, where FPR and TPR in the table are calculated as the average among the 1,000 simulated data replicates. We observe that, in general, FPR decreases and TPR increases with larger sample size. When there exists an association between passenger mutation rate and TMB (i.e., $β_{1} = 1$ ), our method tends to classify fewer mutations as drivers, reducing both FPR and TPR. Conversely, elevation in pathway noise tends to make our method nominate more driver mutations, increasing both FPR and TPR. However, the effect of such noise diminishes with a larger sample size.

Table 5.

Overall accuracy

Configuration	Sample size	Mutational burden	Accuracy
Configuration	Sample size	Mutational burden	FPR	TPR
A Low noise	500	$β_{1} = 0$	0.210	0.994
	500	$β_{1} = 1$	0.156	0.968
	1,000	$β_{1} = 0$	0.211	0.999
	1,000	$β_{1} = 1$	0.154	0.969
	5,000	$β_{1} = 0$	0.212	1.000
	5,000	$β_{1} = 1$	0.153	0.971
B High noise	500	$β_{1} = 0$	0.247	0.997
	500	$β_{1} = 1$	0.187	0.973
	1,000	$β_{1} = 0$	0.213	0.999
	1,000	$β_{1} = 1$	0.166	0.973
	5,000	$β_{1} = 0$	0.212	1.000
	5,000	$β_{1} = 1$	0.155	0.972

Open in a new tab

Finally, we explored diagnostic accuracy at a more granular level, seeking to determine the accuracy of driver identification for the different individual gene configurations. For this purpose, we define the gene-specific false-positive and true-positive rates (gFPR and gTPR). That is, our FPR in this context measures, among tumors with a mutation in gene $j$ that is not a driver, what proportion are incorrectly flagged as a driver:

g F P R_{j} = \frac{1}{\sum_{i = 1}^{N} I_{x_{i j} = 1} I_{z_{i j} = 0}} \sum_{i = 1}^{N} I_{x_{i j} = 1} I_{z_{i j} = 0} I_{{argmax}_{k} w_{i k} = j} .

(Equation 20)

The corresponding true gene-specific positive rate measures, among tumors with a mutation in gene $j$ that is a driver, what proportion are correctly flagged as a driver:

g T P R_{j} = \frac{1}{\sum_{i = 1}^{N} I_{x_{i j} = 1} I_{z_{i j} = 1}} \sum_{i = 1}^{N} I_{x_{i j} = 1} I_{z_{i j} = 1} I_{{argmax}_{k} w_{i k} = j} .

(Equation 21)

Results are provided in Table S8, where, as before, the gFPR and gTPR values are averages across all simulated datasets. It is worth noting that the previously defined overall FPR and crude overall TPR are not simple weighted averages of gFPR or gTPR across genes, as the denominators in those formulas are not the same. We do not compute gFPR for strong driver genes because in our simplified construct there are no tumors that have non-driver mutations in these genes (i.e., no passenger mutations identified). Similarly, we do not compute gTPR for non-driver genes. For strong driver genes, we observe that the average gTPR is close to 1 in almost all scenarios (e.g., small sample size or high noise). Similarly, the method also performs well in screening out non-driver genes, evidenced by the extremely small gFPR across different scenarios. For moderate driver genes, those that can often be either a driver or a passenger, it is clearly more challenging for the method to identify drivers accurately, with gFPRs ranging from 0.340 to 0.477 across our various configurations, while the gTPRs range from 0.851 to 0.942. As was shown previously in Table 5, with greater association between passenger mutation rate and TMB, our method tends to classify fewer mutations as drivers, reducing both FPR and TPR. Conversely, the presence of increasing noise has minimal impact at an individual gene level.

Discussion

Our goals in developing this methodology were to find a strategy for identifying potential driver mutations in a tumor and assigning probabilities to the potential candidates. We built our strategy on a model that frames the selection on the presence of mutual exclusivity patterns in the data. Among the many groups that have studied mutual exclusivity in this context, we elected to build on the ideas of Hua et al.²³ since their model was firmly based on well-established statistical principles. The underlying model is structured around the assumption that there can be at most 1 driver in the pathway in any individual tumor, and this is in itself an assumption that may not be correct. However, this assumption does provide a solid framework in which to examine mutual exclusivity. We observe in our detailed analysis of data from the InterMEL study that the method produces results that appear to be highly plausible in that they align with known evidence about the RTK-RAS pathway. However, the RTK-RAS example represents a pathway for which the mutual exclusivity between BRAF and NRAS is especially profound, and thus may present an easier task than for pathways without highly prevalent variants that are very strongly mutually exclusive. However, our more comprehensive analysis of multiple pathways and cancer sites using TCGA data also demonstrates that MAGPIE generally identifies the known cancer-associated genes in addition to identifying other genes worthy of further investigation. The comparative analyses using multiple methods demonstrate wide variation in the results, demonstrating only modest levels of agreement among the methods. However, without a gold standard reference it is difficult to distinguish the methods on the basis of accuracy in identifying driver genes.

We believe that our method has strong potential for shedding light on which mutations are potentially pathogenic in a specific gene. In the melanoma BRAF example we presented, the Val600 variants identified as pathogenic are well characterized and are targets for FDA-approved therapies. However, approximately 35% of all BRAF mutations occur outside the Val600 codon.³⁸ The functional impact and therapeutic potential of non-Val600 BRAF mutations is an active research topic, yet existing knowledge in this area is limited. Our analysis of BRAF identified variants other than the common Val600 variants that may be potentially pathogenic. These represent the kinds of variants that could be prime candidates for experimental validation using modern in vitro and in vivo strategies.⁴⁵

We emphasize that our strategy is focused on a single pathway and is based on the pivotal assumption that there can be only one driver in the pathway in any given tumor. However, in any given tumor there are very likely multiple drivers, each occurring in distinct pathways. While one could perform our analysis independently for distinct pathways in order to identify a more complete set of drivers, a future research task is to expand our approach to permit a simultaneous analysis of multiple pathways.

Web resources

Mutation Data from the InterMEL study, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003099.v1.p1
GitHub, MAGPIE, https://github.com/tarot0410/MAGPIE
NIH National Cancer Institute, Genomic Data Commons, https://gdc.cancer.gov/about-data/publications/mc3-2017
Sanger Institute, COSMIC, https://cancer.sanger.ac.uk/cosmic/download

Acknowledgments

This study was supported by the National Institutes of Health (P01 CA206980 and R01 CA251339) and Memorial Sloan Kettering Cancer Center (P30 CA008748).

Declaration of interests

The authors declare no competing interests.

Published: January 16, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.12.009.

Contributor Information

Xinjun Wang, Email: wangx11@mskcc.org.

Colin Begg, Email: beggc@mskcc.org.

Supplemental information

Document S1. Figures S1–S3

mmc1.pdf^{(596.1KB, pdf)}

Data S1. Tables S1–S8

mmc2.xlsx^{(49.3KB, xlsx)}

Document S2. Article plus supplemental information

mmc3.pdf^{(2.6MB, pdf)}

References

1.Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Carter S.L., Stewart C., Mermel C.H., Roberts S.A., et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Han Y., Yang J., Qian X., Cheng W.C., Liu S.H., Hua X., Zhou L., Yang Y., Wu Q., Liu P., Lu Y. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res. 2019;47:e45. doi: 10.1093/nar/gkz096. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Reimand J., Bader G.D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 2013;9:637. doi: 10.1038/msb.2012.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Mularoni L., Sabarinathan R., Deu-Pons J., Gonzalez-Perez A., López-Bigas N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 2016;17:128. doi: 10.1186/s13059-016-0994-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Tamborero D., Gonzalez-Perez A., Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013;29:2238–2244. doi: 10.1093/bioinformatics/btt395. [DOI] [PubMed] [Google Scholar]
6.Jia P., Wang Q., Chen Q., Hutchinson K.E., Pao W., Zhao Z. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 2014;15:489–516. doi: 10.1186/s13059-014-0489-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Hou J.P., Ma J. DawnRank: discovering personalized driver genes in cancer. Genome Med. 2014;6:56. doi: 10.1186/s13073-014-0056-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bashashati A., Haffari G., Ding J., Ha G., Lui K., Rosner J., Huntsman D.G., Caldas C., Aparicio S.A., Shah S.P. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol. 2012;13:R124. doi: 10.1186/gb-2012-13-12-r124. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Ciriello G., Cerami E., Sander C., Schultz N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 2012;22:398–406. doi: 10.1101/gr.125567.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Pao W., Wang T.Y., Riely G.J., Miller V.A., Pan Q., Ladanyi M., Zakowski M.F., Heelan R.T., Kris M.G., Varmus H.E. KRAS mutations and primary resistance of lung adenocarcinomas to gefitinib or erlotinib. PLoS Med. 2005;2:e17. doi: 10.1371/journal.pmed.0020017. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Ding L., Getz G., Wheeler D.A., Mardis E.R., McLellan M.D., Cibulskis K., Sougnez C., Greulich H., Muzny D.M., Morgan M.B., et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.The Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511:543–550. doi: 10.1038/nature13385. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Shoushtari A.N., Chatila W.K., Arora A., Sanchez-Vega F., Kantheti H.S., Rojas Zamalloa J.A., Krieger P., Callahan M.K., Betof Warner A., Postow M.A., et al. Therapeutic Implications of Detecting MAPK-Activating Alterations in Cutaneous and Unknown Primary MelanomasMAP Kinase Drivers in Cutaneous and Unknown Primary Melanoma. Clin. Cancer Res. 2021;27:2226–2235. doi: 10.1158/1078-0432.CCR-20-4189. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Cancer Genome Atlas Network Genomic Classification of Cutaneous Melanoma. Cell. 2015;161:1681–1696. doi: 10.1016/j.cell.2015.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Vandin F., Upfal E., Raphael B.J. De novo discovery of mutated driver pathways in cancer. Genome Res. 2012;22:375–385. doi: 10.1101/gr.120477.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Babur O., Gönen M., Aksoy B.A., Schultz N., Ciriello G., Sander C., Demir E. Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol. 2015;16:1–10. doi: 10.1186/s13059-015-0612-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Hu Y., Yan C., Chen Q., Meerzaman D. gcMECM: graph clustering of mutual exclusivity of cancer mutations. BMC Bioinf. 2021;22:592. doi: 10.1186/s12859-021-04505-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kim Y.A., Madan S., Przytycka T.M. WeSME: uncovering mutual exclusivity of cancer drivers and beyond. Bioinformatics. 2017;33:814–821. doi: 10.1093/bioinformatics/btw242. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Fedrizzi T., Ciani Y., Lorenzin F., Cantore T., Gasperini P., Demichelis F. Fast mutual exclusivity algorithm nominates potential synthetic lethal gene pairs through brute force matrix product computations. Comput. Struct. Biotechnol. J. 2021;19:4394–4403. doi: 10.1016/j.csbj.2021.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Leiserson M.D.M., Wu H.T., Vandin F., Raphael B.J. CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer. Genome Biol. 2015;16:160. doi: 10.1186/s13059-015-0700-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Leiserson M.D.M., Reyna M.A., Raphael B.J. A weighted exact test for mutually exclusive mutations in cancer. Bioinformatics. 2016;32:736–745. doi: 10.1093/bioinformatics/btw462. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Szczurek E., Beerenwinkel N. Modeling mutual exclusivity of cancer mutations. PLoS Comput. Biol. 2014;10 doi: 10.1371/journal.pcbi.1003503. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Hua X., Hyland P.L., Huang J., Song L., Zhu B., Caporaso N.E., Landi M.T., Chatterjee N., Shi J. MEGSA: A powerful and flexible framework for analyzing mutual exclusivity of tumor mutations. Am. J. Hum. Genet. 2016;98:442–455. doi: 10.1016/j.ajhg.2015.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Canisius S., Martens J.W.M., Wessels L.F.A. A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence. Genome Biol. 2016;17:261–317. doi: 10.1186/s13059-016-1114-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Constantinescu S., Szczurek E., Mohammadi P., Rahnenführer J., Beerenwinkel N. TiMEx: a waiting time model for mutually exclusive cancer alterations. Bioinformatics. 2016;32:968–975. doi: 10.1093/bioinformatics/btv400. [DOI] [PubMed] [Google Scholar]
26.Liu S., Liu J., Xie Y., Zhai T., Hinderer E.W., Stromberg A.J., Vanderford N.L., Kolesar J.M., Moseley H.N.B., Chen L., et al. MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations. Bioinformatics. 2021;37:1189–1197. doi: 10.1093/bioinformatics/btaa957. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sanchez-Vega F., Mina M., Armenia J., Chatila W.K., Luna A., La K.C., Dimitriadoy S., Liu D.L., Kantheti H.S., Saghafinia S., et al. Oncogenic signaling pathways in the cancer genome atlas. Cell. 2018;173:321–337.e10. doi: 10.1016/j.cell.2018.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B. 1977;39:1–22. [Google Scholar]
29.Paijens S.T., Vledder A., de Bruyn M., Nijman H.W. Tumor-infiltrating lymphocytes in the immunotherapy era. Cell. Mol. Immunol. 2021;18:842–859. doi: 10.1038/s41423-020-00565-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Liu D.C., Nocedal J. On the Limited Memory Bfgs Method for Large-Scale Optimization. Math. Program. 1989;45:503–528. [Google Scholar]
31.Orlow I., Sadeghi K.D., Edmiston S.N., Kenney J.M., Lezcano C., Wilmott J.S., Cust A.E., Scolyer R.A., Mann G.J., Lee T.K., et al. InterMEL: An international biorepository and clinical database to uncover predictors of survival in early-stage melanoma. PLoS One. 2023;18 doi: 10.1371/journal.pone.0269324. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Luo L., Shen R., Arora A., Orlow I., Busam K.J., Lezcano C., Lee T.K., Hernando E., Gorlov I., Amos C., et al. Landscape of mutations in early stage primary cutaneous melanoma: An InterMEL study. Pigment Cell Melanoma Res. 2022;35:605–612. doi: 10.1111/pcmr.13058. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Cheng D.T., Mitchell T.N., Zehir A., Shah R.H., Benayed R., Syed A., Chandramohan R., Liu Z.Y., Won H.H., Scott S.N., et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J. Mol. Diagn. 2015;17:251–264. doi: 10.1016/j.jmoldx.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Zehir A., Benayed R., Shah R.H., Syed A., Middha S., Kim H.R., Srinivasan P., Gao J., Chakravarty D., Devlin S.M., et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 2017;23:703–713. doi: 10.1038/nm.4333. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Kostrzewa C.E., Luo L., Arora A., Seshan V.E., Ernstoff M.S., Edmiston S.N., Conway K., Gorlov I., Busam K., Orlow I., et al. Pathway Alterations in Stage II/III Primary Melanoma. JCO Precis. Oncol. 2023;7 doi: 10.1200/PO.22.00439. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Wan P.T.C., Garnett M.J., Roe S.M., Lee S., Niculescu-Duvaz D., Good V.M., Jones C.M., Marshall C.J., Springer C.J., Barford D., et al. Mechanism of activation of the RAF-ERK signaling pathway by oncogenic mutations of B-RAF. Cell. 2004;116:855–867. doi: 10.1016/s0092-8674(04)00215-6. [DOI] [PubMed] [Google Scholar]
37.Yaeger R., Corcoran R.B. Targeting Alterations in the RAF–MEK PathwayTargeting RAF and MEK Alterations. Cancer Discov. 2019;9:329–341. doi: 10.1158/2159-8290.CD-18-1321. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Dankner M., Wang Y., Fazelzad R., Johnson B., Nebhan C.A., Dagogo-Jack I., Myall N.J., Richtig G., Bracht J.W.P., Gerlinger M., et al. Clinical Activity of Mitogen-Activated Protein Kinase-Targeted Therapies in Patients With Non-V600 BRAF-Mutant Tumors. JCO Precis. Oncol. 2022;6 doi: 10.1200/PO.22.00107. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Yao Z., Torres N.M., Tao A., Gao Y., Luo L., Li Q., de Stanchina E., Abdel-Wahab O., Solit D.B., Poulikakos P.I., Rosen N. BRAF Mutants Evade ERK-Dependent Feedback by Different Mechanisms that Determine Their Sensitivity to Pharmacologic Inhibition. Cancer Cell. 2015;28:370–383. doi: 10.1016/j.ccell.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Chakravarty D., Gao J., Phillips S., Kundra R., Zhang H., Wang J., Rudolph J.E., Yaeger R., Soumerai T., Nissan M.H., et al. OncoKB: A Precision Oncology Knowledge Base. JCO Precis. Oncol. 2017;2017:1–16. doi: 10.1200/PO.17.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Dietrich P., Kuphal S., Spruss T., Hellerbrand C., Bosserhoff A.K. Wild-type KRAS is a novel therapeutic target for melanoma contributing to primary and acquired resistance to BRAF inhibition. Oncogene. 2018;37:897–911. doi: 10.1038/onc.2017.391. [DOI] [PubMed] [Google Scholar]
42.Berger A.H., Imielinski M., Duke F., Wala J., Kaplan N., Shi G.X., Andres D.A., Meyerson M. Oncogenic RIT1 mutations in lung adenocarcinoma. Oncogene. 2014;33:4418–4423. doi: 10.1038/onc.2013.581. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Karachaliou G.S., Alkallas R., Carroll S.B., Caressi C., Zakria D., Patel N.M., Trembath D.G., Ezzell J.A., Pegna G.J., Googe P.B., et al. The clinical significance of adenomatous polyposis coli (APC) and catenin Beta 1 (CTNNB1) genetic aberrations in patients with melanoma. BMC Cancer. 2022;22:38. doi: 10.1186/s12885-021-08908-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Sondka Z., Bamford S., Cole C.G., Ward S.A., Dunham I., Forbes S.A. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer. 2018;18:696–705. doi: 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Hodis E., Triglia E.T., Kwon J.Y.H., Biancalani T., Zakka L.R., Parkar S., Hütter J.C., Buffoni L., Delorey T.M., Phillips D., et al. Stepwise-edited, human melanoma models reveal mutations' effect on tumor and microenvironment. Science. 2022;376:474. doi: 10.1126/science.abi8175. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3

mmc1.pdf^{(596.1KB, pdf)}

Data S1. Tables S1–S8

mmc2.xlsx^{(49.3KB, xlsx)}

Document S2. Article plus supplemental information

mmc3.pdf^{(2.6MB, pdf)}

[bib1] 1.Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Carter S.L., Stewart C., Mermel C.H., Roberts S.A., et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Han Y., Yang J., Qian X., Cheng W.C., Liu S.H., Hua X., Zhou L., Yang Y., Wu Q., Liu P., Lu Y. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res. 2019;47:e45. doi: 10.1093/nar/gkz096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Reimand J., Bader G.D. Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 2013;9:637. doi: 10.1038/msb.2012.68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Mularoni L., Sabarinathan R., Deu-Pons J., Gonzalez-Perez A., López-Bigas N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 2016;17:128. doi: 10.1186/s13059-016-0994-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Tamborero D., Gonzalez-Perez A., Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013;29:2238–2244. doi: 10.1093/bioinformatics/btt395. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Jia P., Wang Q., Chen Q., Hutchinson K.E., Pao W., Zhao Z. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 2014;15:489–516. doi: 10.1186/s13059-014-0489-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Hou J.P., Ma J. DawnRank: discovering personalized driver genes in cancer. Genome Med. 2014;6:56. doi: 10.1186/s13073-014-0056-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Bashashati A., Haffari G., Ding J., Ha G., Lui K., Rosner J., Huntsman D.G., Caldas C., Aparicio S.A., Shah S.P. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol. 2012;13:R124. doi: 10.1186/gb-2012-13-12-r124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Ciriello G., Cerami E., Sander C., Schultz N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 2012;22:398–406. doi: 10.1101/gr.125567.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Pao W., Wang T.Y., Riely G.J., Miller V.A., Pan Q., Ladanyi M., Zakowski M.F., Heelan R.T., Kris M.G., Varmus H.E. KRAS mutations and primary resistance of lung adenocarcinomas to gefitinib or erlotinib. PLoS Med. 2005;2:e17. doi: 10.1371/journal.pmed.0020017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Ding L., Getz G., Wheeler D.A., Mardis E.R., McLellan M.D., Cibulskis K., Sougnez C., Greulich H., Muzny D.M., Morgan M.B., et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.The Cancer Genome Atlas Research Network Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511:543–550. doi: 10.1038/nature13385. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Shoushtari A.N., Chatila W.K., Arora A., Sanchez-Vega F., Kantheti H.S., Rojas Zamalloa J.A., Krieger P., Callahan M.K., Betof Warner A., Postow M.A., et al. Therapeutic Implications of Detecting MAPK-Activating Alterations in Cutaneous and Unknown Primary MelanomasMAP Kinase Drivers in Cutaneous and Unknown Primary Melanoma. Clin. Cancer Res. 2021;27:2226–2235. doi: 10.1158/1078-0432.CCR-20-4189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Cancer Genome Atlas Network Genomic Classification of Cutaneous Melanoma. Cell. 2015;161:1681–1696. doi: 10.1016/j.cell.2015.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Vandin F., Upfal E., Raphael B.J. De novo discovery of mutated driver pathways in cancer. Genome Res. 2012;22:375–385. doi: 10.1101/gr.120477.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Babur O., Gönen M., Aksoy B.A., Schultz N., Ciriello G., Sander C., Demir E. Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol. 2015;16:1–10. doi: 10.1186/s13059-015-0612-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Hu Y., Yan C., Chen Q., Meerzaman D. gcMECM: graph clustering of mutual exclusivity of cancer mutations. BMC Bioinf. 2021;22:592. doi: 10.1186/s12859-021-04505-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Kim Y.A., Madan S., Przytycka T.M. WeSME: uncovering mutual exclusivity of cancer drivers and beyond. Bioinformatics. 2017;33:814–821. doi: 10.1093/bioinformatics/btw242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19.Fedrizzi T., Ciani Y., Lorenzin F., Cantore T., Gasperini P., Demichelis F. Fast mutual exclusivity algorithm nominates potential synthetic lethal gene pairs through brute force matrix product computations. Comput. Struct. Biotechnol. J. 2021;19:4394–4403. doi: 10.1016/j.csbj.2021.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Leiserson M.D.M., Wu H.T., Vandin F., Raphael B.J. CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer. Genome Biol. 2015;16:160. doi: 10.1186/s13059-015-0700-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Leiserson M.D.M., Reyna M.A., Raphael B.J. A weighted exact test for mutually exclusive mutations in cancer. Bioinformatics. 2016;32:736–745. doi: 10.1093/bioinformatics/btw462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Szczurek E., Beerenwinkel N. Modeling mutual exclusivity of cancer mutations. PLoS Comput. Biol. 2014;10 doi: 10.1371/journal.pcbi.1003503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Hua X., Hyland P.L., Huang J., Song L., Zhu B., Caporaso N.E., Landi M.T., Chatterjee N., Shi J. MEGSA: A powerful and flexible framework for analyzing mutual exclusivity of tumor mutations. Am. J. Hum. Genet. 2016;98:442–455. doi: 10.1016/j.ajhg.2015.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24.Canisius S., Martens J.W.M., Wessels L.F.A. A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence. Genome Biol. 2016;17:261–317. doi: 10.1186/s13059-016-1114-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Constantinescu S., Szczurek E., Mohammadi P., Rahnenführer J., Beerenwinkel N. TiMEx: a waiting time model for mutually exclusive cancer alterations. Bioinformatics. 2016;32:968–975. doi: 10.1093/bioinformatics/btv400. [DOI] [PubMed] [Google Scholar]

[bib26] 26.Liu S., Liu J., Xie Y., Zhai T., Hinderer E.W., Stromberg A.J., Vanderford N.L., Kolesar J.M., Moseley H.N.B., Chen L., et al. MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations. Bioinformatics. 2021;37:1189–1197. doi: 10.1093/bioinformatics/btaa957. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Sanchez-Vega F., Mina M., Armenia J., Chatila W.K., Luna A., La K.C., Dimitriadoy S., Liu D.L., Kantheti H.S., Saghafinia S., et al. Oncogenic signaling pathways in the cancer genome atlas. Cell. 2018;173:321–337.e10. doi: 10.1016/j.cell.2018.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 28.Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B. 1977;39:1–22. [Google Scholar]

[bib29] 29.Paijens S.T., Vledder A., de Bruyn M., Nijman H.W. Tumor-infiltrating lymphocytes in the immunotherapy era. Cell. Mol. Immunol. 2021;18:842–859. doi: 10.1038/s41423-020-00565-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] 30.Liu D.C., Nocedal J. On the Limited Memory Bfgs Method for Large-Scale Optimization. Math. Program. 1989;45:503–528. [Google Scholar]

[bib31] 31.Orlow I., Sadeghi K.D., Edmiston S.N., Kenney J.M., Lezcano C., Wilmott J.S., Cust A.E., Scolyer R.A., Mann G.J., Lee T.K., et al. InterMEL: An international biorepository and clinical database to uncover predictors of survival in early-stage melanoma. PLoS One. 2023;18 doi: 10.1371/journal.pone.0269324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Luo L., Shen R., Arora A., Orlow I., Busam K.J., Lezcano C., Lee T.K., Hernando E., Gorlov I., Amos C., et al. Landscape of mutations in early stage primary cutaneous melanoma: An InterMEL study. Pigment Cell Melanoma Res. 2022;35:605–612. doi: 10.1111/pcmr.13058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33.Cheng D.T., Mitchell T.N., Zehir A., Shah R.H., Benayed R., Syed A., Chandramohan R., Liu Z.Y., Won H.H., Scott S.N., et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J. Mol. Diagn. 2015;17:251–264. doi: 10.1016/j.jmoldx.2014.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34.Zehir A., Benayed R., Shah R.H., Syed A., Middha S., Kim H.R., Srinivasan P., Gao J., Chakravarty D., Devlin S.M., et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 2017;23:703–713. doi: 10.1038/nm.4333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 35.Kostrzewa C.E., Luo L., Arora A., Seshan V.E., Ernstoff M.S., Edmiston S.N., Conway K., Gorlov I., Busam K., Orlow I., et al. Pathway Alterations in Stage II/III Primary Melanoma. JCO Precis. Oncol. 2023;7 doi: 10.1200/PO.22.00439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Wan P.T.C., Garnett M.J., Roe S.M., Lee S., Niculescu-Duvaz D., Good V.M., Jones C.M., Marshall C.J., Springer C.J., Barford D., et al. Mechanism of activation of the RAF-ERK signaling pathway by oncogenic mutations of B-RAF. Cell. 2004;116:855–867. doi: 10.1016/s0092-8674(04)00215-6. [DOI] [PubMed] [Google Scholar]

[bib37] 37.Yaeger R., Corcoran R.B. Targeting Alterations in the RAF–MEK PathwayTargeting RAF and MEK Alterations. Cancer Discov. 2019;9:329–341. doi: 10.1158/2159-8290.CD-18-1321. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38.Dankner M., Wang Y., Fazelzad R., Johnson B., Nebhan C.A., Dagogo-Jack I., Myall N.J., Richtig G., Bracht J.W.P., Gerlinger M., et al. Clinical Activity of Mitogen-Activated Protein Kinase-Targeted Therapies in Patients With Non-V600 BRAF-Mutant Tumors. JCO Precis. Oncol. 2022;6 doi: 10.1200/PO.22.00107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Yao Z., Torres N.M., Tao A., Gao Y., Luo L., Li Q., de Stanchina E., Abdel-Wahab O., Solit D.B., Poulikakos P.I., Rosen N. BRAF Mutants Evade ERK-Dependent Feedback by Different Mechanisms that Determine Their Sensitivity to Pharmacologic Inhibition. Cancer Cell. 2015;28:370–383. doi: 10.1016/j.ccell.2015.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Chakravarty D., Gao J., Phillips S., Kundra R., Zhang H., Wang J., Rudolph J.E., Yaeger R., Soumerai T., Nissan M.H., et al. OncoKB: A Precision Oncology Knowledge Base. JCO Precis. Oncol. 2017;2017:1–16. doi: 10.1200/PO.17.00011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 41.Dietrich P., Kuphal S., Spruss T., Hellerbrand C., Bosserhoff A.K. Wild-type KRAS is a novel therapeutic target for melanoma contributing to primary and acquired resistance to BRAF inhibition. Oncogene. 2018;37:897–911. doi: 10.1038/onc.2017.391. [DOI] [PubMed] [Google Scholar]

[bib42] 42.Berger A.H., Imielinski M., Duke F., Wala J., Kaplan N., Shi G.X., Andres D.A., Meyerson M. Oncogenic RIT1 mutations in lung adenocarcinoma. Oncogene. 2014;33:4418–4423. doi: 10.1038/onc.2013.581. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] 43.Karachaliou G.S., Alkallas R., Carroll S.B., Caressi C., Zakria D., Patel N.M., Trembath D.G., Ezzell J.A., Pegna G.J., Googe P.B., et al. The clinical significance of adenomatous polyposis coli (APC) and catenin Beta 1 (CTNNB1) genetic aberrations in patients with melanoma. BMC Cancer. 2022;22:38. doi: 10.1186/s12885-021-08908-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] 44.Sondka Z., Bamford S., Cole C.G., Ward S.A., Dunham I., Forbes S.A. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer. 2018;18:696–705. doi: 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 45.Hodis E., Triglia E.T., Kwon J.Y.H., Biancalani T., Zakka L.R., Parkar S., Hütter J.C., Buffoni L., Delorey T.M., Phillips D., et al. Stepwise-edited, human melanoma models reveal mutations' effect on tumor and microenvironment. Science. 2022;376:474. doi: 10.1126/science.abi8175. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Adaptation of a mutual exclusivity framework to identify driver mutations within oncogenic pathways

Xinjun Wang

Caroline Kostrzewa

Allison Reiner

Ronglai Shen

Colin Begg

Summary

Introduction

Material and methods

MEGSA framework

Proposed approach

Adjustment for tumor mutational burden

A statistical test to establish mutual exclusivity

Identifying driver mutation for each tumor and specific variants within a gene

Application of MAGPIE: The InterMEL study

Results

Illustration of MAGPIE: Data from the InterMEL study

Figure 1.

Table 1.

Table 2.

Benchmarking MAGPIE against existing tools using TCGA data

Table 3.

Figure 2.

Figure 3.

Operating characteristics of the method

Table 4.

Table 5.

Discussion

Web resources

Acknowledgments

Declaration of interests

Footnotes

Contributor Information

Supplemental information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases