Abstract
The high background tumor mutation burden in cutaneous melanoma limits the ability to identify significantly mutated genes (SMGs) that drive this cancer. To address this, we performed a mutation significance study of over 1,000 melanoma exomes, combined with a multi-omic analysis of 470 cases from The Cancer Genome Atlas. We discovered several SMGs with co-occurring loss-of-heterozygosity and loss-of-function mutations, including PBRM1, PLXNC1 and PRKAR1A, which encodes a protein kinase A holoenzyme subunit. Deconvolution of bulk tumor transcriptomes into cancer, immune and stromal components revealed a melanoma-intrinsic oxidative phosphorylation signature associated with protein kinase A pathway alterations. We also identified SMGs on the X-chromosome, including the RNA helicase DDX3X, whose loss-of-function mutations were exclusively observed in males. Finally, we found that tumor mutation burden and immune infiltration contain complementary information on survival of patients with melanoma. In summary, our multi-omic analysis provides insights into melanoma etiology and supports contribution of specific mutations to the sex bias observed in this cancer.
INTRODUCTION
Cutaneous melanoma is the most aggressive form of skin cancer. Most frequently it develops on non-acral, sun-exposed skin, linked with DNA damage from ultraviolet radiation (UVR). It can also arise on acral skin, such as the soles of the feet, palms of the hands, and fingernail matrix, where UVR is thought to play a lesser role.1 Melanomas originating from sun-exposed skin display one of the highest tumor mutation burden (TMB) among all malignancies.2–4 The majority of these mutations are UVR-induced C>T transitions occurring at dipyrimidines.5
An important yet poorly understood aspect of melanoma is that males have higher incidence and worse prognosis at all clinical stages.6,7 The mechanisms that mediate these differences remain unclear. Recently, differential expression of a gonosomal gene, PPP2R3B, between sexes in melanoma was proposed to explain some of these differences.8 However, the cumulative effect of X-inactivation-escaping genes on melanoma biology remains largely unknown.
Despite methodological advances in the identification of significantly mutated genes (SMGs)2,9–12, it remains difficult to determine which genes are under positive selection in melanoma. For instance, recent studies have reported a context-specific mutational signature characterized by extremely high mutation rates in ETS transcription factor binding sites.13–16 This phenomenon occurs at cytosines flanked by a specific sequence ([C]TTCCG)13, where transcription factor binding causes conformational changes increasing DNA vulnerability to UVR-induced damage14 and reducing repair efficiency.15,16 Classical trinucleotide mutation models do not account for this context-specific signature12,13, which can lead to spurious evidence of positive selection. Additionally, the large proportion of passenger mutations greatly reduces the statistical power to detect genes under positive selection.2 Previous estimates suggest ~1,000 melanoma exomes are needed to achieve the same sensitivity provided by 200 breast cancer cases.17 The largest integrative analysis of cutaneous melanoma from The Cancer Genome Atlas (TCGA) included 331 cases, identifying 13 SMGs4, and a more recent analysis of 437 cases identified 17 SMGs.12 Thus, a comprehensive catalogue of oncogenes and tumor suppressors is still lacking for cutaneous melanoma.
Here, we performed a mutation analysis of cutaneous melanoma, combining whole exome somatic variants for 1,014 melanomas from five studies2,4,18–20, with integration of the complete melanoma TCGA cohort of 470 cases with copy number, transcriptomic, methylation and clinical data. We controlled for background mutational processes by analyzing samples with different mutational signatures separately and limited the risk of false positives by accounting for ETS-binding sites and other confounding factors. For several identified SMGs, we observed independent evidence of positive selection, such as co-occurring mutations and loss-of-heterozygosity (LOH). The power gained by analyzing over 1,000 melanoma exomes, along with our integrative analysis, facilitated the identification of previously unrecognized SMGs in cutaneous melanoma, uncovered the importance of a male-specific tumor suppressor, DDX3X, and provided insights into the relationship of UVR, TMB, and immune infiltration with patient survival.
RESULTS
Summary of samples
We collected and uniformly annotated whole exome somatic variant calls for 1,014 melanomas (623 males, 390 females, and one unannotated) from four whole exome sequencing studies2,4,19,20 and one whole genome sequencing study18 (Supplementary Tables 1 and 2). The combined cohort comprised 219 primary, 663 metastatic, and 132 unannotated samples. The majority (n = 772) originated on non-acral skin, and the rest were from acral (n = 51), mucosal (n = 14), or of unknown, uncertain, or unavailable origin (n = 177). We referred to a published curated annotation to define non-acral cutaneous melanomas in TCGA.21 Cases from the Hayward et al. study (n = 183) and the majority from TCGA (n = 470) were systemic and radiation treatment naïve prior to tumor sample procurement (Supplementary Table 2). The other cohorts were not restricted to treatment naïve samples2,19,20. Only the TCGA cohort had matching gene expression, methylation and copy number data (Supplementary Table 3).
Identification of significantly mutated genes
We identified SMGs using OncodriveFML11 (OFML), an algorithm that detects positive selection by comparing the average impact score of the mutations in a gene with its expected distribution under the hypothesis of neutral evolution. While OFML uses a permutation approach that controls for variations of the mutation rate across the genome, it relies on a global estimate of the tri-nucleotide background mutation rates. Consequently, we stratified our cohort according to the dominant tri-nucleotide mutational signature in each sample using non-negative matrix factorization (NMF). The optimal NMF decomposition consisted of three mutational signatures (Extended Data Fig. 1a–d), which we compared to a set of 65 pan-cancer signatures from the COSMIC database (Extended Data Fig. 1f, g).22 Our first signature matched UVR-associated mutational signatures (SBS7a and 7b) that dominated the majority of non-acral cutaneous melanomas (Extended Data Fig. 1e). Our second signature was a mixture of an aging-associated signature (SBS1) and another signature of unknown etiology (SBS39), most prevalent in acral and mucosal melanomas (Extended Data Fig. 1d, e, g). Our third signature corresponded to an alkylating agent-associated mutational signature (SBS11) dominant in 13 samples, likely due to prior treatment with an alkylating agent (Extended Data Fig. 1d). We performed separate mutation significance analyses on UVR-high (>50% UVR-mutations, n = 824) and UVR-low samples (≤50% UVR-mutations, n = 177), excluding samples with a dominant alkylating signature (n = 13).
OFML employs the CADD score23, which combines multiple annotations (e.g. conservation measures such as phyloP24 and protein-level scores such as SIFT25) into a single metric to reflect the relative functional impact of any single nucleotide change. It does not explicitly distinguish between gain-of-function (GoF) and loss-of-function (LoF) mutations. To improve our ability to detect tumor suppressor genes (TSGs), we used an additional score that considers high confidence LoF mutations (frameshifts, loss of translation start sites, premature stop codons, and splice site mutations).2
We identified 38 SMGs (false discovery rate (FDR) < 1%) in our combined OFML analyses (Supplementary Table 4 and Extended Data Fig. 2a–d). These included established melanoma oncogenes and tumor suppressors in pathways related to RTK-RAS-MAPK kinase signaling (BRAF, NRAS, NF1, KIT, MAP2K1, RAC1), apoptosis and cell cycle (TP53, CDKN2A, RB1, CDK4), PI 3-kinase signaling (PTEN), immune evasion (B2M), epigenetic regulation (ARID2), and mRNA splicing (SF3B1) (Fig. 1a, b, Extended Data Fig. 2e). Comparing mutational frequencies across acral, mucosal, and UVR-high and -low non-acral cutaneous melanomas, we observed that KIT and SF3B1 were found significantly mutated only in the UVR-low analysis (Extended Data Fig. 2d) and had higher mutation frequency in mucosal melanomas (~21% [3 of 14] for SF3B1 and ~14% [2 of 14] for KIT), as reported previously (Extended Data Fig. 2f).18 Although KIT mutations were more frequent in acral (~8% [4 of 51]) compared to non-acral cutaneous melanomas (~4% [29 of 772])26,27, the UVR-low subset of non-acral cutaneous melanomas had a KIT mutation frequency comparable to acral melanomas (~10% [8 of 82]; Extended Data Fig. 2g).26
Filtering potential false positives
While OFML and similar well-established algorithms2,9–11 have demonstrated their proficiency in the identification of cancer driver genes, their mutational models remain a simplification of a more complex and heterogeneous process. For instance, several ETS binding sites exhibit high neutral mutation rates in melanoma (Extended Data Fig. 3a). This can lead to recurrent mutations that do not confer a selective advantage,13–16 but still deviate from background mutational models. While these mutations are usually located near the transcription start sites of actively transcribed genes, they can overlap with the coding region of low or non-expressed isoforms and be mis-annotated as non-synonymous variants. We believe this to be the case for STK19, SLC27A5, and SUCO among our SMGs (Extended Data Fig. 3b, c). We also observed nine SMGs (PDE7B, KCNQ, RNF217, SLC27A5, IVL, DACH1, RUNX1T1, HS3ST4, and DSPP) that had extremely low mRNA abundance and/or high neutral mutation rates (Extended Data Fig. 3d,e), two well-known discriminative features of false positives.10 We omitted these genes from downstream analyses.
Significantly Mutated Genes
Our SMG analysis highlighted evidence of positive selection for the recently reported candidate oncogene CNOT9/RQCD128 (mRNA helicase), the candidate tumor suppressor SETD219 (histone lysine methyltransferase), and members of the SWI/SNF (BAF) complex family, ARID1A and BRD7.2,12,19 Here, we report significant enrichment of LoF mutations in an additional member of the SWI/SNF complexes, PBRM1, in ~4% of melanoma cases. Altogether, SWI/SNF complex subunits highlighted by our study (ARID2, ARID1A, ARID1B, PBRM1, and BRD7) exhibited LoF mutations in >12% of melanoma samples (Extended data Fig. 4c). We also observed LoF mutations in a transmembrane receptor for semaphorins, PLXNC1, in ~5% of cases. Finally, the cAMP-protein kinase A (PKA) signaling pathway is known to play an important role in melanoma; however, driver somatic mutations affecting this pathway have remained elusive.29 We observed a significant enrichment of LoF mutations in PRKAR1A, a regulatory subunit of the cAMP-dependent PKA holoenzyme, which were found in ~2% of samples. PRKAR1A loss is known to activate PKA signalling and is observed in an autosomal dominant syndrome called Carney Complex, which is associated with the development of multiple neural-crest-derived tumors.30 Over 50% of mutations in most SMGs were likely acquired due to UVR mutagenesis (Fig. 1c). Our significance analysis omitted several established melanoma-associated genes, possibly due to their low mutation frequency or the limitations of OFML, and a saturation analysis suggests that additional low frequency driver genes would be uncovered in larger cohorts (Fig. 1d). These genes included APC, CTNNB1, EZH2, IDH1, KRAS, HRAS and PPP6C (Fig. 1a, b). We considered these genes false negatives and included them in downstream analyses.
To identify trending genes that did not meet our 1% FDR significance cut-off, we performed gene set enrichment analysis (GSEA) on 75 genes with an OFML FDR <10%. We found an expected enrichment of MAPK pathway genes (Extended data Fig. 4a), including two recently reported RASopathy genes with tumor suppressor functions, SPRED1 and RASA2.19,31 We identified one member of the mixed-lineage leukemia (MLL) complex family, KMT2B, as significantly mutated, and an enrichment for other members, KMT2A, MEN1, and KANSL1 in our mutation analysis (Extended data Fig. 4b). These MLL complex genes collectively exhibited LoF mutations in ~7% of samples (Extended data Fig. 4c).
Finally, three SMGs identified at <1% FDR were located on the X chromosome: DDX3X (a DEAD-box RNA helicase), CCNQ/FAM58A (the activating cyclin for CDK10), and ZFX (a C2H2 zinc finger transcription factor).3,4,12 Despite sex being one of the strongest independent prognostic factors in melanoma,6,7 sex differences in driver mutations have not yet been reported in melanoma.
DDX3X is a sex-specific tumor suppressor in cutaneous melanoma
Some tumor suppressors escape X chromosome inactivation (XCI), which has been proposed to explain the protective effect of the X chromosome against cancer.32 We compared TMB between sexes and observed lower values for autosomes in females relative to males (Fig. 2a).33 We observed no significant difference for the X chromosome, likely explained by the accumulation of mutations on the additional copy in females (Fig. 2a). We compared the mutation frequency of the SMGs identified in our analysis and observed that autosomal SMGs were mutated more frequently in males than females, but these differences were not statistically significant when controlling for the difference in TMB between sexes (Fig. 2b and Supplementary Table 5). The three X-linked SMGs, DDX3X, CCNQ and ZFX were also more frequently mutated in males (Fig. 2b). This was unexpected given the similar TMB observed between sexes for the X chromosome. DDX3X showed the only significant imbalance in our analyzed cohort (FDR < 1%; two-tailed Fisher’s exact test), with its LoF mutations (n = 19) found exclusively in males (Fig. 2c). This result remained significant when controlling for age, study, and TMB using a logistic regression approach (Extended Data Fig. 5a).
LoF mutations in DDX3X were associated with a decrease in its mRNA expression (Fig. 2d). A comparison of mutated allele frequencies with tumor sample purity derived computationally by ABSOLUTE suggests that most DDX3X LoF mutations are homozygous clonal (Fig. 2e), indicating they likely occurred prior to clonal expansion. Our NMF mutation signature analysis revealed ~75% of DDX3X mutations are attributable to UVR (Fig. 1c).
We examined all X-linked genes for differential expression between sexes and identified 45 genes significantly upregulated in females (Fig. 2f), which suggests they escape XCI. DDX3X expression was ~1.3-fold higher in melanomas from females (FDR < 1%). We observed biallelic expression of a common single nucleotide polymorphism (rs5963957) located in DDX3X (Fig. 2g). Furthermore, upregulated X-linked genes, and specifically DDX3X, had lower levels of promoter methylation (Fig. 2h). These results indicate that females are protected against complete loss of DDX3X in the event of a single mutation, as opposed to males, which could explain some of the observed sex bias in melanoma incidence and outcomes.
To gain insight into the biological consequences of DDX3X mutations in melanoma, we compared mRNA expression profiles of wild-type samples to those harboring LoF and missense DDX3X mutations in TCGA. We controlled for potential confounding factors, such as tumor purity, and confined our analysis to male samples. We identified 57 upregulated and 10 downregulated genes (FDR < 20%) (Fig. 3a), including DVL1, which exhibited 50% upregulation in mutant samples. DVL1 is a regulator of the WNT/β-catenin signaling axis, one of the best-characterized DDX3X-regulated pathways.34 Given the high genetic heterogeneity in these tumors, we sought additional evidence supporting these mutant DDX3X associated changes. We analysed public RNA-Seq data of DDX3X knockdown in three cell lines (K562, HepG2, and the melanoma cell line, HT144).35,36 We observed substantial concordance between expression differences in these lines and tumors (Extended Data Fig. 5b, c).
Considering DDX3X is a DEAD-box protein family member that has ATP-dependent RNA helicase activity37, we used enhanced crosslinking and immunoprecipitation (eCLIP) data from ENCODE project to examine whether DDX3X binding sites are enriched in differentially expressed genes.35,38 Given the strong positional enrichment of DDX3X peaks in 5’UTRs (Fig. 3b), we defined a set of DDX3X target genes, whose 5’UTRs overlap DDX3X binding sites. We compared these to a set of control genes, whose 5’UTRs overlap at least one binding site from a compendium of RNA binding proteins (RBPs), to account for potential biases associated with eCLIP experiments. In both cell lines and tumors, we observed enrichment of DDX3X targets in genes upregulated due to DDX3X knockdown or mutation compared to the control gene set (Fig. 3c, Extended Data Fig. 5d).
To identify pathways impacted by DDX3X mutations, we performed GSEA on DDX3X-associated gene expression differences in melanomas from TCGA. We identified 100 gene sets exhibiting differential regulation (FDR < 1%). Overall, 34 were concordantly differentially regulated in the HT144 melanoma line (p < 0.05) (Fig. 3d). Upregulated gene sets were related to metastatic processes, as well as RAS, PI3K, β-catenin and neuronal signaling pathways. Downregulated gene sets were involved in cell cycle processes and RNA metabolism. Altogether, this analysis suggests that DDX3X loss is associated with de-differentiation, invasiveness and reduced proliferation, consistent with a recent functional study.36
The DNA copy number landscape of cutaneous melanoma
The landmark melanoma TCGA study analyzed copy number data from 331 melanomas.4 To gain insight into additional genetic driver events targeted by copy number alterations, we obtained estimates of tumor purity, ploidy, and genome-wide copy number for the TCGA cohort using ABSOLUTE39 (Fig. 4). We confirmed that our copy number calls are positively correlated with mRNA expression of driver genes (Fig. 4b). Overall, the most frequent chromosome arm alterations included gain of 6p (40%), 7q (40%), 1q (35%), 7p (35%), and 8q (32%); and loss of 9p (63%), 10q (50%), 6q (45%), 10p (40%), 9q (38%), and 11q (32%) (Fig. 4c). None of the examined autosomal arms were completely lost (Fig. 4 e, f). Recurrent focal homozygous loss was observed for a few genes, including CDKN2A (25%), PTEN (5%), LINC00290 (3%), and SPRED1 (1%) (Fig. 4d). Most LOH events in samples that have undergone genome duplication were copy-neutral (i.e. at loci with a copy number of 2) (Fig. 4 e, f), supporting the notion they occur prior to genome duplication.39
We compared the copy-number profiles of UVR-high and UVR-low non-acral cutaneous melanomas. We observed chromosome arms 4p, 5p, 8q, 11q, and 22q were more frequently amplified in UVR-low cases (Fig. 4g), while chromosome arm 9q was more frequently deleted in UVR-high cases. Finally, a region of 15q overlapping SPRED1 and B2M was preferentially deleted in UVR-low melanomas.
We observed statistically significant co-occurrence between segmental LOH and LoF mutations in several tumor suppressors including B2M, MEN1, CDKN2A, PTEN, TP53, APC, NF1, and RB1 (Fig. 5a, Supplementary Table 6). In addition, BRD7 (OR = 10.40, P = 2.57×10−3), PLXNC1 (OR = 7.36, P = 8.01×10−3), and PBRM1 (OR = 6.07, P = 1.89×10−2) also exhibited association between LOH and LoF mutations. All PRKAR1A LoF mutations were concurrent with LOH (P = 2.75×10−04). Similarly, we observed significant co-occurrence between DNA copy gain and recurrent amino acid substitutions in three activators of the MAPK signalling pathway: KIT, BRAF, and NRAS (Fig. 5b, Supplementary Table 6). Overall, the frequency of local copy loss of SMGs was positively correlated with their enrichment of LoF mutations (Fig. 5c, d). Finally, we used GISTIC40 to identify significantly recurrent copy number alterations (q-value < 0.01) (Supplementary Tables 7, 8). Three SMGs (CDK4, KIT, and BRAF), in addition to EZH2, overlapped significantly amplified regions, and four SMGs (BRD7, B2M, CDKN2A, and PTEN), in addition to SPRED1 and KMT2A, overlapped significantly deleted regions (Fig. 4e).
Deconvolution of melanoma intrinsic and extrinsic expression profiles
To gain insight into the relationship between the mutational landscape and transcriptome, we screened for associations between driver gene alterations and cancer-cell intrinsic mRNA signatures. Previous studies used unsupervised clustering of mRNA profiles to group melanomas based on their dominant gene expression signatures.4,41 Four major signatures have been found in cutaneous melanoma: immune, keratin, MITF-Low, and MITF-high. Because some of these signatures can originate from stromal and immune cells, tumor purity can greatly impact transcriptomic grouping. Our analysis of tumor purity across TCGA samples revealed melanoma tumors vary widely in their stromal cell content (interquartile range of 15%−49%; Fig. 6a). Strong negative correlations were observed between tumor purity and expression for a large number of genes (Fig. 6b), implying a significant proportion of variance in expression reflects stromal cell content variations rather than differences in cancer cell gene expression.
To untangle cancer-cell-intrinsic and -extrinsic mRNA signatures, we applied NMF to gene expression data from 468 TCGA samples. In contrast to partitional clustering, NMF considers samples as a mix of k unknown signatures and proceeds to deconvolve each sample into its constitutive parts.42 An advantage of NMF is that it can be used to assign signature weights to samples when signatures are not discrete. This is highly relevant for immune related signatures, as the degree of infiltration is a continuous predictor of patient outcome (Extended Data Fig. 6a). The most stable NMF solution involved five signatures (Extended Data Fig. 6b–d), which we characterized using GSEA and the xCell tool.43–45
One signature showed a strong negative correlation with purity (Fig. 6c), consistent with a normal-cell origin. It was associated with an array of immune cell types (Fig. 6e) and predictive of patient survival (Fig. 6f).4,41 All samples exhibited some level of expression of this immune signature (Fig. 6d, Extended Data Fig. 6e). The second signature was characterized by high keratin expression and correlated with skin cells, such as keratinocytes and sebocytes (Fig. 6e, Extended Data Fig. 6f, g). This keratin signature was present almost exclusively in primary samples (Extended Data Fig. 6h) and likely explained by the presence of normal skin cells in those samples.
In contrast, the other three expression signatures had a positive correlation with tumor purity (Fig. 6c) and showed a pattern of mutual exclusivity (Fig. 6d, Extended Data Fig. 6e), suggesting they constitute well-defined cancer-cell intrinsic subgroups. This is further supported by the presence of highly concordant subgroups when performing classical clustering on purity-adjusted expression data (Extended Data Fig. 6i, j).
The first subgroup (n = 76) corresponded to the well-known melanoma mRNA subgroup characterized by low levels of the lineage-specific transcription factor, MITF (MITF-low) (Extended Data Fig. 7a–c).41 The second subgroup (n = 72) exhibited higher expression of genes that regulate oxidative phosphorylation (OxPhos) (Extended Data Fig. 7d, e), had the lowest expression of hypoxia-related genes, including HIF1A and VEGFA (Extended Data Fig. 7b, c), as well as the highest level of pigmentation (Fig. 6g). The third subgroup constituted the majority (n = 291) of melanoma samples (Common), characterized by higher expression of MITF, interferon signalling genes, and genes co-expressed with the SWI/SNF chromatin-remodelling subunit, SMARCA2 (Extended Data Fig. 7a, d, e). Whereas tumors within the OxPhos mRNA subgroup exhibited gene expression patterns resembling differentiated melanocytes, the Common and MITF-low signatures resembled other lineages of the neural crest origin as determined using xCell (Fig. 6e).41
We examined the relationship between our mRNA signatures and other genomic features, including TMB and UVR signature (expressed as the proportion of UVR-associated mutations) in non-acral cutaneous samples from TCGA. We observed no significant association between TMB or UVR and our intrinsic mRNA subgroups (Extended Data Fig. 7f, g). However, we observed a modest but robust correlation of our immune signature with TMB and the UVR signature (Extended Data Fig. 7h). We found 4 SMGs differentially mutated across our mRNA subgroups (FDR < 20%) (Figure 7a, Supplementary Table 9). CDKN2A and TP53 were preferentially mutated and had lower expression in MITF-Low and Common samples (Figure 7a, b), whereas PRKAR1A was preferentially mutated and had lower expression in the OxPhos samples. Finally, CTNNB1 and KIT had relatively more mutations and higher expression in OxPhos samples.
Correlates of immune infiltration and survival
We next asked whether mutations in individual SMGs were associated with our immune signature. Because infiltrated tumors have lower proportions of tumor originating sequencing reads, we controlled for purity and sequencing coverage using a partial correlation model. Only mutations in one SMG, PRKAR1A, showed a negative correlation with the immune signature following multiple hypothesis correction (FDR < 5%; Supplementary Table 10).
Previous studies observed that high TMB is associated with improved response to immune checkpoint inhibitors (ICIs)20,46 and longer survival in the cutaneous melanoma TCGA cohort.47 High TMB is thought to increase the likelihood that a tumor will express non-self antigens recognized by the immune system. More recently, UVR-induced DNA damage has been linked to improved survival21 and reported as a potential determinant in response to ICI.48,49 Here, we investigated the relationship of TMB, the UVR signature, and other clinical variables with melanoma post-accession survival (i.e. survival relative to time of tumor sample procurement) in patients with non-acral cutaneous melanoma in TCGA4. We first tested an initial set of clinical, pathological, and molecular features using univariate Cox proportional-hazards models and a p-value threshold of 0.05. Statistically significant predictors consisted of the immune signature, TMB, UVR signature, age, and tumor tissue site (Extended Data Fig. 8a). We then considered multivariable Cox proportional-hazards models for all possible subsets of predictors and compared the effect of TMB and UVR-signature inclusion on their quality, using the Akaike Information Criterion (AIC). The best models included the immune signature, tumor tissue site, age at sample procurement, and either UVR-signature or TMB (Extended Data Fig. 8b). We observed that the immune signature, UVR-signature and TMB were also amongst the best predictors of overall survival (i.e. survival relative to time of initial diagnosis) (Extended data Fig. 9). These results indicate wthat UVR-signature and TMB provide prognostic information complementary to immune infiltration (Fig. 8a, b). Including both UVR and TMB simultaneously did not significantly improve AIC or concordance index (Extended Data Fig. 8c, d), which is not surprising due to their substantial correlation (Spearman rho of 0.73) (Fig. 8d). Notably, when restricting our analysis solely to UVR-high samples, TMB, but not the proportion of UVR mutations, provided a significant improvement to the model (Fig. 8c, Extended Data Fig. 8e, f). This suggests that TMB provides information on melanoma patient survival not included in the UVR signature.
We next explored if tumor neoantigen load is more informative than TMB regarding patient survival. The recent TCGA Pan-Cancer Atlas neoantigen study limited their analysis to primary tumors of ~100 melanomas.50 We implemented a pipeline to predict neo-peptide binding to MHC class I for the complete TCGA cohort (n = 457) (Fig. 8e). To maximize the sensitivity of our analysis, we considered different levels of stringency by grouping antigenic mutations into four tiers, based on the predicted binding affinities of the mutated and wild-type peptides. As expected, we observed extremely high correlation (Pearson > 0.99) between TMB and neoantigen load (Fig. 8f)48. Substituting TMB by neoantigen load did not improve our survival models (Extended Data Fig. 10).
We next sought for evidence of negative selection acting upon the accumulation of antigenic mutations by comparing the number of predicted HLA-mutation pairs to the distribution obtained with 1,000 random permutations of the HLA alleles across patients. We did not observe significant depletion for any tier. These results are consistent with a prior analysis that did not detect evidence of negative selection in 99 melanoma samples, and with a recent study that estimated ~99% of missense mutations are tolerated and escape negative selection.12
Despite the absence of a strong immunoediting signal in the melanoma TCGA cohort, studies have shown that specific neoantigens can be exploited therapeutically.51 We looked for recurrent antigenic peptides and their associated mutations in our extended cohort. In addition to known recurrent neoantigens in BRAF and H/K/NRAS, we highlight here less appreciated recurrent neoantigens predicted for RAC1 and CDKN2A (Fig. 8g). Whether these neoantigens are therapeutically relevant for the development of personalized tumor vaccines requires further investigation.
DISCUSSION
Male specific DDX3X loss-of-function mutations
Women have lower melanoma incidence and better prognosis than men. Epidemiological studies estimate on average, for a 20-year old individual, the risk of any mole transforming into a melanoma by the age of 80 is 3 times higher in males than females.52 This has been attributed to behavioral factors; however, sex has been shown to be an independent prognostic factor in cutaneous melanoma and evidence clearly points to either tumor-intrinsic or host-related biological sex differences.6,7 Here, we provided evidence that DDX3X escapes XCI and is preferentially mutated in male melanoma patients, potentially explaining some of the sex differences observed in this malignancy. We also performed an integrative analysis of multiple datasets that support dysregulation of RAS, PI3K, β-catenin pathways upon DDX3X loss.
Our findings raise many questions. First, it is unclear what role DDX3Y, the Y-linked paralog of DDX3X, plays in melanoma. We observed that males carrying DDX3X mutations had concurrent mRNA expression of DDX3Y and did not observe significant co-occurrence between DDX3X and DDX3Y mutations (Extended Data Fig. 5e, f). Although these paralogs share 92% amino acid identity, genetic studies have shown that DDX3Y does not compensate for loss of DDX3X.53 Specifically, germline mutations in DDX3X have been associated to intellectual disability (ID), and pedigree analysis of ID-affected families have reported cases of DDX3X mutations causing ID in males, but not in carrier females within the same family.53 This is consistent with reports indicating that although DDX3Y mRNA is found in many human tissues, DDX3Y protein is observed only in spermatocytes.54 Conversely, a CRISPR-Cas9 screening study observed that DDX3Y was essential in a DDX3X mutant cancer cell line of male origin55. Future studies characterizing DDX3X and DDX3Y expression and function in melanoma are required. Furthermore, trends are emerging in meta-analyses of sex differences in overall survival rates in ICI trials.56 Whether DDX3X plays a role in modulating response to ICI requires further examination.
The cAMP-PKA signaling pathway
Recently, LoF mutations in PRKAR1A were reported in 2 of 27 whole-exome sequencing cases of spitzoid melanoma; however, none were reported in conventional non-acral cutaneous melanoma.57 Spitzoid melanoma is an uncommon melanocytic neoplasm composed of large atypical epithelioid or spindled cells, more frequently presented in childhood or adolescence as an unpigmented nodule.1 Here, we identified PRKAR1A as a SMG in ~2% of cases. To determine whether these mutations were solely in spitzoid melanomas, two dermatopathologists examined the digitized tumor slides, pathology reports and clinical data for 4 primary and 3 metastatic cases harbouring a PRKAR1A LoF mutation in the TCGA dataset. Both dermatopathologists indicated none of these melanomas either displayed spitzoid morphology nor had clinical features associated with spitzoid melanoma. These results indicate that PRKAR1A loss is an infrequent but significant genetic event in conventional non-acral cutaneous melanoma.
PRKAR1A encodes for the regulatory type IA subunit for the cAMP-dependent PKA holoenzyme.29 The holoenzyme exists as an inactive tetramer, which consists of two pairs of regulatory and catalytic subunits (Fig. 7c). Loss of PRKAR1A function is known to activate PKA signalling, and germline LoF variants in PRKAR1A have been linked to the Carney Complex syndrome.30 By performing cross-platform integrative analysis, we observed that PRKAR1A LoF mutations are enriched in melanomas belonging to the OxPhos mRNA subgroup, which exhibits high expression of the PRKACA catalytic subunit (Fig. 7b). A similar OxPhos expression signature has been linked to BRAF inhibitor resistance.58 A genome-wide open-reading-frame screen identified PRKACA as the highest scoring serine/threonine kinase to promote BRAF inhibitor resistance.59 When examining published sequencing studies of BRAF inhibitor pre- and post-resistance melanoma samples, PRKAR1A mutations were found in 2 of 45 (4.4%) post-treatment resistant cases.60 Whether PRKAR1A loss is associated with BRAF inhibitor resistance requires further investigation.
UVR and TMB in melanoma patient survival
Studies have linked high TMB with improved ICI response and survival in patients with melanoma20,46. However, two recent reports have suggested that these results are confounded by different melanoma subtypes (acral, mucosal and uveal), which generally have lower ICI responses, but also lack a UVR mutation signature and have lower TMB.48,49 Here, we examined the relationship of UVR, TMB, the immune signature and other clinical variables with patient survival in non-acral cutaneous melanomas from TCGA that were predominantly procured prior to the widespread implementation of ICI therapies in the clinic. We observed TMB provides complementary information to immune infiltration on patient survival, even when restricting our analysis to non-acral cutaneous melanomas with a high UVR signature, although this effect was weaker in the latter case. These results support the notion that TMB is not simply distinguishing melanoma subtypes (non-acral versus acral, mucosal, and uveal), but is having an impact on patient survival. It will be interesting to see if the association between TMB and ICI response in melanoma re-emerges when analyzing larger cohorts of patients with a more comprehensive characterization of immune infiltration.
METHODS
Variant processing
Aggregated somatic mutation files from 470 TCGA-SKCM samples were downloaded from the GDC61 portal. To mitigate sequencing errors and alignment artefacts, we only considered TCGA variants that were reported by at least three callers in at least one sample. Variants from the 183 MELA-AU whole genomes18 were downloaded from the ICGC data portal62. Variants from the Hodis2, Krauthammer19 and VanAllen20 cohorts were retrieved from the associated publications. Variants from hg19-based datasets were mapped to the hg38 reference using the rtracklayer R package. We discarded any variants with ambiguous coordinates (non-bijective mapping between hg19 and hg38) or discordant reference allele. The hg19-based coordinates of TCGA variants were similarly determined. Adjacent SNVs within each sample were identified using the GenomicRanges R package63 and merged back into MNVs. The combined set of variants from all five studies was re-annotated with snpEFF v.4.3s (2017–10-25)64 using Ensembl GRCh38.86 gene models and dbSNP build 150. Common germline variants were excluded from downstream analysis.
Mutational signatures analysis
Mutational signatures were identified using non-negative matrix factorization (NMF) from the NNLM R package (version 0.4.2), considering a trinucleotide context model without strand specificity (96 mutation types). Thus, the mutation counts for the 1,014 melanoma samples were arranged in a 96-by-1014 matrix V, and NMF was applied to obtain a decomposition , where W is a 96-by-k matrix containing k mutational signatures, and H is a k-by-1014 matrix representing the signatures’ absolute contribution to each sample. NMF was run with the Kullback-Leibler divergence loss function and a maximum of 50,000 iterations. The optimal decomposition rank k (i.e. number of mutation signatures) was determined using three repetitions of five-fold cross-validation. For each fold, one-fifth of the input matrix V was randomly masked, and the mean squared error (MSE) between the predicted and original values of the masked entries was computed. The rank with the smallest mean MSE was selected. The final NMF decomposition is provided in Supplementary Tables 12 and 13 for matrices W and H, respectively.
To estimate the proportion of mutations attributed to a mutational signature k in each sample, we first multiplied the signature’s corresponding column in W by its row in H to produce a matrix, W*,kHk,*, that contains estimated sample-wise tri-nucleotide mutation counts for the signature. We then divided the column sums of W*,kHk,* by the column sums of WH. A similar procedure was used to estimate gene-wise signature contributions.
Significantly mutated genes
We used OncodriveFML 2.0.311 to identify genes under positive selection. Analyses were done separately for the UVR-high (n = 824) and UVR-low (n = 177) samples, defined as having ≤50% or >50% of their mutations originating from the UV-signature. Samples with >50% of their mutations originating from the alkylating signature (n = 13) were omitted from these analyses.
We ran OncodriveFML twice for each UVR group, using default CADD scores23 and custom LoF scores devised for the identification of tumor suppressor genes. LoF scores were obtained by generating all possible single nucleotide variants across the coding genome, followed by snpEff annotation (v.4.3s, Ensembl GRCh37.75 gene models). Variants with a loss-of-function consequence on any protein coding transcripts were given a score of 1 and all other variants were given a score of 0. These consequences were considered loss-of-functions: stop_gained, start_lost, splice_acceptor and splice_donor. Since frameshift variants are treated independently by OncodriveFML, they were not explicitly included in the LoF scores.
For each OncodriveFML run, genes with less than 10 mutations were discarded and p-values were adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure to control the false discovery rate (FDR). Genes that passed an FDR cut-off of <1% were labelled “significantly mutated”. Results of all OncodriveFML runs are provided in Supplementary Table 4.
To compute an LoF enrichment score (Fig. 5d), we estimated the expected (neutral) proportions of LoF and synonymous variants in each gene, according to a penta-nucleotide context12, and use the following formula:
The following variants were considered as LoF: stop_gained, start_lost, splice_acceptor and splice_donor. To prevent extreme values for genes with few mutations, we added three pseudo-counts to both the numerator and denominator of the plotted estimates.
Saturation analysis
To measure the influence of sample size on the number of identified SMGs, we ran OncodriveFML ten times using n = (100, 150, …, 800) tumors randomly chosen from the high-UV datasets. The number of genes that passed an FDR cut-off of <1% in each run was then plotted against the number of considered samples.
Identification of potential false positives
We considered three criteria to identify potential false positive SMGs: (1) high proportions of mutations in ETS transcription factor binding sites (>10% of all mutations in gene), (2) high neutral mutation rate (>3 × 10−05 mutations per nucleotide per sample), (3) lack of or low gene expression in melanoma cell lines (90th percentile of RPKM < 1).
ENCODE’s clustered ChIP-seq data for 161 transcription factors65 was downloaded from the UCSC Genome Browser66. We selected peaks with an ENCODE’s normalized score ≥500 from the following ETS factors: ETS1, GABPA, ELF1, ELK1 and ELK4. Overlapping variants were identified using the GenomicRanges package.
To estimate neutral mutation rates, we used the mutation data from the 183 melanoma whole genomes (MELA-AU). For each gene, we considered a centered window of at least 100kb spanning its complete set of transcripts. We then excluded any coding, evolutionary conserved or low mappability regions (Supplementary Table 19; neutral mutation rate estimation). The gene-level mutation rate was computed as the number of variants falling within the non-excluded regions, divided by their total size. This method was implemented in R with the rtracklayer and GenomicRanges packages.
Mutated genes pathway enrichment analysis
We tested if genes with an OFML FDR < 10% were enriched for biological pathways or complexes from the Reactome67 “ENSEMBL to pathways” databaseand EpiFactors database68. The enrichment of each pathway or complex was tested using a one-tailed Fisher’s exact test. For each test, the “gene universe” was defined as the set of genes tested for mutational significance in any of the four OFML runs (CADD-UVR-high, CADD-UVR-low, LoF-UVR-high and LoF-UVR-low). P-values were adjusted for multiple testing using the Benjamini-Hochberg procedure independently for Reactome and EpiFactors.
Copy number and purity analysis
ABSOLUTE
Haplotype phasing and copy-ratio segmentation was done with HAPSEG (version 1.1.1) 69 using Affymetrix SNP6 microarray data from 463 TCGA tumor-normal pairs acquired from the legacy GDC archive. Somatic tumor variants and HAPSEG segmentations were processed with ABSOLUTE39 (version 1.0.6) to obtain purity, ploidy and genome-wide allelic copy numbers (solution obtained for 449 samples). ABSOLUTE segmentation was intersected with gene coordinates (Ensembl GRCh37.75) to obtain gene level LOH and total copy numbers in each tumor sample. Genes overlapped by multiple segments were assigned the lowest total copy number and were considered to exhibit LOH if at least one segment supported it. The local gain or loss of a gene was determined using the ratio of its absolute copy number relative to the median copy number of the chromosome arm where it resides. When this ratio was greater than or equal to three, a gene was considered amplified.
Co-occurrence between mutations and copy gain or LOH
Co-occurrence enrichment between mutations and segmental LOH or copy gain in candidate driver genes was tested using a one-tailed Fisher’s exact test. For LOH, we considered LoF mutations only. For copy gain, we considered missense mutations at recurrently mutated amino acids (N > 1) only. To ensure sufficient power, only genes having mutations and segmental events in at least three samples were tested. For any given gene locus, samples with homozygous deletion (0 copy) were excluded from the tests.
Significantly amplified or deleted regions
We used GISTIC to identify significantly amplified or deleted regions. Segmented copy ratios (germline CNV masked) for 470 TCGA tumor samples were acquired from the GDC data portal. Segments that exceeded the telomeric- or centromeric-most array probes were truncated to be within covered genomic regions. To improve sensitivity, we applied in silico admixture removal70 to samples for which ABSOLUTE ploidy and purity estimates were available, using the following formula:
Non-positive copy ratios were capped to 1e-3. Adjusted ratios were log2 transformed, centered on their mode and passed to GISTIC. In Fig. 4e, GISTIC wide peak boundary coordinates were converted from hg38 to hg19 using liftOver to be visualized with ABSOLUTE copy number profiles.
Transcriptome analysis
RNA-seq processing
Raw RNA-Seq read counts were download from the GDC portal for the TCGA-SKCM cohort, and from the CCLE data portal for the melanoma cell lines. Counts of protein coding genes were converted to CPM after TMM normalization using the edgeR package. RPKM/FPKM values were calculated using the rpkm function.
Deconvolution of transcriptomic profiles by NMF
We used NMF (as implemented in the NMF71 R package) to deconvolve cancer and stromal transcriptomic profiles in the TCGA-SKCM cohort. The output of NMF consists of 2 matrices, W and H, whose product is the approximated matrix of observed CPM values. In this context, W is a gene-by-signature matrix containing the weights of each gene’s contribution to a signature and H is a signature-by-sample matrix containing the weight of each signature’s contribution to a sample. Here, signatures can be seen as cell-type specific gene expression modules. NMF was applied to a matrix of CPM values for 5000 genes and 470 samples, with the Brunet optimization algorithm. The genes were selected to have the largest mean absolute deviation (calculated using log-transformed CPM values) amongst autosomal protein coding genes with a mean RPKM > 1. The optimal number of signatures (i.e. decomposition rank) was determined using the proportion of ambiguous clustering (PAC).72 The final NMF decomposition is provided in Supplementary Tables 14 (matrix W) and 15 (matrix H).
Confirmation of intrinsic profiles using PCA and clustering
We recovered the three intrinsic NMF signatures using a classical clustering approach on purity adjusted gene expression. To mitigate the effect of stromal cell contamination, we restricted our analysis to genes with at least one read in all samples and whose expression (log-transformed CPM) positively correlated with tumor purity (Pearson correlation > 0.1) and not strongly positively correlated with NMF’s keratin signature (Pearson correlation < 0.2). The log-transformed CPM values of 5166 retained genes were regressed (linearly) on tumor purity. Genes were ranked by decreasing variance of the residuals, and the top 1500 were used for clustering of the tumor samples using the ConsensusClusterPlus73 R package, with 1000 resampling iterations of kmeans clustering with k = 3. The transcriptomic subgroup of each sample was assigned based on their membership to one of the 3 clusters.
Transcriptomic signatures and xCell analysis
A gene by sample matrix of mRNA RPKM expression values for 468 TCGA samples was passed to xCell to obtain cell-type enrichment scores for each sample. Spearman’s correlation between cell-type’s scores and NMF component weights was computed and plotted in Figure 6e.
Transcriptomic subgroup gene set enrichment analysis
Differential gene expression analysis was performed using DESeq274, comparing samples in each transcriptomic subgroup to all other samples. For each comparison, log2 fold-differences were supplied to the GSEA tool,44,45 using default parameters with the Hallmarks and Curated (C2) gene sets.
Differential gene alteration analysis across transcriptomic subgroups
Differential alteration frequencies (coding mutations, homozygous deletions, and local amplifications) of candidate driver genes across transcriptional subgroups was assessed using a two-tailed Fisher’s exact test. For each gene, the test was performed on a two-by-three contingency table of alteration counts (gain and loss) and mRNA subgroups. P-values were adjusted for multiple testing using the Benjamini-Hochberg procedure.
X-linked analysis
Sex-biased mutation frequency
A two-tailed Fisher’s exact test was used to determine if a given SMG is differentially mutated (missense, inframe-indel or LoF) between males and females. To control for the different neutral mutation burden observed in males and females (see Figure 2A), separate null hypotheses [i.e. expected odds ratio (ORs)] were considered for autosomal and X-linked genes. Specifically, we set the expected OR of the Fisher’s test (i.e. “or” parameter in R’s fisher.test() function) to the median OR observed across all non-SMGs (mutated in at least 10 samples to ensure reliable estimates), considering autosomal and X-linked genes separately. P-values were adjusted for multiple testing using the Benjamini-Hochberg procedure.
We complemented the Fisher’s test using a logistic regression approach, whereby the mutation probability of each gene is modeled as a function of sex and additional covariates specified in Extended Data Fig. 5a. We used the number of SNVs (log-transformed) on the autosomes (or X chromosome for X-linked genes) to account for differences in mutation burden across samples. We note that this approach cannot be used when the outcome variable completely separates one or more of the predictor variables, as is the case for DDX3X LoF mutations that were found exclusively in males.
Differential gene expression between sexes
Differential gene expression analysis of X-linked genes between males (n = 174) and females (n = 273) was performed using DESeq274. Specifically, gene expression was modeled as a function of gender, tumor purity, and tumor tissue site (i.e. primary, regional cutaneous or sub-cutaneous, regional lymph node, and distant metastasis). DESeq2 was initially run on all protein-encoding genes to ensure precise estimates of dispersion. Expression fold-differences between genders and their respective P-values for X-linked genes were subset and adjusted for multiple testing using the Benjamini-Hochberg procedure independently of other genes.
Promoter methylation
Promoter methylation was calculated by taking the mean Beta value of all methylation probes 2kb upstream of a gene’s most 5’ transcription start site, in each of 180 female samples.
Bi-allelic expression of DDX3X
RNA-seq BAM slices of the DDX3X locus were downloaded from GDC for all TCGA-SKCM samples, and nucleotide counts were determined at each genomic position using the Rsamtools package.75 We then looked for common SNPs (average heterozygosity >= 20%, dbSNP150) located within any DDX3X exon and covered by at least 10 reads in >50% of the samples. Only one SNP fulfilled these criteria, rs5963957 (A/C, hg38:chrX:41349057; avHet = 0.43), with a median coverage of 274 reads across samples. The distribution of nucleotide counts at this position confirmed bi-allelic expression in most female samples.
DDX3X functional analysis
Differential gene expression (DGE) analysis of mutant and WT DDX3X tumors
We applied a linear model framework for transformed RNA-seq read counts, implemented in the limma R package76, to RNA-seq data from the TCGA. Starting with a gene-by-sample matrix of read counts, we retained protein coding genes that in at least 50 samples had a counts-per-million (CPM) value ≥ a CPM corresponding to 10 reads in the sample with the smallest library (number of genes = 14,011). Then, sample-wise normalization factors were calculated using the TMM method implemented in edgeR77 and were subsequently provided along with the read counts to limma’s voom function to estimate observation weights. Linear models were fitted to the voom-weighted observations using limma’s lmFit function and differential expression estimates were moderated using limma’s eBayes function. P-values corresponding to the log2 fold-changes were adjusted for multiple testing using the Benjamini-Hochberg procedure.
We restricted our analysis to male samples, which harbored the vast majority of DDX3X mutations. We modeled gene expression as a function of (1) the mutation status of DDX3X (LoF, missense, or wildtype), (2) the intrinsic gene expression signatures from NMF, (3) the immune signature, (4) top three principal components corresponding to the gene expression (log2CPM + a prior count of 5) of “Keratinization” and “Formation of the cornified envelope” related genes listed in the Reactome database (downloaded October 6, 2019), as these captured more of the variance in Keratinocyte gene expression than NMF’s Keratin signature, and (5) the expression level of DDX3Y (high or low, based on a [log2CPM + a prior count of 5] > 5 cut-off determined based on the relation of DDX3Y expression and tumor purity). We computed the fold-change in gene expression between DDX3X mutant and wildtype samples that had high DDX3Y expression (22 and 167 samples respectively), as the majority of DDX3X mutations occurred in DDX3Y expressing tumors.
Differential gene expression analysis of DDX3X KD and control cell lines
For each cell line (melanoma HT144 cells36, hepatocellular carcinoma HepG2 cells, and chronic myelogenous leukemia K562 cells35), we quantified mRNA expression using Kallisto (default parameters and GENCODE v22 gene annotations). We used DESeq274 with default parameters to estimate differences in gene expression between DDX3X knockdown and control conditions. P-values were adjusted for multiple testing using the Benjamini-Hochberg procedure. SRA accessions for RNA-seq data are in Supplementary Tables 16 and 17.
Positional enrichment of DDX3X eCLIP peaks
Enhance crosslinking immunoprecipitation (eCLIP) peaks that passed an irreproducible discovery rate (IDR) cut-off <0.01 were acquired from ENCODE for 150 RNA binding proteins (RBPs) (103 for HepG2 and 120 for K562) including DDX3X. To determine the positional enrichment of these peaks in gene bodies, we first binned the genomic coordinates of each gene into 1000 tiles, with the first tile starting at the 5’end of the gene. Then, for each tile [1–1000], we computed the proportion of genes that overlap at least one peak for the RBP of interest at that tile as a fraction of all genes that overlap one or more peaks for that RBP at any tile. The ENCODE IDs of eCLIP data are in Supplementary Table 18.
Enrichment of DDX3X targets in differentially regulated genes
Genes were divided into two groups based on whether their 5’UTR(s) exclusively overlap at least one IDR eCLIP peak for DDX3X or another RNA binding protein (RBP). For each group, 2D kernel density estimates for differential gene expression in tumors (DDX3X mutant vs. WT) and cell lines (DDX3X knockdown vs. control) were estimated using the kde2d function from the MASS package in R. The bandwidth parameter of the function was set to 0.4. The difference in densities between the two groups of genes was computed and plotted in Fig. 3c and Extended data Fig. 5d.
Gene set enrichment analyses for DDX3X differential expression
We tested for enrichment of gene sets in differentially expressed genes using weighted logistic regression models. For each gene set in the Reactome database (downloaded October 6, 2019), we modeled the presence or absence of each gene in the set (i.e. number of ‘successes’), as a fraction of the total number of sets to which the gene is annotated (i.e. number of ‘trials’), using differential gene expression as an explanatory variable. In this model, each observation (fraction of successes) is weighted by the number of trials. We extracted the differential gene expression coefficients and their corresponding P-values from each model for plotting and further analysis. P-values were adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure.
Survival analysis
The majority of TCGA specimens analyzed were from metastases (363 of 465). For some patients, this was years after their primary melanoma diagnosis. Because the biology of a metastatic melanoma and its immune cell content may differ from that of its original primary melanoma, we focused on post-accession survival times as in the melanoma TCGA marker publication.4 In summary, patient vital status and the number of days from primary melanoma diagnosis to death or last follow-up were acquired from the GDC data portal (overall survival). We also obtained the number of days between primary diagnosis and sample procurement (sample procurement times) from the Broad Institute TCGA GDAC Firehose website:
(https://gdac.broadinstitute.org/). We subtracted these sample procurement times from the overall survival times to obtain “post-accession survival times”.
We modeled survival time using Cox proportional hazards regression in R. Kaplan-Meier estimator plots were generated using the survfit function from the survival package (version 2.43–3) in R. In all Kaplan-Meier plots, we limited our survival analysis to patients with molecularly profiled metastatic melanoma lymph node specimens only (n = 216). P-values associated with Kaplan-Meier plots are from a log-rank or Mantel-Haenszel test performed using the survdiff R function.
Neoantigen analysis
HLA typing of the TCGA-SKCM samples was performed with Optitype78 using the BAM files from the normal tissue samples. MHC-I binding predictions were obtained with netMHCpan479. Variant processing was performed as follow. We first extracted the mutated and wild-type sequences of a 17aa window centered on each missense mutation using the Biostrings and ensembldb R packages. These sequences were then processed with netMHCpan4 to predict their MHC-I binding affinity, using a 9aa window. We used the default percentile rank thresholds provided by netMHCpan4 to classify peptides into strong (<0.5%) or weak (<2%) binders. Predicted antigenic mutations were grouped into 4 tiers of decreasing specificity as follow: Tier 1 includes mutations creating at least one peptide with strong binding prediction but whose wild-type form is not predicted to be a strong binder. Tier 2 includes any mutation with a strong binding prediction, without regard to the binding predictions of the wild-type forms. Tier 3 includes mutations creating at least one peptide with weak binding prediction, but whose wild-type form is not predicted to bind. Tier 4 includes any mutation with weak binding prediction, without regard to the binding predictions of the wild-type forms. Finally, all tiers were updated to include mutations from less specific tiers (i.e. tier k includes any mutations in tier k-1). Only variants with median expression > 1 TPM (as estimated by Kallisto)80 were considered as potential neoantigens.
To test for evidence of negative selection, we compared the number of predicted antigenic mutations in the TCGA-SKCM cohort with the distribution obtained over 1000 random permutations of the HLA alleles across patients. Importantly, to remove bias that could results from population structure or the HLA-typing step, we considered the sum of predicted antigenic mutations over the six HLA alleles in each patient (i.e. antigenic mutations recognised by homozygous HLA loci are counted twice). We estimated the statistical power of this approach by applying the same procedure on randomized datasets in which varying proportions of antigenic mutations were specifically removed to simulate increasing levels of negative selection. For predicted MHC-I strong binding peptides (tiers 1 and 2), power reached 80% when 7.5% of antigenic mutations were removed.
Adjusting TMB for WES coverage and purity
For each TCGA sample, we determined the proportion of the coding genome that has sufficient read coverage to provide 80% power for mutation detection using ABSOLUTE estimates. We divided the observed TMB by this value to obtain the expected TMB if coverage was sufficient for 100% of the coding genome.
Statistics and reproducibility
In this study, we aimed to analyze the largest possible cohort of melanoma whole exomes. No statistical method was used to predetermine sample size, as this number was dictated by the availability of published datasets.
Four TCGA patients had multiple corresponding tumor samples. Prior to our analyses, we decided to exclude the following redundant samples, arbitrarily prioritizing primaries over metastases: TCGA-ER-A19T-06A, TCGA-ER-A2NF-06A, TCGA-D3-A1Q6–07A and TCGA-D3-AlQA-07A.
Statistical analyses were performed in R (v3.3.0-v3.5.3). These included one-sided and two-sided Fisher’s exact test, two-sided Mann–Whitney U test, one-sided Kolmogorov–Smirnov test and generalized linear models, as indicated. P-values were adjusted for multiple testing using the Benjamini-Hochberg procedure, as indicated. A detailed list of R packages and software programs used in this study is provided in Supplementary Table 20. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.
DATA AVAILABILITY
Previously published melanoma somatic variants that were reanalyzed in this study are available from the associated publications:
Hodis et al. 2012 (https://doi.org/10.1016/j.cell.2012.06.024, Supplementary Table S4A),
Krauthammer et al. 2015 (https://doi.org/10.1038/ng.3361, Supplementary Data 3) and
Van Allen at al. 2015 (https://doi.org/10.1126/science.aad0095, Supplementary Table S1).
The human melanoma data generated by the TCGA Research Network (http://cancergenome.nih.gov/) can be accessed from the GDC Data Portal (https://portal.gdc.cancer.gov/), after approval for dbGap Study Accession phs000178 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000178.v10.p8), due to the presence of personally identifiable information, such as a patient’s germline DNA variants.
The following MAFs were used:
TCGA.SKCM.muse.4cd49f89-d7e2–4333-9872–0bff5327c896.protected.maf
TCGA.SKCM.mutect.bd022199-d399–45db-8474–6dc1f3aad457.protected.maf
TCGA.SKCM.somaticsniper.4ff8ab0f-1a75–44f6-af48–2b30fc6d5a08.protected.maf
TCGA.SKCM.varscan.a83548c2-e6b2–45cf-a7c3-ec099daf30ce.protected.maf
The somatic variants from 183 human melanoma whole genomes (Hayward et al. 2017) can be accessed from the International Cancer Genome Consortium (ICGC) data portal (https://dcc.icgc.org/releases/release_23/Projects/MELA-AU), without restriction.
RNA-seq data from DDX3X knockdown in HT144 cell lines can be accessed from the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra), using accession identifiers provided in Supplementary Table 14.
eCLIP data and expression data from DDX3X knockdown in K562 and HepG2 human cell lines can be downloaded from the ENCODE portal (https://www.encodeproject.org/), using accession identifiers provided in Supplementary Table 16 and 15, respectively.
Regions considered for neutral mutation rate estimation were defined using the following files available from Ensembl or the UCSC Genome Browser website:
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/phastConsElements100way.txt.gz
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/pseudoYale60.txt.gz
ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz
ENCODE’s ETS transcription factor binding sites were downloaded from the UCSC Genome Browser website:
CCLE cell lines gene expression data was obtained from:
https://portals.broadinstitute.org/ccle/data/CCLE_DepMap_18q3_RNAseq_reads_20180718.gct
Cell line annotations were obtained from DepMap:
https://depmap.org/portal/download/all/DepMap-2018q4-celllines.csv
Gene lengths used for RPKM calculations were obtained from:
ftp://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens.GRCh38.86.gtf.gz
The mutated genes pathway enrichment analysis was based on the
EpiFactors database
(downloaded on 2018–01-21, http://epifactors.autosome.ru/) and the
Reactome database
(downloaded on 2018–01-20, https://reactome.org/, ENSEMBL- to-pathways).
The mRNA subgroups pathway enrichment analysis was based on
MSigDB (v6.2): https://www.gsea-msigdb.org/gsea/msigdb/index.jsp
We obtained transcript level expression (in TPM) for TCGA-SKCM from: https://osf.io/gqrz9
Gene set enrichment analyses for DDX3X differential expression was based on the
Reactome database (downloaded on 2019–10-06): https://reactome.org/
For the GISTIC2 analysis of recurrent focal copy-number alteration, we used the following reference file provided by the GDC: snp6.na35.liftoverhg38.txt.zip
(https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files/)
The COSMIC Mutation Signature definitions were downloaded from the DeconstructSigs website:
https://github.com/raerose01/deconstructSigs/blob/master/data/signatures.exome.cosmic.v3.may2019.rda
The combined set of reannotated variants, excluding those protected by the TCGA, can be accessed at our GitHub repository:
https://github.com/ianwatsonlab/multiomic_melanoma_study_2019
CODE AVAILABILITY
Code related to the main findings of the study is available at GitHub at: https://github.com/ianwatsonlab/multiomic_melanoma_study_2019
Extended Data
Supplementary Material
ACKNOWLEDGEMENTS
We would like to thank J. Pelletier, N. Beauchemin, A. Lissouba, and the Watson lab for their critical comments on the manuscript. We thank and acknowledge the Analysis Working Group of the SKCM TCGA project and the authors of Hodis et al., 2012, Van Allen et al. 2015, Krauthammer et al. 2015, Hayward et al., 2017, whose past work enabled this study. We would like to especially thank N. Hayward, M. Krauthammer and R. Halaban for answering specific questions related to these studies, and R. Marais and P. Mundra for sharing their curated list of non-acral cutaneous melanoma cases from TCGA.21
This work was supported by the V Foundation (IRW V Scholar Grant ID #: V2016–023). IRW is a Canada Research Chair II and funded by grants from the Melanoma Research Alliance (MRA – Grant #412429), the Canadian Institute of Health Research (CIHR – Grant # PJT-152975) and the Terry Fox Research Institute and Genome Québec (TFRI – Grant #1084). RA is a recipient of the Canderel Graduate Studentship, the Fonds de recherche du Québec – Santé (FRQS) Doctoral Training Award and the CIHR Doctoral Award - Frederick Banting and Charles Best Canada Graduate Scholarships (CGS-D).
Footnotes
COMPETING INTERESTS
The authors declare no competing interests.
REFERENCES
- 1.Bastian BC The molecular pathology of melanoma: an integrated taxonomy of melanocytic neoplasia. Annu Rev Pathol 9, 239–71 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hodis E et al. A landscape of driver mutations in melanoma. Cell 150, 251–63 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Krauthammer M et al. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat Genet 44, 1006–14 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cancer Genome Atlas Network. Genomic Classification of Cutaneous Melanoma. Cell 161, 1681–96 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brash DE UV signature mutations. Photochem Photobiol 91, 15–26 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Joosse A et al. Superior outcome of women with stage I/II cutaneous melanoma: pooled analysis of four European Organisation for Research and Treatment of Cancer phase III trials. J Clin Oncol 30, 2240–7 (2012). [DOI] [PubMed] [Google Scholar]
- 7.Joosse A et al. Sex is an independent prognostic indicator for survival and relapse/progression-free survival in metastasized stage III to IV melanoma: a pooled analysis of five European organisation for research and treatment of cancer randomized controlled trials. J Clin Oncol 31, 2337–46 (2013). [DOI] [PubMed] [Google Scholar]
- 8.van Kempen LC et al. The protein phosphatase 2A regulatory subunit PR70 is a gonosomal melanoma tumor suppressor gene. Sci Transl Med 8, 369ra177 (2016). [DOI] [PubMed] [Google Scholar]
- 9.Dees ND et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res 22, 1589–98 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lawrence MS et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A & Lopez-Bigas N OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol 17, 128 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Martincorena I et al. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029–1041 e21 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fredriksson NJ et al. Recurrent promoter mutations in melanoma are defined by an extended context-specific mutational signature. PLoS Genet 13, e1006773 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mao P et al. ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat Commun 9, 2626 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Perera D et al. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532, 259-+ (2016). [DOI] [PubMed] [Google Scholar]
- 16.Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A & Lopez-Bigas N Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532, 264-+ (2016). [DOI] [PubMed] [Google Scholar]
- 17.Lawrence MS et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hayward NK et al. Whole-genome landscapes of major melanoma subtypes. Nature 545, 175–180 (2017). [DOI] [PubMed] [Google Scholar]
- 19.Krauthammer M et al. Exome sequencing identifies recurrent mutations in NF1 and RASopathy genes in sun-exposed melanomas. Nat Genet 47, 996–1002 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Van Allen EM et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science 350, 207–211 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Trucco LD et al. Ultraviolet radiation-induced DNA damage is prognostic for outcome in melanoma (vol 25, pg 221, 2018). Nature Medicine 25, 350–350 (2019). [DOI] [PubMed] [Google Scholar]
- 22.Alexandrov LB et al. The Repertoire of Mutational Signatures in Human Cancer. bioRxiv, 322859 (2019). [Google Scholar]
- 23.Kircher M et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310–5 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pollard KS, Hubisz MJ, Rosenbloom KR & Siepel A Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20, 110–21 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ng PC & Henikoff S SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31, 3812–4 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Curtin JA, Busam K, Pinkel D & Bastian BC Somatic activation of KIT in distinct subtypes of melanoma. J Clin Oncol 24, 4340–6 (2006). [DOI] [PubMed] [Google Scholar]
- 27.Newell F et al. Whole-genome landscape of mucosal melanoma reveals diverse drivers and therapeutic targets. Nat Commun 10, 3163 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wong SQ et al. Whole exome sequencing identifies a recurrent RQCD1 P131L mutation in cutaneous melanoma. Oncotarget 6, 1115–27 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Rodriguez CI & Setaluri V Cyclic AMP (cAMP) signaling in melanocytes and melanoma. Arch Biochem Biophys 563, 22–7 (2014). [DOI] [PubMed] [Google Scholar]
- 30.Stratakis CA, Kirschner LS & Carney JA Clinical and molecular features of the Carney complex: diagnostic criteria and recommendations for patient evaluation. J Clin Endocrinol Metab 86, 4041–6 (2001). [DOI] [PubMed] [Google Scholar]
- 31.Arafeh R et al. Recurrent inactivating RASA2 mutations in melanoma. Nat Genet 47, 1408–10 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dunford A et al. Tumor-suppressor genes that escape from X-inactivation contribute to cancer sex bias. Nat Genet 49, 10–16 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gupta S, Artomov M, Goggins W, Daly M & Tsao H Gender Disparity and Mutation Burden in Metastatic Melanoma. J Natl Cancer Inst 107(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Cruciat CM et al. RNA Helicase DDX3 Is a Regulatory Subunit of Casein Kinase 1 in Wnt-beta-Catenin Signaling. Science 339, 1436–1441 (2013). [DOI] [PubMed] [Google Scholar]
- 35.Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Phung B et al. The X-Linked DDX3X RNA Helicase Dictates Translation Reprogramming and Metastasis in Melanoma. Cell Rep 27, 3573–3586 e7 (2019). [DOI] [PubMed] [Google Scholar]
- 37.Soto-Rifo R & Ohlmann T The role of the DEAD-box RNA helicase DDX3 in mRNA metabolism. Wiley Interdiscip Rev RNA 4, 369–85 (2013). [DOI] [PubMed] [Google Scholar]
- 38.Van Nostrand EL et al. A Large-Scale Binding and Functional Map of Human RNA Binding Proteins. bioRxiv, 179648 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Carter SL et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 30, 413–21 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Mermel CH et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12, R41 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lauss M, Nsengimana J, Staaf J, Newton-Bishop J & Jonsson G Consensus of Melanoma Gene Expression Subtypes Converges on Biological Entities. J Invest Dermatol 136, 2502–2505 (2016). [DOI] [PubMed] [Google Scholar]
- 42.Moffitt RA et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat Genet 47, 1168–78 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Aran D, Hu Z & Butte AJ xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol 18, 220 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Subramanian A et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–50 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mootha VK et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–73 (2003). [DOI] [PubMed] [Google Scholar]
- 46.Snyder A et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med 371, 2189–2199 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Klebanov N et al. Burden of unique and low prevalence somatic mutations correlates with cancer survival. Sci Rep 9, 4848 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Miao D et al. Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors. Nat Genet 50, 1271–1281 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Liu D et al. Integrative molecular and clinical modeling of clinical outcomes to PD1 blockade in patients with metastatic melanoma. Nat Med 25, 1916–1927 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Thorsson V et al. The Immune Landscape of Cancer. Immunity 48, 812–830 e14 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ott PA et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tsao H, Bevona C, Goggins W & Quinn T The transformation rate of moles (melanocytic nevi) into cutaneous melanoma - A population-based estimate. Archives of Dermatology 139, 282–288 (2003). [DOI] [PubMed] [Google Scholar]
- 53.Snijders Blok L et al. Mutations in DDX3X Are a Common Cause of Unexplained Intellectual Disability with Gender-Specific Effects on Wnt Signaling. Am J Hum Genet 97, 343–52 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ditton HJ, Zimmer J, Kamp C, Rajpert-De Meyts E & Vogt PH The AZFa gene DBY (DDX3Y) is widely transcribed but the protein is limited to the male germ cells by translation control. Hum Mol Genet 13, 2333–41 (2004). [DOI] [PubMed] [Google Scholar]
- 55.Wang T et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Conforti F et al. Cancer immunotherapy efficacy and patients’ sex: a systematic review and meta-analysis. Lancet Oncol 19, 737–746 (2018). [DOI] [PubMed] [Google Scholar]
- 57.Lazova R et al. Spitz nevi and Spitzoid melanomas: exome sequencing and comparison with conventional melanocytic nevi and melanomas. Mod Pathol 30, 640–649 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Smith LK, Rao AD & McArthur GA Targeting metabolic reprogramming as a potential therapeutic strategy in melanoma. Pharmacol Res 107, 42–47 (2016). [DOI] [PubMed] [Google Scholar]
- 59.Johannessen CM et al. A melanocyte lineage program confers resistance to MAP kinase pathway inhibition. Nature 504, 138–42 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Van Allen EM et al. The genetic landscape of clinical resistance to RAF inhibition in metastatic melanoma. Cancer Discov 4, 94–109 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Grossman RL et al. Toward a Shared Vision for Cancer Genomic Data. N Engl J Med 375, 1109–12 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhang J et al. International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data. Database (Oxford) 2011, bar026 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lawrence M et al. Software for computing and annotating genomic ranges. PLoS Comput Biol 9, e1003118 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Cingolani P et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gerstein MB et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kent WJ et al. The human genome browser at UCSC. Genome Res 12, 996–1006 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Fabregat A et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res 46, D649–D655 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Medvedeva YA et al. EpiFactors: a comprehensive database of human epigenetic factors and complexes. Database (Oxford) 2015, bav067 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Carter S, Meyerson M & Getz G Accurate estimation of homologue-specific DNA concentration-ratios in cancer samples allows long-range haplotyping. Nat. Preced, 59–87 (2011). [Google Scholar]
- 70.Zack TI et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet 45, 1134–40 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Gaujoux R & Seoighe C A flexible R package for nonnegative matrix factorization. Bmc Bioinformatics 11(2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Senbabaoglu Y, Michailidis G & Li JZ Critical limitations of consensus clustering in class discovery. Sci Rep 4, 6207 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Wilkerson MD & Hayes DN ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–3 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Love MI, Huber W & Anders S Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15(2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Morgan M, Pages H, Obenchain V & Hayden N Rsamtools: Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import. R package version 1.28.0 edn (2017). [Google Scholar]
- 76.Ritchie ME et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Robinson MD, McCarthy DJ & Smyth GK edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Szolek A et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–6 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Jurtz V et al. NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol 199, 3360–3368 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Tatlow PJ & Piccolo SR A cloud-based workflow to quantify transcript-expression levels in public cancer compendia. Sci Rep 6, 39259 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Previously published melanoma somatic variants that were reanalyzed in this study are available from the associated publications:
Hodis et al. 2012 (https://doi.org/10.1016/j.cell.2012.06.024, Supplementary Table S4A),
Krauthammer et al. 2015 (https://doi.org/10.1038/ng.3361, Supplementary Data 3) and
Van Allen at al. 2015 (https://doi.org/10.1126/science.aad0095, Supplementary Table S1).
The human melanoma data generated by the TCGA Research Network (http://cancergenome.nih.gov/) can be accessed from the GDC Data Portal (https://portal.gdc.cancer.gov/), after approval for dbGap Study Accession phs000178 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000178.v10.p8), due to the presence of personally identifiable information, such as a patient’s germline DNA variants.
The following MAFs were used:
TCGA.SKCM.muse.4cd49f89-d7e2–4333-9872–0bff5327c896.protected.maf
TCGA.SKCM.mutect.bd022199-d399–45db-8474–6dc1f3aad457.protected.maf
TCGA.SKCM.somaticsniper.4ff8ab0f-1a75–44f6-af48–2b30fc6d5a08.protected.maf
TCGA.SKCM.varscan.a83548c2-e6b2–45cf-a7c3-ec099daf30ce.protected.maf
The somatic variants from 183 human melanoma whole genomes (Hayward et al. 2017) can be accessed from the International Cancer Genome Consortium (ICGC) data portal (https://dcc.icgc.org/releases/release_23/Projects/MELA-AU), without restriction.
RNA-seq data from DDX3X knockdown in HT144 cell lines can be accessed from the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra), using accession identifiers provided in Supplementary Table 14.
eCLIP data and expression data from DDX3X knockdown in K562 and HepG2 human cell lines can be downloaded from the ENCODE portal (https://www.encodeproject.org/), using accession identifiers provided in Supplementary Table 16 and 15, respectively.
Regions considered for neutral mutation rate estimation were defined using the following files available from Ensembl or the UCSC Genome Browser website:
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/phastConsElements100way.txt.gz
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/pseudoYale60.txt.gz
ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz
ENCODE’s ETS transcription factor binding sites were downloaded from the UCSC Genome Browser website:
CCLE cell lines gene expression data was obtained from:
https://portals.broadinstitute.org/ccle/data/CCLE_DepMap_18q3_RNAseq_reads_20180718.gct
Cell line annotations were obtained from DepMap:
https://depmap.org/portal/download/all/DepMap-2018q4-celllines.csv
Gene lengths used for RPKM calculations were obtained from:
ftp://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens.GRCh38.86.gtf.gz
The mutated genes pathway enrichment analysis was based on the
EpiFactors database
(downloaded on 2018–01-21, http://epifactors.autosome.ru/) and the
Reactome database
(downloaded on 2018–01-20, https://reactome.org/, ENSEMBL- to-pathways).
The mRNA subgroups pathway enrichment analysis was based on
MSigDB (v6.2): https://www.gsea-msigdb.org/gsea/msigdb/index.jsp
We obtained transcript level expression (in TPM) for TCGA-SKCM from: https://osf.io/gqrz9
Gene set enrichment analyses for DDX3X differential expression was based on the
Reactome database (downloaded on 2019–10-06): https://reactome.org/
For the GISTIC2 analysis of recurrent focal copy-number alteration, we used the following reference file provided by the GDC: snp6.na35.liftoverhg38.txt.zip
(https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files/)
The COSMIC Mutation Signature definitions were downloaded from the DeconstructSigs website:
https://github.com/raerose01/deconstructSigs/blob/master/data/signatures.exome.cosmic.v3.may2019.rda
The combined set of reannotated variants, excluding those protected by the TCGA, can be accessed at our GitHub repository:
https://github.com/ianwatsonlab/multiomic_melanoma_study_2019