Abstract
Malignant pleural mesothelioma (MPM) is an aggressive cancer with rising incidence and challenging clinical management. Through a large series of whole-genome sequencing data, integrated with transcriptomic and epigenomic data using multiomics factor analysis, we demonstrate that the current World Health Organization classification only accounts for up to 10% of interpatient molecular differences. Instead, the MESOMICS project paves the way for a morphomolecular classification of MPM based on four dimensions: ploidy, tumor cell morphology, adaptive immune response and CpG island methylator profile. We show that these four dimensions are complementary, capture major interpatient molecular differences and are delimited by extreme phenotypes that—in the case of the interdependent tumor cell morphology and adapted immune response—reflect tumor specialization. These findings unearth the interplay between MPM functional biology and its genomic history, and provide insights into the variations observed in the clinical behavior of patients with MPM.
Subject terms: Mesothelioma, Genomics
Genomic, transcriptomic and epigenetic analyses of malignant pleural mesothelioma identify molecular axes and specialized tumor profiles underlying intertumoral heterogeneity.
Main
Malignant pleural mesothelioma (MPM) is a rare and aggressive disease associated with asbestos exposure1. The World Health Organization (WHO) histological classification distinguishes three major types with prognostic value: epithelioid (MME), biphasic (MMB) and sarcomatoid (MMS)2. In the past decade, genomic studies uncovered molecular profiles (clusters) related to MPM’s histopathological classification, each enriched for somatic alterations in known cancer genes (for example, BAP1 in MME and TP53 in MMS)3–5. We and others undertook unsupervised analyses of these data, revealing a molecular continuum of types that explained the prognosis of the disease more accurately than any reported discrete cluster6,7. MPM interpatient heterogeneity at the biological and clinical level is therefore expected to be sufficiently explained by the histopathological classification, with phenotypes ranging from MME to MMS8,9.
Nevertheless, the full extent of MPM phenotypes and the mechanisms by which they evolved are poorly understood. Histopathological features (such as architectural subtypes) and molecular features (such as aneuploidy and immune infiltration) were shown to be independent of histopathological type8,9, suggesting that there are additional sources of heterogeneity that remain unexplained. In addition, although malignant transformation and cancer development can depend on a wide range of genomic aberrations10–12, genomic events have not been fully described in MPM as previous efforts have been restricted to profiling only exomes or a reduced representation of genomes3–5,13. As a result, biological functions performed by tumor cells, and the role of genomic events in shaping these functions, remain largely unknown, hindering any meaningful progress in the diagnosis, classification and treatment of the disease8.
We designed the MESOMICS study to uncover the main sources of molecular variation explaining MPM intertumoral heterogeneity, and to identify the underlying biological functions. Using multiomic analyses combining genomic, transcriptomic and epigenomic data on a novel cohort of 120 MPM tumors (Supplementary Tables 1–3), we show that the current histopathological classification only explains a fraction of the molecular heterogeneity of the disease, while ploidy, adaptive immune response and CpG island methylation are as important. Taking advantage of a large cohort of whole-genome sequencing (WGS) data, we map the molecular landscape of 120 MPMs and elucidate the link between genotype and phenotype.
Results
Multiomic analyses uncover four axes of molecular variation
We first found that the current histopathological classification only accounts for up to 10% of the interpatient molecular differences (2–10%, depending on the molecular layer, with an average of 6%), leaving 90% unexplained (Fig. 1a). We then undertook an unsupervised decomposition of the interpatient molecular heterogeneity using Multi-Omics Factor Analysis (MOFA)14, integrating genomic, transcriptomic and epigenomic data. We identified four independent and reproducible latent factors individually explaining more than 10% of molecular variation in at least one molecular layer, and collectively up to 61% of interpatient differences (19–61%, depending on the molecular layer, with an average of 33%; Fig. 1a, Extended Data Figs. 1–3, Supplementary Fig. 1 and Supplementary Tables 4–7). Only latent factor 2 (LF2) was associated with the histopathological classification, the recent artificial intelligence score based on digital pathology15 and the previously proposed molecular classifications3–7 (median q value = 6.94 × 10−11; Fig. 1b). Therefore, LF1, LF3 and LF4 capture three prominent sources of biological variation overlooked by previous histopathological and genomic studies.
LF1 (the ploidy factor) is largely explained by tumor ploidy (r = 0.87; Fig. 1c,d). LF2 (the morphology factor) separates the main histopathological types and thus summarizes the morphological and related molecular classifications (Fig. 1a–c). LF3 (the adaptive response factor) summarizes immune infiltration with adaptive response effectors (lymphocytes) (Fig. 1c). For LF2 and LF3, enhancer methylation was the major molecular layer captured (Fig. 1a), partly explained by its implication in the tumor–immune interaction phenotype captured by LF3, and its variability in MPM samples is probably driven by cell-type heterogeneity (Supplementary Fig. 2 and Supplementary Tables 5, 6 and 8). The major feature captured by LF4 (the CpG island methylator phenotype (CIMP) factor) was methylation at gene body and promoter regions, and most of its molecular variation was strongly associated with the CIMP index (r = 0.92; Fig. 1c,e). We then identified proxies to facilitate the interpretation of the latent factors and their implementation in the clinical setting: aneuploidy for LF1; the percentage of sarcomatoid component as reported by pathologists for LF2; an adaptive versus innate immune response score (Methods) for LF3; and a five-gene CIMP index proxy (Methods) for LF4. LF1, LF3, LF4 and their proxies were statistically independent of histopathological type (that is, all histological types can be either high or low ploidy, have high or low adaptive immune responses and have a high or low CIMP index), further confirming that these latent factors represent independent sources of molecular variation (Extended Data Fig. 4a–c).
In line with our previous observations6, tumor samples did not form clusters in MOFA but rather gradients between extreme molecular profiles (Fig. 1d,e). The ploidy factor ranged between a genomic near-haploidization (GNH) and a whole-genome doubling (WGD) profile, with a gradient of intermediate ploidies due to various levels of chromosome arm and focal amplifications and deletions (Fig. 1d). In contrast with the features found associated with the GNH subtype identified in the The Cancer Genome Atlas (TCGA) cohort4, the single near-haploid sample, MESO_108, had a ploidy of 1.10, almost no copy-neutral loss of heterozygosity (LOH) (<1%) and no SETDB1/TP53 mutations and did not undergo WGD. Therefore, this sample does not correspond to the GNH subtype as described by Hmeljak and colleagues4, but to another possible genomic trajectory, where genomic instability is driven by alternative pathways. Differential gene expression analyses showed that, as reported in other tumor types12, the most upregulated enriched pathway in WGD-positive (WGD+) versus WGD-negative (WGD−) cases was E2F targets (q value = 0.048; Supplementary Tables 9 and 10), although we could not replicate this result in the TCGA cohort4, possibly due to the difficulty of replicating such findings in low-sample-size series (n = 11 WGD+ samples). The CIMP factor also ranged between two extreme profiles: CIMP-low and CIMP-high (Fig. 1e). A well-known effect of the CIMP-high phenotype is epigenetic silencing of tumor suppressor genes16. In line with this, we identified five Catalogue of Somatic Mutations in Cancer (COSMIC) tumor suppressor genes17, whose expression was negatively correlated with both the CIMP index and the methylation level of their CpG island(s): CBFA2T3, FBLN2, PRF1, SLC34A2 and WT1 (median q value = 2.6 × 10−3; Supplementary Table 11).
We trained latent factor-based survival models and tested their performance over previously proposed prognostic factors to evaluate to what extent each latent factor captured variability predictive of prognosis (Methods). While individually they provided a prediction value similar to each other, when combining the four latent factors there was an increase in their area under the receiver operating characteristic curve value, suggesting that they capture molecular characteristics with independent prognostic value, being informative of MPM progression in a complementary manner (Extended Data Fig. 5, Supplementary Fig. 3 and Supplementary Tables 12–20). In line with evidence from multiple cancer types12, survival was lowest for the greatest ploidy (Fig. 1f). As expected, samples in the lower extreme of the morphology factor, enriched for sarcomatoid tumors, presented the worst prognosis. The adaptive response factor linked hot tumors (tumors with a high level of immune infiltration) with better survival, whereas CIMP-low tumors had better survival than CIMP-high tumors (Fig. 1f). The previously described proxies also demonstrated prognostic value in the MESOMICS cohort, and allowed for validation of the prognostic value of the latent factors in the validation cohorts (Extended Data Fig. 4d–g). Probably due to the limited power and a potential effect of histology, the prognostic value of the ploidy and CIMP factors was not statistically significant when analyzing MME samples only; however, their respective effect size remained similar to those identified in the entire cohort (Supplementary Fig. 3). We additionally validated the existence of the four dimensions as well as their prognostic values in previously published cohorts (Supplementary Tables 21 and 22).
Finally, combining molecular and drug response data for 59 MPM cell lines from Iorio et al.18, de Reyniès et al.5 and Blum et al.7, we were able to evaluate the therapeutic value of the ploidy, morphology and CIMP factors (the lack of microenvironment in cell culture models did not allow for replication of the adaptive response factor), by assessing the impact that cell line position along each latent factor had on the response to candidate drugs (Extended Data Fig. 6, Supplementary Fig. 4 and Supplementary Tables 23–26). Significant drug responses associated with the different factors were entirely orthogonal (Extended Data Fig. 6a), highlighting the fact that MOFA latent factors capture independent axes of heterogeneity in both tumoral mechanisms and therapeutic responses. Therefore, both survival and cell line analyses showed that these axes of variation are clinically relevant and have the potential for translation into clinical practice.
Task specialization analyses reveal diverse tumor strategies
Samples along the interdependent morphology and adaptive response factors formed a triangular shape delimited by three extremes (Fig. 2a and Supplementary Fig. 5). The well-established Pareto optimum theory19 (ParetoTI method) predicted that this pattern results from natural selection for cancer tasks, with specialist tumors close to the vertices of the triangle and generalists in the center (triangle fit P value = 0.001; Fig. 2b). Integrative gene set enrichment analysis (IGSEA) pointed to the following cancer tasks and tumor phenotypes: cell division, tumor–immune interaction and acinar phenotype (Fig. 2c and Supplementary Tables 27–30 for archetypes, IGSEA significant pathways and q values).
Tumors specialized in the cell division task displayed upregulation of these pathways, as reported by Hausser et al. in multiple tumor types20. This phenotype was enriched for nonepithelioid tumors and presented higher levels of necrosis, higher grade and a greater percentage of infiltrating innate immune response cells (neutrophils) (median q value = 0.005). Cell division specialization was supported by high expression levels of the proliferation marker MKI67 and increased genomic instability (estimated from genomic, transcriptomic and epigenomic data; median q value = 1.97 × 10−4). Tumors specialized in the tumor–immune interaction task carried upregulated immune-related pathways, high expression of immune checkpoint genes and high immune infiltration with an enrichment for adaptive response cells: B lymphocytes, CD8+ T cells and regulatory T cells (median q value = 2.73 × 10−3). The cell division and tumor–immune interaction specialists also showed high expression of hypoxia response pathways and common enrichment for pathways in the invasion and tissue remodeling universal cancer task. Indeed, we found a higher epithelial-to-mesenchymal transition (EMT) score among tumors in this area of the Pareto triangle, driven by upregulation of mesenchymal genes and hypomethylation of their associated enhancers (median q value = 1.61 × 10−6). In line with in vitro studies showing that asbestos may induce EMT in MPM21, we found a positive correlation between the expression of mesenchymal genes and asbestos exposure score (r = 0.44 and q value = 0.01) and a negative correlation between this score and enhancer methylation of mesenchymal genes (r = −0.33 and q value = 0.02). We also observed overexpression of neoangiogenesis-related genes, corroborating the ability of these tumors to remodel their environment.
The last extreme phenotype was characterized by samples with acinar morphology, presenting a very structured tissue organization with epithelial cells tightly linked into tubular structures, and correlated with the presence of monocytes and natural killer cells (innate immune response cells) (median q value = 0.022). This phenotype presented the lowest EMT score, with overexpression of epithelial markers such as cell adhesion molecules (median q value = 1.21 × 10−3), corroborating the importance of tissue organization in this phenotype, and also low levels of MKI67 expression, indicating slow growth. This phenotype showed no particular tumoral specialization in any task based on the few IGSEA upregulated pathways. In line with the better prognosis reported for this subtype8, the acinar phenotype is characterized by the highest levels of global methylation22 (q value = 5.58 × 10−10). Altogether, these data provide a biological understanding of the molecular and phenotypic heterogeneity characteristic of MPM tumors.
WGS uncovers a diverse genomic landscape
We found 97% (111/115) of MPM tumors harboring at least one large genomic event (copy number variant (CNV), amplicon, homologous recombination deficiency (HRD), chromothripsis or aneuploidy; Fig. 3a). As captured by the ploidy factor, MPM samples ranged from haploid to tetraploid (Fig. 1d). The average CNV profile was highly consistent between cohorts (Supplementary Fig. 6), with several recurrent chromosome arm-level CNVs, as well as focal alterations encompassing known cancer genes (Fig. 3b and Supplementary Tables 31–35). As previously reported23, all of the MTAP alterations co-occurred with CDKN2A/B (Fig. 3a and Supplementary Tables 36 and 37). We also found recurrent deletions of a prominent immune recognition gene, B2M (chr15q14; Fig. 3b).
A comprehensive analysis of mutational signatures, encompassing single-base substitutions, CNVs and structural variants24,25, allowed us to identify the processes leading to particular somatic alteration patterns (Extended Data Fig. 7). A total of ten active single-base substitution signatures were detected in MPM genomes (Extended Data Fig. 7b); all corresponded to known COSMIC signatures and none was associated with asbestos exposure, as was previously reported3,4. Six tumors were found to have extrachromosomal DNA (ecDNA) (Supplementary Fig. 7 and Supplementary Table 38), and in the one sample with transcriptomic data we found increased expression of the genes predicted to be present on the ecDNA, including the known oncogene BRIP1 (Fig. 3c). We observed that the aforementioned ecDNA sample co-occurred with, and may be fueled by, kataegis26 (Supplementary Fig. 8). Overall, kataegis was rarely seen in our cohort, contributing to only 2% of the MPM clustered mutations (Supplementary Tables 39 and 40). The identified complex mutational processes included a pattern compatible with chromothripsis. This was observed in 20% of the samples (Fig. 3a, Supplementary Fig. 9 and Supplementary Table 39) and also at the transcriptomic level, as fusion transcripts, in half of the positive samples (Supplementary Fig. 10a and Supplementary Tables 41–43). A signature of clustered structural variants was detected and significantly associated with a high structural variant load and chromothripsis (Supplementary Fig. 10b,c and Supplementary Tables 41 and 42). For one sample (MESO_019), the chromothripsis region overlapped with an ecDNA region, suggesting that chromothripsis may have been the source of the circular amplification (Fig. 3c). Finally, 23% of the samples showed a HRD phenotype, identified either by copy number signatures25 or structural variant pattern-based methods27 (Supplementary Fig. 11 and Supplementary Table 40). Among these samples, five harbored pathogenic germline mutations (from the ClinVar database) in one of 26 genes known to be involved in homologous recombination28—significantly more than the two mutations reported in the 77% of samples without HRD (Fisher’s exact test, P value = 0.00587).
We detected an HRD signature in nine out of 21 MPM cell lines from Iorio et al.18, thus validating the high rate of this pattern in MPM. In addition, the sensitivity of these cell lines to the clinically approved olaparib showed a tendency toward higher sensitivity in HRD samples compared with non-HRD samples (Supplementary Fig. 12). This may be linked with the results of a clinical trial suggesting a highly complex mechanism between the response to this drug and markers for DNA repair pathway activity29. Indeed, in contrast with their original hypothesis, patients with BAP1 mutations had poorer survival when treated with olaparib than wild-type patients. In line with this observation, the olaparib response was positively associated with the prognostic CIMP index factor (r = 0.65; Extended Data Fig. 6), meaning that CIMP-low samples were more sensitive to this poly-ADP ribose polymerase inhibitor than CIMP-high samples (which are enriched for BAP1 alterations (Fig. 5a) and associated with poorer survival (Supplementary Fig. 3)).
Despite the low mutational rate (0.98 nonsynonymous small variants per megabase; Supplementary Fig. 13a and Supplementary Tables 44–46), MPM tumors carry a particularly high number of structural variants relative to tumors with similarly low mutational burden (Fig. 4 and Supplementary Fig. 13b). The top genes altered by structural variants (≥5%) were RBFOX1, NF2, BAP1, MTAP and PCDH15 (Supplementary Fig. 14a). For RBFOX1, 13 out of 39 samples have two separate events, with most deleting part of the RNA-binding protein domain (Supplementary Fig. 14b). Many of these genomic rearrangements resulted in fusion transcripts detected at the transcriptomic level (Supplementary Figs. 10a and 15).
Combining the MESOMICS dataset with the two other large datasets from Bueno et al.3 and the TCGA4, we reached the sample size (n ≈ 300) needed to detect rare driver alterations (1%). The IntOGen pipeline30 discovered 30 MPM driver genes based on small variants (Supplementary Fig. 14c). BAP1, NF2, SETD2, TP53 and LATS2 are all known MPM driver genes. Among the other 25 genes, some were previously reported as recurrently mutated in MPM (PBRM1, KMT2D, DDX3X, PIK3CA, FBXW7, MGA, NF1, SETDB1, MYH9, PTCH1, RHOA and TRAF7)31–33 or altered by structural variants (PTPRD and LRP1B)34, two were found overexpressed in MPM cell lines (DNMT3B and EZH2)35 and, for another two, germline mutations have been discovered, suggesting genetic susceptibility (NCOR1 (ref. 36) and MYO5A37). The remaining seven driver genes have, to our knowledge, not been previously reported in MPM, but they are all known cancer genes, as reported in COSMIC: FAT3, NIN, ARHGAP5, HLA-A, NCOR2, SRGAP3 and WNK2. Of note, NF2 and MYH9 (IntOGen drivers) are located within the significantly deleted chr22q region, along with TTC28—a gene frequently altered by structural variants (Figs. 3a,b and 4). Beyond extending the list of putative MPM drivers, combining point mutations with structural variants allowed for refinement of the frequency of alterations in key MPM genes (Fig. 4 and Supplementary Tables 41–46).
Genomic alterations tune the molecular profiles of MPM
Genomic events were associated with all MOFA latent factors and the extreme profiles that they encapsulated, as well as with the phenotypic specialists captured by the morphology and adaptive response factors (Fig. 5a and Supplementary Tables 47 and 48). Associated alterations significantly tuned tumor specialization (P value = 0.003; Methods and Extended Data Fig. 8). In addition to ploidy, NCOR2 alterations and TERT amplification were associated with the ploidy factor (q values = 4.3 × 10−18 and 3.3 × 10−4, respectively; Fig. 5a). Thirty-six samples (31%) displayed TERT amplification, resulting in a significant increase in TERT expression (P value = 1.8 × 10−5; Supplementary Fig. 16a,b). TERT amplification was accompanied by an underlying amplification of chr5p in 81% of the positive cases. While no association was previously detected between TERT promoter mutations and WGD38, here we found that both TERT amplification and its increased expression were associated with WGD events (P value = 1.6 × 10−10 and 0.009, respectively; Supplementary Fig. 16c).
Genomic alterations in epigenetic regulatory genes (ERGs) have previously been shown to drive CIMP in cancer39. In line with this, we found enrichment for ERGs (P value = 3.4 × 10−3; Methods and Supplementary Fig. 17), including the mesothelioma drivers NCOR2 and EZH2, among the genes highly expressed in CIMP-high tumors, and more generally in the list of MPM drivers (q value = 2.1 × 10−5). Chr7q36.1del, encompassing EZH2, further tuned the position of the samples along the CIMP factor (q value = 5.2 × 10−3; Fig. 5a). EZH2 (enhancer of zeste homolog 2) is a histone methyltransferase that functions as part of the Polycomb repressive complex 2 (PRC2) complex to promote gene silencing of specific targets40. Indeed, genes whose CpG island methylation level was highest in CIMP-high tumors were enriched for PRC2 target genes (P value = 0.01; Fig. 5b). WT1, which is found downregulated in CIMP-high tumors, is particularly interesting and a vaccine against this PRC2 target is currently being assessed in clinical trials for mesothelioma41. Cancers frequently associated with a CIMP-high phenotype include colorectal cancer (CRC) and glioma42,43, with BRAF (CRC) and IDH1 (glioma) mutations also associated with this phenotype, as well as with microsatellite instability in CRC42. Microsatellite instability and BRAF/IDH1 mutations were rare or absent events in our series and unrelated to the CIMP phenotype (Supplementary Tables 7, 44 and 49), suggesting that the mutational processes linked with CIMP phenotype in MPM may differ from those of other cancers.
WGD and chromothripsis seemed to push tumors away from the tumor–immune interaction phenotype (q values = 0.042 and 0.012, respectively; Fig. 5c); indeed, both cell division and acinar phenotypes were characterized by low immune cell infiltration (cold tumors), which may be explained by the downregulation of the interferon response pathway and B2M expression seen in WGD + MPM tumors (q value = 7.4 × 10−17; Supplementary Fig. 18a,b,e and Supplementary Tables 9 and 10). These may represent important mechanisms for WGD+ tumors to avoid the immune response12,44. Chromothripsis has also been associated with low immune infiltration as part of the chromosomal chaos that silences immune surveillance45.
CDKN2A, MTAP and NF2 alterations also converged on cold tumors (median q value = 0.003). Within this cold phenotype, TERT amplification and alterations in TTC28, involved in the mitotic cell cycle, moved tumors towards cell division specialization (q values = 1.6 × 10−4 and 7.4 × 10−4, respectively; Fig. 5c), whereas chr3p21.1del (BAP1, DNAH1 and PBRM1) and BAP1 mutations moved tumors toward the better-prognosis acinar phenotype (q values = 0.021 and 7.1 × 10−4, respectively; Fig. 5c), as expected given the previously reported association between BAP1 alterations and better survival in MPM36. A loss of BAP1 (BRCA1-associated protein-1) expression, measured by immunohistochemistry, was also associated with this phenotype (r = −0.38 and q value = 4.61 × 10−5; Supplementary Fig. 19). Interestingly, an analysis of splicing variation found that the morphology factor and acinar phenotype were significantly associated with alternative splicing events (Supplementary Fig. 20a–f). Major contributions came from events in cell adhesion genes, and neuronal progenitor BAF, neuron-specific BAF and SWI/SNF complexes, potentially affecting the alternative splicing pattern of genes such as BCL11A and SMARCE1 (Supplementary Fig. 20g,h). The fact that these genes (just like BAP1) have important roles in chromatin remodeling suggests that disruption of chromatin remodeling pathways may molecularly define the acinar phenotype.
The specialization of tumors can be influenced by early genomic events. Estimates of the timing of WGD, TERT amplification and copy-neutral LOH in the few samples (n = 6) with such events where a subclonal deconvolution was possible showed that our samples fall well within the values observed across >2,500 tumors of the Pan-Cancer Analysis of Whole Genomes Consortium46 (empirical P values = 0.16–0.79; Fig. 5d and Supplementary Fig. 21). Thus, these genomic events may indeed have occurred more than 10 years before diagnosis. Three out of the six patients were exposed to asbestos (of the other three patients, two had no known exposure and one had unknown exposure), among whom two had well-documented periods of exposure, from 56 to 21 years before diagnosis for MESO_048 (including the estimated timing of LOH) and from 54 to 50 years before diagnosis for MESO_057, more than 50 years before the estimated timing of TERT amplification, suggesting that genomic events can occur both concomitantly with and subsequent to asbestos exposure, although conclusive evidence of the timing of these alterations will need to be investigated in hypothesis-driven studies. Using a multiregional subcohort from 13 patients, we found intratumor heterogeneity in all factors except the ploidy factor, further suggesting that genomic events are mostly early and thus do not vary much across regions (Extended Data Fig. 9, Supplementary Fig. 22 and Supplementary Tables 50–52). Finally, we detected neutral tumor evolution close to the acinar phenotype (P value = 0.0024; Supplementary Fig. 23) at extreme values of the morphology and adaptive response factor, suggesting that tumors with this profile were even less influenced by recent genomic events.
Discussion
The MESOMICS project represents a substantial advancement toward the comprehensive molecular characterization of MPM, made possible by inclusion of a large WGS dataset3,4,34 and by the depth of the multiomic integrative analyses undertaken. We demonstrated that ploidy, adaptive immune response and CpG island methylation constitute independent sources of molecular variation with quantitatively similar impacts on interpatient MPM heterogeneity as the histological classification. Despite some individual observations made in previous studies6,7,13, these three sources of molecular variation have been mostly unexplored or unknown because of the major focus that was put on refining the histological groups, and the lack of comprehensive analysis of a large multiomics dataset. In this sense, the unifying framework aspect of our research approach allowed us to capture the entire molecular landscape of MPM, summarized in four dimensions.
Aneuploidy is one of the morphology-independent features previously reported in MPM4 but poorly characterized. The ploidy factor identified tumors that underwent WGD, previously described in multiple cancer types as an early transformative event that dramatically destabilizes cell genetics and fuels tumor development47. WGD tends to be favored along the evolutionary course of low-mutational-burden tumors like MPM12 and is suspected to serve as a genetic spare tire in case of lethal alterations48. As a consequence, this event shapes the cellular phenotype associated with specific vulnerabilities12.
The CIMP has been reported in several cancer types, most notably CRC and glioblastoma, with inconsistent associations with survival49–51. Here we provide further evidence, to that of Blum et al.7, of distinct variation in CIMP index within mesothelioma tumors, and have shown that a high CIMP index is independent of morphology and predictive of poorer outcome. While a universal cause for a CIMP-high phenotype has not been established, it has been previously associated with alterations in ERGs39,52. Indeed, our data suggest that some mesothelioma tumors may acquire a CIMP-high phenotype through the activity of the ERG EZH2, to hypermethylate and silence specific target genes. Such a strategy may be warranted to promote malignant transformation in a lowly mutated tumor such as mesothelioma35.
Pareto task inference uncovered three specialized tumor profiles in the space delimited by the interdependent morphology and adaptive response factors, presumably resulting from pressures of the microenvironment, each selecting for adaptive alterations and phenotypic traits. Cell division specialists adopted a fast reproduction strategy that was expected to result from unfavorable and unpredictable environments53, with their genomic instability suggesting adaptation through evolutionary leaps54,55. Immune interaction specialists adopted an immune evasion or camouflage strategy. Both phenotypes also presented characteristics of invasion and tissue remodeling specialists20. These tumors tended to occur in intensely asbestos-exposed individuals, suggesting that chronic inflammation (promoted by asbestos exposure56) may have created the unfavorable environment responsible for selective pressure. Finally, acinar phenotype specialists adopted a structured tissue organization and slow growth strategy. This suggests an equilibrium strategy that is expected to be favorable in stable, resource-rich environments with limited predation57, in line with the lower level of asbestos exposure and limited inflammation and immune infiltration observed in these tumors. Consistent with limited environmental pressures, acinar tumors were enriched for neutral evolution and BAP1 alterations—an event that, when combined with weak asbestos exposure in mice, greatly increased mesothelioma occurrence over weak asbestos exposure alone58.
Overall, the four molecular factors are highly informative and capture specific profiles that are complementary in predicting tumor phenotype and aggressiveness. The fact that they are all independent and mostly unrelated to the morphology factor (histology) means that disregarding them might not only jeopardize the success of any treatment but also miss opportunities to stratify patients based on their molecular profile (Fig. 6). The tightly correlated proxies that we have identified could serve as biomarkers for response to specific therapies (such as immunotherapy for LF3) and could be easily tested in a hypothesis-driven study design. Subsequently, integrating these complementary factors would help to stratify patients for preselected-cohort clinical trials59, a process that has proven to be beneficial in small-cell lung cancer, another aggressive recalcitrant cancer60–62. The results of the MESOMICS project pave the way for the establishment of a more clinically relevant morphomolecular classification of MPM tumors.
Methods
This section briefly describes the main methods (see Supplementary Information for details on the data, processing and analyses).
Ethics
All of the methods were carried out in accordance with relevant guidelines and regulations. This study is part of a larger study, the MESOMICS project, aiming to perform comprehensive molecular characterization of MPM, and was approved by the International Agency for Research on Cancer (IARC) Ethics Committee (project number 15-17). The samples used in this study belong to the virtual biorepository French MESOBANK. Written, informed consent was obtained from all participants and no participant compensation was provided.
Clinical data
Age at diagnosis (in years), sex (male or female), smoking status (nonsmoker, ex smoker or smoker), asbestos exposure (exposed or nonexposed), previous treatment with chemotherapy drugs (yes or no), treatment information (surgery, chemotherapy, radiotherapy, immunotherapy or cancer history) and survival data (calculated in months from surgery to the last day of follow-up or death) were collected for all 123 patients. The median age at diagnosis was 67.5 years and 73.3% of patients were male.
MESOMICS cohort
The MESOMICS cohort includes biological material from 123 patients with MPM (including three nonchemonaive patients who were excluded from all analyses unless explicitly mentioned) kindly provided by the French MESOBANK and annotated with detailed clinical, epidemiological and morphological data. Samples were collected from chemonaive surgically resected tumors, applying local regulations and rules at the collecting site, and included patient consent for molecular analyses, as well as the collection of de-identified data. Samples underwent an independent pathological review by the French MESOPATH reference panel, who determined that of the 120 MPM tumor samples, 79 belonged to the MME type, 26 were MMB and 15 were MMS. Of the 105 samples with an epithelioid component (79 MME and 26 MMB), solid, acinar, trabecular and tubulopapillary architectural patterns were the most frequent in the series (n = 37, 31, 16 and 14, respectively).
Discovery and intratumoral heterogeneity cohorts
Among the 123 patients with MPM, 13 had two tumor specimens collected for the study of intratumoral heterogeneity (ITH). The one with the highest tumor content, estimated by pathological review, was selected for this descriptive study and is reported in Supplementary Tables 1–3, and the other region is described in Supplementary Tables 50–52. Additionally, three patients have been reported as nonchemonaive and they were excluded from the analyses except if explicitly mentioned otherwise in the Methods.
Pathological review
For all 136 samples (123 tumors plus 13 additional regions), a hematoxylin and eosin stain from a representative formalin-fixed, paraffin embedded block was collected for pathological review. Our pathologist (F.G.-S.) performed a detailed pathological review and classified all tumors according to the 2015 WHO classification63,64. The hematoxylin and eosin stain was also used to assess the quality of the frozen material selected for molecular analyses and to confirm that all frozen samples were at least 70% tumor cells.
Artificial Intelligence analysis
Whole-slide image-based artificial intelligence prognostic scores were computed using the artificial intelligence MesoNet model based on morphological features, developed by Owkin—an artificial intelligence for medical research company15.
Statistical analyses
All analyses were performed in R version 4.1.2. All tests involving multiple comparisons were adjusted using the Benjamini–Hochberg procedure, controling the false discovery rate using the p.adjust R function (stats package version 3.4.4). To limit false discoveries, we took a conservative q value threshold of 0.05. In addition, in line with the American Statistical Association statement on the misuse of P values65, which intends to ‘steer research into a “post P < 0.05 era"’, we report all P and q values, even those that may be closer to arbitrary thresholds such as the 5% threshold. To improve the reproducibility of our results, we summarize in Supplementary Tables 21 and 22 all P and q values reported in the text and main figures, along with details about the tests performed (hypothesis, model and sample size) and replication performed with additional cohorts.
Survival analysis
Survival analysis has been performed using Cox’s proportional hazard model from which the significance of the hazard ratio between the reference and the other levels has been evaluated using Wald tests. We assessed the global significance of the model using the logrank test statistic (R package survival version 2.41-3) and drew Kaplan–Meier and forest plots using the R package survminer (version 0.4.2).
DNA extraction
Included samples were extracted using the Gentra Puregene Tissue Kit (4 g) (158667; Qiagen), following the manufacturer’s instructions. All DNA samples were quantified using the fluorometric method (Quant-iT PicoGreen dsDNA Assay; Life Technologies) and assessed for purity by NanoDrop (Thermo Scientific) 260/280 and 260/230 ratio measurements. The DNA integrity of the fresh frozen samples was checked with a TapeStation system (Agilent Biotechnologies) using Genomic DNA ScreenTape (Agilent Biotechnologies).
RNA extraction
Included samples were extracted using the AllPrep DNA/RNA extraction kit (Qiagen) following the manufacturer’s instructions. All RNA samples were treated with DNAse I for 15 min at 30 °C. The RNA integrity of the frozen samples was checked with a TapeStation system (Agilent Biotechnologies) using RNA ScreenTape (Agilent Biotechnologies).
Because of unsuccessful extraction (impacting either the quality or the quantity), we obtained different numbers of MPM samples for which WGS, DNA methylation or RNA sequencing (RNA-seq) data are available (Supplementary Tables 1–3).
DNA sequencing
Sequencing
WGS was performed by the Centre National de Recherche en Génomique Humaine (Institut de Biologie François Jacob, CEA) on 130 fresh frozen MPMs, 54 of which with matched normal tissue or blood samples. We used an Illumina TruSeq DNA PCR-Free Library Preparation Kit (20015963; Illumina) according to the manufacturer’s instructions and sequenced them on a HiSeq X Five platform (Illumina) as paired-end 150-base pair reads. Samples paired with matched normal tissue or blood had a target sequencing depth of 60× and other samples had a target depth of 30×.
Data processing
WGS reads were mapped to the reference genome GRCh38 (with ALT and decoy contigs) using our in-house workflow (https://github.com/IARCbioinfo/alignment-nf; release version 1.0)66. In summary, this workflow relies on the Nextflow domain-specific language67 version 20.10.0.5430 and consists of four steps: read mapping (software BWA68; version 0.7.15), duplicate marking (software samblaster69; version 0.1.24), read sorting (software sambamba70; version 0.6.6) and base quality score recalibration using GATK71 (version 4.0.12).
Variant calling and filtering on DNA
We performed somatic variant calling using the software Mutect2 (ref. 72) from GATK version 4.1.5.0, as implemented in our Nextflow workflow (https://github.com/IARCbioinfo/mutect-nf; release version 2.2b). Multiregion samples were processed jointly using the multisample calling mode of Mutect2. We called germline variants using Strelka2 (ref. 73) version 2.9.10-0 using our Nextflow workflow (https://github.com/IARCbioinfo/mutect2-nf; release version 1.2a). Annotation was performed with ANNOVAR74 (16 April 2018) using the GENCODE version 33 annotation, COSMIC version 90 and REVEL databases. To call somatic variants on tumor-only samples (72/115), a similar procedure was performed (Mutect2 tumor-only mode) but including further germline-filtering steps using a random forest classifier.
CNV calling
Somatic CNVs were called using the PURPLE software75 version 2.52, as implemented in our Nextflow workflow (https://github.com/IARCbioinfo/purple-nf; version 1.0). We used a total of 57 matched WGS samples of MPM (including multiregion samples) for benchmarking the tumor-only mode of PURPLE. We ran PURPLE twice for each matched sample: first using the matched WGS normal/tumor pair as input and second using only the tumor WGS sample as input.
Structural variant calling
To identify somatic structural variants, including insertions, deletions, duplications, inversions and translocations, we built a consensus structural variants call set by integrating SvABA76 version 1.1.0, Manta77 version 1.6.0 and DELLY78 version 0.8.3 calls with SURVIVOR79 version 1.0.7. Somatic structural variants (minimum structural variant size = 50 base pairs) identified by at least two callers and single-caller predictions with a minimum read support of 15 pairs (including paired-end and split-read evidence) were included in the consensus set of each matched sample.
RNA-seq
Sequencing
RNA-seq was performed on 126 fresh frozen MPM samples in the Cologne Center for Genomics, of which 109 MPM samples belonged to the discovery cohort (Supplementary Tables 1–3). Libraries were prepared using the Illumina TruSeq Stranded mRNA Sample Preparation Kit (20020595; Illumina) and the pool was sequenced using an Illumina NovaSeq 6000 sequencing device and a paired-end 100-nucleotide protocol.
Data processing
The 126 raw read files from the MESOMICS cohort and the 21 files from the Iorio and colleagues18 mesothelioma cohort (downloaded from the European Genome-phenome Archive (EGA) and Sequence Read Archive websites; datasets EGAS00001000828 and PRJNA523380, respectively) were processed in three steps using the RNA-seq processing workflow based on the Nextflow language and accessible at https://github.com/IARCbioinfo/RNAseq-nf (release version 2.3)66. Then, reads were realigned locally using ABRA2 (ref. 80); (workflow https://github.com/IARCbioinfo/abra-nf; release version 3.0) and base quality scores were recalibrated using GATK (workflow https://github.com/IARCbioinfo/BQSR-nf; release version 1.1). Once processed, expression was quantified using StringTie software (version 2.1.2; Nextflow pipeline accessible at https://github.com/IARCbioinfo/RNAseq-transcript-nf; release version 2.2).
The raw read counts of the 59,607 genes in the expression data matrix, from the MESOMICS, TCGA and Bueno cohorts3,4, from which we removed non-chimionaif samples, were normalized using the variance-stabilizing transform (vst function from R package DESeq2 version 1.14.1); this transformation enables comparisons between samples with different library sizes and different variances in expression across genes.
DNA methylation
EPIC 850K methylation array
Epigenome analysis was performed on 119 MPMs (Extended Data Fig. 1 and Supplementary Tables 1–3), two technical replicates and three adjacent normal tissues. Epigenomic studies were performed at the IARC with the Infinium EPIC DNA methylation beadchip platform (Illumina) used for the interrogation of over 850,000 CpG sites (dinucleotides that are the main target for methylation).
Data processing
The resulting IDAT raw data files were preprocessed using the R packages minfi (version 1.34.0) and ENmix (version 1.25.1). Raw data were then normalized using functional normalization (function preprocessFunnorm; minfi), to reduce technical variation within the data, and probe removal steps were performed to ensure reliability and accuracy of the final dataset. This resulted in a normalized, filtered dataset of 781,245 probes for 139 samples. Finally, beta and M values were extracted (functions getBeta and getM; minfi). Nine probes recorded M values of −∞ for at least one sample, and these values were replaced with the next lowest M value in the dataset. The three normal tissues and one remaining technical replicate were then removed from the beta and M matrices for the subsequent analyses. This resulted in 135 samples: 122 for discovery and an additional 13 for ITH analyses.
CIMP index
A CIMP index value was calculated for all samples as follows. The mean beta value across all probes located within CpG islands was calculated per sample, resulting in beta values for 24,891 and 24,924 CpG islands, MESOMICS (EPIC array), TCGA4 and Iorio and colleagues18 cell lines (HM450K array), respectively. The CIMP index was then calculated as the proportion of these 24,891 or 24,924 islands with ≥30% methylation (beta value ≥ 0.3) per sample.
Integrative unsupervised analyses
We performed four series of analyses with different subsets of samples: (1) discovery analyses with all of our discovery cohort (MESOMICS cohort; 120 samples), for which WGS, RNA-seq and/or 850K methylation array data were available; (2) and (3) replication analyses with the already published data from Bueno3 (181 samples after exclusion of nonchemonaive samples) and Hmeljak and colleagues4 (TCGA cohort; 73 samples in the curated list), respectively; (4) combined analyses integrating the MESOMICS, Bueno and TCGA cohorts3,4 with a total of 374 samples; and (5) replication combining cell lines from the Iorio study18 (for which whole-exome sequencing, expression arrays and RNA-seq, 450K methylation arrays and drug responses in the form of half-maximum inhibitory concentration scores are available (21 samples; 265 drugs)) and the de Reyniès5 and Blum et al.7 datasets (for which expression arrays and drug responses are available (38 samples; three drugs)). In addition, some single-omic analyses are also described in this section.
Preprocessing of expression data
We used normalized read count matrices (see the section ‘RNA-seq’) for subsets (1)–(4), encompassing 59,607 genes. Among these genes, those having less than one fragment per kilobase of exon per million mapped fragments (FPKM) difference across the samples were excluded from the unsupervised analyses. Also, to mitigate sex influence on the expression profiles, we removed genes from the sex chromosomes. For each analysis, the top 5,000 most variable genes were selected. Similarly, the 5,000 most variable genes from the normalized array expression of cell lines (see the section ‘Processing of publicly available expression array processing’ in Supplementary Methods) were selected. Whenever several probes were available for the same gene, the one with the highest intensity was selected.
Preprocessing of methylation data
DNA methylation was available for both the MESOMICS and TCGA cohorts. First, we extracted the M values of the CpGs from the MESOMICS, TCGA4, combined MESOMICS/TCGA and Iorio18 cell line cohorts, respectively81. We excluded sex chromosome CpGs, CpGs that did not pass quality control (see the section ‘DNA methylation’ in Supplementary Methods) and those having less than 0.1 beta value difference across the (1) 119, (3) 73, (4) 192 and (5) 59 samples. Based on this annotation, the CpG list representing the methylation data was divided according to their association with promoters, enhancers or the gene body using the EPIC 850K array manifest B5 (see the section ‘Regional methylation analysis’ in Supplementary Methods), resulting in three datasets, respectively named MethPro, MethEnh and MethBod. For each analysis and dataset, the top 5,000 most variable CpGs (calculated from M values) were selected.
Preprocessing of copy number changes
Copy number change data were available for the MESOMICS, TCGA and MPM cell line cohorts. We assessed the global (total) and minor (minor) allele copy number states at the gene level using, respectively, the total (total) and minor (minor) copy number estimate given by PURPLE (see the section ‘CNV calling’) on the hg38 genome for the MESOMICS cohort and SNP array estimates downloaded from the Genomic Data Commons portal for the TCGA–MESO cohort4 and from the Cell Model Passports portal for the MPM cell lines.
For the three analyses, the resulting value assigned to each gene is an average of the copy number estimate of the tumor by taking into account the tumor purity (purity) estimated by PURPLE. To avoid redundancy, genes with exactly the same resulting copy number value in all samples (because of their genome location proximity) were grouped as one single feature in the dataset. Only the genes or groups of genes altered in at least three samples were selected. To ensure continuity of the data, which is technically necessary for the algorithm, the copy number estimates were centered and scaled before being integrated into the MOFA algorithm. For consistency, somatic CNVs occurring on sex chromosomes were removed and the top 5,000 most variable genes or groups of genes were selected to be integrated.
Preprocessing of genomic alterations data
Somatic structural variants data were used only for integrative analyses (1) and (4), while somatic mutations were used in all analyses. Each gene, altered by somatic splicing, structural variants or exonic, damaging mutations (see the section ‘Damaging variants and driver detection’ in Supplementary Methods) was integrated in a common dataset. Of note, for missense mutations, we used the REVEL annotation included in ANNOVAR for predicting the pathogenicity of these variants and we used a 0.5 cut-off to restrict to the most likely damaging missense events. We also removed genes altered in fewer than three samples. For consistency, we selected genes in non-sex chromosomes, protein-coding or long noncoding RNA genes, and with expression greater than or equal to 0.01 fragment per kilobase of exon per million mapped fragments (FPKM) in at least one sample of the cohort, to be sure to include genes expressed in mesothelioma. We integrated the resulting datasets as a Boolean variable in the following analyses.
Multiomic integrative analyses
To provide an integrative low-dimensional summary of the molecular variation across the samples, we performed continuous latent factors identification using the software MOFA (R package MOFA2, version 1.7.0). Indeed, MOFA is able to integrate different molecular datasets (layers) by generating independent continuous variables, named latent factors, that explain most variation from the joint datasets. In total, we performed five analyses: (1) MOFA–MESOMICS (n = 120; Fig. 1 and Extended Data Fig. 1a); (2) MOFA–Bueno (n = 181; Extended Data Fig. 1c); (3) MOFA–TCGA (n = 73; Extended Data Fig. 1b); (4) MOFA–3 cohorts (n = 374; Extended Data Fig. 1d) and (5) MOFA–cell lines, as described above (n = 59; Supplementary Fig. 4). Additionally, we ran MOFA on our discovery cohort, including the ITH samples (MOFA–ITH; n = 134) to evaluate the ITH within MPM samples.
MOFA was performed independently for each analysis, setting the number of latent factors to ten (function runMOFA from the R package MOFA2). A summary of all of these runs is given in Extended Data Figs. 1 and 2, Fig. 1 and Supplementary Figs. 1 and 4 and coordinates and proportions of variance explained for models (1)–(4) are given in Supplementary Tables 4–8, while those for MOFA–ITH are given in Supplementary Tables 50–52 and those for the cell lines (model (5)) are given in Supplementary Tables 23–26. A comparison with other multiomic methods is provided in Extended Data Fig. 10 (see section 'Multiomic integrative analyses details' in Supplementary Methods).
Evolutionary tumor trade-off analyses
Pareto task identification
The Pareto front model was fitted to different sets of samples using the ParetoTI R package (https://github.com/vitkl/ParetoTI; release version 0.1.13), following the above-mentioned analyses (1)–(4), and additionally on two different kinds of molecular maps: using MOFA (restricting to LF1, LF2, LF3 and LF4) and using expression principal component analysis as technical validation (see the section ‘RNA-seq’). In brief, the algorithm tries to find polyhedra by testing successively 1 to n axes, adding them one after another in decreasing order of transcriptomic variance explained. For this technical reason, the MOFA latent factors were ordered as follows by decreasing transcriptomic variance explained: morphology factor (LF2), adaptive response factor (LF3), CIMP factor (LF4) and ploidy factor (LF1). For each number n of axes used, ParetoTI identifies the position of the n + 1 = k vertices (archetypes) in the molecular map defined, and we used 200 bootstraps, each taking 75% of the data to measure the variability in archetype position and infer archetype positions robust to outliers (function fit_pch_bootstrap with the parameters bootstrap = T and bootstrap_N = 200; see our code at https://github.com/IARCbioinfo/MESOMICS_data/blob/main/phenotypic_map/MESOMICS/PhenotypicMap_MESOMICS.md).
Interpretation of tumor archetypes
To further characterize the phenotype of each archetype, we used the proportion of each archetype for each sample estimated by ParetoTI. These proportions were used as continuous variables to further test the association between each archetype and clinical, epidemiological and morphological variables, as well as molecular data (Supplementary Tables 27–30).
More specifically, we inferred each archetype phenotype by performing IGSEA on the expression data. To do so, we used the ActivePathways R package (https://github.com/reimandlab/ActivePathways; release version 1.1.0), which is a tool able to integrate different sources of molecular variation to assess the enrichment of Gene Ontology terms by combining P values from different association tests between sources and gene-level data. Here we integrated these proportions as different axes of molecular variation. We restricted the Gene Ontology terms to a minimum size of 20 genes and a maximum size of 1,000 genes as the default parameters of ActivePathways. To infer the pathways specifically altered in each archetype, we integrated the Pearson’s P value correlation of each gene from the expression matrix of 59,607 genes with the proportion from each archetype and we selected the pathways for which the enrichment source only corresponded to the tested archetype. We performed two kinds of analyses: one restricted to the genes positively correlated with the proportion (to obtain the upregulated pathways) and the other restricted to the negatively correlated genes (to identify the downregulated pathways).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-023-01321-1.
Supplementary information
Acknowledgements
The MESOMICS project is part of the Rare Cancers Genomics initiative (www.rarecancersgenomics.com) led by the Rare Cancers Genomics team at the IARC (https://www.iarc.who.int/teams-gem-rcg/). We thank the patients for donating tumor specimens. The human biological samples and associated data were obtained from the French MESOBANK. We also thank R. Argelaguet for advice on using MOFA, H. Begueret, N. Rousseau, D. Bozonnet, E. Wasielewski, G. Clapisson, C. Bonnetaud, K. Washetine, A. Lupo Mansuet, C. Cuenin and E. Clermont for their contribution to the biorepository. We acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of the study authors. The results published here are in part based on data generated by the the TCGA Research Network (https://www.cancer.gov/tcga). We also thank the French National Mesothelioma Surveillance Program and Santé Publique France. This work has been funded by the French National Cancer Institute (PRT-K 2016-039 to L.F.-C. and M.F.) and the Ligue Nationale Contre le Cancer (LNCC 2017 and 2020 to L.F.-C. and M.F.). L.M. has a fellowship from the Ligue Nationale Contre le Cancer. This work also benefited from support from the France Génomique national infrastructure, funded as part of the Investissements d’Avenir program managed by the Agence Nationale de la Recherche (contract ANR-10-INBS-09). Other funding was provided by the Spanish Ministry of Science and Innovation (PID2019‐105201RB‐I00 to J.P.C.), the Instituto de Salud Carlos III, co‐funded by the European Union (ERDF/ESF; Investing in Your Future), a Sara Borrell postdoctoral grant (CD19/00255 to A.I.-C.), the Spanish Ministry of Universities (predoctoral contract FPU18/02275 to R.B.-E.), the Junta de Andalucía (BIO‐0139) and the Universidad de Córdoba-FEDER (UCO-202099901918904) (to J.P.C. and A.I.-C.), a GETNE2019 Research grant to J.P.C. and the CIBER Fisiopatología de la Obesidad y Nutrición (CIBER is an initiative of the Instituto de Salud Carlos III). We finally thank the reviewers and the editor for taking the time to provide very useful and constructive feedback.
Extended data
Source data
Author contributions
L.F.-C. and M.F. conceived the study idea. L.F.-C., M.F., L.M., N.A., A.D.G. and A.S.-O. developed the study methodology. L.M., N.A., A.D.G., A.S.-O., A.G.-P., A.K., E.N.B. and C.V. developed software. L.M., N.A., A.D.G. and A.S.-O. validated the results. L.M., N.A., A.D.G., A.S.-O., A.G.-P., A.K., E.N.B., C.V., M.A., C.M., P.C., A.G.-P. and F.G.-S. performed the formal analyses. L.M., N.A., A.D.G., A.S.-O., A.G.-P., A.K., E.N.B., J.K., X.L., R.B.-E., A.I.-C., J.P.C., C. Giacobi, M.A., L.S., T.M.D., A.P., C.M. and P.C. performed the investigation. N.L.S., S.B., S.T.-E., F.D., M.B., M.-C.C., S.G.-C., D.D., C. Girard, V.H., P.H., J. Mouroux., C. Cohen, S. Lacomme, J. Mazieres, V.T.d.M., C.P., G.P., N.R., I.R., C.S., A.S., F.T., J.-M.V., A.G.S.I., R.O., V.M., S. Lantuejoul and F.G.-S. provided resources. C. Cuenin performed the methylation experiments. L.M., N.A., A.D.G., A.S.-O. and C.V. curated the data. L.F.-C., M.F., L.M., N.A., A.D.G. and A.S.-O. wrote the original draft of the manuscript. L.F.-C., M.F., L.M., N.A., A.D.G., A.S.-O., A.G.-P., A.K., D.J., H.H.-V., C. Caux, N.G., N.L.-B., L.B.A. and F.G.-S. reviewed and edited the manuscript. L.M., N.A., A.D.G and A.S.-O. created the visualizations of the results. L.F.-C., M.F. and N.A. supervised the project. L.F.-C., M.F., L.M., N.A., M.-C.M., A.B.-A., J.-F.D., J.A., P.N. and A.G. administered the project. L.F.-C., M.F. and N.A. acquired funding. L.F.-C., M.F., L.M., N.A. and A.S.-O. revised the manuscript.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
The genome sequencing, RNA-seq and methylation data have been deposited in the EGA database, which is hosted at the European Bioinformatics Institute and Centre for Genomic Regulation under accession number EGAS00001004812. Because raw omics datasets derived from humans are at risk of re-identification when combined with information from other public sources, access must be requested from the MESOMICS data access committee, as detailed at https://ega-archive.org/studies/EGAS00001004812. Minimum datasets of processed somatic alterations for genomic, transcriptomic and epigenomic data, sufficient to reproduce, interpret and extend our main results, are publicly available at https://github.com/IARCbioinfo/MESOMICS_data/tree/main/phenotypic_map/MESOMICS. A data note manuscript detailing all of the quality controls of the dataset is available at https://www.biorxiv.org/content/10.1101/2022.07.06.499003v1 (ref. 82). TCGA whole-exome sequencing, RNA-seq and methylation array data are available from the Genomic Data Commons portal (TCGA–MESO cohort4). Whole-exome sequencing and RNA-seq data from the Bueno and colleagues cohort3 are available from the EGA under accession number EGAS00001001563. Small variant lists, RNA-seq, expression array and methylation data for the Iorio and colleagues cohort18 are available from the Gene Expression Omnibus (accession number GSE29354), EGA (accession number EGAS00001000828) and Sequence Read Archive (accession number PRJNA523380). Corresponding drug responses are available from the cancerrxgene.org website (https://www.cancerrxgene.org/downloads/drug_data?tissue=MESO; accessed July 2021). Expression array data for the de Reyniès and colleagues cohort5 are available from the ArrayExpress platform (E-MTAB-1719) and corresponding drug response data are available from the supplementary material of Blum et al.7. All of the other data supporting the findings of this study are available within the article and its Supplementary Information files. Further information and requests for resources should be directed to and will be fulfilled by M.F. (follm@iarc.who.int). Source data are provided with this paper.
Code availability
All bioinformatics pipelines are available at https://github.com/IARCbioinfo (see Methods for details about which pipelines and versions were used for each analysis). A detailed R notebook allowing reproduction of the MOFA and Pareto tumor task inference results for the MESOMICS cohort is available at https://github.com/IARCbioinfo/MESOMICS_data.
Competing interests
Where authors are identified as personnel of the IARC/WHO, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the IARC/WHO. Where authors are identified as personnel of the Centre de Recherche en Cancérologie de Lyon, the authors declare no competing interests. A.S. participated in expert boards and clinical trials with AstraZeneca, Bristol-Myers Squibb, MSD and Roche. N.G. declares consultancy for and research support from Bristol-Myers Squibb, AstraZeneca, Roche and MSD. S Lantuejoul declares research support from AstraZeneca, Sanofi, Bristol-Myers Squibb, Janssen and Eli Lilly and has participated in expert boards for MSD and Bristol-Myers Squibb. D.D. declares research support from AstraZeneca. J. Mazieres declares consultancy for and research support from Roche, AstraZeneca, Bristol-Myers Squibb, MSD and Pierre Fabre. M.B. declares consultancy for and research support from AstraZeneca, Bristol-Myers Squibb and Amgen. I.R. participated in expert boards for AstraZeneca, MSD and Bristol-Myers Squibb. L.B.A. is a compensated consultant and has equity interest in io9. His spouse is an employee of Biotheranostics. L.B.A. is also an inventor on a US patent (10,776,718) relating to source identification by non-negative matrix factorization. E.N.B. and L.B.A. declare US provisional patent applications with the serial numbers 63/289,601 and 63/269,033. L.B.A. declares US provisional patent applications with the serial numbers 63/366,392, 63/367,846 and 63/412,835. C.M. is employed by and has equity interest in Owkin. C.M., P.C. and F.G.-S. are inventors on the US patent 17185924 ‘Systems and methods for mesothelioma feature detection and enhanced prognosis or response to treatment’. All other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Lise Mangiante, Nicolas Alcala, Alexandra Sexton-Oates, Alex Di Genova.
These authors jointly supervised this work: Nicolas Alcala, Matthieu Foll, Lynnette Fernandez-Cuesta.
Contributor Information
Matthieu Foll, Email: follm@iarc.who.int.
Lynnette Fernandez-Cuesta, Email: fernandezcuestal@iarc.who.int.
Extended data
is available for this paper at 10.1038/s41588-023-01321-1.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-023-01321-1.
References
- 1.Carbone M, et al. Mesothelioma: scientific clues for prevention, diagnosis, and therapy. CA Cancer J. Clin. 2019;69:402–429. doi: 10.3322/caac.21572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.WHO Classification of Tumours, Thoracic Tumours (5th edn) (International Agency for Research on Cancer, 2020).
- 3.Bueno R, et al. Comprehensive genomic analysis of malignant pleural mesothelioma identifies recurrent mutations, gene fusions and splicing alterations. Nat. Genet. 2016;48:407–416. doi: 10.1038/ng.3520. [DOI] [PubMed] [Google Scholar]
- 4.Hmeljak J, et al. Integrative molecular characterization of malignant pleural mesothelioma. Cancer Discov. 2018;8:1548–1565. doi: 10.1158/2159-8290.CD-18-0804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.De Reyniès A, et al. Molecular classification of malignant pleural mesothelioma: identification of a poor prognosis subgroup linked to the epithelial-to-mesenchymal transition. Clin. Cancer Res. 2014;20:1323–1334. doi: 10.1158/1078-0432.CCR-13-2429. [DOI] [PubMed] [Google Scholar]
- 6.Alcala N, et al. Redefining malignant pleural mesothelioma types as a continuum uncovers immune–vascular interactions. EBioMedicine. 2019;48:191–202. doi: 10.1016/j.ebiom.2019.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Blum Y, et al. Dissecting heterogeneity in malignant pleural mesothelioma through histo-molecular gradients for clinical applications. Nat. Commun. 2019;10:1333. doi: 10.1038/s41467-019-09307-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nicholson AG, et al. EURACAN/IASLC proposals for updating the histologic classification of pleural mesothelioma: towards a more multidisciplinary approach. J. Thorac. Oncol. 2020;15:29–49. doi: 10.1016/j.jtho.2019.08.2506. [DOI] [PubMed] [Google Scholar]
- 9.Fernandez-Cuesta L, Mangiante L, Alcala N, Foll M. Challenges in lung and thoracic pathology: molecular advances in the classification of pleural mesotheliomas. Virchows Arch. 2021;478:73–80. doi: 10.1007/s00428-020-02980-9. [DOI] [PubMed] [Google Scholar]
- 10.Cortés-Ciriano I, et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 2020;52:331–341. doi: 10.1038/s41588-019-0576-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature. 2020;578:82–93. doi: 10.1038/s41586-020-1969-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Quinton RJ, et al. Whole-genome doubling confers unique genetic vulnerabilities on tumour cells. Nature. 2021;590:492–497. doi: 10.1038/s41586-020-03133-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Creaney J, et al. Comprehensive genomic and tumour immune profiling reveals potential therapeutic targets in malignant pleural mesothelioma. Genome Med. 2022;14:58. doi: 10.1186/s13073-022-01060-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Argelaguet R, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21:111. doi: 10.1186/s13059-020-02015-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Courtiol P, et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 2019;25:1519–1525. doi: 10.1038/s41591-019-0583-3. [DOI] [PubMed] [Google Scholar]
- 16.Baylin SB, Jones PA. Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 2016;8:a019505. doi: 10.1101/cshperspect.a019505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sondka Z, et al. The COSMIC cancer gene census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer. 2018;18:696–705. doi: 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Iorio F, et al. A landscape of pharmacogenomic interactions in cancer. Cell. 2016;166:740–754. doi: 10.1016/j.cell.2016.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hausser J, Alon U. Tumour heterogeneity and the evolutionary trade-offs of cancer. Nat. Rev. Cancer. 2020;20:247–257. doi: 10.1038/s41568-020-0241-6. [DOI] [PubMed] [Google Scholar]
- 20.Hausser J, et al. Tumor diversity and the trade-off between universal cancer tasks. Nat. Commun. 2019;10:5423. doi: 10.1038/s41467-019-13195-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Turini S, Bergandi L, Gazzano E, Prato M, Aldieri E. Epithelial to mesenchymal transition in human mesothelial cells exposed to asbestos fibers: role of TGF-β as mediator of malignant mesothelioma development or metastasis via EMT event. Int. J. Mol. Sci. 2019;20:150. doi: 10.3390/ijms20010150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shipony Z, et al. Dynamic and static maintenance of epigenetic memory in pluripotent and somatic cells. Nature. 2014;513:115–119. doi: 10.1038/nature13458. [DOI] [PubMed] [Google Scholar]
- 23.Chapel DB, et al. MTAP immunohistochemistry is an accurate and reproducible surrogate for CDKN2A fluorescence in situ hybridization in diagnosis of malignant pleural mesothelioma. Mod. Pathol. 2020;33:245–254. doi: 10.1038/s41379-019-0310-0. [DOI] [PubMed] [Google Scholar]
- 24.Alexandrov LB, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101. doi: 10.1038/s41586-020-1943-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Steele CD, et al. Signatures of copy number alterations in human cancer. Nature. 2022;606:984–991. doi: 10.1038/s41586-022-04738-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bergstrom EN, et al. Mapping clustered mutations in cancer reveals APOBEC3 mutagenesis of ecDNA. Nature. 2022;602:510–517. doi: 10.1038/s41586-022-04398-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ladan MM, van Gent DC, Jager A. Homologous recombination deficiency testing for BRCA-like tumors: the road to clinical validation. Cancers. 2021;13:1004. doi: 10.3390/cancers13051004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Toh M, Ngeow J. Homologous recombination deficiency: cancer predispositions and treatment implications. Oncologist. 2021;26:e1526–e1537. doi: 10.1002/onco.13829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Ghafoor A, et al. Phase 2 study of olaparib in malignant mesothelioma and correlation of efficacy with germline or somatic mutations in BAP1 gene. JTO Clin. Res Rep. 2021;2:100231. doi: 10.1016/j.jtocrr.2021.100231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Martínez-Jiménez F, et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer. 2020;20:555–572. doi: 10.1038/s41568-020-0290-x. [DOI] [PubMed] [Google Scholar]
- 31.De Rienzo A, et al. Gender-specific molecular and clinical features underlie malignant pleural mesothelioma. Cancer Res. 2016;76:319–328. doi: 10.1158/0008-5472.CAN-15-0751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kato S, et al. Genomic landscape of malignant mesotheliomas. Mol. Cancer Ther. 2016;15:2498–2507. doi: 10.1158/1535-7163.MCT-16-0229. [DOI] [PubMed] [Google Scholar]
- 33.Shukuya T, et al. Identification of actionable mutations in malignant pleural mesothelioma. Lung Cancer. 2014;86:35–40. doi: 10.1016/j.lungcan.2014.08.004. [DOI] [PubMed] [Google Scholar]
- 34.Mansfield AS, et al. Neoantigenic potential of complex chromosomal rearrangements in mesothelioma. J. Thorac. Oncol. 2019;14:276–287. doi: 10.1016/j.jtho.2018.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.McLoughlin KC, Kaufman AS, Schrump DS. Targeting the epigenome in malignant pleural mesothelioma. Transl. Lung Cancer Res. 2017;6:350–365. doi: 10.21037/tlcr.2017.06.06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pastorino S, et al. A subset of mesotheliomas with improved survival occurring in carriers of BAP1 and other germline mutations. J. Clin. Oncol. 2018;36:3485–3494. doi: 10.1200/JCO.2018.79.0352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hylebos M, et al. Molecular analysis of an asbestos-exposed Belgian family with a high prevalence of mesothelioma. Fam. Cancer. 2018;17:569–576. doi: 10.1007/s10689-018-0095-1. [DOI] [PubMed] [Google Scholar]
- 38.Bielski CM, et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet. 2018;50:1189–1195. doi: 10.1038/s41588-018-0165-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Turcan S, et al. IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature. 2012;483:479–483. doi: 10.1038/nature10866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Margueron R, Reinberg D. The Polycomb complex PRC2 and its mark in life. Nature. 2011;469:343–349. doi: 10.1038/nature09784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zauderer MG, et al. A randomized phase II trial of adjuvant galinpepimut-S, WT-1 analogue peptide vaccine, after multimodality therapy for patients with malignant pleural mesothelioma. Clin. Cancer Res. 2017;23:7483–7489. doi: 10.1158/1078-0432.CCR-17-2169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Phipps AI, et al. Association between molecular subtypes of colorectal cancer and patient survival. Gastroenterology. 2015;148:77–87.e2. doi: 10.1053/j.gastro.2014.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Malta TM, et al. Glioma CpG island methylator phenotype (G-CIMP): biological and clinical implications. Neuro. Oncol. 2018;20:608–620. doi: 10.1093/neuonc/nox183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Sreejit G, et al. The ESAT-6 protein of Mycobacterium tuberculosis interacts with beta-2-microglobulin (β2M) affecting antigen presentation function of macrophage. PLoS Pathog. 2014;10:e1004446. doi: 10.1371/journal.ppat.1004446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zanetti M. Chromosomal chaos silences immune surveillance. Science. 2017;355:249–250. doi: 10.1126/science.aam5331. [DOI] [PubMed] [Google Scholar]
- 46.Gerstung M, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578:122–128. doi: 10.1038/s41586-019-1907-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fujiwara T, et al. Cytokinesis failure generating tetraploids promotes tumorigenesis in p53-null cells. Nature. 2005;437:1043–1047. doi: 10.1038/nature04217. [DOI] [PubMed] [Google Scholar]
- 48.López S, et al. Interplay between whole-genome doubling and the accumulation of deleterious alterations in cancer evolution. Nat. Genet. 2020;52:283–293. doi: 10.1038/s41588-020-0584-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Advani SM, et al. Clinical, pathological, and molecular characteristics of CpG island methylator phenotype in colorectal cancer: a systematic review and meta-analysis. Transl. Oncol. 2018;11:1188–1201. doi: 10.1016/j.tranon.2018.07.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Noushmehr H, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010;17:510–522. doi: 10.1016/j.ccr.2010.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Hughes LAE, et al. The CpG island methylator phenotype: what’s in a name? Cancer Res. 2013;73:5858–5868. doi: 10.1158/0008-5472.CAN-12-4306. [DOI] [PubMed] [Google Scholar]
- 52.Moarii M, Reyal F, Vert J-P. Integrative DNA methylation and gene expression analysis to assess the universality of the CpG island methylator phenotype. Hum. Genomics. 2015;9:26. doi: 10.1186/s40246-015-0048-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Maley CC, et al. Classifying the evolutionary and ecological features of neoplasms. Nat. Rev. Cancer. 2017;17:605–619. doi: 10.1038/nrc.2017.69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Vendramin R, Litchfield K, Swanton C. Cancer evolution: Darwin and beyond. EMBO J. 2021;40:e108389. doi: 10.15252/embj.2021108389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Gould, S. J. & Eldredge, N. Punctuated equilibria: an alternative to phyletic gradualism. In Schopf, T.J.M. Models in Paleobiology 82–115 (Freeman Cooper, 1972).
- 56.Zolondick AA, et al. Asbestos-induced chronic inflammation in malignant pleural mesothelioma and related therapeutic approaches—a narrative review. Precis. Cancer Med. 2021;4:27–27. doi: 10.21037/pcm-21-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Southwood TRE, May RM, Hassell MP, Conway GR. Ecological strategies and population parameters. Am. Nat. 1974;108:791–804. doi: 10.1086/282955. [DOI] [Google Scholar]
- 58.Napolitano A, et al. Minimal asbestos exposure in germline BAP1 heterozygous mice is associated with deregulated inflammatory response and increased risk of mesothelioma. Oncogene. 2016;35:1996–2002. doi: 10.1038/onc.2015.243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Adashek JJ, Goloubev A, Kato S, Kurzrock R. Missing the target in cancer therapy. Nat. Cancer. 2021;2:369–371. doi: 10.1038/s43018-021-00204-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Gay CM, et al. Patterns of transcription factor programs and immune pathway activation define four major subtypes of SCLC with distinct therapeutic vulnerabilities. Cancer Cell. 2021;39:346–360.e7. doi: 10.1016/j.ccell.2020.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Dora D, et al. Neuroendocrine subtypes of small cell lung cancer differ in terms of immune microenvironment and checkpoint molecule distribution. Mol. Oncol. 2020;14:1947–1965. doi: 10.1002/1878-0261.12741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Owonikoko TK, et al. YAP1 expression in SCLC defines a distinct subtype with T-cell-inflamed phenotype. J. Thorac. Oncol. 2021;16:464–476. doi: 10.1016/j.jtho.2020.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Galateau-Salle F, Churg A, Roggli V, Travis WD, World Health Organization Committee for Tumors of the Pleura. The 2015 World Health Organization Classification of Tumors of the Pleura: advances since the 2004 classification. J. Thorac. Oncol. 2016;11:142–154. doi: 10.1016/j.jtho.2015.11.005. [DOI] [PubMed] [Google Scholar]
- 64.WHO Classification of Tumours of the Lung, Pleura, Thymus and Heart (4th edn) (International Agency for Research on Cancer, 2015).
- 65.Wasserstein RL, Lazar NA. The ASA statement on P-values: context, process, and purpose. Am Stat. 2016;70:129–133. doi: 10.1080/00031305.2016.1154108. [DOI] [Google Scholar]
- 66.Alcala N, et al. Integrative and comparative genomic analyses identify clinically relevant pulmonary carcinoid groups and unveil the supra-carcinoids. Nat. Commun. 2019;10:3407. doi: 10.1038/s41467-019-11276-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017;35:316–319. doi: 10.1038/nbt.3820. [DOI] [PubMed] [Google Scholar]
- 68.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics. 2014;30:2503–2505. doi: 10.1093/bioinformatics/btu314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).
- 72.Benjamin, D. et al. Calling somatic SNVs and indels with Mutect2. Preprint at bioRxiv10.1101/861054 (2019).
- 73.Kim S, et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods. 2018;15:591–594. doi: 10.1038/s41592-018-0051-x. [DOI] [PubMed] [Google Scholar]
- 74.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Cameron, D. L. et al. GRIDSS, PURPLE, LINX: Unscrambling the tumor genome via integrated analysis of structural variation and copy number. Preprint at bioRxiv10.1101/781013 (2019).
- 76.Wala JA, et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 2018;28:581–591. doi: 10.1101/gr.221028.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chen X, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32:1220–1222. doi: 10.1093/bioinformatics/btv710. [DOI] [PubMed] [Google Scholar]
- 78.Rausch T, et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Jeffares DC, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 2017;8:14061. doi: 10.1038/ncomms14061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Mose LE, Perou CM, Parker JS. Improved indel detection in DNA and RNA via realignment with ABRA2. Bioinformatics. 2019;35:2966–2973. doi: 10.1093/bioinformatics/btz033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Du P, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010;11:587. doi: 10.1186/1471-2105-11-587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Genova AD, et al. A molecular phenotypic map of malignant pleural mesothelioma. Gigascience. 2022;12:giac128. doi: 10.1093/gigascience/giac128. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The genome sequencing, RNA-seq and methylation data have been deposited in the EGA database, which is hosted at the European Bioinformatics Institute and Centre for Genomic Regulation under accession number EGAS00001004812. Because raw omics datasets derived from humans are at risk of re-identification when combined with information from other public sources, access must be requested from the MESOMICS data access committee, as detailed at https://ega-archive.org/studies/EGAS00001004812. Minimum datasets of processed somatic alterations for genomic, transcriptomic and epigenomic data, sufficient to reproduce, interpret and extend our main results, are publicly available at https://github.com/IARCbioinfo/MESOMICS_data/tree/main/phenotypic_map/MESOMICS. A data note manuscript detailing all of the quality controls of the dataset is available at https://www.biorxiv.org/content/10.1101/2022.07.06.499003v1 (ref. 82). TCGA whole-exome sequencing, RNA-seq and methylation array data are available from the Genomic Data Commons portal (TCGA–MESO cohort4). Whole-exome sequencing and RNA-seq data from the Bueno and colleagues cohort3 are available from the EGA under accession number EGAS00001001563. Small variant lists, RNA-seq, expression array and methylation data for the Iorio and colleagues cohort18 are available from the Gene Expression Omnibus (accession number GSE29354), EGA (accession number EGAS00001000828) and Sequence Read Archive (accession number PRJNA523380). Corresponding drug responses are available from the cancerrxgene.org website (https://www.cancerrxgene.org/downloads/drug_data?tissue=MESO; accessed July 2021). Expression array data for the de Reyniès and colleagues cohort5 are available from the ArrayExpress platform (E-MTAB-1719) and corresponding drug response data are available from the supplementary material of Blum et al.7. All of the other data supporting the findings of this study are available within the article and its Supplementary Information files. Further information and requests for resources should be directed to and will be fulfilled by M.F. (follm@iarc.who.int). Source data are provided with this paper.
All bioinformatics pipelines are available at https://github.com/IARCbioinfo (see Methods for details about which pipelines and versions were used for each analysis). A detailed R notebook allowing reproduction of the MOFA and Pareto tumor task inference results for the MESOMICS cohort is available at https://github.com/IARCbioinfo/MESOMICS_data.