Abstract
The diagnosis of sinonasal tumors is challenging due to a heterogeneous spectrum of various differential diagnoses as well as poorly defined, disputed entities such as sinonasal undifferentiated carcinomas (SNUCs). In this study, we apply a machine learning algorithm based on DNA methylation patterns to classify sinonasal tumors with clinical-grade reliability. We further show that sinonasal tumors with SNUC morphology are not as undifferentiated as their current terminology suggests but rather reassigned to four distinct molecular classes defined by epigenetic, mutational and proteomic profiles. This includes two classes with neuroendocrine differentiation, characterized by IDH2 or SMARCA4/ARID1A mutations with an overall favorable clinical course, one class composed of highly aggressive SMARCB1-deficient carcinomas and another class with tumors that represent potentially previously misclassified adenoid cystic carcinomas. Our findings can aid in improving the diagnostic classification of sinonasal tumors and could help to change the current perception of SNUCs.
Subject terms: Head and neck cancer, DNA methylation, Proteomics, Machine learning
Sinonasal tumour diagnosis can be complicated by the heterogeneity of disease and classification systems. Here, the authors use machine learning to classify sinonasal undifferentiated carcinomas into 4 molecular classe with differences in differentiation state and clinical outcome.
Introduction
Although tumors of the sinonasal region only account for a small fraction of head and neck tumors, they encompass a diverse spectrum of epithelial, mesenchymal and neuroectodermal neoplasms1. The complexity of these tumors presents a major challenge for histopathological diagnosis, even for trained head and neck pathologists2. In fact, tumors of the sinonasal region have been reported to show the highest rate of conflicting diagnoses among all head and neck tumors3.
Sinonasal undifferentiated carcinomas (SNUC) represent an especially challenging diagnosis. SNUCs are aggressive carcinomas that lack a definite lineage-specific differentiation4. For diagnostic evaluation, a variety of other entities have to be excluded, such as poorly differentiated carcinomas or high-grade olfactory neuroblastomas. Histologically, SNUCs by definition lack squamous or glandular differentiation but may show subtle neuroendocrine features and thus may focally resemble neuroendocrine carcinomas5–7. In recent years, molecular analyses of SNUCs have revealed a high rate of IDH2 mutations or alterations of the switch/sucrose non-fermentable (SWI/SNF) complex leading to SMARCB1 or SMARCA4 deficiency8–11. These distinct molecular patterns as well as the occasional morphological and immunohistochemical resemblance to neuroendocrine carcinoma are challenging the current definition of SNUC as a single entity.
DNA methylation is an epigenetic modification of the DNA which regulates gene expression. It plays a significant role in the differentiation of different cell types and it has been shown that DNA methylation patterns are highly tissue-specific12. Although epigenetic alterations represent one of the hallmarks of cancer development, the global DNA methylation signature of tumor cells is thought to contain substantial information about the cell of origin, making DNA methylation an ideal tool for tumor classification13. From a technical perspective, methylated DNA is highly robust (in contrast to alternative molecules such as RNA), enabling the retrospective analysis of formalin-fixed and paraffin embedded (FFPE) samples, almost irrespective of sample age. Using this approach, significant cohorts of even exceedingly rare tumors can be assembled. For these reasons, DNA methylation has shown promising results in the classification of a growing number of malignancies14–18. DNA methylation profiling of olfactory neuroblastomas and a cohort of sinonasal carcinomas showed that IDH2 mutated and SMARCB1 deficient carcinomas likely represent epigenetically distinct classes10,19. Furthermore, it has been suggested that IDH2 mutated neuroendocrine carcinomas and IDH2 mutated SNUCs may represent the same entity, due to their epigenetic similarity10,20.
For this study, we collected a cohort of 395 sinonasal tumors and relevant differential diagnoses encompassing 18 different histologically defined entities as well as normal sinonasal control tissue to elucidate the epigenetic landscape of these tumors. Within this dataset we identified highly robust DNA methylation-based tumor classes. By further integrating mutational profiling and mass spectrometry-based proteomics, we provide sound evidence that tumors with SNUC morphology consist of four distinct epigenetic subclasses which are supported by different driver mutations landscape and protein expression profiles. Furthermore, we provide a machine learning-based algorithm for reliable classification of diagnostic samples which may improve the histopathological diagnosis of challenging cases.
Results
Identification of DNA methylation-based sinonasal tumor classes
To test if DNA methylation-based tumor classification for SNUCs was applicable, we obtained a cohort of 429 high quality DNA methylation profiles of sinonasal tumors and normal tissue. A t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction and unsupervised clustering of the 20,000 most variable CpG sites was used to assess the optimal number of classes and for best partition. The distribution of the CpG sites selected for class separation showed a uniform chromosomal distribution and did not show any enrichment in the functionally relevant promotor regions when compared to the overall array design (Supplementary Fig. 1). The clustering algorithm identified 34 cases as noise or singularity points that did not correspond to a stable cluster, including a relatively high number of cases that were histologically classified as neuroendocrine carcinomas (11/24) or SNUCs (15/84). Noise data points were excluded, resulting in a final reference set comprising of 395 samples, covering 18 tumor entities as defined in the WHO Classification of Head and Neck Tumors as well as normal sinonasal tissue4. The workflow for the compilation of the reference set is also summarized in Supplementary Fig. 2.
The t-SNE dimensionality reduction of the final reference set is shown in Fig. 1. A total of 18 distinct and stable epigenetic classes were identified (Supplementary Data 1). We did not observe any batch effects related to possible confounding factors (Supplementary Fig. 3). Iterative random down-sampling with correlation analysis of the t-SNE coordinates indicated a high stability of the classes with a median correlation coefficient of 0.992 (Range: 0.945 to 0.999; Supplementary Fig. 4). 14 classes were equivalent to their conventional histopathological classification as defined in the WHO classification. The remaining four DNA methylation classes included 133 tumors from a spectrum of different histological entities. Notably, all 69 SNUC samples of the reference set were among these 133 tumors. These four SNUC DNA methylation classes were further molecularly characterized (see results below) and based on these findings were assigned the provisional names NEC-like IDH2, SMARCB1, ACC and NEC-like SMARCA4/ARID1A. Summary copy number plots derived from DNA methylation data for all classes are shown in Supplementary Fig. 5.
Reassessment of SNUC classes
To further evaluate tumor specimens that were assigned to the four SNUC classes defined by distinct DNA methylation profiles, we reviewed the available histopathological and molecular data on these cases.
The NEC-like IDH2 class (n = 48) contained tumors that had initially been diagnosed as either SNUCs, olfactory neuroblastomas, neuroendocrine carcinomas or adenocarcinomas (Fig. 2A). Molecular reports for these cases indicated a strong association with IDH2 mutations. Additional mutational analysis confirmed that all cases with available tissue for testing harbored IDH2 R172 hotspot mutations (Fig. 2B). Copy number profiles derived from DNA methylation data showed highly recurrent chromosomal aberrations, including gain of chromosome 1q as well as loss of chromosome 17p in combination with gain of chromosome 17q (Fig. 2C). Furthermore, tumors from this class showed a CpG island hypermethylation phenotype (Supplementary Fig. 6).
The SMARCB1 class (n = 27) consisted of histologically diagnosed SNUCs, neuroendocrine carcinomas, poorly differentiated carcinomas and atypical teratoid/rhabdoid tumors of adults in the sellar region (Fig. 2D)21. Tumors of this group were characterized by recurrent deletion of the SMARCB1 gene locus (21/27; 80%) and subsequent loss of INI1 protein expression all cases with available tissue (16/16; 100%) including cases where the chromosomal SMARCB1 loss was not identifiable (Fig. 2E). Apart from SMARCB1 loss, we observed no additional highly recurrent chromosomal alterations (Fig. 2F). Based on these findings, we conclude that inactivation of SMARCB1 is the defining alteration for tumors from this DNA methylation subtype.
The ACC class (n = 25) mainly contained conventional adenoid cystic carcinomas (13/25; 52%), but also tumors that were initially diagnosed as adenocarcinomas, poorly differentiated carcinomas and SNUCs by means of conventional diagnostic criteria (Fig. 2G). By reevaluation of histomorphological areas, we found subtle adenoid cystic differentiation in two specimens that had initially been diagnosed as adenocarcinomas. Furthermore, FISH revealed MYB breaks prototypical for adenoid cystic carcinomas in three specimens with SNUC morphology (Fig. 2H). Based on these observations, we concluded that tumors from this class most likely represent histologically misclassified high-grade adenoid cystic carcinomas. Copy number profiling revealed few recurrent alterations, but loss of chromosome 6q was present in 56% (14/25) of samples (Fig. 2I).
The NEC-like SMARCA4/ARID1A class (n = 33) mainly consisted of tumors that had been diagnosed as SNUCs, neuroendocrine carcinomas and olfactory neuroblastomas, but also single adenocarcinomas, poorly differentiated carcinomas and squamous cell carcinomas (Fig. 2J). In our reevaluation, we observed rosette-like histological features in 50% of cases (11/22; Fig. 2K). Furthermore, 60% of these tumors (12/20) showed weak staining of at least one neuroendocrine marker (NSE, Chromogranin, Synaptophysin or CD56) in the initial diagnostic workup. Summary copy number profiles revealed recurrent gain of chromosome 8q in up to 67% (22/33; Fig. 2L). Apart from their characteristic epigenetic profile, our initial review of the available sparse molecular data did not indicate recurrent or characteristic alterations. Further molecular analyses were thus performed.
Mass spectrometry-based proteomics
To identify characteristic protein expression profiles and potential cells of origins for specimens from the four SNUC classes, we performed mass spectrometry-based proteomics. Olfactory neuroblastomas, squamous cell carcinomas and normal sinonasal tissue were used as reference classes. Samples from the SMARCB1 class could not be included in this part of the study due to insufficient quantities of available tumor tissue.
T-SNE analysis of the most variably expressed proteins showed a pattern similar to that found by DNA methylation analysis (Fig. 3A). While normal tissue, squamous cell carcinomas and cases from the ACC tumor class were mostly assigned to distinct groups, the differentiation between olfactory neuroblastomas, cases from the NEC-like IDH2 and NEC-like SMARCA4/ARID1A class was less evident. As expected, differential expression analysis in comparison to normal sinonasal tissue revealed overexpression of classical neuronal proteins in olfactory neuroblastomas (e.g., ENO2). NEC-like IDH2 and NEC-like SMARCA4/ARID1A tumors also demonstrated strong overexpression of proteins specific for neurons or cells of the diffuse neuroendocrine system such as UCHL1, CRMP1 and ENO2 (Fig. 3B), strongly indicating a neuroendocrine differentiation for both tumor classes. This pattern was not seen in tumors from the ACC class. In contrast, cytokeratin 18 (KRT18) was strongly overexpressed in both NEC-like IDH2 and NEC-like SMARCA4/ARID1A cancers but not in olfactory neuroblastoma. This predicts UCHL1 and KRT18 to be a potentially valuable marker combination for the differentiation of olfactory neuroblastomas and the SNUC classes. We performed an immunohistochemical validation of this marker combination (Fig. 3C) and observed strong staining of KRT18 in all cases of the NEC-like IDH2 and NEC-like SMARCA4/ARID1A class, variable staining intensity in the ACC class and no staining in all investigated olfactory neuroblastomas and most squamous cell carcinomas (Fig. 3D). UCHL1 expression was high in NEC-like IDH2 and NEC-like SMARCA4/ARID1A tumors as well as olfactory neuroblastomas but absent in tumors from the ACC class and squamous cell carcinomas. We thus concluded that the combination of both markers could be of diagnostic value for tumor classification.
To identify potential cells of origin, differentially expressed proteins from all tumor classes in comparison to normal sinonasal tissue were subjected to overrepresentation analysis using cell type-specific gene sets from previously published single cell RNA sequencing data of the mucosal lining of upper and lower human airways (Fig. 3E)22,23. The neuroectodermal differentiation of specimens from the NEC-like IDH2 (FDR < 0.001) and NEC-like SMARCA4/ARID1A class (FDR < 0.001) as well as olfactory neuroblastomas (FDR < 0.001) was reflected in their similarity with pulmonary neuroendocrine cells (PNEC). Furthermore, olfactory neuroblastomas (FDR 0.004) and tumors from the NEC-like IDH2 group (FDR < 0.001) showed similarity to a class of Undefined Rare Cells, which likely represent progenitor cells of epithelial and neuroendocrine cells22. Tumors from the ACC class mostly resembled serous cells of submucosal glands (Serous; FDR 0.029). As expected, squamous cell carcinoma profiles were closely related to squamous cells (Squamous Cell 1; FDR 0.011).
Additionally, we performed a differential protein expression analysis comparing the ACC, NEC-like IDH2 and NEC-like SMARCA4/ARID1A tumor classes against each other. Protein lists were subjected to functional pathway analysis (Supplementary Fig. 7). Tumors from the NEC-like IDH2 class were enriched for several functional terms related to mitochondrial processes, including proteins related to the citric acid cycle. ACC class tumors showed evidence for alterations in MAPK-related signaling pathways while the few significant functional terms for cases from the NEC-like SMARCA4/ARID1A class were mainly associated with translational processes.
Mutational profiling of the NEC-like SMARCA4/ARID1A methylation class
As a clear driver for the NEC-like SMARCA4/ARID1A molecular tumor class was not apparent in the available retrospective data, we performed whole exome (n = 9) or NGS panel sequencing (n = 10) of 19 tumors from this group. We observed relatively low median tumor mutational burden with 3.7 mutations per megabase (Fig. 4A). High mutational rates were seen in genes involved in the formation of the SWI/SNF chromatin remodeling complex (14/19; 74%), including SMARCA4 (9/19; 47%) and ARID1A (7/19; 37%). Early clinical data suggests that patients with these alterations might benefit from treatment with PD1 inhibitors24. Notably, we observed one case of a young patient in which a SMARCA4 frameshift mutation (p.Q306Rfs*12) was detected in tumor and in adjacent normal tissue, suggesting a germline or mosaic origin for this mutation. Additional recurrent alterations in this tumor class comprised PIK3CA mutations (6/19; 32%), including classical hotspot mutations such as p.H1047 or p.E545, which are known predictive markers for treatment with PIK3 pathway inhibitors in breast cancer25,26. Other known pathogenic driver mutations included CTNNB1 (3/19; 16%), TP53 (3/19; 16%) and TSC2 (2/19; 11%).
Clinical implications of DNA methylation classes
To further evaluate the clinical importance of the DNA methylation-based classes, we compared disease-specific survival between the four SNUC classes (Fig. 4B). SMARCB1 class tumors were associated with significantly worse disease-specific survival compared to cases from the NEC-like IDH2 (p = 0.012), NEC-like SMARCA4/ARID1A (p < 0.001) or ACC class (p = 0.004). Best survival rates were seen in the NEC-like SMARCA4/ARID1A group, although there was no significant difference in comparison with the other two classes (vs. ACC: p = 0.163; vs. NEC-like IDH2: p = 0.168). In the ACC methylation class, we observed no significant difference in survival between tumors that were classified as adenoid cystic carcinoma or as SNUC by conventional histopathology (p = 0.5).
Machine learning classifier development
As described above, we used t-SNE and hierarchical clustering to define epigenetic classes. However, t-SNE is not a reliable tool to classify new cases as the position of individual data points can change over different iterations and highly depends on selected parameters and the composition of the cohort. For convenient and rapid classification of raw DNA methylation data in a diagnostic setting, we used the data from the reference set to develop a machine learning algorithm that assigns a given sample to one of the DNA methylation classes. We also implemented a supervised outlier detection designed to recognize and prevent the classification of samples with divergent DNA methylation profiles, such as distant metastases from other organs or entities that have not been included in the development of the classifier. For this, we collected a set of 8065 tumor and normal samples, covering eight different categories (e.g., adenocarcinoma) including 197 different exact diagnoses (e.g., colorectal adenocarcinoma). A subset of 400 cases from this cohort was used for the training of an additional Unknown class. The performance of the classifiers was validated using an independent test set consisting of 52 sinonasal tumors as well as the remaining 7665 non-sinonasal tumors.
To explore the most suitable technique for this classification task, we compared support vector machine27–29 and random forest30 machine learning algorithms, the latter being the current gold standard for DNA methylation-based tumor classification14,16,31. In a hypothetical diagnostic setting, the potential hazard of a false classification is higher than the hazard of an unsuccessful classification. While the random forest achieved higher sensitivity values, the results of the support vector machine were superior with regards to specificity and accuracy. Therefore, the support vector machine will be further described in detail. For comparison, the results of the random forest classification are shown in Supplementary Table 1.
We evaluated the performance of the classifier using three different metrics. The algorithm demonstrated a high specificity of 0.982 (7524/7665) to correctly assign non-sinonasal tumor specimens to the Unknown class. We observed some variation in the specificity in different categories of the non-sinonasal test set (Fig. 5A). The lowest values were observed in salivary gland tumors (0.763) and the highest values in brain tumors (1.0) as well as normal tissue (1.0). Of note, 107 of the 197 exact non-sinonasal diagnoses (55.2%) were exclusively present in the test and not in the reference set. The classifier achieved only slightly higher specificities for diagnoses that were included in both sets (6,402/6,492; 0.986) compared to diagnoses that were exclusive to the test set (11,22/1,173; 0.957), demonstrating its reliability to recognize unseen data types. The overall sensitivity to identify primary sinonasal tumors was 0.904 (47/52; Fig. 5B). In 39 of the 47 sinonasal tumor specimens (83%), the DNA methylation-based classification confirmed the initial histopathological diagnosis. Two SCCs were assigned to the LECA and the NUT DNA methylation class and the molecular classification was confirmed by positive EBV-encoded RNA (EBER) in-situ hybridization and positive RNA-based NUTM1 fusion analysis, respectively. Furthermore, the sinonasal tumor set also contained six SNUC specimens. Five of these were classified as NEC-like IDH2 and subsequent mutational analysis revealed the presence of an IDH2 R172 mutation in all cases. The remaining SNUC specimen was assigned to the NEC-like SMARCA4/ARID1A class and DNA sequencing confirmed a truncating SMARCA4 mutation (p.Q611*). Furthermore, all six samples showed strong expression of UCHL1. Thus, the molecular workup confirmed the DNA methylation-based diagnosis in all reclassified cases. The classifier, therefore, achieved an accuracy of 1.0 on the sinonasal validation cohort and lead to a revision or refinement of the initial diagnosis in 17% of cases (Fig. 5C, D).
A web platform which provides convenient access to the classification algorithm can be accessed at www.aimethylation.com.
Discussion
In this study, we provide a resource of DNA methylation profiles from a diverse cohort of sinonasal tumors and present a machine learning algorithm for a robust classification of these diagnostically challenging tumors. Using DNA methylation profiling, DNA sequencing, copy number analysis and mass spectrometry-based proteomics, we show that tumors with SNUC morphology are not as undifferentiated as their current terminology suggests, but rather consist of four different molecularly distinct entities.
A cohort of the clinically relevant spectrum of sinonasal tumors and associated neoplasms was surveyed for DNA methylation classification. In line with previous studies from other fields, we were able to demonstrate that most established tumor entities show characteristic DNA methylation signatures which can be used for reliable clinical classification and differentiation14–18,32. While earlier studies only covered a fraction of the sinonasal cancer spectrum with rather limited total case numbers, the current study includes the whole spectrum of diagnostically relevant tumor classes and has clearly increased the total numbers of samples. This allowed to identify previously unrecognized DNA methylation-based tumor classes among sinonasal tumors. During the assembly of the reference cohort, 34 specimens were excluded, as they could not be stably assigned to an epigenetic tumor class. There are several aspects that could explain this. First, some of the excluded cases encompassed diagnoses with insufficient number of cases to form a separate and stable class (e.g., biphenotypic sinonasal sarcoma). These entities could be included in future versions of the classifier, if additional cases can be acquired. Second, slightly divergent DNA methylation profiles could also be caused by array quality or by technical variations between different analyses. Third, some of these cases could also be of non-sinonasal origin, such as advanced tumors from neighboring anatomic regions (e.g., tumors originally arising from the palate or brain) with continuous infiltration of sinonasal structures or distant metastases from an unrecognized primary site. Of note, none of these tumors were used for the development of the classifier as they did not correspond to a stable epigenetic class and were therefore excluded from further analyses. Finally, other cases of these non-clustering samples could correspond to hitherto unrecognized, even rarer tumor classes that require additional investigation. The last two points could also explain the large proportion of neuroendocrine carcinomas and SNUCs in the noise point category. Non-sinonasal tumors would be prone to be histologically classified as SNUCs due to their unusual morphology. Furthermore, expression of neuroendocrine markers is not uncommon in advanced and potentially dedifferentiated carcinomas, making the classification as neuroendocrine carcinoma more likely. To facilitate the further characterization of potentially unrecognized classes, we provide the unprocessed DNA methylation data for these cases along with the data of the reference cohort.
Using an independent test set, we were able to show that the DNA methylation-based classification algorithm can reliably subtype samples with SNUC morphology without the need for additional molecular testing. Furthermore, the classifier correctly reclassified two samples initially diagnosed as sinonasal squamous cell carcinomas as lymphoepithelial carcinoma and NUT midline carcinoma, respectively. This reclassification is of profound clinical importance, as lymphoepithelial carcinomas show improved response to radiotherapy while NUT midline carcinomas are associated with very poor prognosis. We also describe the implementation of a supervised outlier detection for enhancing DNA methylation-based tumor classification. Machine learning algorithms typically assign anomalous profiles to the next, most similar class, potentially leading to spurious classification results. In the context of DNA methylation-based classification, entities that have not been used for the training of the algorithm such as distant metastases from other organs would thus either go unnoticed or be assigned to a wrong class. Crucially, both errors can be avoided by incorporating outlier detection in the machine learning pipeline. Although this approach may slightly compromise sensitivity, it reduces the risk of misclassifying non-sinonasal tumors, which is crucial for application in a potential diagnostic setting. Copy number profiles derived from DNA methylation data revealed different recurrent copy number alterations between the tumor classes. This information may provide additional confidence to the DNA methylation-based classification, although the sensitivity and specify of these alterations for classificatory purposes seem limited.
A further focus of our study was to investigate different DNA methylation signatures and molecular alterations in the diagnostically highly challenging group of SNUCs. Our results indicate that what is currently summarized as SNUCs likely represent a heterogeneous group of tumors comprising at least four different molecular classes with different molecular drivers and different clinical course. A summary of the most important characteristics of the four classes is provided in Fig. 6.
Specimens from the NEC-like IDH2 class were characterized by IDH2 mutations and highly recurrent copy number alterations. Similar to acute myeloid leukemia and gliomas, IDH2 mutations induce a CpG island hypermethylation phenotype in sinonasal carcinomas, resulting in a highly distinct DNA methylation signature and significant hypermethylation of various tumor-related genes33. The value of IDH2-specific inhibitors in the treatment of patients with sinonasal tumors is currently unknown34. In line with previous reports, patients with sinonasal tumors with IDH2 mutations have a comparably favorable prognosis35.
Most cases from the NEC-like SMARCA4/ARID1A class showed SMARCA4 or ARID1A mutations, which are part of the SWI/SNF chromatin remodeling complex. This also included one case with a SMARCA4 loss of function mutation in tumor-free normal tissue, potentially representing a germline or mosaic mutation. Germline mutations of SMARCA4 have previously been described to be associated with Rhabdoid tumor predisposition syndrome 2, leading to highly aggressive and early-onset tumors such as small cell carcinoma of the ovary, hypercalcemic type36. Although our data is clearly limited in this aspect, our findings suggest that NEC-like SMARCA4/ARID1A tumors may also occur in the context of tumor predisposition syndromes. Furthermore, we also observed a remarkably high rate of activating and potentially actionable PIK3CA mutations in almost one-third of NEC-like SMARCA4/ARID1A tumors. Comparably high mutation rates of PIK3CA are observed in breast carcinomas but have so far not been detected in other types of cancer26. There have been previous studies which observed activating PIK3CA mutations in SNUCs at low frequency, which indicates that these alterations are likely enriched in NEC-like SMARCA4/ARID1A tumors8,11. Functional studies or clinical trials will be required to evaluate whether these tumors may be responsive to treatment with PIK3 pathway inhibitors. Overall, the prognosis of patients from this tumor class is relatively favorable and comparable to IDH2 mutated tumors.
In mass spectrometry-based proteomics, we observed relatively similar global protein expression profiles for NEC-like IDH2 and NEC-like SMARCA4/ARID1A tumors. In both classes, we identified overexpression of several proteins that are specific to neurons or cells of the diffuse neuroendocrine system, and gene set enrichment analysis also indicated a high similarity with neuroendocrine cells. The identified markers are not routinely established in histopathological laboratories, ENO2 being a possible exception, and have therefore not been investigated in previous studies or used in routine diagnostics. Importantly, routinely used diagnostic markers such as chromogranin A or synaptophysin were not among the highly enriched markers in our proteomics analysis and may thus fail to identify the neuroendocrine differentiation of these cancers. This might explain why a substantial proportion of tumors from this class were diagnosed as SNUCs or even as adenocarcinomas or squamous cell carcinomas. It should further be mentioned that in our extensive reference cohort of sinonasal tumors no other class of neuroendocrine carcinomas could be identified. Based on our findings, we therefore, propose that these tumors should be regarded as “neuroendocrine carcinoma related”, either characterized by IDH2 mutations (neuroendocrine carcinoma-like, IDH2 mutant) or recurrent SMARCA4/ARID1A alterations (neuroendocrine carcinoma-like, SMARCA4/ARID1A enriched). However, it must be noted that this concept may change the treatment of patients with tumors of SNUC morphology. Therefore, a careful clinical evaluation and confirmation in further studies is crucial before drawing any clinical conclusions from our study.
With regards to routine histopathological workup, we identified KRT18 in combination with UCHL1 as potential immunohistochemical markers to differentiate NEC-like IDH2 and NEC-like SMARCA4/ARID1A tumors from adenoid cystic carcinomas, olfactory neuroblastomas and squamous cell carcinomas. The combinational use of these markers could be of high diagnostic value when DNA methylation analysis is not available or not feasible.
For tumors from the NEC-like IDH2 class, functional analysis of proteomic data revealed alterations in mitochondrial processes, including the citric acid cycle. This is in line with the well-known oncogenic mechanism of mutated IDH1/2, disrupting the citric acid cycle which is located in the inner mitochondrial membrane and producing the oncometabolite 2-hydroxyglutarate37. For NEC-like SMARCA4/ARID1A tumors, we observed a general association with translational processes, however, no specifically disrupted pathways were observed.
Cases from the SMARCB1 class were characterized by SMARCB1 deficiency, which has recently been identified among SNUCs10,19. Our data further substantiates that sinonasal tumors with this alteration represent a distinct entity, including a broad range of histological morphologies and should therefore be identified by molecular testing. Although we were not able to include tumors from this class in our proteomics study due to insufficient quantities of tumor tissue, we did not detect immunohistochemical expression of the neuroendocrine and neuronal markers that were upregulated in NEC-like IDH2 and NEC-like SMARCA4/ARID1A tumors. This suggests a different cell of origin for these cancers and further studies are required to clarify their origin. Interestingly, SMARCB1-deficient sinonasal carcinomas show a remarkable epigenetic similarity to adult sellar atypical teratoid/rhabdoid tumors, although they tended to aggregate slightly separate in t-SNE analysis. It remains unclear if this is due to batch effects or if these tumors actually represent two distinct tumor types sharing the same driving alteration.
Samples from the ACC class shared the molecular profile (DNA methylation, MYB rearrangement, recurrent loss of chromosome 6q38) of adenoid cystic carcinomas and most likely represent high-grade adenoid cystic carcinomas. In several such tumors, we also detected focal adenoid cystic differentiation on histological reexamination. In addition, mass spectrometry-based proteomics revealed similarities of these tumors with serous cells of submucosal glands, further supporting the reclassification. Functional analysis of proteomic data revealed evidence of MAPK-pathway activation as a key mechanism which is in line with other reports39,40. Previous studies recognized that solid variants of adenoid cystic carcinomas can be mistaken for SNUCs and that close histomorphological investigation and adequate sampling is crucial41. A major benefit of DNA methylation-based classification is that it does not require the analysis of a tumor area with a certain differentiation or growth pattern. Therefore, a classification is also possible if only high-grade tumor areas are available (e.g. in smaller biopsy specimens or partial resections).
The findings of our study come with some limitations. While numerous studies showed that DNA methylation is a very reliable tool for tumor classification, the underlying biological mechanisms remain relatively unclear. In our study, the CpGs relevant for classification showed a very similar distribution over chromosomes and functional gene regions compared to the overall array design. Furthermore, most relevant CpGs were located in the gene body. The regulatory effect of DNA methylation in these regions is only poorly understood and interpretation is not straightforward.
Second, the main goal of our proteomic analysis was to identify potential diagnostic markers and cells of origin. While the selected LFQ approach was suitable to accomplish these tasks, mechanistic analyses focusing on less abundant signaling pathway molecules would profit from more sensitive approaches such as data-independent acquisition (DIA) or tandem mass tag (TMT) labeling as well phosphoproteomic profiling. Therefore, the results from our functional analyses should be interpreted with caution and should be further investigated in future studies.
Third, we did not perform central histopathological review of the cases included in this study. Therefore, the quality of the given conventional diagnoses might differ between the providing institutions due to different expertise in the diagnosis of sinonasal tumors.
Furthermore, our outcome analysis should be interpreted with caution, as there was only very limited data available on other outcome associated clinical factors such as local tumor stage or metastatic stage.
In summary, we provide a DNA methylation-based algorithm, which could serve as a valuable tool in the diagnosis of sinonasal tumors, preventing misclassifications and supporting the workup of challenging cases. In addition, we clarify the molecular heterogeneity of tumors with SNUC morphology. We demonstrate that tumors with SNUC morphology can be segregated to four distinct tumor types, including (1) sinonasal neuroendocrine carcinoma-like, IDH2 mutant, (2) sinonasal neuroendocrine carcinoma-like, SMARCA4/ARID1A enriched, (3) sinonasal carcinoma, SMARCB1 altered and (4) poorly differentiated adenoid cystic carcinoma.
Methods
Ethics statement
This research project has been approved by the ethics committee of the Charité – Universitätsmedizin Berlin. Retrospective investigation of left-over diagnostic samples for research purposes was covered by the general treatment agreement of the respective hospitals. No compensations were provided.
Statistics & Reproducibility
Statistical analysis was performed in RStudio Version 1.3.1093. For the reference cohort, minimum sample size per histopathological entity was set at 6, similarly to previously published work14. Detailed parameters for exclusion of low-quality samples and cases that did not correspond to a stable DNA methylation class are listed in the DNA methylation analysis section. The investigators were not blinded to allocation during experiments and outcome assessment, but the algorithms that assigned cases to DNA methylation classes were agnostic to the conventional histopathological diagnosis.
Sinonasal reference cohort
For the identification of DNA methylation-based tumor classes and the development of corresponding machine learning classifiers, we compiled a reference cohort of 495 samples from sinonasal tumors (Supplementary Data 2). 271 formalin-fixed and paraffin embedded (FFPE) tissue specimens were retrieved from the archives of the Institutes of Pathology or Neuropathology at the University Hospitals Basel, Berlin, Frankfurt am Main, Gießen, Göttingen, Hamburg, Heidelberg, Marburg, München, Münster, Naples, Oviedo, Lübeck, Stanford and Tübingen. The conventional histopathological diagnosis was taken from the original histology report of the providing center or the associated metadata if the samples was derived from a previously published study. Specimens were not reviewed centrally prior to inclusion. Normal sinonasal tissue samples were retrieved from independent patients undergoing sinonasal surgery due to non-neoplastic conditions. All samples were histologically evaluated and confirmed to be free of tumor before DNA extraction. Raw IDAT files from an additional 190 samples from previously published studies were retrieved from public repositories or provided by the authors10,14,16,19,21. 32 samples were excluded after quality control, including 20 samples with poor DNA methylation analysis quality metrics as well as 12 specimens with low tumor cell content. 429 cases were used for subsequent analyses. Access to FFPE blocks to reproduce or validate the findings described in this manuscript can be obtained if sufficient material is left for further analyses.
Test set
An additional, independent cohort of 52 sinonasal tumors was compiled as a test set for the validation of the machine learning classifiers (Supplementary Data 3). The samples in this cohort were neither used in the identification of methylation classes nor in the development of the classifiers, nor for dimensionality reduction.
Cohort of non-sinonasal tumors
For the implementation of an outlier detection, we compiled a cohort of 8104 tumor and normal tissue samples covering 197 different diagnoses which we further grouped into eight categories. Raw DNA methylation data in form of IDAT files were retrieved from publicly available repositories as well as our own analyses from other research projects14,16. In a quality control, 39 samples were excluded from further analysis. The final cohort was randomly split in two cohorts and its samples either used for the development or the evaluation of the classifiers. All samples included in the non-sinonasal tumor cohort are listed in Supplementary Data 4.
Immunohistochemistry
Immunohistochemical staining was performed on the BenchMark XT (Ventana) automated slide stainer according to the manufacturer’s instructions. Sections were incubated with primary antibody against UCHL1 (clone 13C4, dilution 1:1000, abcam, United Kingdom, catalog number ab8189) and KRT18 (clone DC-10, dilution 1:1000, BioGenex, USA, catalog number AM143-5M). Antibodies were validated using adequate positive controls, including human neural tissue for UCHL1 and human cancer tissue for KRT18. Expression was scored using an H-score which was calculated by multiplying the staining intensity (0: no staining; 1: weak staining; 2: moderate staining; 3: strong staining) by the respective percentage of tumor cells42.
In-situ hybridization
Fluorescence in-situ hybridization (FISH) was performed as described previously43 using the MYB Dual Color Break Apart Probe (Zytovision). In brief, 4 µm sections were deparaffinized dehydrated and incubated in pretreatment solution (Dako, Denmark) at 95–99 °C for 10 min. Following immersion in pepsin solution for 3–6 min at 37 °C, slides were washed, dehydrated and air dried. DNA probes were applied and the sections were sealed and denaturalized in humidified atmosphere at 82 °C for 5 min. Sections hybridized at 45 °C overnight. After washing, slides were counterstained with 4′,6-diamidino-2-phenylindole (DAPI).
Silver-enhanced in-situ hybridization for EBV analysis was done using the BOND Epstein-Barr virus-encoded small RNA (EBER) Probe (Leica) on the BOND-MAX automated slide stainer (Leica).
DNA extraction
Representative tumor areas were identified using light microscopy of hematoxylin and eosin-stained sections. Semi-automated DNA extraction was performed on the Maxwell RSC Instrument using the Maxwell RSC FFPE Plus DNA Purification Kit (Custom, AX4920; Promega). Extracted total DNA quantities were measured using the Qubit™ HS DNA Assay (Thermo Fisher Scientific).
DNA methylation analysis
We used the Illumina Infinium HD FFPE DNA Restore Kit for DNA restoration of FFPE samples. Subsequent bisulfite conversion was performed using the EpiTect Bisulfite Kit (Qiagen). The bisulfite-converted DNA was analyzed using the Illumina Infinium HumanMethylation450 or MethylationEPIC BeadChip.
Raw DNA methylation data were processed in RStudio Version 1.3.1093 using the minfi package44. The pfilter (with perc = 5) and rmSNPandCH functions from the wateRmelon and DMRcate packages were used to exclude low-quality samples and to filter CpGs with low quality, reported cross-reactivity or association with SNPs or sex chromosomes45,46. The 20,000 most variant CpG sites were selected for further analysis. The combineArrays function of the minfi package was used to merge EPIC and 450k data. T-SNE was done using the RTSNE package, using a perplexity of 20 and 4000 iterations47. Density-based spatial clustering of applications with noise (DBSCAN) with the minPts parameter set at 6 was used to determine the optimal number of classes based on t-SNE coordinates and to assign individual cases to their respective class. Cases that were labeled as noise points were excluded from further analysis. Comparison of the number of classes and the assignment of the non-outlier cases to these classes revealed no differences before and after exclusion of the outlier samples. Robustness of tumor classes derived from t-SNE analysis was tested using iterative random down-sampling to 80% of the total cohort, as described previously14. The Pearson’s correlation coefficient of the x and y coordinates for all samples were calculated after 300 iterations. Tumor purity was estimated using the predict_purity_betas function48.
Classifier development
We developed two separate machine learning classifiers based on a support vector machine and a random forest model that predict the tumor class of sinonasal tumor samples from their DNA methylation profile. In addition to these classes, a single non-sinonasal class for other tumor entities was introduced to detect outliers. Outlier detection has several modes of operation, namely unsupervised (where no outlier labels are required), semi-supervised (where only a few outlier labels are available) and supervised outlier classification. The latter was used in this study to distinguish sinonasal tumors from all other non-sinonasal tumors49.
The models were developed on a training set composed of all samples from the sinonasal reference cohort (n = 395) and 5% of the samples of each category from the non-sinonasal cohort (n = 400), which were randomly selected. This resulted in a combined dataset of 795 samples.
On this combined training set, the optimal hyperparameters for both model types were then determined in a grid search by minimizing the class-balanced multinomial cross-entropy loss in a five-fold cross-validation with stratified sampling. A dimension reduction to the 20,000 most variant CpG sites was performed on each training set of the cross-validation and applied to the respective validation fold. The final models were then retrained on the full training set with the selected hyperparameters.
For the development of support vector machine models, we used the R package e1071. Linear and radial basis function kernels, gamma values of γ = 2−3,…,3 / 20,000, and cost parameters of C = 20,…,5 were considered as possible hyperparameters. Random forest models were trained with the R package randomForest, using the number of trees ntree=500, 1000 and mtry=2−5,…,5 x sqrt(20,000) as hyperparameters. Further, both models were configured to return scores for each methylation class.
In order to make these scores more readily interpretable as probabilities, we developed calibration models based on ridge-penalized multinomial logistic regression, resembling previously described procedures but accounting for the challenge of a lower number of samples here31. In detail, we used the cv.glmnet function of the R package glmnet on the scores on the training set resulting from the previous cross-validation which correspond to the selected hyperparameters. For prediction of the calibrated scores, the λ parameter with minimum mean cross-validated error was chosen. The class with the highest calibrated score was then determined as the final prediction for each sample.
The resulting classification procedure was then evaluated on the sinonasal test cohort (n = 52) and the samples from the non-sinonasal cohort that had not been included in the training set before (n = 7,665). In order to assess the outlier detection, all predictions of sinonasal tumor classes were retrospectively combined in one class and sensitivity and specificity were computed for the binary differentiation of sinonasal and non-sinonasal samples (outlier detection specificity and outlier detection sensitivity). For the evaluation of the sinonasal methylation class prediction, only samples from the sinonasal test cohort without classification as Unknown were considered (sinonasal accuracy). Five repetitions of the classifier development and evaluation led to similar results as with our final classifier, confirming the stability of the procedure.
Copy number analysis
Genome-wide copy number profiles were generated from raw DNA methylation data using a modified version of the conumee package50,51.
Mass spectrometry-based proteomics
Sufficient tissue for mass spectrometry-based proteomics was available for 66 cases, including 59 tumor samples and seven normal sinonasal tissue specimens.
Representative 1.0 or 1.5 mm punch biopsy needle tissue cores were subjected to sonication using a Covaris LE220Rsc Focused-ultrasonicator (250 W, 50% duty cycle, 3 rounds with incubation at 80 °C for 1 h/ 95 °C for 30 min between the rounds) in a denaturing buffer containing 1.5% SDS and 2.5 mM DTT in 25 mM Tris, pH 8.0. Lysates were separated from cell debris and remaining paraffin by centrifugation at 20.000 x g and manual removal of the top layer. This was followed by determination of the protein concentrations using BCA assay and subsequent sample preparation using an automated SDS-SP3 digestion and clean-up protocol52,53. Using a Bravo Automated Liquid Handling Platform (Agilent Technologies, Santa Clara, USA), the proteins were alkylated using iodoacetamide (IAA) for 30 minutes and blocked with an excess of 1,4-dithiothreitol (DTT). They were then bound to a 1:1 mixture of hydrophilic and hydrophobic magnetic beads at a high ACN (acetonitrile) concentration (>70%) and a beads to protein ratio of 10:1 by weight (10 µg beads:1 µg protein). After washing of the beads with 70% ethanol, the proteins were digested in solution in 50 mM HEPES (pH 8) with trypsin and LysC (enzyme-to-substrate ratio 1:50) overnight at 37 °C. The eluted peptides were then acidified using 100% formic acid (final concentration 1%) and desalted using AssayMAP tips on the Bravo robot. The final peptide concentration was determined by BCA assay.
Mass spectrometric data acquisition was performed on a Q Exactive HF-X instrument coupled to an easy nanoLC 1200 system (Thermo Scientific, Bremen, Germany). One microgram of peptides was injected per run and the separation was performed using an in-house packed reverse-phase column (20 cm, 1.9 µm beads, ReproSil Pur, Dr. Maisch GmbH) with a 110 min gradient from 3% to 60% (v/v) ACN, 0.1% (v/v) formic acid in water. The Q Exactive HF-X was operated in data-dependent mode with 60 K MS1 resolution, 3 x 106 ion count target and maximum injection time of 10 ms, followed by 20 MS2 scans with 45 K resolution, 1 x 105 ion count target, and maximum injection time of 86 ms.
Raw data were processed using the MaxQuant software Version 1.6.17.0 and the human reference proteome (UP000005640, downloaded 01/2019)54. For the database searches, Oxidation (M) and acetylation (N-term) were included as variable modifications; carbamidomethyl cysteine was included as a fixed modification. Peptides of a minimum length of seven amino acids were included in the search. The FDR was set to 0.01 for peptide and protein identifications. The Match-Between-Runs (MBR) feature was used for the analysis. We excluded proteins that were flagged by MaxQuant in the Reverse and Only identified by site column as well as proteins with less than three peptides. Contaminants of non-human proteins were identified manually and removed from the data set. After excluding samples with less than 1000 detected proteins, 51 specimens remained for further analysis. Further downstream analysis was performed using R Studio using base functions and the lmfit and eBayes function from the limma package to perform differential expression analysis for groupwise comparisons comparing the different tumor classes to normal sinonasal tissue55. Proteins with a log2 fold change >1.5 and an FDR < 0.05 were considered as differentially expressed. The list of significantly overexpressed genes in each tumor class compared to normal sinonasal tissue was subjected to overrepresentation analysis using the WebGestaltR package. All identified proteins that remained after filtering as described above were used as a reference gene list. Two gene sets of normal respiratory cell types were used to identify potential cells of origin for the respective tumor classes22,23. The WebGestaltR pipeline uses the Fisher’s exact test to test for significance and the FDR method was used to adjust for multiple testing.
Functional analysis of proteomic data was performed using Cytoscape Version 3.9.1 and ClueGO Version 2.5.9. Differentially expressed proteins between the three proposed SNUC subtypes ACC, NEC-like IDH2 and NEC-like SMARCA4/ARID1A were used as input markers. Functional analysis of Reactome pathway terms was performed using a hypergeometric test, followed by Bonferroni step down correction. The results were visualized as functionally grouped networks using prefuse force directed layout.
T-SNE plots were generated as described above using a perplexity of 10 and 2000 iterations.
DNA panel sequencing
The Ion AmpliSeq Library Kit 2.0 (Thermo Fisher Scientific) was used to perform library preparation of 10 ng of genomic DNA using the Ion AmpliSeq Cancer Hotspot Panel v2 (Thermo Fisher Scientific, catalog number 4475346). The final library was quantified with the Ion Library Quantitation Kit (Thermo Fisher Scientific). Samples were multiplexed and amplified on Ion Spheres Particles with Ion 540 Kit-Chef and were sequenced using Ion 540 Chip (Thermo Fisher Scientific) with an adapted standard protocol56.
Selected samples were processed using the TruSight Oncology 500 Panel (Illumina, catalog number 20028214). 100 base pairs were sequenced in paired-end mode on an Illumina NextSeq 550 machine. The raw data was demultiplexed and analyzed using the TruSight Oncology 500 v2.2 Local App Docker. Briefly, demultiplexed reads were alignment to the GRCh37 (hg19) genome using the Burrows-Wheeler Aligner, mapped reads were collapsed, re-aligned and stitched. Pisces software was used for somatic variant calling.
DNA exome sequencing
DNA exome sequencing was performed at the CeGaT laboratory (Tübingen) using the Twist Human Core Exome Plus Kit (Twist Bioscience, catalog number 102027) on a NovaSeq 6000 sequencer (Illumina) to generate 2 × 100 bp reads. Sequence data were aligned to GRCh37 (hg19) genome using the Burrows-Wheeler-Aligner (Version 0.7.17)57. Somatic variants were called in comparison to matched normal tissue, requiring a coverage of at least 30 in both sequencings as well as an allele frequency of at least 0.05 in the tumor specimen. Tumor mutational burden was calculated using the parameters established for the Illumina TSO500 panel58.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We gratefully acknowledge the excellent technical assistance of Peggy Wolkenstein, Ines Koch, Daniel Teichmann and Carola Geiler. We thank Thomas Cramer, Damian Rieke and Konrad Klinghammer for providing clinical information on single patients. Parts of Fig. 6 were created with BioRender.com. The results published here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga. This work was supported by the German Ministry of Education and Research (BMBF), as part of the National Research Node Mass spectrometry in Systems Medicine (MSCoresys), under grant agreement 031L0220B (to P.M.) and 031L0220A (to F.K.). Part of this study was further supported by the Deutsche Forschungsgemeinschaft (DFG – German Research Foundation) under grant agreement SFB 1449 Dynamic Hydrogels (projects C03, Z01; to P.M.). This study was in part further supported by the Berliner Krebsgesellschaft (JUFF201917; to P.J. and D.C.). Additional funding to D.C. was provided by the German Cancer Consortium (DKTK), partner site Berlin. P.J. was participant in the BIH-Charité Digital Clinician Scientist Program funded by the Charité – Universitätsmedizin Berlin, the Berlin Institute of Health (BIH) and the German Research Foundation (DFG) and is further supported by the Medical & Clinician Scientist Program (MCSP) of the LMU Munich. S.G. was awarded a medical doctoral research stipend for this project by the Berlin Institute of Health (BIH). K.R.M. gratefully acknowledges partial funding by the German Ministry for Education and Research under Grant 01IS14013A-E, Grant 01GQ1115, Grant 01GQ0850, as BIFOLD (ref. 01IS18025A and ref. 01IS18037A) and Patho234 (ref. 031LO207), the Institute of Information & Communications Technology Planning & Evaluation (IITP) grants funded by the Korea Government Grants 2017-0-00451 and 2019-0-00079.
Author contributions
P.J., S.G., P.M., F.K. and D.C. designed the study. Samples were provided by C.D., A.J., S.S., S.S., H.B., P.H., B.E., S.F., J.H., W.P., M H., W.H., H.D., U.K., P.J., C.D., C.S., F.B., A.R., A.W., J.R.-I., S.P., C.I., L.C., R.D.M., A.M., U.S., J.L., V.J.L., M.F., M.L., S.L.-G., M.H., P.D.J., A.A., A.K., F.H., A.v.D., M.S., E.F., B.E.H., P.J. and D.C. DNA methylation analysis was performed by L.H.M. and E.P.C. DNA sequencing was performed by I.H., C.V. and A.L. Proteomic analysis was done by R.R., C.F., R.F., M.H. and P.M. Computational analysis was performed by P.J., R.R., M.L., P.K., A.T., D.H., M.B., P.S. and K.R.M. P.J. and D.C. supervised the project. Resources were provided by M.H., S.P., D.TW.J., M.S., A.v.D., F.H. and D.C. Funding was acquired by P.J., K.R.M., P.M., F.K. and D.C. P.J., R.R., M.L., F.K. and D.C. wrote the original draft of the manuscript. All authors were involved in reviewing and editing of the final manuscript.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Data availability
Raw DNA methylation data of all samples that have been collected for this study have been deposited in GEO under the accession GSE196228. Processed proteomics data are available at FigShare (10.6084/m9.figshare.17144639). Due to privacy concerns, raw DNA sequencing and raw proteomics data cannot be made publicly available. Instead, this data has been deposited under controlled access in the European Genome Phenome Archive (EGA) under the accession numbers EGAS00001006712 (proteomics data; https://ega-archive.org/studies/EGAS00001006712) and EGAS00001006713 (DNA sequencing data; https://ega-archive.org/studies/EGAS00001006713). Requests for access should be addressed to the Data Access Committee of the Institute of Pathology LMU Munich (DAC@aimethylation.com). The time for response from the authors to applications will be within one month. All requests will be reviewed by the legal and data protection department of the LMU Munich. The following restrictions apply: (1) a data sharing agreement must be signed between the corresponding author and the data processor; (2) data will only be shared for scientific, non-commercial purposes; (3) the data processor must comply with the General Data Protection Regulation (GDPR) of the European Union, alternatively, they have to establish a data privacy policy that is adequate in the sense of the GDPR which will be assessed by the data protection department of the LMU Munich; (4) the data processor must delete all shared data after the investigation; (5) data must not be shared with any third party or individuals who are not authorized to access the data. For part of the study, publicly available data was retrieved from the TCGA database (https://www.cancer.gov/tcga). All other data is provided within the Supplementary Information and Supplementary Data files.
Code availability
The code to reproduce the main analyses presented in this manuscript is available on FigShare (10.6084/m9.figshare.17144639)59.
Competing interests
D.C. and A.v.D are listed as inventors on the patent application ‘DNA-methylation based method for classifying tumor species’ (PCT/EP2016/055337) filed by Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts and Ruprecht-Karls-Universität Heidelberg. All other authors declare no conflicts of interest.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Frederick Klauschen, David Capper.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-34815-3.
References
- 1.Virk JS, et al. Sinonasal cancer: an overview of the emerging subtypes. J. Laryngol. Otol. 2020;134:191–196. doi: 10.1017/S0022215120000146. [DOI] [PubMed] [Google Scholar]
- 2.Houston GD, Gillies E. Sinonasal Undifferentiated Carcinoma. Adv. Anat. Pathol. 1999;6:317–323. doi: 10.1097/00125480-199911000-00002. [DOI] [PubMed] [Google Scholar]
- 3.Mehrad M, Chernock RD, El-Mofty SK. Diagnostic Discrepancies in Mandatory Slide Review of Extradepartmental Head and Neck Cases: Experience at a Large Academic Center. Arch. Pathol. Lab Med. 2015;139:1539–1545. doi: 10.5858/arpa.2014-0628-OA. [DOI] [PubMed] [Google Scholar]
- 4.Stelow EB, Bishop JA. Update from the 4th Edition of the World Health Organization Classification of Head and Neck Tumours: Tumors of the Nasal Cavity, Paranasal Sinuses and Skull Base. Head. Neck Pathol. 2017;11:3–15. doi: 10.1007/s12105-017-0791-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.López-Hernández A, et al. Genetic profiling of poorly differentiated sinonasal tumours. Sci. Rep. 2018;8:3998. doi: 10.1038/s41598-018-21690-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Franchi A. An Update on Sinonasal Round Cell Undifferentiated Tumors. Head. Neck Pathol. 2016;10:75–84. doi: 10.1007/s12105-016-0695-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Agaimy A, et al. Sinonasal Undifferentiated Carcinoma (SNUC): From an Entity to Morphologic Pattern and Back Again—A Historical Perspective. Adv. Anat. Pathol. 2020;27:51–60. doi: 10.1097/PAP.0000000000000258. [DOI] [PubMed] [Google Scholar]
- 8.Mito JK, et al. Immunohistochemical Detection and Molecular Characterization of IDH-mutant Sinonasal Undifferentiated Carcinomas. Am. J. Surg. Pathol. 2018;42:1067–1075. doi: 10.1097/PAS.0000000000001064. [DOI] [PubMed] [Google Scholar]
- 9.Agaimy A, Jain D, Uddin N, Rooper LM, Bishop JA. SMARCA4-deficient Sinonasal Carcinoma. Am. J. Surg. Pathol. 2020;44:703–710. doi: 10.1097/PAS.0000000000001428. [DOI] [PubMed] [Google Scholar]
- 10.Dogan S, et al. DNA methylation-based classification of sinonasal undifferentiated carcinoma. Mod. Pathol. 2019;32:1447–1459. doi: 10.1038/s41379-019-0285-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jo VY, Chau NG, Hornick JL, Krane JF, Sholl LM. Recurrent IDH2 R172X mutations in sinonasal undifferentiated carcinoma. Mod. Pathol. 2017;30:650–659. doi: 10.1038/modpathol.2016.239. [DOI] [PubMed] [Google Scholar]
- 12.Lokk K, et al. DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns. Genome Biol. 2014;15:3248. doi: 10.1186/gb-2014-15-4-r54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen Y, Breeze CE, Zhen S, Beck S, Teschendorff AE. Tissue-independent and tissue-specific patterns of DNA methylation alteration in cancer. Epigenet Chromatin. 2016;9:10. doi: 10.1186/s13072-016-0058-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Capper D, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555:469–474. doi: 10.1038/nature26000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Jurmeister P, et al. Machine learning analysis of DNA methylation profiles distinguishes primary lung squamous cell carcinomas from head and neck metastases. Sci. Transl. Med. 2019;11:eaaw8513. doi: 10.1126/scitranslmed.aaw8513. [DOI] [PubMed] [Google Scholar]
- 16.Koelsche C, et al. Sarcoma classification by DNA methylation profiling. Nat. Commun. 2021;12:498. doi: 10.1038/s41467-020-20603-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hackeng WM, et al. Genome Methylation Accurately Predicts Neuroendocrine Tumor Origin: An Online Tool. Clin. Cancer Res. 2021 doi: 10.1158/1078-0432.ccr-20-3281. [DOI] [PubMed] [Google Scholar]
- 18.Leitheiser M, et al. Machine Learning Models Predict the Primary Sites of Head and Neck Squamous Cell Carcinoma Metastases Based on DNA Methylation. J. Pathol. 2021 doi: 10.1002/path.5845. [DOI] [PubMed] [Google Scholar]
- 19.Capper D, et al. DNA methylation-based reclassification of olfactory neuroblastoma. Acta Neuropathol. 2018;136:255–271. doi: 10.1007/s00401-018-1854-7. [DOI] [PubMed] [Google Scholar]
- 20.Glöss S, et al. IDH2 R172 Mutations Across Poorly Differentiated Sinonasal Tract Malignancies. Am. J. Surg. Pathol. 2021;45:1190–1204. doi: 10.1097/PAS.0000000000001697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Johann PD, et al. Sellar Region Atypical Teratoid/Rhabdoid Tumors (ATRT) in Adults Display DNA Methylation Profiles of the ATRT-MYC Subgroup. Am. J. Surg. Pathol. 2018;42:506–511. doi: 10.1097/PAS.0000000000001023. [DOI] [PubMed] [Google Scholar]
- 22.Deprez M, et al. A Single-Cell Atlas of the Human Healthy Airways. Am. J. Resp. Crit. Care. 2020;202:1636–1645. doi: 10.1164/rccm.201911-2199OC. [DOI] [PubMed] [Google Scholar]
- 23.Consortia, C. Z. I. S.-C. C.−19. et al. Single cell profiling of COVID-19 patients: an international data resource from multiple tissues. Medrxiv10.1101/2020.11.20.20227355.
- 24.Jiang T, Chen X, Su C, Ren S, Zhou C. Pan-cancer analysis of ARID1A Alterations as Biomarkers for Immunotherapy Outcomes. J. Cancer. 2020;11:776–780. doi: 10.7150/jca.41296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.André F, et al. Alpelisib for PIK3CA-Mutated, Hormone Receptor–Positive Advanced Breast Cancer. N. Engl. J. Med. 2019;380:1929–1940. doi: 10.1056/NEJMoa1813904. [DOI] [PubMed] [Google Scholar]
- 26.Martínez-Sáez O, et al. Frequency and spectrum of PIK3CA somatic mutations in breast cancer. Breast Cancer Res. 2020;22:45. doi: 10.1186/s13058-020-01284-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B. An introduction to kernel-based learning algorithms. Ieee Trans. Neural Netw. Publ. Ieee Neural Netw. Counc. 2001;12:181–201. doi: 10.1109/72.914517. [DOI] [PubMed] [Google Scholar]
- 28.Vapnik, V. N. The Nature of Statistical Learning Theory. 15–32 (1995) 10.1007/978-1-4757-2440-0_2.
- 29.Schölkopf, B., Smola, A. & Atiya, A. F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press (2005) 10.1109/tnn.2005.848998.
- 30.Friedman, J., Hastie, T. & Tibshirani, R. The elements of statistical learning. vol. 1 (Springer Series in Statistics, 2001).
- 31.Maros ME, et al. Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat. Protoc. 2020;15:479–512. doi: 10.1038/s41596-019-0251-6. [DOI] [PubMed] [Google Scholar]
- 32.Sturm D, et al. New Brain Tumor Entities Emerge from Molecular Classification of CNS-PNETs. Cell. 2016;164:1060–1072. doi: 10.1016/j.cell.2016.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Unruh D, et al. Methylation and transcription patterns are distinct in IDH mutant gliomas compared to other IDH mutant cancers. Sci. Rep. 2019;9:8946. doi: 10.1038/s41598-019-45346-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dalle IA, DiNardo CD. The role of enasidenib in the treatment of mutant IDH2 acute myeloid leukemia. Ther. Adv. Hematol. 2018;9:163–173. doi: 10.1177/2040620718777467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Riobello C, et al. IDH2 Mutation Analysis in Undifferentiated and Poorly Differentiated Sinonasal Carcinomas for Diagnosis and Clinical Management. Am. J. Surg. Pathol. 2020;44:396–405. doi: 10.1097/PAS.0000000000001420. [DOI] [PubMed] [Google Scholar]
- 36.Foulkes WD, et al. No small surprise – small cell carcinoma of the ovary, hypercalcaemic type, is a malignant rhabdoid tumour. J. Pathol. 2014;233:209–214. doi: 10.1002/path.4362. [DOI] [PubMed] [Google Scholar]
- 37.Dang L, et al. Cancer-associated IDH1 mutations produce 2-hydroxyglutarate. Nature. 2009;462:739–744. doi: 10.1038/nature08617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Persson M, et al. Clinically significant copy number alterations and complex rearrangements of MYB and NFIB in head and neck adenoid cystic carcinoma. Genes Chromosomes Cancer. 2012;51:805–817. doi: 10.1002/gcc.21965. [DOI] [PubMed] [Google Scholar]
- 39.Andersson MK, Åman P, Stenman G. IGF2/IGF1R Signaling as a Therapeutic Target in MYB-Positive Adenoid Cystic Carcinomas and Other Fusion Gene-Driven Tumors. Cells. 2019;8:913. doi: 10.3390/cells8080913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gupta AK, et al. Signaling pathways in adenoid cystic cancers: Implications for treatment. Cancer Biol. Ther. 2009;8:1947–1951. doi: 10.4161/cbt.8.20.9596. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Salha IB, Bhide S, Mourtzoukou D, Fisher C, Thway K. Solid Variant of Adenoid Cystic Carcinoma. Int J. Surg. Pathol. 2016;24:419–424. doi: 10.1177/1066896916642011. [DOI] [PubMed] [Google Scholar]
- 42.Budwit-Novotny DA, et al. Immunohistochemical analyses of estrogen receptor in endometrial adenocarcinoma using a monoclonal antibody. Cancer Res. 1986;46:5419–5425. [PubMed] [Google Scholar]
- 43.Jurmeister P, et al. Parallel screening for ALK, MET and ROS1 alterations in non-small cell lung cancer with implications for daily routine testing. Lung Cancer. 2015;87:122–129. doi: 10.1016/j.lungcan.2014.11.018. [DOI] [PubMed] [Google Scholar]
- 44.Aryee MJ, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30:1363–1369. doi: 10.1093/bioinformatics/btu049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Peters TJ, et al. De novo identification of differentially methylated regions in the human genome. Epigenet Chromatin. 2015;8:6. doi: 10.1186/1756-8935-8-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pidsley R, et al. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14:293. doi: 10.1186/1471-2164-14-293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Krijthe, J. H. Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation, URL: https://github.com/jkrijthe/Rtsne. (Accessed 11/02/2022)
- 48.Johann PD, Jäger N, Pfister SM, Sill M. RF_Purify: a novel tool for comprehensive analysis of tumor-purity in methylation array data based on random forest regression. Bmc Bioinforma. 2019;20:428. doi: 10.1186/s12859-019-3014-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ruff, et al. A Unifying Review of Deep and Shallow Anomaly Detection. P IEEE. 2021;109:756–795. doi: 10.1109/JPROC.2021.3052449. [DOI] [Google Scholar]
- 50.Hovestadt, V. & Zapatka, M. conumee: Enhanced copy-number variation analysis using Illumina DNA methylation arrays. R package version 1.9.0, http://bioconductor.org/packages/conumee/. [DOI] [PMC free article] [PubMed]
- 51.Capper D, et al. Practical implementation of DNA methylation and copy-number-based CNS tumor diagnostics: the Heidelberg experience. Acta Neuropathol. 2018;136:181–210. doi: 10.1007/s00401-018-1879-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hughes CS, et al. Quantitative Profiling of Single Formalin Fixed Tumour Sections: proteomics for translational research. Sci. Rep. 2016;6:34949. doi: 10.1038/srep34949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Friedrich C, et al. Comprehensive micro-scaled proteome and phosphoproteome characterization of archived retrospective cancer repositories. Nat. Commun. 2021;12:3576. doi: 10.1038/s41467-021-23855-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 55.Ritchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucl. Acids Res. 2015;43:e47–e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Budczies J, Bockmayr M, Treue D, Klauschen F, Denkert C. Semiconductor sequencing: how many flows do you need? Bioinformatics. 2015;31:1199–1203. doi: 10.1093/bioinformatics/btu805. [DOI] [PubMed] [Google Scholar]
- 57.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zhao C, et al. TruSight Oncology 500: Enabling Comprehensive Genomic Profiling and944 Biomarker Reporting with Targeted Sequencing. Biorxiv. 2020 doi: 10.1101/2020.10.21.349100. [DOI] [Google Scholar]
- 59.Jurmeister P., Leitheiser M. DNA methylation-based classification of sinonasal tumors [Code and preprocessed proteomics data]. Figshare 10.6084/m9.figshare.17144639 (Accessed 11/02/2022)
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw DNA methylation data of all samples that have been collected for this study have been deposited in GEO under the accession GSE196228. Processed proteomics data are available at FigShare (10.6084/m9.figshare.17144639). Due to privacy concerns, raw DNA sequencing and raw proteomics data cannot be made publicly available. Instead, this data has been deposited under controlled access in the European Genome Phenome Archive (EGA) under the accession numbers EGAS00001006712 (proteomics data; https://ega-archive.org/studies/EGAS00001006712) and EGAS00001006713 (DNA sequencing data; https://ega-archive.org/studies/EGAS00001006713). Requests for access should be addressed to the Data Access Committee of the Institute of Pathology LMU Munich (DAC@aimethylation.com). The time for response from the authors to applications will be within one month. All requests will be reviewed by the legal and data protection department of the LMU Munich. The following restrictions apply: (1) a data sharing agreement must be signed between the corresponding author and the data processor; (2) data will only be shared for scientific, non-commercial purposes; (3) the data processor must comply with the General Data Protection Regulation (GDPR) of the European Union, alternatively, they have to establish a data privacy policy that is adequate in the sense of the GDPR which will be assessed by the data protection department of the LMU Munich; (4) the data processor must delete all shared data after the investigation; (5) data must not be shared with any third party or individuals who are not authorized to access the data. For part of the study, publicly available data was retrieved from the TCGA database (https://www.cancer.gov/tcga). All other data is provided within the Supplementary Information and Supplementary Data files.
The code to reproduce the main analyses presented in this manuscript is available on FigShare (10.6084/m9.figshare.17144639)59.