Sarcoma classification by DNA methylation profiling

Christian Koelsche; Daniel Schrimpf; Damian Stichel; Martin Sill; Felix Sahm; David E Reuss; Mirjam Blattner; Barbara Worst; Christoph E Heilig; Katja Beck; Peter Horak; Simon Kreutzfeldt; Elke Paff; Sebastian Stark; Pascal Johann; Florian Selt; Jonas Ecker; Dominik Sturm; Kristian W Pajtler; Annekathrin Reinhardt; Annika K Wefers; Philipp Sievers; Azadeh Ebrahimi; Abigail Suwala; Francisco Fernández-Klett; Belén Casalini; Andrey Korshunov; Volker Hovestadt; Felix K F Kommoss; Mark Kriegsmann; Matthias Schick; Melanie Bewerunge-Hudler; Till Milde; Olaf Witt; Andreas E Kulozik; Marcel Kool; Laura Romero-Pérez; Thomas G P Grünewald; Thomas Kirchner; Wolfgang Wick; Michael Platten; Andreas Unterberg; Matthias Uhl; Amir Abdollahi; Jürgen Debus; Burkhard Lehner; Christian Thomas; Martin Hasselblatt; Werner Paulus; Christian Hartmann; Ori Staszewski; Marco Prinz; Jürgen Hench; Stephan Frank; Yvonne M H Versleijen-Jonkers; Marije E Weidema; Thomas Mentzel; Klaus Griewank; Enrique de Álava; Juan Díaz Martín; Miguel A Idoate Gastearena; Kenneth Tou-En Chang; Sharon Yin Yee Low; Adrian Cuevas-Bourdier; Michel Mittelbronn; Martin Mynarek; Stefan Rutkowski; Ulrich Schüller; Viktor F Mautner; Jens Schittenhelm; Jonathan Serrano; Matija Snuderl; Reinhard Büttner; Thomas Klingebiel; Rolf Buslei; Manfred Gessler; Pieter Wesseling; Winand N M Dinjens; Sebastian Brandner; Zane Jaunmuktane; Iben Lyskjær; Peter Schirmacher; Albrecht Stenzinger; Benedikt Brors; Hanno Glimm; Christoph Heining; Oscar M Tirado; Miguel Sáinz-Jaspeado; Jaume Mora; Javier Alonso; Xavier Garcia del Muro; Sebastian Moran; Manel Esteller; Jamal K Benhamida; Marc Ladanyi; Eva Wardelmann; Cristina Antonescu; Adrienne Flanagan; Uta Dirksen; Peter Hohenberger

doi:10.1038/s41467-020-20603-4

. 2021 Jan 21;12:498. doi: 10.1038/s41467-020-20603-4

Sarcoma classification by DNA methylation profiling

Christian Koelsche ^1,^2,^3,^#, Daniel Schrimpf ^1,^2,^#, Damian Stichel ^2,^#, Martin Sill ^4,^5,^#, Felix Sahm ^1,², David E Reuss ^1,², Mirjam Blattner ^4,⁶, Barbara Worst ^4,^6,⁷, Christoph E Heilig ⁸, Katja Beck ^8,⁹, Peter Horak ⁸, Simon Kreutzfeldt ⁸, Elke Paff ^4,^6,⁷, Sebastian Stark ^4,^6,⁷, Pascal Johann ^4,^6,⁷, Florian Selt ^4,^7,¹⁰, Jonas Ecker ^4,^7,¹⁰, Dominik Sturm ^4,^6,⁷, Kristian W Pajtler ^4,^5,⁷, Annekathrin Reinhardt ^1,², Annika K Wefers ^1,², Philipp Sievers ^1,², Azadeh Ebrahimi ², Abigail Suwala ^1,², Francisco Fernández-Klett ^1,², Belén Casalini ², Andrey Korshunov ^1,², Volker Hovestadt ^11,¹², Felix K F Kommoss ³, Mark Kriegsmann ³, Matthias Schick ¹³, Melanie Bewerunge-Hudler ¹³, Till Milde ^4,^7,¹⁰, Olaf Witt ^4,^7,¹⁰, Andreas E Kulozik ^4,⁷, Marcel Kool ^4,⁵, Laura Romero-Pérez ¹⁴, Thomas G P Grünewald ¹⁴, Thomas Kirchner ¹⁵, Wolfgang Wick ^16,¹⁷, Michael Platten ^18,¹⁹, Andreas Unterberg ²⁰, Matthias Uhl ^21,²², Amir Abdollahi ^21,^22,^23,²⁴, Jürgen Debus ^21,^22,^23,²⁴, Burkhard Lehner ²⁵, Christian Thomas ²⁶, Martin Hasselblatt ²⁶, Werner Paulus ²⁶, Christian Hartmann ²⁷, Ori Staszewski ^28,²⁹, Marco Prinz ^28,^30,³¹, Jürgen Hench ³², Stephan Frank ³², Yvonne M H Versleijen-Jonkers ³³, Marije E Weidema ³³, Thomas Mentzel ³⁴, Klaus Griewank ³⁵, Enrique de Álava ^36,³⁷, Juan Díaz Martín ³⁶, Miguel A Idoate Gastearena ³⁸, Kenneth Tou-En Chang ³⁹, Sharon Yin Yee Low ⁴⁰, Adrian Cuevas-Bourdier ⁴¹, Michel Mittelbronn ^41,^42,^43,⁴⁴, Martin Mynarek ⁴⁵, Stefan Rutkowski ⁴⁵, Ulrich Schüller ^45,^46,⁴⁷, Viktor F Mautner ⁴⁸, Jens Schittenhelm ⁴⁹, Jonathan Serrano ⁵⁰, Matija Snuderl ⁵⁰, Reinhard Büttner ⁵¹, Thomas Klingebiel ⁵², Rolf Buslei ⁵³, Manfred Gessler ⁵⁴, Pieter Wesseling ^55,⁵⁶, Winand N M Dinjens ⁵⁷, Sebastian Brandner ^58,⁵⁹, Zane Jaunmuktane ^59,⁶⁰, Iben Lyskjær ⁶¹, Peter Schirmacher ³, Albrecht Stenzinger ³, Benedikt Brors ⁶², Hanno Glimm ^63,^64,^65,⁶⁶, Christoph Heining ^64,^65,⁶⁶, Oscar M Tirado ⁶⁷, Miguel Sáinz-Jaspeado ⁶⁷, Jaume Mora ⁶⁸, Javier Alonso ⁶⁹, Xavier Garcia del Muro ⁷⁰, Sebastian Moran ⁷¹, Manel Esteller ^72,^73,^74,⁷⁵, Jamal K Benhamida ⁷⁶, Marc Ladanyi ⁷⁶, Eva Wardelmann ⁷⁷, Cristina Antonescu ⁷⁶, Adrienne Flanagan ^78,⁷⁹, Uta Dirksen ^80,⁸¹, Peter Hohenberger ⁸², Daniel Baumhoer ⁸³, Wolfgang Hartmann ⁸⁴, Christian Vokuhl ⁸⁵, Uta Flucke ⁸⁶, Iver Petersen ^87,⁸⁸, Gunhild Mechtersheimer ³, David Capper ⁸⁹, David T W Jones ^4,⁶, Stefan Fröhling ⁸, Stefan M Pfister ^4,^5,⁷, Andreas von Deimling ^1,^2,^✉

¹Department of Neuropathology, Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany

²Clinical Cooperation Unit Neuropathology, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

³Department of General Pathology, Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany

⁴Hopp Children’s Cancer Center Heidelberg (KiTZ), Heidelberg, Germany

⁵Division of Pediatric Neurooncology, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶Pediatric Glioma Research Group, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

⁷Department of Pediatric Oncology, Hematology and Immunology, Heidelberg University Hospital, Heidelberg, Germany

⁸Division of Translational Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), Heidelberg, Germany

⁹Heidelberg Center for Personalized Oncology (HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁰Clinical Cooperation Unit Pediatric Oncology, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

¹¹Broad Institute of MIT and Harvard, Cambridge, MA USA

¹²Department of Pathology and Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA USA

¹³Genomics and Proteomics Core Facility, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁴Max-Eder Research Group for Pediatric Sarcoma Biology, Institute of Pathology, Faculty of Medicine, LMU Munich, Munich, Germany

¹⁵Institute of Pathology, Faculty of Medicine, LMU Munich, Munich, Germany

¹⁶Neurology Clinic and National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany

¹⁷Clinical Cooperation Unit Neurooncology, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁸Department of Neurology, Mannheim University Medical Center, University of Heidelberg, Mannheim, Germany

¹⁹Clinical Cooperation Unit Neuroimmunology and Brain Tumor Immunology, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

²⁰Department of Neurosurgery, Heidelberg University Hospital, Heidelberg, Germany

²¹Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany

²²Heidelberg Institute of Radiation Oncology (HIRO), National Center for Radiation Research in Oncology (NCRO), Heidelberg, Germany

²³Heidelberg Ion-Beam Therapy Center (HIT), Heidelberg, Germany

²⁴Translational Radiation Oncology, German Cancer Consortium (DKTK), National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany

²⁵Department of Orthopaedics, Trauma Surgery and Paraplegiology, Heidelberg University Hospital, Heidelberg, Germany

²⁶Institute of Neuropathology, University Hospital Münster, Münster, Germany

²⁷Department of Neuropathology, Institute of Pathology, Hannover Medical School (MHH), Hannover, Germany

²⁸Institute of Neuropathology, Faculty of Medicine, University of Freiburg, Freiburg, Germany

²⁹Berta-Ottenstein-Programme for Clinician Scientists, Faculty of Medicine, University of Freiburg, Freiburg, Germany

³⁰Signalling Research Centers BIOSS and CIBSS, University of Freiburg, Freiburg, Germany

³¹Center for Basics in NeuroModulation (NeuroModulBasics), Faculty of Medicine, University of Freiburg, Freiburg, Germany

³²Department of Neuropathology, Institute of Pathology, Basel University Hospital, Basel, Switzerland

³³Department of Medical Oncology, Radboud University Medical Center, Nijmegen, The Netherlands

³⁴Dermatopathology Bodensee, Friedrichshafen, Germany

³⁵Department of Dermatology, University Hospital Essen, West German Cancer Center, University Duisburg-Essen, Essen, Germany

³⁶Department of Pathology, Institute of Biomedicine of Sevilla (IBiS), Virgen del Rocio University Hospital, CSIC/University of Sevilla/CIBERONC, Seville, Spain

³⁷Department of Normal and Pathological Cytology and Histology, School of Medicine. University of Seville, Seville, Spain

³⁸Department of Pathological Anatomy, Clínica Universidad de Navarra, University of Navarra, Pamplona, Spain

³⁹Department of Pathology and Laboratory Medicine, KK Women’s and Children’s Hospital, Singapore, Republic of Singapore

⁴⁰Department of Neurosurgery, National Neuroscience Institute, Singapore, Republic of Singapore

⁴¹National Center of Pathology (NCP), Laboratoire National de Santé (LNS), Dudelange, Luxembourg

⁴²Luxembourg Center of Neuropathology (LCNP), Luxembourg, Luxembourg

⁴³Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg, Luxembourg

⁴⁴Department of Oncology (DONC), Luxembourg Institute of Health (LIH), Luxembourg, Luxembourg

⁴⁵Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

⁴⁶Institute of Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

⁴⁷Research Institute Children’s Cancer Center Hamburg, Hamburg, Germany

⁴⁸Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

⁴⁹Department of Neuropathology, Institute of Pathology and Neuropathology, University Hospital of Tübingen, Tübingen, Germany

⁵⁰Department of Pathology, New York University School of Medicine, New York, NY USA

⁵¹Institute of Pathology, Cologne University Hospital, Cologne, Germany

⁵²Department of Pediatric Hematology and Oncology, University Children’s Hospital, Frankfurt/Main, Germany

⁵³Institute of Pathology, Sozialstiftung Bamberg, Klinikum am Bruderwald, Bamberg, Germany

⁵⁴Theodor-Boveri-Institute/Biocenter, Developmental Biochemistry, Würzburg University, Würzburg, Germany

⁵⁵Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands

⁵⁶Department of Pathology, Amsterdam University Medical Centers/VUmc, Amsterdam, The Netherlands

⁵⁷Department of Pathology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands

⁵⁸Department of Neurodegeneration, Institute of Neurology, University College London, London, UK

⁵⁹Division of Neuropathology, The National Hospital for Neurology and Neurosurgery, University College London Hospitals NHS Foundation Trust, London, UK

⁶⁰Department of Molecular Neuroscience, UCL Queen Square Institute of Neurology, University College London, London, UK

⁶¹Research Department of Pathology, University College London, London, UK

⁶²Division of Applied Bioinformatics, German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶³Translational and Functional Cancer Genomics, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶⁴Department of Translational Medical Oncology, National Center for Tumor Diseases (NCT) Dresden and German Cancer Research Center (DKFZ), Dresden, Germany

⁶⁵Center for Personalized Oncology, National Center for Tumor Diseases (NCT) Dresden and University Hospital Carl Gustav Carus Dresden at TU Dresden, Dresden, Germany

⁶⁶German Cancer Consortium (DKTK), Dresden, Germany

⁶⁷Sarcoma Research Group, Oncobell Program, Bellvitge Biomedical Research Institute (IDIBELL), CIBERONC, Barcelona, Catalonia Spain

⁶⁸Department of Pediatric Onco-Hematology and Developmental Tumor Biology Laboratory, Hospital Sant Joan de Déu, Barcelona, Catalonia Spain

⁶⁹Pediatric Solid Tumor Laboratory, Human Genetic Department, Research Institute of Rare Diseases, Instituto de Salud Carlos III (ISCIII), Madrid, Spain

⁷⁰Medical Oncology Service, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), University of Barcelona, Barcelona, Catalonia Spain

⁷¹Cancer Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Catalonia Spain

⁷²Josep Carreras Leukaemia Research Institute (IJC), Badalona, Barcelona, Catalonia Spain

⁷³Centro de Investigacion Biomedica en Red Cancer (CIBERONC), Madrid, Spain

⁷⁴Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia Spain

⁷⁵Physiological Sciences Department, School of Medicine and Health Sciences, University of Barcelona (UB), Barcelona, Catalonia Spain

⁷⁶Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY USA

⁷⁷Gerhard Domagk Institute of Pathology, University Hospital Münster, Münster, Germany

⁷⁸Department of Histopathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, Middlesex UK

⁷⁹University College London Cancer Institute, London, UK

⁸⁰Pediatrics III Pediatric Hematology, Oncology, Immunology, Cardiology, Pulmonology, West German Cancer Center, University Hospital Essen, Essen, Germany

⁸¹International Ewing Sarcoma Study Group, German Cancer Consortium (DKTK), West German Cancer Center (WTZ), University Duisburg-Essen, Essen, Germany

⁸²Division of Surgical Oncology and Thoracic Surgery, Mannheim University Medical Center, University of Heidelberg, Mannheim, Germany

⁸³Bone Tumour Reference Centre at the Institute of Pathology, University Hospital Basel and University of Basel, Basel, Switzerland

⁸⁴Division of Translational Pathology, Gerhard Domagk Institute of Pathology, University Hospital Münster, Münster, Germany

⁸⁵Department of Pediatric Pathology, University Hospital of Schleswig-Holstein, Kiel, Germany

⁸⁶Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands

⁸⁷Institute of Pathology, SRH Poliklinik Gera GmbH, Gera, Germany

⁸⁸Institute of Pathology, Jena University Hospital Jena, Germany

⁸⁹Department of Neuropathology, Charité Universitätsmedizin Berlin, Berlin, Germany

^✉

Corresponding author.

Contributed equally.

PMCID: PMC7819999 PMID: 33479225

Abstract

Sarcomas are malignant soft tissue and bone tumours affecting adults, adolescents and children. They represent a morphologically heterogeneous class of tumours and some entities lack defining histopathological features. Therefore, the diagnosis of sarcomas is burdened with a high inter-observer variability and misclassification rate. Here, we demonstrate classification of soft tissue and bone tumours using a machine learning classifier algorithm based on array-generated DNA methylation data. This sarcoma classifier is trained using a dataset of 1077 methylation profiles from comprehensively pre-characterized cases comprising 62 tumour methylation classes constituting a broad range of soft tissue and bone sarcoma subtypes across the entire age spectrum. The performance is validated in a cohort of 428 sarcomatous tumours, of which 322 cases were classified by the sarcoma classifier. Our results demonstrate the potential of the DNA methylation-based sarcoma classification for research and future diagnostic applications.

Subject terms: Sarcoma, Computational biology and bioinformatics, Classification and taxonomy, Databases

Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.

Introduction

Sarcomas are a heterogeneous group of tumours, which pose challenges to pathologists. Many entities lack unequivocal morphologic or molecular hallmarks and the overall rarity of sarcomas result in a widespread lack of experience^1,2. A high inter-observer variability among pathologists is reflected in considerable discrepancy rates between primary institutions and specialized referral centres with access to comprehensive molecular testing^3,4. Pathologists often rely on the determination of tumour specific molecular alterations if available⁴. While the determination of characteristic molecular alterations most often consisting of translocations that generate gene fusions has become a diagnostic standard for many sarcoma types, approximately half of the sarcoma entities lack unequivocal molecular hallmarks¹. Even in some cases defined by specific gene fusions, it may not be possible to identify adequately the fusion by FISH or RNA-based methods for a variety of technical and specimen-related limitations. Novel approaches are needed to fill these diagnostic gaps⁵.

DNA methylation is a key epigenetic mark and plays important roles in normal development and disease⁶. In cancer, DNA methylation patterns reflect both the cell type of origin, as well as acquired changes during tumour formation⁷. Profiling of human brain tumours has demonstrated entity-specific methylation signatures and has led to the identification of several novel and clinically relevant subtypes^8–13. On this basis, a comprehensive brain tumour classifier has been developed^14,15. Recently, we have extended the principle of methylation-based tumour profiling to small blue round cell sarcomas evading a definite histological diagnosis, thereby resolving these cases into established sarcoma entities¹⁶. Further, DNA methylation-based profiling showed diagnostic potential for soft tissue and bone sarcoma subtyping^17–22. In this work, we aimed at developing a DNA methylation-based classification tool for soft tissue and bone sarcomas representing a broad range of subtypes and age groups.

Results

DNA methylation profiling of prototypical sarcomas

We subjected prototypical cases of the most common soft tissue and bone tumours, non-mesenchymal tumours that might mimic mesenchymal differentiation, i.e. squamous cell carcinoma or melanoma, and non-neoplastic control tissue to DNA methylation profiling using the Infinium HumanMethylation450K BeadChip or EPIC array platform. Following quality control, methylation data were analysed by unsupervised hierarchical clustering and t-Distributed Stochastic Neighbour Embedding (t-SNE)²³ thereby identifying groups of tumours sharing methylation patterns (methylation classes). To minimize potential clustering artefacts at least seven cases were required for defining a methylation class, which empirically proved sufficient for training a classifier and allowed prediction^14,15. Unsupervised clustering, respecting the minimal number of seven cases per group, led to the designation of 62 tumour methylation classes belonging altogether to 54 histological types, and three non-neoplastic control methylation classes (Fig. 1). Iterative random down-sampling validated the stability of these methylation classes (Supplementary Fig. 1), and potential confounding factors such as sex, patients’ age, type of material, type of array and tumour purity were excluded (Supplementary Fig. 2).

Based on 1077 tumour cases, methylation classes were assigned to four categories relating to the WHO classification (Fig. 1a). Category 1 represents methylation classes equaling a WHO entity. Category 2 represents methylation classes corresponding to a subgroup of a WHO entity. Category 3 represents methylation classes that combine WHO entities. Category 4 represents methylation classes of novel entities which are not yet defined by the WHO classification (Fig. 1a). 48 methylation classes corresponded to distinct WHO entities (category 1) comprising 45 mesenchymal tumour entities, cutaneous melanoma, cutaneous squamous cell carcinoma and Langerhans cell histiocytosis. Nine methylation classes corresponded to subsets within WHO entities (category 2) with conventional chondrosarcoma dividing into four methylation classes, rhabdomyosarcoma with MYOD1 alteration, plexiform neurofibroma, dedifferentiated chordoma and small blue round cell tumours with either BCOR alteration or CIC alteration. Three methylation classes combined WHO entities (category 3). The methylation class angioleiomyoma/myopericytoma and the methylation class atypical fibroxanthoma/pleomorphic dermal sarcoma each combined two entities, while the methylation class undifferentiated sarcoma contained undifferentiated (pleomorphic) sarcoma, myxofibrosarcoma and a fraction of pleomorphic liposarcoma, thereby providing further evidence that these sarcoma subtypes probably fall into a morphologic continuum of a single entity as suggested by previous genetic-based studies^24–26. Two methylation classes point towards novel entities not yet defined by the WHO (category 4)^13,19. The methylation class SARC (RMS-like) was identified in sarcomatous CNS tumours with various morphologic patterns not matching established tumour categories. Unifying features of cases mapping to this class are rhabdomyoblast-like cells and DICER1 mutations¹³. Methylation class SARC (MPNST-like) was reported as a subset of malignant peripheral nerve sheath tumours¹⁹. Cases assigning to SARC (MPNST-like) present similar to MPNST, however, retain trimethylation at histone 3 lysine 27 (H3K27me3). In addition, based on 28 non-neoplastic tissue specimens methylation classes were established for non-neoplastic skeletal muscle, reactive soft tissue and leukocytes. Supplementary Data 1 provides basic clinical information for each individual case of these methylation classes. Supplementary Data 2 indicates characteristic clinical and molecular features for each methylation class.

Development of the sarcoma classifier

We next developed a classification tool, sarcoma classifier, using a Random Forest machine learning classification algorithm as described^14,27. Cross-validation, an internal performance metric¹⁵, of the sarcoma classifier provided an estimated error rate of 1.95% for raw scores and a discriminating power of 99.9% by area under receiver operating characteristic curve analysis. The low rate of misclassifications demonstrates the discriminating power of the classifier algorithm (Fig. 2, Supplementary Data 2). The discrepancies encountered at cross-validation predominantly occurred between the four methylation classes of conventional chondrosarcoma and between three methylation classes of sarcomas associated with BCOR alterations. Similar to the brain tumour classifier we introduced a methylation class family score combining these closely related methylation classes by adding up their respective prediction scores. This modification reduced the error rate at cross-validation to 0.65% for the raw scores. We employed a calibration algorithm transforming raw into calibrated scores thereby ensuring inter-class-comparability. This further allowed definition of a general cut-off score of 0.9 as a threshold for prediction to a specific methylation class (Supplementary Fig. 3)¹⁴.

Fig. 2 — Heat map showing results of a threefold cross-validation of the Random Forest classifier incorporating information of n = 1077 biologically independent samples allocated to 65 methylation classes. Deviations from the bisecting line represent misclassification errors (using the maximum calibrated score for class prediction). Methylation class families (MCF) are indicated by black squares. The colour code and abbreviations are identical to Fig. 1a. Numbers of this figure are summarized in Supplementary Data 4.

Classifier performance validated in a clinical cohort

Next, the sarcoma classifier performance was validated on 428 additional cases, mostly representing relapsed and refractory soft tissue and bone tumours, enrolled in the MNP2.0, PTT2.0, INFORM or NCT MASTER trials, which are focused on molecular analysis (Supplementary Data 3)^28–30. The predicted methylation class by the sarcoma classifier was compared to institutional diagnoses (Fig. 3). A calibrated score ≥0.9 was reached for 322 of 428 cases (75%). The respective methylation class or -family matched with the institutional diagnosis in 263/428 cases (61%). A discrepant classifier prediction with a calibrated classifier prediction score ≥0.9 was encountered in 59/428 cases (14%). In these cases, molecular data were screened for subtype-specific alterations. The initial diagnosis was revised in favour of the predicted methylation class in 29/59 cases. In 26/59 cases the discrepancy between histological diagnosis and classifier prediction could not be resolved due to lack of entity specific mutations. The initial diagnosis was retained against the predicted methylation class in 4/59 cases (Fig. 4). The reason for misleading methylation class prediction in the latter cases, all passed the quality control steps, remains unclear. The 0.9 threshold was not reached for 106 of 428 cases (25%). Consecutive t-SNE analysis demonstrated a position of many of these cases peripheral or outside of the methylation classes from the reference set. It is possible that some of these tumours were contaminated with a higher amount of non-neoplastic cells than estimated by histological examination, although the mean value for tumour cell purity of 47,4% in non-classifiable cases was only slightly lower compared to 51,3% in classifiable cases (Fig. 5). However, because some sarcomas with low calibrated classifier scores carried unique molecular alterations such as ONECUT1-NUTM1 or EWSR1-TFCP2 gene fusions we favour considering these as epigenetic subsets not yet covered by the current classifier version^31,32. A heatmap for the performance of the classifier in the validation set is shown in Supplementary Fig. 4.

Fig. 3 — In total, 426 independent sarcoma samples were analysed. 75% matched to an established DNA methylation class with a classifier prediction cut-off score of ≥0.9. 25% reached a classifier prediction cut-off score of <0.9. Abbreviations are identical to Fig. 1a.

Fig. 4 — Classifier validation using sarcoma cases enrolled in the MNP2.0, PTT2.0, INFORM or NCT MASTER trials. Institutional diagnosis (left) and classifier prediction (right) of the 322 cases that received a methylation class prediction ≥0.9. The institutional diagnosis of 263 cases matched the classifier prediction (concordant; grey bars). In 59 cases the classifier prediction differed from institutional diagnosis, with 29 cases reclassified in favour of the methylation class prediction (discrepant—reclassified; blue bars), 26 cases where molecular validation analysis was inconclusive (discrepant; light blue bars), and four cases with a misleading classifier result (discrepant – misleading; red bar).

Fig. 5 — a Unsupervised clustering of the combined reference (n = 1077) and diagnostic cohort (n = 428) using t-SNE dimensionality reduction. The reference set is indicated in the upper left plot. The diagnostic samples coded as classifiable (n = 318, grey dots; upper right plot), non-classifiable (n = 106, blue dots; lower left plot) and misleading (n = 4, red dots; lower right plot). The classifiable cases show high overlap with the reference cases. The non-classifiable cases frequently fall in the periphery of or are completely separate from the reference samples. b Tumour cell purity histogram plots of the reference set and the validation set subdivided into classifiable and non-classifiable cases. The mean value is indicated as dashed red line and provided as number [%]. c Tumour cell purity plotted against calibrated score for conventional osteosarcoma cases of the validation set.

Copy number profiling of sarcomas

Independent from the methylation patterns used for classification, high-density DNA methylation arrays allow for determining copy number alterations, the detection of which is of major diagnostic relevance for sarcomas^25,26. We generated copy number variation (CNV) plots from all sarcomas of the reference cohort as described¹⁴. Frequently encountered alterations include MDM2 amplification for well-/dedifferentiated liposarcomas, MYC amplification for radiation induced angiosarcoma or segmental chromosomal deletions on chromosome 22q encompassing SMARCB1 for rhabdoid tumours. While these alterations often are characteristic for distinct sarcoma entities, they usually are not pathognomonic because of their occasional occurrence also in other entities. However, in combination with methylation profiles, CNV plots frequently add to the diagnostic decision process. The frequency of chromosomal or subchromosomal numerical alterations within the methylation classes/entities can be depicted by summary CNV plots (Supplementary Fig. 5). A systematic overview of frequently observed copy number alterations is provided for each methylation class (Supplementary Data 2). Molecular and clinical characteristics of the predicted methylation class are provided in a molecular classifier report (Supplementary Fig. 6).

Discussion

We established an open-access platform allowing categorization of sarcomas based on machine generated methylation data and algorithm driven analysis. Employing DNA methylation-based categorization offers highly attractive features. Analyses can be performed on DNA extracted from paraffin-embedded and formalin-fixed tissues allowing integration in routine settings. This represents a clear advantage over RNA expression profiling dependent on fresh tumour tissue³³. The detection of individual methylation patterns for sarcoma entities is of special interest for those entities lacking pathognomonic gene alterations such as entity specific gene fusions. In the spectre of sarcomas currently recognized by the classifier approximately one third of the entities do not exhibit such specific mutational events.

Heterogeneity on DNA methylation level has been described between different tumours, but also within individual tumours for Ewing sarcoma³⁴. On the other hand, that study also reported a close to 100% accuracy of distinguishing Ewing sarcoma from other cell types. Nevertheless, the observation of heterogeneity on the methylation level within individual tumours contrasts with the high stability of a parameter required for tumour classification. We here describe a high stability of methylation profiles for sarcoma entities. In addition, our selection process for CpG sites included in the classification algorithm favours those with maximal distinction between tumour entities. A practical example for the high stability of methylation profiles established by this approach has been presented for ependymoma with demonstration of primary and recurrent tumours from same patients neighbouring in almost all instances upon unsupervised clustering⁹.

While conceptually highly attractive, the current version of the sarcoma classifier could not assign approximately 25% of the cases in the validation cohort to a DNA methylation class. This can be explained: Foremost, in its current stage the sarcoma classifier has not been trained to cover the entire spectrum of sarcoma subtypes. This does account for a portion of the 106/428 unrecognized cases exhibiting a calibrated score <0.9 (Fig. 3). Limited sample numbers for some entities will not allow identifying methylation subclasses as done for the chondrosarcomas splitting in four sub-categories. Future increase of the number of cases in the reference set will very likely enable detection of more methylation subgroups. A similar tendency has been observed in pilocytic astrocytomas and medulloblastomas separating now into several methylation subgroups with the clinical impact still remaining unclear^7,12,35. Moreover, the DNA methylation-based approach is dependent on fairly high tumour cell content in the samples. Our experience is best with 70% or more of all cells in a sample constituting tumour cells³⁶. Many sarcomas, however, typically contain high proportions of non-neoplastic inflammatory cells (Fig. 5). This circumstance might have contributed to classifier output scores lower than the cut-off score of 0.9, consequently prompting the tumour evaluation as unclassifiable. The effect of tumour cell purity on the classifier performance is likely to be dependent on the sarcoma subtype (Fig. 5). Future studies with larger case numbers are required to elucidate the effect of tumour purity on classifier performance. A possibility to overcome this problem might be to subtract methylation patterns typical for lymphocytes thereby accentuating patterns of the respective sarcoma entities. And lastly, our validation cohort did not receive a centralized pathological reference review. While such centralized expert review would not affect the classifier performance, it likely would reduce the number of discordant cases as suggested by a recent study pointing to a reclassification rate of 14% in sarcoma upon central review³⁷.

In summary, we introduce a tool based on DNA methylation data and on automated algorithm analysis using probability measures for sarcoma classification. We developed a webpage for the scientific community listing characteristic features for the tumour methylation classes. This online platform also provides a free upload service for locally generated methylation data, which are analysed instantly and results are returned as molecular classifier report with a prediction confidence score (Supplementary Fig. 6). While the current version of the sarcoma classifier already includes some very rare entities, we acknowledge not to cover the entire spectrum. Analysis of additional sarcoma samples, including uploaded data, subject to permission, will further improve this tool by refining established and adding novel methylation classes. The sarcoma classifier can be accessed at www.molecularsarcomapathology.org.

Methods

Sample selection and quality control

All samples of the reference and validation set are from individual/different patients. All cases of the reference set had undergone rigorous morphological examination by pathologists specialized in diagnosing sarcomas and also tumour-type specific molecular testing for identification of the relevant alterations, whenever possible. For each specimen, we aimed at a tumour cell content of ≥70%, with the caveat that microscopically estimated tumour cell percentage is prone to being relatively imprecise. However, determining tumour cell content by random forest regression demonstrated that this goal was not reached for many samples³⁸. Our usual approach was the identification of a representative region on an H&E section followed by taking a 1.5 mm punch from the corresponding site in the formalin-fixed paraffin-embedded (FFPE) block. The validation set included sarcomas enrolled in the INFORM, NCT-MASTER, PPT and MNP2.0 studies^28–30. Rare sarcoma entities have not been over-represented. However, availability determined inclusion resulting in over-representation of high-grade sarcomas in the validation set.

To exclude low-quality samples from the cohort, the on-chip quality metrics of all samples were checked and compared to a set of 7,500 pairs of IDAT-files. In addition, for each sample, an overall noise-level was computed using the R package conumee version 1.6.0. Samples showing low quality values ranging in the 10th percentile for at least one of the sample controls (‘BC conversion I C1, C2, C3’, ‘BC conversion I C4, C5, C6’ or ‘BC conversion II 1, 2, 3, 4’) and showing an overall noise level greater than 3, were excluded from this study.

Methylation array processing

All computational analyses were performed in R version 3.4.4 (R Development Core Team, 2019). Raw signal intensities were obtained from IDAT-files using the minfi Bioconductor package version 1.24.0. Illumina EPIC and 450k samples were merged to a combined dataset by selecting the intersection of probes present on both arrays (combineArrays function, minfi). Each sample was individually normalized by performing a background correction (shifting of the 5th percentile of negative control probe intensities to 0) and a dye-bias correction (scaling of the mean of normalization control probe intensities to 10,000) for both colour channels. Subsequently, a correction for the type of material tissue (FFPE/frozen) and array (450k/EPIC) was performed by fitting univariate, linear models to the log2-transformed intensity values (removeBatchEffect function, limma package version 3.34.5). The methylated and unmethylated signals were corrected individually. Beta-values were calculated from the retransformed intensities using an offset of 100 (as recommended by Illumina).

Before further analysis was undertaken, the following filtering criteria were applied: removal of probes targeting the X and Y chromosomes (n = 11,551), removal of probes containing a single-nucleotide polymorphism (dbSNP132 Common) within five base pairs of and including the targeted CpG-site (n = 7998), probes not mapping uniquely to the human reference genome (hg19) allowing for one mismatch (n = 3,965), and 450k array probes not included on the EPIC array. In total, 428,230 probes were kept for downstream analysis.

Unsupervised analysis

t-SNE

To perform unsupervised non-linear dimension reduction, the 10,000 most variable probes according to standard deviation were selected. The t-SNE plot was then computed via the R package Rtsne (version 0.13) using 3000 iterations and a perplexity value of 30. In addition, to assess the stability of the resulting projection, we repeated the t-SNE 500 times for subsamples of 90% of the data, sampled without replacement.

Hierarchical clustering

Unsupervised hierarchical clustering was performed using the 20,000 most variably methylated CpG sites across the dataset according to median absolute deviation, Euclidean distance and Ward’s linkage method.

Classifier development

Similar to the development of the brain tumour classifier¹⁴ the Random Forest²⁷ algorithm (R package randomForest version 4.6-12) was applied to generate 10,000 binary decision trees, incorporating genome-wide information from all 1077 reference samples of the 65 methylation classes. We used the 10.000 CpGs with highest variable importance. In addition, to address unequal class size we performed downsampling as described¹⁴. The distribution of these CpGs position within the gene region and their regulatory feature group are indicated (Supplementary Fig. 7). Each binary decision tree assigns a given sample to one of the 65 classes, resulting in aggregate raw scores. To enable the comparison of classifier results between classes, these are transformed to a probability that measures the confidence in the class assignment (the calibrated score) by a L2-penalized multinomial logistic regression calibration model (R package glmnet version 2.0-18). Cross-validation of the Random Forest classifier resulted in an estimated error rate of 1.95% for raw scores and 0.65% for calibrated scores and a multi-class area under receiver operating characteristic curve³⁹ of 0.99 and a Brier score⁴⁰ of 0.05. This indicates a high discriminating power. To be able to classify samples from biologically closely related tumour classes, we introduced methylation class families. In those the calibrated scores were added to one score for the methylation class family¹⁴.

Classifier calibration

To obtain classifier scores that are comparable between classes and that are improved estimates of the certainty of individual predictions, we performed a classification score recalibration by mapping the original scores to more accurate class probabilities¹⁵. To find such a mapping, a L2-penalized, multinomial, logistic regression model was fitted, which takes the methylation class as the response variable and the Random Forest scores as explanatory variables. The R package glmnet⁴¹ was used to fit this model. In addition, the model was fitted by incorporating a small ridge-penalty (L2) on the likelihood to prevent overfitting, as well as to stabilize estimation in situations in which classes are perfectly separable. Independent Random Forest scores are needed to fit this model, that is, the scores need to be generated by a Random Forest classifier that was not trained using the same samples, otherwise the Random Forest scores would be systematically biased and not comparable to scores of unseen cases. As such, Random Forest scores generated by the threefold cross-validation are used. To validate the class predictions generated by using the recalibrated scores of the calibration model, a nested threefold cross-validation loop is incorporated into the main threefold cross-validation that validates the Random Forest classifier¹⁵. Within each cross-validation run this nested threefold cross-validation is applied to generate independent Random Forest scores, which are then used to train a calibration model. The predicted Random Forest scores resulting from predicting the one-third test data of the outer cross-validation loop are then recalibrated by applying the calibration model that was fitted on the Random Forest scores generated during the nested cross-validation.

Calibration model parameter tuning

To determine the optimal amount of L2-penalization for a calibration model a parameter tuning is performed using a resampling approach. To this end, each time a calibration model is fitted using raw RF scores from training data to calibrate raw RF scores from test data, 500 random data sets are generated by sampling 70% of the raw scores training data without replacement. For each of these random data sets, L2-multinomial logistic regression models were fitted applying a range of reasonable penalization parameters lambda. The remaining 30% scores were then calibrated by these models and maximum of the calibrated scores over all methylation classes was used to generate class predictions. Then a new binary class was defined, that is, predictions in agreement with the actual true class were considered ‘classifiable’ and predictions not in agreement were labelled ‘non-classifiable’. This new binary variable and the accompanying maximum score over all class scores was then analysed by a receiver operator characteristics (ROC), i.e. calculating the Youden index (Specificity + Sensitivity − 1) for all possible thresholds. The final lambda was then determined such that the average Youden index over all resampling iterations at the prespecified cut-off threshold of 0.9 is maximal¹⁵. By tuning the calibration model in this way we can regulate the amount of calibration so that the scores perform well at the prespecified common threshold of 0.9. This allows us to establish a common threshold for all forthcoming updates of the proposed classifier, which facilitates the communication with clinicians. A scheme summarizing the classifier algorithm steps is provided in Supplementary Fig. 8.

Methylation class families

Misclassification errors mainly occurred within seven groups of histologically and biologically closely related tumour methylation classes. Therefore, we defined three ‘methylation class families’ (MCF) encompassing these seven tumour groups. Calibrated MCF score were calculated by summing up the calibrated class scores within one MCF.

Estimating tumour purity from DNA methylation data

The estimated tumour purity for all reference cases was computed using the R package RF_Purify as described³⁸. For the illustrations, the predictions obtained with the method ‘ABSOLUTE’ were used.

Copy number profiling

Copy number alterations of genomic segments were inferred from the methylation array data based on the R-package conumee after additional baseline correction (https://github.com/dstichel/conumee). Summary copy number profiles were created by summarizing these data in the respective set of reference cases for each methylation class.

Validation analysis

Cases enrolled in INFORM and NCT MASTER were subjected to total RNA and whole-exome sequencing; cases enrolled in MNP2.0 and PTT2.0 were subjected to a customized gene panel NGS⁴² and total RNA sequencing from FFPE material⁴³, whenever necessary.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(7.2MB, pdf)}

Peer Review File^{(2MB, pdf)}

41467_2020_20603_MOESM3_ESM.pdf^{(79.7KB, pdf)}

Description of Additional Supplementary Files

Supplementary Data 1^{(97.7KB, xlsx)}

Supplementary Data 2^{(28.5KB, xlsx)}

Supplementary Data 3^{(62.1KB, xlsx)}

Supplementary Data 4^{(11.5KB, xlsx)}

Reporting Summary^{(197.4KB, pdf)}

Acknowledgements

This work was funded by Deutsche Krebshilfe grant 70112499, the NCT Heidelberg and an Illumina Medical Research Grant. Part of this work was funded by the National Institute of Health Research (to S.B. and Z.J.) and to UCLH Biomedical research centre (BRC399/NS/RB/101410). Human tissues were obtained from University College London NHS Foundation Trust as part of the UK Brain Archive Information Network (BRAIN UK, Ref: 18/004) which is funded by the Medical Research Council and Brain Tumour Research UK. The methylation profiling at NYU is supported by a grant from the Friedberg Charitable Foundation (to M.Sn.). M.Mi. would like to thank the Luxembourg National Research Fond (FNR) for the support (FNR PEARL P16/BM/11192868 grant).

Author contributions

C.K. and A.v.D. conceived and supervised the project. C.K., D.Sc., D.St., M.Si., V.Ho., M.Sc., M.B.H., D.C., D.T.W.J., S.M.P. and A.v.D. performed DNA methylation data analysis and interpretation. C.K., D.Sc., D.St., M.Si., F.Sa., D.E.R., M.Bl., B.Wo., C.He., K.Be., P.Ho., S.Kr., E.Pf., S.St., P.Jo., F.Se., J.Ec., D.St., A.Re., A.K.W., P.Si., A.Eb., A.Su., F.F.K., B.Ca., A.Ko., F.K.F.K., M.Kr., M.Ko., A.St., B.B., H.G., C.Hei., W.H., D.T.W.J., S.F., S.M.P. and A.v.D. performed validation data analysis and interpretation. C.K., T.Mi., K.W.P., O.Wi., A.Ku., L.R.P., T.G.P.G., T.Ki., W.Wi., M.Pl., A.Un., M.Uh., A.Ab., J.De., B.Le., C.Th., M.Ha., W.Pa., C.Ha., O.St., M.Pr., J.He., S.Fr., Y.M.H.V., M.E.W., T.Me., K.Gr., E.d.A., J.D.M., M.A.I.G., K.T.C., S.Y.Y.L., A.Cu., M.Mi., M.My., S.Ru., U.Sc., V.F.M., J.Sc., J.Se., M.Sn., R.B., T.Kl., R.Bu., M.G., P.W., W.N.M.D., S.B., Z.J., Ly.I., P.S., O.M.T., M.Sa., J.M., J.A., X.G.M., S.M., M.E., J.K.B., M.L., E.W., C.A., A.F., U.D., P.H., D.B., C.V., G.M., U.F., I.P., S.F., S.M.P. and A.v.D. provided cases and meta data. C.K., D.Sc., D.St. and M.Si. created the figures. C.K., D.Sc., D.St., M.Si. and A.v.D. wrote the manuscript. The manuscript underwent an internal collaboration-wide review process.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

Methylation data required for building the sarcoma classifier (reference set) were deposited at the public repository Gene Expression Omnibus under the accession number GSE140686. Supplementary Data 1 indicates the IDAT file names for each case. The remaining data are available within the Article, Supplementary Information or available from the authors upon request.

Competing interests

A patent for a DNA methylation-based method for classifying tumour species of the brain has been applied for by the Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts and Ruprecht-Karls-Universität Heidelberg (EP 3067432 A1) with S.M.P., A.v.D., D.T.W.J., D.C., V.Ho., M.Si., M.B.H. and M.Sc. as inventors. The other authors declare no competing interests.

Footnotes

Peer review information Nature Communications thanks Rosandra Kaplan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Christian Koelsche, Daniel Schrimpf, Damian Stichel, Martin Sill.

Supplementary information

Supplementary information is available for this paper at 10.1038/s41467-020-20603-4.

References

1.Fletcher, C. D. M., Bridge, J. A., Hogendoorn, P. C. W. & Mertens, F. WHO Classification of Tumours of Soft Tissue and Bone (IARC Press, Lyon, 2013).
2.Gatta G, et al. Rare cancers are not so rare: the rare cancer burden in Europe. Eur. J. Cancer. 2011;47:2493–2511. doi: 10.1016/j.ejca.2011.08.008. [DOI] [PubMed] [Google Scholar]
3.Ray-Coquard I, et al. Sarcoma: concordance between initial diagnosis and centralized expert review in a population-based study within three European regions. Ann. Oncol. 2012;23:2442–2449. doi: 10.1093/annonc/mdr610. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Italiano A, et al. Clinical effect of molecular methods in sarcoma diagnosis (GENSARC): a prospective, multicentre, observational study. Lancet Oncol. 2016;17:532–538. doi: 10.1016/S1470-2045(15)00583-5. [DOI] [PubMed] [Google Scholar]
5.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
6.Lokk K, et al. DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns. Genome Biol. 2014;15:r54. doi: 10.1186/gb-2014-15-4-r54. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Hovestadt V, et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature. 2014;510:537–541. doi: 10.1038/nature13268. [DOI] [PubMed] [Google Scholar]
8.Sturm D, et al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell. 2012;22:425–437. doi: 10.1016/j.ccr.2012.08.024. [DOI] [PubMed] [Google Scholar]
9.Pajtler KW, et al. Molecular classification of ependymal tumors across All CNS compartments, histopathological grades, and age groups. Cancer Cell. 2015;27:728–743. doi: 10.1016/j.ccell.2015.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Sturm D, et al. New brain tumor entities emerge from molecular classification of CNS-PNETs. Cell. 2016;164:1060–1072. doi: 10.1016/j.cell.2016.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sahm F, et al. DNA methylation-based classification and grading system for meningioma: a multicentre, retrospective analysis. Lancet Oncol. 2017;18:682–694. doi: 10.1016/S1470-2045(17)30155-9. [DOI] [PubMed] [Google Scholar]
12.Reinhardt A, et al. Anaplastic astrocytoma with piloid features, a novel molecular class of IDH wildtype glioma with recurrent MAPK pathway, CDKN2A/B and ATRX alterations. Acta Neuropathol. 2018;136:273–291. doi: 10.1007/s00401-018-1837-8. [DOI] [PubMed] [Google Scholar]
13.Koelsche C, et al. Primary intracranial spindle cell sarcoma with rhabdomyosarcoma-like features share a highly distinct methylation profile and DICER1 mutations. Acta Neuropathol. 2018;136:327–337. doi: 10.1007/s00401-018-1871-6. [DOI] [PubMed] [Google Scholar]
14.Capper D, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555:469–474. doi: 10.1038/nature26000. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Maros ME, et al. Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat. Protoc. 2020;15:479–512. doi: 10.1038/s41596-019-0251-6. [DOI] [PubMed] [Google Scholar]
16.Koelsche C, et al. Array-based DNA-methylation profiling in sarcomas with small blue round cell histology provides valuable diagnostic information. Mod. Pathol. 2018;31:1246–1256. doi: 10.1038/s41379-018-0045-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Renner M, et al. Integrative DNA methylation and gene expression analysis in high-grade soft tissue sarcomas. Genome Biol. 2013;14:r137. doi: 10.1186/gb-2013-14-12-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Seki M, et al. Integrated genetic and epigenetic analysis defines novel molecular subgroups in rhabdomyosarcoma. Nat. Commun. 2015;6:7557. doi: 10.1038/ncomms8557. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Rohrich M, et al. Methylation-based classification of benign and malignant peripheral nerve sheath tumors. Acta Neuropathol. 2016;131:877–887. doi: 10.1007/s00401-016-1540-6. [DOI] [PubMed] [Google Scholar]
20.Wu, S. P. et al. DNA methylation-based classifier for accurate molecular diagnosis of bone sarcomas. JCO Precis. Oncol. 10.1200/PO.17.00031 (2017). [DOI] [PMC free article] [PubMed]
21.Weidema ME, et al. DNA methylation profiling identifies distinct clusters in angiosarcomas. Clin. Cancer Res. 2020;26:93–100. doi: 10.1158/1078-0432.CCR-19-2180. [DOI] [PubMed] [Google Scholar]
22.Kommoss FKF, et al. DNA methylation-based profiling of uterine neoplasms: a novel tool to improve gynecologic cancer diagnostics. J. Cancer Res. Clin. Oncol. 2020;146:97–104. doi: 10.1007/s00432-019-03093-w. [DOI] [PubMed] [Google Scholar]
23.van der Maaten L, Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
24.Idbaih A, et al. Myxoid malignant fibrous histiocytoma and pleomorphic liposarcoma share very similar genomic imbalances. Lab. Invest. 2005;85:176–181. doi: 10.1038/labinvest.3700202. [DOI] [PubMed] [Google Scholar]
25.Barretina J, et al. Subtype-specific genomic alterations define new targets for soft-tissue sarcoma therapy. Nat. Genet. 2010;42:715–721. doi: 10.1038/ng.619. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Cancer Genome Atlas Research Network. Electronic address: elizabeth.demicco@sinaihealthsystem.ca; Cancer Genome Atlas Research Network Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell. 2017;171:950–965 e928. doi: 10.1016/j.cell.2017.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
28.Worst BC, et al. Next-generation personalised medicine for high-risk paediatric cancer patients—The INFORM pilot study. Eur. J. Cancer. 2016;65:91–101. doi: 10.1016/j.ejca.2016.06.009. [DOI] [PubMed] [Google Scholar]
29.Selt F, et al. Pediatric targeted therapy: clinical feasibility of personalized diagnostics in children with relapsed and progressive tumors. Brain Pathol. 2016;26:506–516. doi: 10.1111/bpa.12326. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Horak P, et al. Precision oncology based on omics data: The NCT Heidelberg experience. Int. J. Cancer. 2017;141:877–886. doi: 10.1002/ijc.30828. [DOI] [PubMed] [Google Scholar]
31.Dickson BC, et al. NUTM1 Gene fusions characterize a subset of undifferentiated soft tissue and visceral tumors. Am. J. Surg. Pathol. 2018;42:636–645. doi: 10.1097/PAS.0000000000001096. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Watson S, et al. Transcriptomic definition of molecular subgroups of small round cell sarcomas. J. Pathol. 2018;245:29–40. doi: 10.1002/path.5053. [DOI] [PubMed] [Google Scholar]
33.Khan J, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 2001;7:673–679. doi: 10.1038/89044. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Sheffield NC, et al. DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma. Nat. Med. 2017;23:386–395. doi: 10.1038/nm.4273. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Hovestadt V, et al. Robust molecular subgrouping and copy-number profiling of medulloblastoma from small amounts of archival tumour material using high-density DNA methylation arrays. Acta Neuropathol. 2013;125:913–916. doi: 10.1007/s00401-013-1126-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Capper D, et al. Practical implementation of DNA methylation and copy-number-based CNS tumor diagnostics: the Heidelberg experience. Acta Neuropathol. 2018;136:181–210. doi: 10.1007/s00401-018-1879-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Perrier L, et al. The cost-saving effect of centralized histological reviews with soft tissue and visceral sarcomas, GIST, and desmoid tumors: The experiences of the pathologists of the French Sarcoma Group. PLoS ONE. 2018;13:e0193330. doi: 10.1371/journal.pone.0193330. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Johann, P. D., Jäger, N. & Pfister, S. M. RF_Purify: a novel tool for comprehensive analysis of tumor-purity in methylation array data based on random forest regression. BMC Bioinformatics10.1186/s12859-019-3014-z (2019). [DOI] [PMC free article] [PubMed]
39.Hand DJ, Till RJ. A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn. 2001;45:171–186. doi: 10.1023/A:1010920819831. [DOI] [Google Scholar]
40.Brier, G. W. Verification of forecasts expressed in terms of probability. Monthly weather review78, 10.1175/1520-0493(1950)0782.0.CO;2 (1950). [DOI]
41.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010;33:1–22. doi: 10.18637/jss.v033.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Sahm F, et al. Next-generation sequencing in routine brain tumor diagnostics enables an integrated diagnosis and identifies actionable targets. Acta Neuropathol. 2016;131:903–910. doi: 10.1007/s00401-015-1519-8. [DOI] [PubMed] [Google Scholar]
43.Stichel, D. et al. Routine RNA sequencing of formalin-fixed paraffin-embedded specimens in neuropathology diagnostics identifies diagnostically and therapeutically relevant gene fusions. Acta Neuropathol. 10.1007/s00401-019-02039-3 (2019). [DOI] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(7.2MB, pdf)}

Peer Review File^{(2MB, pdf)}

41467_2020_20603_MOESM3_ESM.pdf^{(79.7KB, pdf)}

Description of Additional Supplementary Files

Supplementary Data 1^{(97.7KB, xlsx)}

Supplementary Data 2^{(28.5KB, xlsx)}

Supplementary Data 3^{(62.1KB, xlsx)}

Supplementary Data 4^{(11.5KB, xlsx)}

Reporting Summary^{(197.4KB, pdf)}

Data Availability Statement

[CR1] 1.Fletcher, C. D. M., Bridge, J. A., Hogendoorn, P. C. W. & Mertens, F. WHO Classification of Tumours of Soft Tissue and Bone (IARC Press, Lyon, 2013).

[CR2] 2.Gatta G, et al. Rare cancers are not so rare: the rare cancer burden in Europe. Eur. J. Cancer. 2011;47:2493–2511. doi: 10.1016/j.ejca.2011.08.008. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Ray-Coquard I, et al. Sarcoma: concordance between initial diagnosis and centralized expert review in a population-based study within three European regions. Ann. Oncol. 2012;23:2442–2449. doi: 10.1093/annonc/mdr610. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Italiano A, et al. Clinical effect of molecular methods in sarcoma diagnosis (GENSARC): a prospective, multicentre, observational study. Lancet Oncol. 2016;17:532–538. doi: 10.1016/S1470-2045(15)00583-5. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Lokk K, et al. DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns. Genome Biol. 2014;15:r54. doi: 10.1186/gb-2014-15-4-r54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Hovestadt V, et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature. 2014;510:537–541. doi: 10.1038/nature13268. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Sturm D, et al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell. 2012;22:425–437. doi: 10.1016/j.ccr.2012.08.024. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Pajtler KW, et al. Molecular classification of ependymal tumors across All CNS compartments, histopathological grades, and age groups. Cancer Cell. 2015;27:728–743. doi: 10.1016/j.ccell.2015.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Sturm D, et al. New brain tumor entities emerge from molecular classification of CNS-PNETs. Cell. 2016;164:1060–1072. doi: 10.1016/j.cell.2016.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Sahm F, et al. DNA methylation-based classification and grading system for meningioma: a multicentre, retrospective analysis. Lancet Oncol. 2017;18:682–694. doi: 10.1016/S1470-2045(17)30155-9. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Reinhardt A, et al. Anaplastic astrocytoma with piloid features, a novel molecular class of IDH wildtype glioma with recurrent MAPK pathway, CDKN2A/B and ATRX alterations. Acta Neuropathol. 2018;136:273–291. doi: 10.1007/s00401-018-1837-8. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Koelsche C, et al. Primary intracranial spindle cell sarcoma with rhabdomyosarcoma-like features share a highly distinct methylation profile and DICER1 mutations. Acta Neuropathol. 2018;136:327–337. doi: 10.1007/s00401-018-1871-6. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Capper D, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555:469–474. doi: 10.1038/nature26000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Maros ME, et al. Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat. Protoc. 2020;15:479–512. doi: 10.1038/s41596-019-0251-6. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Koelsche C, et al. Array-based DNA-methylation profiling in sarcomas with small blue round cell histology provides valuable diagnostic information. Mod. Pathol. 2018;31:1246–1256. doi: 10.1038/s41379-018-0045-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Renner M, et al. Integrative DNA methylation and gene expression analysis in high-grade soft tissue sarcomas. Genome Biol. 2013;14:r137. doi: 10.1186/gb-2013-14-12-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Seki M, et al. Integrated genetic and epigenetic analysis defines novel molecular subgroups in rhabdomyosarcoma. Nat. Commun. 2015;6:7557. doi: 10.1038/ncomms8557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Rohrich M, et al. Methylation-based classification of benign and malignant peripheral nerve sheath tumors. Acta Neuropathol. 2016;131:877–887. doi: 10.1007/s00401-016-1540-6. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Wu, S. P. et al. DNA methylation-based classifier for accurate molecular diagnosis of bone sarcomas. JCO Precis. Oncol. 10.1200/PO.17.00031 (2017). [DOI] [PMC free article] [PubMed]

[CR21] 21.Weidema ME, et al. DNA methylation profiling identifies distinct clusters in angiosarcomas. Clin. Cancer Res. 2020;26:93–100. doi: 10.1158/1078-0432.CCR-19-2180. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Kommoss FKF, et al. DNA methylation-based profiling of uterine neoplasms: a novel tool to improve gynecologic cancer diagnostics. J. Cancer Res. Clin. Oncol. 2020;146:97–104. doi: 10.1007/s00432-019-03093-w. [DOI] [PubMed] [Google Scholar]

[CR23] 23.van der Maaten L, Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]

[CR24] 24.Idbaih A, et al. Myxoid malignant fibrous histiocytoma and pleomorphic liposarcoma share very similar genomic imbalances. Lab. Invest. 2005;85:176–181. doi: 10.1038/labinvest.3700202. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Barretina J, et al. Subtype-specific genomic alterations define new targets for soft-tissue sarcoma therapy. Nat. Genet. 2010;42:715–721. doi: 10.1038/ng.619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Cancer Genome Atlas Research Network. Electronic address: elizabeth.demicco@sinaihealthsystem.ca; Cancer Genome Atlas Research Network Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell. 2017;171:950–965 e928. doi: 10.1016/j.cell.2017.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]

[CR28] 28.Worst BC, et al. Next-generation personalised medicine for high-risk paediatric cancer patients—The INFORM pilot study. Eur. J. Cancer. 2016;65:91–101. doi: 10.1016/j.ejca.2016.06.009. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Selt F, et al. Pediatric targeted therapy: clinical feasibility of personalized diagnostics in children with relapsed and progressive tumors. Brain Pathol. 2016;26:506–516. doi: 10.1111/bpa.12326. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Horak P, et al. Precision oncology based on omics data: The NCT Heidelberg experience. Int. J. Cancer. 2017;141:877–886. doi: 10.1002/ijc.30828. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Dickson BC, et al. NUTM1 Gene fusions characterize a subset of undifferentiated soft tissue and visceral tumors. Am. J. Surg. Pathol. 2018;42:636–645. doi: 10.1097/PAS.0000000000001096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Watson S, et al. Transcriptomic definition of molecular subgroups of small round cell sarcomas. J. Pathol. 2018;245:29–40. doi: 10.1002/path.5053. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Khan J, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 2001;7:673–679. doi: 10.1038/89044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Sheffield NC, et al. DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma. Nat. Med. 2017;23:386–395. doi: 10.1038/nm.4273. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Hovestadt V, et al. Robust molecular subgrouping and copy-number profiling of medulloblastoma from small amounts of archival tumour material using high-density DNA methylation arrays. Acta Neuropathol. 2013;125:913–916. doi: 10.1007/s00401-013-1126-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Capper D, et al. Practical implementation of DNA methylation and copy-number-based CNS tumor diagnostics: the Heidelberg experience. Acta Neuropathol. 2018;136:181–210. doi: 10.1007/s00401-018-1879-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Perrier L, et al. The cost-saving effect of centralized histological reviews with soft tissue and visceral sarcomas, GIST, and desmoid tumors: The experiences of the pathologists of the French Sarcoma Group. PLoS ONE. 2018;13:e0193330. doi: 10.1371/journal.pone.0193330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Johann, P. D., Jäger, N. & Pfister, S. M. RF_Purify: a novel tool for comprehensive analysis of tumor-purity in methylation array data based on random forest regression. BMC Bioinformatics10.1186/s12859-019-3014-z (2019). [DOI] [PMC free article] [PubMed]

[CR39] 39.Hand DJ, Till RJ. A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn. 2001;45:171–186. doi: 10.1023/A:1010920819831. [DOI] [Google Scholar]

[CR40] 40.Brier, G. W. Verification of forecasts expressed in terms of probability. Monthly weather review78, 10.1175/1520-0493(1950)0782.0.CO;2 (1950). [DOI]

[CR41] 41.Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010;33:1–22. doi: 10.18637/jss.v033.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Sahm F, et al. Next-generation sequencing in routine brain tumor diagnostics enables an integrated diagnosis and identifies actionable targets. Acta Neuropathol. 2016;131:903–910. doi: 10.1007/s00401-015-1519-8. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Stichel, D. et al. Routine RNA sequencing of formalin-fixed paraffin-embedded specimens in neuropathology diagnostics identifies diagnostically and therapeutically relevant gene fusions. Acta Neuropathol. 10.1007/s00401-019-02039-3 (2019). [DOI] [PubMed]

PERMALINK

Sarcoma classification by DNA methylation profiling

Christian Koelsche

Daniel Schrimpf

Damian Stichel

Martin Sill

Felix Sahm

David E Reuss

Mirjam Blattner

Barbara Worst

Christoph E Heilig

Katja Beck

Peter Horak

Simon Kreutzfeldt

Elke Paff

Sebastian Stark

Pascal Johann

Florian Selt

Jonas Ecker

Dominik Sturm

Kristian W Pajtler

Annekathrin Reinhardt

Annika K Wefers

Philipp Sievers

Azadeh Ebrahimi

Abigail Suwala

Francisco Fernández-Klett

Belén Casalini

Andrey Korshunov

Volker Hovestadt

Felix K F Kommoss

Mark Kriegsmann

Matthias Schick

Melanie Bewerunge-Hudler

Till Milde

Olaf Witt

Andreas E Kulozik

Marcel Kool

Laura Romero-Pérez

Thomas G P Grünewald

Thomas Kirchner

Wolfgang Wick

Michael Platten

Andreas Unterberg

Matthias Uhl

Amir Abdollahi

Jürgen Debus

Burkhard Lehner

Christian Thomas

Martin Hasselblatt

Werner Paulus

Christian Hartmann

Ori Staszewski

Marco Prinz

Jürgen Hench

Stephan Frank

Yvonne M H Versleijen-Jonkers

Marije E Weidema

Thomas Mentzel

Klaus Griewank

Enrique de Álava

Juan Díaz Martín

Miguel A Idoate Gastearena

Kenneth Tou-En Chang

Sharon Yin Yee Low

Adrian Cuevas-Bourdier

Michel Mittelbronn

Martin Mynarek

Stefan Rutkowski

Ulrich Schüller

Viktor F Mautner

Jens Schittenhelm

Jonathan Serrano

Matija Snuderl

Reinhard Büttner

Thomas Klingebiel

Rolf Buslei

Manfred Gessler

Pieter Wesseling

Winand N M Dinjens