Abstract
Purpose:
Soft tissue tumors (STT) are highly heterogeneous neoplasms with more than 100 recognized subtypes, many of which lack reliable diagnostic or prognostic markers. We aimed to evaluate the clinical utility of transcriptome sequencing [RNA sequencing (RNA-seq)] in classifying STTs and identifying prognostically relevant subgroups.
Experimental Design:
We performed RNA-seq on 704 tumors representing 56 histologic subtypes, integrating global gene expression (GGE) profiles, fusion transcript detection, genomic features, and clinicopathologic data. Unsupervised clustering and machine learning approaches were used to refine tumor classification and identify prognostic transcriptomic signatures.
Results:
We identified more than 200 pathogenic gene fusion transcripts, including 40 not previously described. Strong GGE profiles were seen for many morphologic subtypes, especially for those characterized by specific gene fusions. Sarcomas with complex genomes, such as myxofibrosarcoma, undifferentiated pleomorphic sarcoma, and leiomyosarcoma (LMS), showed less distinct GGE profiles, and for LMS, subclusters correlating with genomic profiles were seen. On the basis of GGE profiles and other data, the diagnosis was changed in 5.7% of the cases. Comparison with follow-up data identified transcriptional subclusters among sarcomas with complex genomes that correlated with metastasis-free survival.
Conclusions:
Transcriptomic profiling enhances diagnostic precision, uncovers novel oncogenic fusions, and provides prognostic information in STTs. Still, the results of the current study suggest that, due to the morphologic and clinical diversity of these tumors, multicenter collaborative studies are needed to fully explore the potential of RNA-seq in STTs.
Translational Relevance.
Soft tissue tumors (STT) are a heterogeneous group of neoplasms that remain diagnostically and prognostically challenging, particularly for sarcomas with complex genomes. In this study, transcriptome sequencing of 704 STTs across 56 histologic subtypes demonstrated that global gene expression profiling, together with fusion transcript detection, provides clinically relevant insights. We identified more than 200 pathogenic fusions, including 40 novel ones, and showed that transcriptomic clustering refines diagnostic accuracy, leading to reclassification in nearly 6% of cases. Importantly, transcriptomic subgroups within morphologically similar sarcomas correlated with patient outcome, identifying prognostic signatures beyond current histopathologic and genomic stratification. These findings highlight the utility of transcriptomic analysis in routine diagnostics to further improve both diagnostic and prognostic information. Bearing in mind the morphologic and clinical diversity of STTs, the results of the current study suggest that multicenter collaborative studies are needed to fully explore the potential of RNA sequencing in STTs.
Introduction
Soft tissue tumors (STT) constitute a heterogeneous group of neoplasms comprising more than 100 distinct entities (1). For many of them, somatic mutation profiles, more or less strongly associated with specific morphologic features, have been described. Although a few crucial single-nucleotide variants (SNV) have been identified in some subtypes, gene fusions and/or copy-number alterations (CNA) seem to be the major driver events in the development of most STTs (1–4). Hence, genetic analyses—ranging from fluorescence in situ hybridization (FISH) analysis of individual genes to massive parallel sequencing of the entire genome and transcriptome—are today often used in the diagnostic approach to STT (5–8). However, despite the rapid advances in the molecular pathology of STTs, numerous tumor entities remain inadequately investigated, and many subtypes so far lack distinctive somatic mutations. Thus, recent attempts to classify STTs on the basis of DNA methylation profiling have attracted much attention and have provided promising preliminary results (9–11).
In the present study, we wanted to assess the value of global gene expression (GGE) profiles as detected by sequencing of the transcriptome [RNA sequencing (RNA-seq)]. In addition to allowing assessment of the expression levels of all protein-coding genes, RNA-seq is also well suited for the detection of fusion transcripts, which currently constitute the most important diagnostic genetic markers in STTs (12). Although numerous studies have included transcriptomic data on various STTs, these studies have typically compared a single tumor type of interest with a few other entities serving as controls. Furthermore, most previous studies have focused on malignant STTs, which is in contrast to the clinical spectrum at tertiary sarcoma centers, where benign lesions predominate. In this study, we have studied the distribution of STTs according to GGE profiles in a series of 704 cases and discuss how the GGE profiles correlate with gene fusions, genomic changes, morphology, and outcome.
Materials and Methods
Tumors
The present study was based on STT types that were represented by ≥2 samples and from which RNA of sufficient quality could be obtained; a total of 704 samples from 704 patients, representing 56 different tumor types, could be included. The vast majority of the patients had been diagnosed and treated at the sarcoma centers in Lund or Stockholm during the period 1988 to 2020, but also a few samples from other centers, obtained through previous collaborative projects, were included. The analyzed samples were primary lesions unless otherwise indicated (Supplementary Table S1). The study was conducted in accordance with the Declaration of Helsinki and was approved by the Swedish ethical review authority (2023-01550-01). All ethical regulations relevant to human research participants were followed. All samples were retrieved after written informed consent from the patients.
The diagnoses were according to the World Health Organization classification (1, 13), with two exceptions: 12 tumors with spindle cell morphology lacked a specific diagnosis and were here called spindle cell tumors, and 24 tumors with varying sarcoma diagnoses but with genomic features strongly reminiscent of dedifferentiated liposarcoma were here called cryptic liposarcomas. In addition, malignant melanoma was included because it can mimic the presentation of an STT and constitutes an important differential diagnosis. Selected cases were rereviewed in this study with regard to morphologic and/or IHC features. Tumor types were also categorized with regard to assumed primary pathogenetic mechanism into those driven by gene fusions, CNAs, or SNVs, as well as into three groups of aggressiveness: benign, intermediate, and malignant (1).
Information on tumor type, site, size, and grade (14), patient outcome (with a focus on intermediate and malignant tumors), and genomic features of the tumors was retrieved from medical charts and/or previous publications. The genomic features that could be retrieved from the medical charts are presented in a simplified way. For tumor types with cytogenetic features that are highly characteristic, only the typical feature, e.g., a translocation or deletion, detected at G-banding or FISH analysis, is specified. For tumor types without such distinct genomic features, available copy-number (CN) profiles or karyotypes were dichotomized into simple or complex depending on whether ≤5 or >5 chromosomes, respectively, were affected (Supplementary Table S1). A subset of the cases was reanalyzed in this study with FISH, RT-PCR, or SNP array with regard to gene fusion or CN status. When some clinical, morphologic, or genetic feature of the tumor had been published before, the relevant reference and case numbers in the publicly available Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer (RRID: SCR_012877; MDB; ref. 4) are provided (Supplementary Table S1).
RNA-seq
RNA extraction from samples that had been stored at −80°C, library preparation, and sequencing of paired-end 150 nt reads were performed as described previously (15). RNA-seq reads were aligned to the GRCh38 genome assembly using STAR (RRID: SCR_004463, v2.7.10b). Gene-level quantification was performed by RSEM (RRID: SCR_013027, v1.3.3) based on the Ensemble gene model (GRCh38.104). Only samples with at least 5 million uniquely mapped reads (range, 5.1–154, median 25.4) in total were included in the study (Supplementary Table S1).
Gene fusion detection
Fusion transcripts were identified using Arriba (RRID: SCR_025854, v2.4.0) and FusionCatcher (RRID: SCR_000060, v1.33) with default settings. Potential chimeric transcripts were further filtered as follows: the fusions had to be identified by both callers, and they should have been previously reported in the MDB (4). However, as not to exclude fusions of likely pathogenetic significance, exceptions were made for fusions that were identified with only one caller and had been reported before and/or confirmed with other methods at the RNA or DNA level. In addition, all fusion transcripts involving CSF1 and the first three exons of HMGA2 were included, even when found only by Arriba.
Unsupervised gene expression analysis
Gene-level counts were normalized using transcripts per million (TPM) and log2-transformed prior to further analysis. The top 25% most variably expressed protein-coding genes (n = 4,779) were selected and the t-distributed stochastic neighbor embedding (tSNE) method was applied to perform unsupervised nonlinear dimension reduction. The tSNE projection was computed with the R package Rtsne (RRID: SCR_016900, version 0.16) using 3,000 iterations and a perplexity value of 30. The projection was further used to evaluate potential batch effects related to sequencing year (Supplementary Fig. S1). In addition, unsupervised hierarchical clustering was performed on the same set of protein-coding genes using Euclidean distance and the complete-linkage method. Dendrogram generation as well as multiscale bootstrap resampling (nboot = 1,000) was used to assess the stability of the observed clusters and was generated with pvclust (RRID: SCR_021063, v2.2.0). Silhouette score (S-score) was calculated for individual samples based on the institutional diagnosis using the scikit-learn (RRID: SCR_002577, v1.6.1) function. The metric measures how similar an object is to its own cluster compared with other clusters and provides, in this context, an indication of how similar each individual tumor is to its designated diagnosis (16). The value ranges from 1 to −1 in which a score close to 1 indicates that the sample is well clustered. A score close to 0 indicates that the sample is on or very near the boundary between two clusters. A score close to −1 indicates that the sample is closer to samples in another cluster. An enrichment score for the CINSARC gene set (17) was calculated for individual tumors using gene set variation analysis (GSVA) with the GSVA package (RRID: SCR_021058, v1.42).
Outlier detection
We identified potential outlier samples in the dataset based on how well they clustered with the other samples with the same diagnosis. We then applied a machine learning–based prediction model to test whether such samples could have a better fit with an alternative diagnosis. First, tumors diagnosed as spindle cell tumor NOS as well as diagnoses represented by <4 samples were excluded. Then, for each individual diagnosis, we selected the top 75% of samples with the highest S-scores to be used as a training dataset. We trained a Random Forest algorithm (RRID: SCR_015718, v4.7-1.1) using 10-fold cross-validation which was repeated five times. The cross-validation and hyperparameter tuning for the model was performed using the caret package (RRID: SCR_021138, v7.0-1). The model was then used to predict the diagnosis of the samples with the lowest S-scores. Finally, samples with a predicted diagnosis that differed from its original diagnosis were flagged as potential misclassifications and kept for further genomic, morphologic, and clinical evaluation.
Genomic data
Transcriptomic profiles were compared with, when available, cytogenetic data and/or global CN data retrieved from the medical charts (Supplementary Table S1). For the present study, additional SNP array analysis (Affymetrix Cytoscan HD SNP arrays; Affymetrix) was performed on selected cases, as described previously (18). Tumor Aberration Prediction Suite (RRID: SCR_000356; ref. 19) and Affymetrix Power Tools (RRID: SCR_008401, v2.11.2) were used for segmentation, CN evaluation, and visualization of the SNP array data. The segment files were further filtered to include only segments spanning ≥100 kb and supported by ≥50 probes; any segment that did not meet these criteria was combined with the nearest preceding segment and given the CN value of that segment.
The GRCh38/hg38 build was used as the human reference genome for all analyses.
Survival analysis
Metastasis-free survival (MFS) was calculated from the date of diagnosis until the date of metastatic occurrence, and follow-up was measured from the date of diagnosis to the date of latest follow-up for event-free patients (Supplementary Table S1). Metastasis-free probability was assessed using the Kaplan–Meier (KM) method and Cox proportional hazard models. KM curves were compared with the log-rank test. Survival analysis was performed using the survminer package (RRID: SCR_021094, v0.4.9).
Public data
RNA-seq data in the form of fastq files from 50 leiomyosarcoma (LMS) samples were obtained from EGAD00001006628 and processed using the same pipeline as for the samples included in this study. In addition, gene-level TPM counts for the 80 LMS samples included in The Cancer Genome Atlas-SARC study were obtained using the UCSC Xena Hub (https://xenabrowser.net/hub/).
Results
We performed RNA-seq on fresh frozen material from the most common types of STT. Supplementary Table S1 provides clinical information and relevant characteristic molecular features for each individual tumor.
Gene fusions
Intersection of the Arriba and FusionCatcher findings resulted in a total of 2,646 fusion transcripts with each sample harboring an average of 3.8 (range, 0–65) fusions (Supplementary Table S2). After manual curation, a total of 206 fusion transcripts from 203 cases were kept (Supplementary Table S3). Of these, 40 have not been reported before, including novel fusion partners to ACVR2B, CSF1, HMGA2, FGFR1, GLI1, NR4A2, PHF1, and PRKCG (Supplementary Table S3). Both fusion callers failed to detect at least 12 likely pathogenic gene fusions, previously identified with other callers and/or RT-PCR (Supplementary Table S4). Taking the latter data into account, fusion frequencies ranged from 0% (e.g., angiosarcoma and embryonal rhabdomyosarcoma) to 100% (e.g., dermatofibrosarcoma protuberans and synovial sarcoma) among tumor types, with no obvious differences among the different lineages of differentiation (Supplementary Table S4).
Global gene expression
Following a uniform processing pipeline, gene expression data were analyzed by unsupervised hierarchical clustering and tSNE to identify groups of tumors sharing gene expression patterns. The 704 tumors were color- and shape-coded based on their clinical diagnosis encompassing 56 different entities (Fig. 1; Supplementary Fig. S2; Supplementary Table S1).
Figure 1.
Gene expression–based analysis of STTs. Visualization of the gene expression profiles of STTs and controls using tSNE dimensionality reduction. Individual samples (n = 704) are color- and shape-coded according to their initial diagnosis (n = 56). A, Malignant tumors, color-coded. Benign and borderline tumors are displayed in gray. B, Benign and borderline tumors, color-coded. Malignant tumors are displayed in gray. ALT, atypical lipomatous tumor; Alv rhabmyosarc, rhabdomyosarcoma, alveolar; Angiofibr, angiofibroma of soft tissue; Angiolip, angiolipoma; Angiosarc, angiosarcoma; CD34PSF tum, CD34-positive superficial fibroblastic tumor; Clear cell sarc, clear cell sarcoma; Cryptic liposarc, cryptic liposarcoma; Dediff liposarc, dedifferentiated liposarcoma; Desmopl fibrobl, desmoplastic fibroblastoma; Desmopl tum, desmoplastic small round cell tumor; Emb rhabmyosarc, rhabdomyosarcoma, embryonal; Endomet stromsarc, endometrial stromal sarcoma; Epithel sarc, epithelioid sarcoma; Ewing sarc, Ewing sarcoma; Extraskel chondsarc, extraskeletal myxoid chondrosarcoma; Fibr histiocyt, benign fibrous histiocytoma; Gran cell tum, granular cell tumor; Hamartom fibr, hamartomatous fibroma; Infl rhabmyo tum, inflammatory rhabdomyoblastic tumor; LG fibmyx sarc, low-grade fibromyxoid sarcoma; Lipoma spindle, lipoma spindle cell/pleomorphic; MIFS/HFLT, myxoinflammatory fibroblastic sarcoma/hemosiderotic fibrolipomatous tumor; Myoepi tum, myoepithelial tumor; Myxfib sarc, myxofibrosarcoma; Myx lipsarc, myxoid liposarcoma; Myx pleo lipsarc, myxoid pleomorphic liposarcoma; Neurofibr, neurofibroma; Nod fasc, nodular fasciitis; OFMT, ossifying fibromyxoid tumor; Osteosarc, osteosarcoma; PHAT, pleomorphic hyalinizing angiectatic tumor; Pleo lipsarc, pleomorphic liposarcoma; Pseumyo hemang, pseudomyogenic hemangioendothelioma; Rhabmyosarc NOS, rhabdomyosarcoma, NOS; Round cell sarc, undifferentiated round cell sarcoma; Scler epi fibsarc, sclerosing epithelioid fibrosarcoma; Scler rhabmyosarc, rhabdomyosarcoma, sclerosing; SEF-like sarc, sclerosing epithelioid fibrosarcoma–like sarcoma; Sol fib tum, solitary fibrous tumor; Spindle tum NOS, spindle cell tumor NOS; Syn chondr, synovial chondromatosis; Syn sarc, synovial sarcoma; Tenosyn cell tum, tenosynovial giant cell tumor.
Although many STTs displayed distinct clusters, clearly separated from other tumor types, we also observed several diffuse “clouds” consisting of intermixed tumor types. One contained most of the benign and borderline malignant adipocytic tumors, in which three subclusters (BAT SC1-3) could be discerned (Supplementary Fig. S3), one was largely composed of nerve sheath tumors, forming two subclusters (neurofibroma SC and schwannoma SC; Fig. 2), and one contained a variety of tumor types, mostly sarcomas with complex genomic rearrangements, that could be further subdivided into three subclusters (complex SC1–3) with a total of 202 tumors, including 47 of 61 myxofibrosarcomas and 76 of 98 undifferentiated pleomorphic sarcomas (UPS; Supplementary Table S1; Supplementary Fig. S4). In addition, some STTs formed multiple clusters, notably myxoma and LMS. Myxoma formed two distinct clusters (Supplementary Fig. S5) referred to as myxoma class 1 and myxoma class 2. The presence of GNAS mutations, identified in five of nine samples, was not exclusive for any of the two clusters (Supplementary Table S1). A subset (n = 11) of LMS formed a distinct cluster (LMS class 1), whereas 17 of the remaining 29 SA lei (LMS class 2) appeared in one of complexes SC1 to SC3 (Fig. 2A). LMS class 1 samples displayed increased expression of muscle gene markers, whereas LMS class 2 had increased expression of the ARL4C gene (Supplementary Fig. S6). In addition, the separation between LMS class 1 and LMS class 2 remained intact when clustered together with an additional 130 publicly available LMS samples (Supplementary Fig. S6). Assessment of CN profiles from 10 LMS class 1 and the other 27 LMS samples with SNP array data showed that the latter had higher median ASCAT ploidy values (2.61 vs. 1.91), suggesting a higher frequency of whole genome doubling. Furthermore, the LMS class 1 tumors displayed several CNAs that were absent or less common among LMS class 2, such as loss of material from chromosome arms 2p, 6p, 10q, 13q (including RB1), 16q, and 17p and gain on 17p (including MYOCD; Supplementary Fig. S6).
Figure 2.
Distinct clustering is associated with low genomic complexity. A, Original dendrogram (right) based on unsupervised hierarchical clustering (Supplementary Fig. S2) with the benign nerve sheath tumor cluster highlighted in purple. Magnified region of the dendrogram (left) corresponding to the two benign nerve sheath tumor subclusters: neurofibroma SC (gray) and schwannoma SC (no color). B, tSNE representations of the same unsupervised analysis as in Fig. 1A, but individual samples (n = 704) are colored based on whether belonging to LMS class 1 (blue triangle), LMS class 2 (purple square), or any other diagnosis (gray circle). C, Boxplot based on the silhouette score (S-score) for each individual sample (if their corresponding diagnosis is represented by at least three samples, n = 682) grouped by diagnosis. Boxes are ordered based on median value, and diagnoses are colored based on the assumed underlying pathogenetic mechanism. D, Proportions (%) of diagnoses being associated with specific genomic alterations among the high or low S-score groups. Color annotations are identical to C. E, Boxplot of S-scores for samples having either simple or complex genomes. ***, indicates P value < 0.001; unpaired t test. For abbreviations of tumor types, see the legend of Fig. 1.
Distinct clustering is associated with low genomic complexity
S-scores were calculated for each tumor in order to assess the tightness and separation of the individual diagnoses (Fig. 2). The S-scores were in good agreement with the tSNE projection as tumor types with positive scores formed distinct clusters, whereas diagnoses with poor or negative values formed less-defined clusters. The tightness of the clusters formed by individual diagnoses correlated with their underlying genomic features: tumor types characterized by a single driver event—a gene fusion or an SNV—had higher S-scores than tumors associated with large-scale CNAs (Fig. 2). The different tumor types were split into a high or a low group based on whether their median S-scores were positive or negative. Indeed, tumor types driven by fusion genes or SNV were strongly enriched in the high group, in which they constituted 63% and 16%, respectively, of the diagnoses. In contrast, tumor types driven by CNA comprised 76% of the diagnoses in the low group (Fig. 2). We also assessed the association between genomic complexity and S-score and found that samples with simple genomes displayed significantly higher S-scores compared with samples with complex genomes (t test, P = 2.2e−16; Fig. 2).
Gene expression–based classifier for outlier detection
The S-score also adds valuable information at the single sample level as individual tumors with values significantly lower than the median for its tumor type could indicate an erroneous diagnosis. We developed a systematic approach to identify such outlier samples (Supplementary Fig. S7) based on gene expression values using a Random Forest machine learning classification algorithm. The classifier-predicted diagnosis matched the clinical diagnosis in 92 cases (48%), whereas a discrepant classifier prediction was observed in 98 samples (52%, Supplementary Table S5; Supplementary Fig. S7). Some of these classifier-predicted diagnoses were expected, such as the many myxofibrosarcomas with higher probability scores for UPSs, and vice versa (20).
Reclassification of diagnoses
By combining the probability scores with S-score profiles, gene fusion status, other molecular data, clinical features, and histopathologic review in selected cases, the initial diagnosis was revised in 72 cases (Supplementary Fig. S7). This reclassification included all 24 samples initially labeled cryptic liposarcoma; all of them had initially been diagnosed as something else (e.g., myxofibrosarcoma or UPS) but had genomic profiles typical for dedifferentiated liposarcoma (Supplementary Table S1). Transcriptomically, cryptic liposarcomas had a very low median S-score and co-clustered with dedifferentiated liposarcomas, myxofibrosarcomas, and UPSs (Fig. 1; Supplementary Fig. S2). Also most of the 12 tumors first classified as spindle cell tumor NOS were reclassified, albeit a new specific entity could be suggested in only two cases (desmoid tumor and GLI1-rearranged mesenchymal tumor, respectively; Supplementary Table S1). Among the remaining 668 tumors with proper initial diagnoses, 38 (5.7%) were reclassified. Notably, several dedifferentiated liposarcomas had much higher probability scores for atypical lipomatous tumor, and most of these tumors had genomic profiles more compatible with atypical lipomatous tumor than dedifferentiated liposarcoma, strongly suggesting that the transcriptomic analysis had been performed on RNA from a well-differentiated region of the tumor. The most frequent non-adipocytic new diagnosis was ossifying fibromyxoid tumor; one myxofibrosarcoma, two LMS, and one soft tissue osteosarcoma were more compatible with ossifying fibromyxoid tumor (Supplementary Fig. S7; Supplementary Table S1). Finally, 16 cases were reclassified as unclear. The diagnoses predicted from gene expression analysis were either unreasonable or the probability score was <0.1 for any of the tumor types under study. For some of these tumors, the failure to classify them was probably due to poor quality because of preoperative treatment or a high fraction of normal cells. However, the latter is an unlikely explanation for the majority of the cases: when comparing ASCAT purity values, available for 385 tumors (Supplementary Table S1), low-purity values (<0.3) were not more frequent among reclassified outliers (8/62) than in tumors that were not reclassified (24/323; P > 0.05, χ2 test). More likely, several reclassified tumors probably belonged to a rare subtype that was not further represented in the cohort.
Outcome
We also evaluated the prognostic impact of the CINSARC gene expression signature across the spectrum of STTs. We performed single-sample GSVA and overlayed the enrichment scores on the original tSNE projection to generate an overview of the distribution across the cohort (Fig. 3A). As expected, malignant tumors displayed a higher enrichment score than benign and intermediate tumors, and samples with high genomic complexity had higher scores then samples with simple genomes (Fig. 3B). Malignant and intermediate samples were then split into a high, medium, or low group based on their enrichment score, and a KM analysis could confirm a significant difference in MFS between the high and low groups (log-rank test, P = 0.0001) and between the medium and low groups (log-rank test, 0.00096) whereas no significant difference was observed between the medium and high groups (Fig. 3C). The same trend was observed when samples were further subdivided into having simple or complex genomes; thus, a strong enrichment of the CINSARC signature seems to be associated with shorter MFS independent of genomic complexity (Fig. 3D; Supplementary Fig. S8A). When comparing the outcome among patients with malignant or intermediate tumors belonging to complexes SC1 to SC3, complex SC1 had significantly shorter MFS (log-rank test, SC1 vs. SC2 P = 0.00036, SC1 vs. SC3 P = 0.018, SC2 vs. SC3 P = ns); however, no difference among complexes SC1 to SC3 was found using the CINSARC signature (Fig. 3E and F; Supplementary Fig. S8B). The decrease in MFS for complex SC1 remained significant when a multivariable Cox model accounted for tumor grade (Supplementary Fig. S8C).
Figure 3.
Prognostic value of GGE analysis. A, Original tSNE projection in which samples are colored based on their enrichment score for the CINSARC gene set. B, Boxplot of CINSARC enrichment scores in which samples are grouped based on being malignant, intermediate, or benign and further subdivided based on either simple or complex genomes. C, KM survival analysis of malignant and intermediate tumors grouped based on having high, intermediate, or low CINSARC enrichment scores. D, KM analysis with samples further subdivided into being either genomically simple or complex. E, KM analysis of the three complex subclusters, SC1, SC2, and SC3. F, Boxplot of CINSARC enrichment scores among SC1, SC2, and SC3.
Representativeness of study participants is commented upon in Supplementary Table S6.
Discussion
Similar to methylation data, GGE data reflect the cell of origin, somatic events, as well as tumor microenvironment. Furthermore, it should be kept in mind that clustering based on methylation or GGE data only reflects the relationships among the tumor types that have been studied; STTs constitute a highly heterogeneous group of neoplasms, many of which are exceedingly rare. In the present study, we were able to include 53 of the STT types that are currently recognized (1) but which combined cover the vast majority of the cases that are seen at tertiary sarcoma centers (21).
As expected, several STTs displayed distinct GGE profiles, clearly separating them from the other tumor types. These STTs included many subtypes characterized by gene fusions, including, e.g., desmoplastic small round cell tumor, myxoid liposarcoma, and synovial sarcoma (Fig. 1; Supplementary Table S4). Some tumor types with characteristic gene fusions were represented by too few cases to be tested for transcriptomic stability. However, it could be noted that three sarcomas with the YAP1::KMT2A::YAP1 fusion, which currently are referred to as “sclerosing epithelioid fibrosarcoma-like” sarcomas (1), clustered together, separated from sclerosing epithelioid fibrosarcoma, in line with our previous results and methylation data (22, 23). Furthermore, two soft tissue angiofibromas with AHRR::NCOA2 as well as two desmoplastic fibroblastomas with translocations upregulating FOSL1 clustered next to each other in the dendrogram, strongly suggesting that also these tumor types have characteristic GGE profiles (Supplementary Table S1; Supplementary Fig. S2).
The expected gene fusions were found in all or close to all relevant samples (Supplementary Tables S3 and S4), but in a few samples in which the characteristic gene fusion had been identified with other techniques, none of the two fusion callers reported the fusion. This was particularly common in CD34-positive superficial fibroblastic tumor; the PRDM10 fusions that characterize these tumors were detected with the fusion callers in less than half of the cases. The reasons for not picking up expected fusion events could have a variety of explanations, ranging from poor sampling to low expression of the fusion gene or difficulties to map the genes involved. The finding that most fusion-associated tumors, regardless of whether the fusion was picked up or not by the two callers, clustered together indicates that the negative results were often due to inefficient callers. Indeed, several pathogenic fusion events were picked up by only one of the two callers: Arriba more often than FusionCatcher called events that do not result in proper chimeric genes, such as fusions of the first three exons of HMGA2 with ectopic sequences in lipomatous tumors and CSF1 rearrangements in tenosynovial giant cell tumors, whereas FusionCatcher detected several FUS and EWSR1 fusions that were not picked up by Arriba (Supplementary Table S3). Thus, fusion-negative RNA-seq results should be interpreted with caution.
There were also clusters consisting of gene fusion–associated tumors in which several tumor types intermingled, strongly suggesting either misdiagnoses or shared routes of tumor development. One such cluster with 13 tumors and an adjacent cluster with three tumors illustrate both possibilities (Supplementary Fig. S9). Combined, they encompassed all eight ossifying fibromyxoid tumors, a tumor type often displaying gene fusions, all of which seem to have a major impact on chromatic accessibility (4, 24). In addition, the clusters contained two endometrial stromal sarcomas with JAZF1::SUZ12 and YWHAE::NUTM2B, two myoepithelial tumors (25) with PHF1::TFE3, a soft tissue osteosarcoma with MEAF6::SUZ12, two LMS with PHF1::TFE3 or EPC1::KDM2B, and a myxofibrosarcoma with a previously reported (26) AFF3::PHF1 fusion (Supplementary Fig. S2). Thus, all tumors in this cluster had fusions of genes involved in epigenetic regulation of transcription, such as the NuA4 acetyltransferase complex or the polycomb repressive complex 2. Although some of the reported fusions for unknown reasons remain unique to one entity—such as the JAZF1::SUZ12 fusion in endometrial stromal sarcomas—our findings show that they have highly similar gene expression profiles. Furthermore, endometrial stromal sarcomas sometimes display ectopic ossification, a morphologic hallmark of OFMT (27), and a morphologic overlap between myoepithelioma and ossifying fibromyxoid tumor has been noted (28). Thus, the co-clustering of these three tumors exemplify that distinct tumor types can share the same pathogenetic mechanisms. The osteosarcoma, LMS, and myxofibrosarcoma cases, on the other hand, most likely represent misdiagnoses of ossifying fibromyxoid tumor: none of the cases had the complex genomic change expected for these tumor types (Supplementary Table S1), and osteosarcoma is a well-known differential diagnosis for OFMT.
The current study also revealed a number of novel gene fusions (Supplementary Table S4). Fusions affecting the A, B, and D variants of the protein kinase C family have been reported as recurrent in benign fibrous histiocytoma (4); in this study, we found two samples with fusions affecting PRKCG. In addition, among the group of tumors that were diagnosed as spindle cell tumor NOS, two cases had novel GLI1 or FGFR1 fusions (Supplementary Table S4), suggesting that they represented single cases of recently recognized rare subtypes of STTs (29, 30).
Strong GGE signals were present also for some tumor types that lack gene fusions but that have characteristic CN profiles, e.g., inflammatory rhabdomyoblastic tumor and hibernoma, as well as for some SNV-driven tumor types, such as desmoid tumors and gastrointestinal stromal tumor (GIST; Fig. 1; Supplementary Table S1). Apart from these tight clusters, there were several less distinctly defined “clouds” of tumors at GGE analysis (Fig. 1): one containing most of the benign adipocytic tumors, one largely composed of nerve sheath tumors, and one with the majority of myxofibrosarcomas and UPSs, as well as a variety of other tumors with complex genomic rearrangements. Several subtypes of benign adipocytic tumors, each with its own characteristic genetic profile, have been recognized (4), but with the exception of hibernoma, none of them formed a distinct cluster, and the classifier prediction often suggested a closer relation to another type of benign adipocytic tumor (Fig. 1; Supplementary Table S1). Thus, although the expression of individual genes (e.g., high expression of MDM2 in ALT) may be strongly associated with some of the lipomatous subtypes (Supplementary Fig. S3B), the inclusion of larger gene sets seems to blur these differences. Still, at closer scrutiny of the distribution of the benign adipocytic tumors in the dendrogram (Supplementary Fig. S3A), several subclusters with a skewed distribution of tumor types could be observed. The vast majority of the tumors were included in three subclusters, one (BAT SC1) being enriched for atypical lipomatous tumors and one (BAT SC3) being dominated by conventional lipomas, spindle cell lipomas, and angiolipomas; less than one third of the atypical lipomatous tumors belonged to BAT SC3. For unknown reasons, BAT SC1 also harbored close to one third (9/31) of dedifferentiated liposarcomas, the majority of which instead clustered with sarcomas with complex genomes. Interestingly, it could be noted that five of the dedifferentiated liposarcomas in BAT SC1 were classified as low-grade (grade 1 or 2) lesions, whereas the majority (17/30) of dedifferentiated liposarcomas were high-grade malignancies. Thus, the results suggest that prognostically relevant information on lipomatous tumors may be obtained from transcriptomic data.
The study included 41 nerve sheath tumors [8 neurofibromas, 13 schwannomas, and 20 malignant peripheral nerve sheath tumors (MPNST); Fig. 1]: 12 schwannomas and two MPNSTs formed the schwannoma SC, and seven neurofibromas and four MPNSTs formed the neurofibroma SC (Fig. 2A). Hence, almost all schwannomas and neurofibromas had distinct expression profiles, separated from each other. When reassessing the six MPNSTs clustering with the benign nerve sheath tumors, it was found that one of the two cases with a higher prediction for schwannoma indeed was both morphologically and genomically (−22 as the sole change) a schwannoma. The other, however, was an intra-abdominal grade 2 MPNST with genomic features typical for MPNSTs and not compatible with schwannoma. The co-clustering of some MPNSTs with neurofibromas was more expected, as there is a well-known line of progression from benign neurofibroma over atypical neurofibroma to malignant MPNSTs, especially among patients with neurofibromatosis type 1 (NF1; ref. 4). Thus, the co-clustering could be due to sampling from neurofibroma parts of the tumors. Indeed, two neurofibromas from patients with NF1, as well as a sporadic case, had areas corresponding to both neurofibroma and MPNST. Still, the results indicate a reliable transcriptomic subclustering of schwannomas and neurofibromas, and that GGE analysis is useful for separating MPNSTs from benign nerve sheath tumors.
The majority of UPSs and myxofibrosarcomas, two common sarcoma entities that are both associated with poor outcome (4), were found in complex SC1 to SC3 (Supplementary Fig. S4). The genomic and transcriptomic overlap among these two tumor types has been pointed out before (3, 20), and it was thus not surprising that they co-clustered or that most of the reclassifications suggested by the probability scores were from the one to the other (Supplementary Table S1). Complexes SC1 to SC3 also contained a variety of other sarcomas with complex genomes, including LMS. In addition, 11 LMSs formed a distinct cluster (LMS class 1) of their own, in line with the previous notion that LMSs can be subdivided into three transcriptomic subgroups (31, 32): one (class 1) is associated with smooth muscle differentiation and overexpression of a set of marker genes (ACTG2, CFL2, LMOD1, MYLK, SLMAP, and MYOCD), one (class 2) represents a dedifferentiated form of LMS that co-clusters with UPS and overexpresses ARL4C, and one (class 3) is restricted to uterine LMSc, which were not represented in our dataset. Our LMS class 1 cluster showed significant overexpression of reported class 1 marker genes (Supplementary Fig. S6A), whereas most of the remaining LMSs had transcriptomic profiles more similar to other sarcomas with complex genomes (Supplementary Fig. S2). There were no significant clinical differences between the LMS class 1 tumors and the other LMS regarding age or sex of the patients or the frequency of metastases. However, at the genomic level, LMS class 1 tumors showed a distinct distribution of chromosomal deletions and less often displayed whole genome doubling (Supplementary Fig. S6C).
Complexes SC1 to SC3 also contained 3 of 12 lesions that had been tentatively diagnosed as spindle cell tumor NOS, i.e., a soft tissue lesion with spindled tumor cells but without sufficient features to provide a proper diagnosis. Indeed, the probability score analysis suggested UPS or myxofibrosarcoma for five of the tumors. However, this group of tumors is likely to harbor also other, more rare subtypes. For instance, one case, with a transcriptomic profile that did not match any of the tumor types included in the study, had a UBTF::GLI1 fusion, thus likely representing the only example in the cohort of a recently recognized “GLI1-rearranged” subtype of STT (30). Another tumor had an FBN1::FGFR1 fusion, thus possibly representing a phosphaturic mesenchymal tumor.
The French Sarcoma Group has reported that a gene expression signature (CINSARC) based on the expression levels of 67 genes, mainly involved in chromosome integrity and mitotic control, outperform both morphologic and genomic metastasis predictors in sarcomas with complex genomes (UPS, LMS, and dedifferentiated liposarcoma; ref. 17). Later, they have confirmed the prognostic impact of CINSARC in GIST (33), synovial sarcomas (34), and LMS (35). However, a recent prospective study on patients with high-risk soft tissue sarcomas treated with neoadjuvant chemotherapy failed to find any correlation between CINSARC and outcome (36). In this study, we applied the CINSARC panel on the current cohort (Fig. 3A). When categorizing the CINSARC values into low, intermediate, and high scores, we found, as expected, that only two benign lesions (one lipoma and one tenosynovial giant cell tumor) had high values (Supplementary Fig. S4B); both these benign lesions had typical morphology and genomic background. When instead focusing on either intermediate plus malignant lesions or malignant tumors alone, we found a correlation between a high CINSARC score and time to metastasis development (Fig. 3C). However, outcome analysis of complexes SC1 to SC3 shows that transcriptomic data harbor more information than can be gleaned from gene signatures based on a small set of genes. Although CINSARC values did not differ among the three SCs, prognostically relevant subgrouping was seen when studying a larger subset of genes (Fig. 3E and F).
In summary, transcriptomic analysis of STTs provides both diagnostic and prognostic information. By combining gene fusion status with GGE, subgrouping of tumors into relevant morphologic subgroups is improved and demonstrates the intrinsic variation among many of the most common types of sarcoma. Although a new diagnosis was suggested in close to 6% of the tumors, it should be kept in mind that the current study was based on a retrospective material. The majority of the tumors had been subject to morphologic reassessment prior to the RNA-seq analysis, but it seems likely that some of the tumors that were reclassified here would have been accurately diagnosed using the IHC markers available today. Furthermore, as surgery remains the most important treatment modality, morphologic reclassification does not necessarily imply any changes to the management of the patients. Still, bearing in mind the morphologic and clinical diversity of STTs, as well as the rapid development of novel targeted therapies, the results of the current study provide strong arguments for further assessment of the clinical potential of RNA-seq in STTs.
Supplementary Material
Batch effects on transcriptomic clustering
Unsupervised hierarchical clustering of soft tissue tumors
Clustering of benign adipocytic tumors
Clustering of sarcomas with complex genomic rearrangements
Myxoma samples form two distinct clusters
Soft tissue leiomyosarcoma can be divided into two distinct classes
Gene expression based classification of soft tissue tumors
Survival analysis with CINSARC enrichment
Overlapping fusion genes result in similar expression profiles
Clinical, morphological, and genetic data on 704 soft tissue tumors.
Fusion transcripts called by both Arriba and Fusioncatcher.
Fusions kept after manual filtering of RNA-seq data analyzed with fusion callers Arriba and Fusioncatcher.
Fusion transcripts of pathogenetic relevance in the different tumor types, sorted according to lineage of differentiationa.
Classification probabilities from the Random Forest model.
Representativeness of Study Participants
Acknowledgments
The authors thank the Center for Translational Genomics, Lund University, and Clinical Genomics Lund, SciLifeLab, for providing RNA-seq services. This work was supported by the Swedish Childhood Cancer Foundation (PR2023-0012 to F. Mertens), the Swedish Cancer Society (23-2662 to F. Mertens), and Governmental Funding of Clinical Research within the National Health Service (ALF grant to F. Mertens).
Footnotes
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
Data Availability
CEL files from SNP array analysis of 36 LMSs have been uploaded to ArrayExpress (RRID: SCR_002964) under accession number E-MTAB-15661. The raw RNA-seq data (fastq files) have been deposited in Federated European Genome-phenome Archive Sweden under accession number EGAD50000002120. Relevant aspects of genomic data (karyotypes, FISH results, RT-PCR findings, and SNP array profiles) were retrieved from the medical charts or prior publications, unless otherwise specified, and these data can be provided by the corresponding author upon reasonable request. Segment files from SNP array analysis of myxofibrosarcomas and UPSs have been deposited at ArrayExpress with accession number E-MTAB-13948 as part of a previous publication (20). All analysis code, source data, as well as an interactive version of Fig. 1 are available at Zenodo (RRID: SCR_004129, https://doi.org/10.5281/zenodo.17866629).
Authors’ Disclosures
F. Mertens reports grants from the Swedish Cancer Society and the Swedish Childhood Cancer Foundation during the conduct of the study. No disclosures were reported by the other authors.
Authors’ Contributions
J. Hofvander: Conceptualization, data curation, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. J. Köster: Data curation, validation, investigation, writing–review and editing. S. Sydow: Formal analysis, writing–review and editing. P. Piccinelli: Data curation, visualization. F. Vult von Steyern: Resources, writing–review and editing. P. Tsagkozis: Resources, writing–review and editing. A. Hesla: Resources, writing–review and editing. L. Magnusson: Formal analysis, methodology, writing–review and editing. J. Nilsson: Formal analysis, methodology, writing–review and editing. F. Mertens: Conceptualization, resources, data curation, supervision, funding acquisition, writing–original draft, writing–review and editing.
References
- 1. WHO Classification of Tumours Editorial Board . WHO classification of tumours of soft tissue and bone. 5th ed. Lyon: IARC; 2020. p. 1–336. [Google Scholar]
- 2. Mertens F, Johansson B, Fioretos T, Mitelman F. The emerging complexity of gene fusions in cancer. Nat Rev Cancer 2015;15:371–81. [DOI] [PubMed] [Google Scholar]
- 3. Cancer Genome Atlas Research Network . Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 2017;171:950–65.e28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Mitelman F, Johansson B, Mertens F, editors. Mitelman database of chromosome aberrations and gene fusions in cancer; 2025 [cited 2025 Sep 1]. Available from:https://mitelmandatabase.isb-cgc.org/. [Google Scholar]
- 5. Köster J, Piccinelli P, Arvidsson L, Vult von Steyern F, Bedeschi Rego De Mattos C, Almquist M, et al. The diagnostic utility of DNA copy number analysis of core needle biopsies from soft tissue and bone tumors. Lab Invest 2022;102:838–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Baelen J, Dewaele B, Debiec-Rychter M, Sciot R, Schöffski P, Hompes D, et al. Optical genome mapping for comprehensive cytogenetic analysis of soft-tissue and bone tumors for diagnostic purposes. J Mol Diagn 2024;26:374–86. [DOI] [PubMed] [Google Scholar]
- 7. Öfverholm I, Wallander K, Haglund C, Chellappa V, Wejde J, Gellerbring A, et al. Comprehensive genomic profiling alters clinical diagnoses in a significant fraction of tumors suspicious of sarcoma. Clin Cancer Res 2024;30:2647–58. [DOI] [PubMed] [Google Scholar]
- 8. Suurmeijer AJH, Dickson BC, Antonescu CR. Complementary value of molecular analysis to expert review in refining classification of uncommon soft tissue tumors. Genes Chromosomes Cancer 2024;63:e23196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Koelsche C, Schrimpf D, Stichel D, Sill M, Sahm F, Reuss DE, et al. Sarcoma classification by DNA methylation profiling. Nat Commun 2021;12:498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Lyskjaer I, De Noon S, Tirabosco R, Rocha AM, Lindsay D, Amary F, et al. DNA methylation-based profiling of bone and soft tissue tumours: a validation study of the “DKFZ Sarcoma Classifier”. J Pathol Clin Res 2021;7:350–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Miettinen M, Abdullaev Z, Turakulov R, Quezado M, Luiña Contreras A, Curcio CA, et al. Assessment of the utility of the sarcoma DNA methylation classifier in surgical pathology. Am J Surg Pathol 2024;48:112–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Mertens F, Antonescu CR, Mitelman F. Gene fusions in soft tissue tumors: recurrent and overlapping pathogenetic themes. Genes Chromosomes Cancer 2016;55:291–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Fletcher CDM, Bridge JA, Hogendoorn PCW, Mertens F, editors. WHO classification of tumours of soft tissue and bone. 4th ed. Lyon: IARC; 2013. p. 468. [Google Scholar]
- 14. Trojani M, Contesso G, Coindre JM, Rouesse J, Bui NB, de Mascarel A, et al. Soft-tissue sarcomas of adults; study of pathological prognostic variables and definition of a histopathological grading system. Int J Cancer 1984;33:37–42. [DOI] [PubMed] [Google Scholar]
- 15. Arbajian E, Puls F, Antonescu CR, Amary F, Sciot R, Debiec-Rychter M, et al. In-depth genetic analysis of sclerosing epithelioid fibrosarcoma reveals recurrent genomic alterations and potential treatment targets. Clin Cancer Res 2017;23:7426–34. [DOI] [PubMed] [Google Scholar]
- 16. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987;20:53–65. [Google Scholar]
- 17. Chibon F, Lagarde P, Salas S, Pérot G, Brouste V, Tirode F, et al. Validated prediction of clinical outcome in sarcomas and multiple types of cancer on the basis of a gene expression signature related to genome complexity. Nat Med 2010;16:781–7. [DOI] [PubMed] [Google Scholar]
- 18. Sydow S, Versleijen-Jonkers YMH, Hansson M, van Erp AEM, Hillebrandt-Roeffen MHS, van der Graaf WTA, et al. Genomic and transcriptomic characterization of desmoplastic small round cell tumors. Genes Chromosomes Cancer 2021;60:595–603. [DOI] [PubMed] [Google Scholar]
- 19. Rasmussen M, Sundström M, Kultima HG, Botling J, Micke P, Birgisson H, et al. Allele-specific copy number analysis of tumor samples with aneuploidy and tumor heterogeneity. Genome Biol 2011;12:R108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Mitra S, Farswan A, Piccinelli P, Sydow S, Hesla A, Tsagkozis P, et al. Transcriptomic profiles of myxofibrosarcoma and undifferentiated pleomorphic sarcoma correlate with clinical and genomic features. J Pathol 2024;264:293–304. [DOI] [PubMed] [Google Scholar]
- 21. Köster J, Ghanei I, Domanski HA. Comparative cytological and histological assessment of 828 primary soft tissue and bone lesions, and proposal for a system for reporting soft tissue cytopathology. Cytopathology 2021;32:7–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Puls F, Agaimy A, Flucke U, Mentzel T, Sumathi VP, Ploegmakers M, et al. Recurrent fusions between YAP1 and KMT2A in morphologically distinct neoplasms within the spectrum of low-grade fibromyxoid sarcoma and sclerosing epithelioid fibrosarcoma. Am J Surg Pathol 2020;44:594–606. [DOI] [PubMed] [Google Scholar]
- 23. Warmke LM, Ameline B, Fritchie KJ, Dehner CA, Agaimy A, Din NU, et al. YAP1::KMT2A-rearranged sarcomas harbor a unique methylation profile and are distinct from sclerosing epithelioid fibrosarcoma and low-grade fibromyxoid sarcoma. Virchows Arch 2025;486:457–77. [DOI] [PubMed] [Google Scholar]
- 24. Hofvander J, Jo VY, Fletcher CDM, Puls F, Flucke U, Nilsson J, et al. PHF1 fusions cause distinct gene expression and chromatin accessibility profiles in ossifying fibromyxoid tumors and mesenchymal cells. Mod Pathol 2020;33:1331–40. [DOI] [PubMed] [Google Scholar]
- 25. Hallor KH, Teixeira MR, Fletcher CDM, Bizarro S, Staaf J, Domanski HA, et al. Heterogeneous genetic profiles in soft tissue myoepitheliomas. Mod Pathol 2008;21:1311–9. [DOI] [PubMed] [Google Scholar]
- 26. Hofvander J, Tayebwa J, Nilsson J, Magnusson L, Brosjö O, Larsson O, et al. RNA sequencing of sarcomas with simple karyotypes: identification and enrichment of fusion transcripts. Lab Invest 2015;95:603–9. [DOI] [PubMed] [Google Scholar]
- 27. Brunetti M, Vitelli V, Naas AM, Zahl Eriksson AG, Haugland HK, Krakstad C, et al. Molecular landscape of endometrial stromal tumors. JCO Precis Oncol 2025;9:e2400779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Fei F, Prieto GCN, Harada S, Siegal GP, Wei S. Round cell tumor with a myxoid matrix harboring a PHF1-TFE3 fusion: myoepithelial neoplasm or ossifying fibromyxoid tumor? Pathol Res Pract 2021;225:153578. [DOI] [PubMed] [Google Scholar]
- 29. Liu X, Yin X, Li D, Li K, Zhang H, Lu J, et al. RNA sequencing reveals novel oncogenic fusions and depicts detailed fusion transcripts of FN1-FGFR1 in phosphaturic mesenchymal tumors. Mod Pathol 2023;36:100266. [DOI] [PubMed] [Google Scholar]
- 30. Cloutier JM, Kerr DA. GLI1-altered mesenchymal tumors. Surg Pathol Clin 2024;17:13–24. [DOI] [PubMed] [Google Scholar]
- 31. Guo X, Jo VY, Mills AM, Zhu SX, Lee CH, Espinosa I, et al. Clinically relevant molecular subtypes in leiomyosarcoma. Clin Cancer Res 2015;21:3501–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Anderson ND, Babichev Y, Fuligni F, Comitani F, Layeghifard M, Venier RE, et al. Lineage-defined leiomyosarcoma subtypes emerge years before diagnosis and determine patient survival. Nat Commun 2021;12:4496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Lagarde P, Pérot G, Kauffmann A, Brulard C, Dapremont V, Hostein I, et al. Mitotic checkpoints and chromosome instability are strong predictors of clinical outcome in gastrointestinal stromal tumors. Clin Cancer Res 2012;18:826–38. [DOI] [PubMed] [Google Scholar]
- 34. Lagarde P, Przybyl J, Brulard C, Pérot G, Pierron G, Delattre O, et al. Chromosome instability accounts for reverse metastatic outcomes of pediatric and adult synovial sarcomas. J Clin Oncol 2013;31:608–15. [DOI] [PubMed] [Google Scholar]
- 35. Italiano A, Lagarde P, Brulard C, Terrier P, Laë M, Marques B, et al. Genetic profiling identifies two classes of soft-tissue leiomyosarcomas with distinct clinical characteristics. Clin Cancer Res 2013;19:1190–6. [DOI] [PubMed] [Google Scholar]
- 36. Frezza AM, Stacchiotti S, Chibon F, Coindre JM, Italiano A, Romagnosa C, et al. CINSARC in high-risk soft tissue sarcoma patients treated with neoadjuvant chemotherapy: results from the ISG-STS 1001 study. Cancer Med 2023;12:1350–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Batch effects on transcriptomic clustering
Unsupervised hierarchical clustering of soft tissue tumors
Clustering of benign adipocytic tumors
Clustering of sarcomas with complex genomic rearrangements
Myxoma samples form two distinct clusters
Soft tissue leiomyosarcoma can be divided into two distinct classes
Gene expression based classification of soft tissue tumors
Survival analysis with CINSARC enrichment
Overlapping fusion genes result in similar expression profiles
Clinical, morphological, and genetic data on 704 soft tissue tumors.
Fusion transcripts called by both Arriba and Fusioncatcher.
Fusions kept after manual filtering of RNA-seq data analyzed with fusion callers Arriba and Fusioncatcher.
Fusion transcripts of pathogenetic relevance in the different tumor types, sorted according to lineage of differentiationa.
Classification probabilities from the Random Forest model.
Representativeness of Study Participants
Data Availability Statement
CEL files from SNP array analysis of 36 LMSs have been uploaded to ArrayExpress (RRID: SCR_002964) under accession number E-MTAB-15661. The raw RNA-seq data (fastq files) have been deposited in Federated European Genome-phenome Archive Sweden under accession number EGAD50000002120. Relevant aspects of genomic data (karyotypes, FISH results, RT-PCR findings, and SNP array profiles) were retrieved from the medical charts or prior publications, unless otherwise specified, and these data can be provided by the corresponding author upon reasonable request. Segment files from SNP array analysis of myxofibrosarcomas and UPSs have been deposited at ArrayExpress with accession number E-MTAB-13948 as part of a previous publication (20). All analysis code, source data, as well as an interactive version of Fig. 1 are available at Zenodo (RRID: SCR_004129, https://doi.org/10.5281/zenodo.17866629).



