Skip to main content
Neuro-Oncology logoLink to Neuro-Oncology
. 2025 May 27;27(10):2605–2616. doi: 10.1093/neuonc/noaf130

A clinically annotated transcriptomic atlas of nervous system tumors

Chi H Le 1, Ajai K Nelson 2, Adam J Berrones 3, Janki R Naidugari 4, Robert P Naftel 5, Eyas M Hattab 6, Brian J Williams 7, Akshitkumar M Mistry 8,
PMCID: PMC12833534  PMID: 40423244

Abstract

Background

While DNA methylation signatures are distinct across nervous system neoplasms, it has not been comprehensively demonstrated whether transcriptomic signatures exhibit similar uniqueness. Additionally, no large-scale dataset for comparative gene expression analyses exists. This study addresses these knowledge and resource gaps.

Methods

We compiled and harmonized raw transcriptomic and clinical data for neoplastic (n = 5,402) and nonneoplastic (n = 1,973) nervous system samples from publicly available sources, all profiled on the same microarray platform. After adjusting for surrogate variable effects (“batch effects”), machine learning methods were used to visualize, cluster, and reclassify samples with uncertain diagnoses (n = 2,225).

Results

We generated the largest clinically annotated transcriptomic atlas of nervous system tumors to date. Sample clustering was primarily driven by diagnosis. We show the utility of the atlas by refining the transcriptional subtypes of pheochromocytoma and paraganglioma (PH/PG), revealing 6 robust subtypes (Neuronal, Vascular, Metabolic, Steroidal, Developmental, Indeterminate), which were independently validated using TCGA RNA-seq data and that correlated with specific mutational signatures and clinical behaviors of these tumors.

Conclusions

Like bulk DNA methylation, we demonstrate that bulk transcriptomic signatures are distinct across the diagnostic spectrum of nervous system neoplasms. Our atlas’ broad coverage of diagnoses, including rarely studied entities, spans all ages and includes individuals from diverse geographical regions, enhancing its utility for comprehensive and robust comparative gene expression analyses, as exemplified by our PH/PG analyses. For access, visit http://kdph.shinyapps.io/atlas/ or https://github.com/axitamm/BrainTumorAtlas.

Keywords: brain neoplasms, gene expression, nervous system neoplasms, paraganglioma, transcriptome


Key Points.

  • - We present a large-scale transcriptomic atlas of nervous system tumors and nontumor entities.

  • - Sample clustering in this atlas is primarily driven by diagnosis.

  • - Refinement of PH/PG transcriptional subtypes revealed clinically relevant biological programs.

Importance of the Study.

In this study, we present an atlas that combines harmonized gene expression and manually curated clinical data from 5,402 neoplastic and 1,973 nonneoplastic nervous system samples from the public domain. Currently, there is a lack of comprehensive atlases covering a broad range of nervous system neoplasm diagnoses, including rarely studied entities, and spanning different geographic regions and age groups. They enable discoveries through comprehensive and robust comparative gene expression analyses across the diagnostic spectrum of nervous system neoplasms. Additionally, we demonstrate that the diagnostic distinctiveness of bulk DNA methylation signatures also extends to gene expression across the diagnostic spectrum of nervous system neoplasms and age groups. In the process, we identified specific entities within discrete gliomas and pheochromocytomas / paragangliomas that require further diagnostic refinement. Finally, the methods of this study can be applied to integrate and harmonize raw transcriptomic data from other rare conditions, enhancing their utility.

The distinctive DNA methylation signature of neoplasms has enabled the development of various DNA methylation-based classifiers for their diagnosis.1–3 DNA methylation, an epigenetic modification, regulates gene transcription. While the bulk transcriptomic profile of nervous system neoplasms is also believed to be unique, comprehensive evidence spanning all types of nervous system neoplasms and age groups is lacking. A large-scale dataset showing this would enable robust, comparative gene expression analyses of nervous system neoplasms with statistical confidence. This study addresses these gaps in knowledge and resources by harmonizing publicly available data to create a large-scale, clinically annotated transcriptomic dataset. Our atlas includes 5,402 neoplastic samples affecting the nervous system and 1,973 nonneoplastic samples. It serves as a valuable resource for gene expression-based analyses, supporting both clinical and biological research in neuro-oncology.

Methods

Transcriptomic Platform Selection

To ensure precision in the integrated data, we used only raw data generated on a single platform, allowing for uniform reprocessing across all samples. A comprehensive search of major online genomic databases identified the Applied Biosystems (formerly Affymetrix) GeneChip Human Genome U133 Plus 2.0 Array (National Center for Biotechnology Information [NCBI] ID: GPL570) as the predominant platform used for transcriptomic profiling of neoplasms affecting the nervous system.4 The GPL570 array offers extensive coverage of the human genome5 and generates data that are comparable to RNA sequencing data, providing similar opportunities for meaningful biological insights.6

Data Collection

We retrieved raw data files for both neoplasms affecting the nervous system and nonneoplastic samples of the nervous system analyzed using the GPL570 array from multiple sources, including the Gene Expression Omnibus (NCBI, National Institutes of Health, Bethesda, MD, USA), ArrayExpress (European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, UK) and other less commonly used repositories (last searched on July 15, 2024). In addition, we also collected any clinical data and processing information associated with the samples (metadata; see Supplementary Table 1) from the respective repositories, publications, and, at times, the corresponding authors of the publications. We discarded any metadata found to be inconsistent between the respective publication and the data repository, unless they had been reported in multiple publications or deposited more than once, in which case we used the most recent metadata.

Raw Transcriptomic Data Reprocessing

We conducted all analyses using R version 4.0 on a computing instance with 40 CPU cores and 1.5 terabytes of RAM, running Windows 10.

We read and processed all raw data (.cel files) using the affyPara R package7 (version 1.42) and a customized chip definition format with updated probe set definitions (BrainArray8 ENTREZG version 25). We performed background correction using the RMA algorithm9 and applied quantile normalization to the data. We used PM-only probes and summarized the data with the medianpolish algorithm. We annotated the probes with their respective gene Entrez identification numbers. For genes represented by multiple probes, we summarized the data using the maxmean method and the collapseRows function in the WCGNA package.10 Finally, we identified and removed duplicate samples—those with identical values for every gene—resulting in a final dataset of 7,375 samples with 20,360 genes measured in each sample.

Dimensionality Reduction

We reduced the high-dimensional dataset to two dimensions for visualization by plotting the samples on a 2D map using FFT-accelerated Interpolation-based t-distributed stochastic neighbor embedding11 (FIt-SNE version 1.2.1). We used the following recommended12 parameters: perplexity = n/100, theta = 0.5, max_iter = 5000, stop_early_exag_iter = 250, exaggeration_factor = 12.0, learning_rate = n/12, and initialization = “pca,” where n = the total number of samples, which is 7,375.

Training Dataset Development

We primarily identified clusters visually on the 2D FIt-SNE, or t-SNE, plot. To confirm and refine these visually identified clusters, we applied a combination of Ordering Points To Identify Clustering Structure (OPTICS) and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), both of which are unsupervised algorithms in the dbscan R package13 (version 1.2.0). OPTICS’ ability to detect clusters of varying densities is an advantage over DBSCAN. We used the following parameters for OPTICS: eps = 10, minPts = 3, xi = 0.03. For DBSCAN, we set the parameters to: eps = 1.75, minPts = 3, k = 2. The identified clusters were visualized using convex hulls on the t-SNE plot (Supplementary Figure 1). We annotated the samples in the clusters with their respective diagnoses from the metadata. Each cluster was annotated with its majority diagnosis, and the samples of the majority diagnosis that clustered together on the t-SNE plot were incorporated into a subset dataset that we designated as the training dataset. We used it to train classifiers to identify or reclassify the remaining samples.

Surrogate Variable Analysis

Estimating covariate-specific variance in gene expression data

We performed Principal Variance Component Analysis (PVCA; using the R package ExpressionNormalizationWorkflow, version 1.32) to assess the contribution of surrogate variables to gene expression variability. Specifically, PVCA partitions the total variance in gene expression data and quantifies the proportion of variance attributable to each covariable, allowing for the identification of sources of unwanted variation. We used a threshold of 50% of the variance to be explained by the selected principal components.

Surrogate variable adjustment

Surrogate variables that explained more than 5% of the variance in the PVCA analysis were selected for adjustment to minimize their confounding effects. To correct for these surrogate variables, we employed the ComBat function14 from the R package sva (version 3.54)15 and applied sequential “batch” effect correction. For each adjustment, we created a model matrix that included the sample diagnosis as a covariate to retain the biological signatures. The gene expression data were then adjusted iteratively, with each surrogate variable treated as a separate “batch” effect, thereby removing its contribution to the observed gene expression variance.

Sample Classification

Synthetic balancing of samples per diagnosis

As expected, the distribution of samples across diagnoses in the generated training dataset was imbalanced. To improve accuracy of machine learning classifiers, we over-sampled the samples within each diagnosis to match the maximum number of samples for a neoplastic diagnosis in the training dataset (n = 850). We applied the synthetic minority over-sampling technique16 in the R package smotefamily (version 1.3.1), setting the K nearest neighbor parameter at 10, or n − 1 when the number of samples per diagnosis (n) was fewer than 10. Using the synthetically balanced core dataset (which contained 46,689 samples), we developed 2 different machine learning classifiers to predict the diagnoses of the remaining samples. The total number of classes to predict equaled 54 for this multiclass classification.

Machine learning classifiers

We trained one classifier using the random forest algorithm (ranger R package,17 version 0.13.1). We trained the algorithm for accuracy by tuning the parameters and performing 5-fold cross-validation with 3 repeats using the caret R package18 (version 6.0-94). We tuned the following parameters and values: num.trees (1,000, 2,000, 3,000, 4,000), min.node.size (1, 2, 3, 4, 5, 6, 7, 8, 9), and mtry (142, 170, 175, 180, 200). Based on performance, we finalized the classifier with num.trees = 1000, min.node.size = 4, mtry = 170, splitrule = “gini,” and metric=“accuracy.”

Next, we trained another classifier using LightGBM19 (lightgbm R package, version 4.3.0), a high-performance gradient boosting framework for decision tree-based learning algorithms. We split the synthetically balanced core dataset into training (85%) and testing (15%) datasets, ensuring diagnostic class balance in the training set. We trained the algorithm to minimize the multiclass classification error rate in the test dataset by tuning the following parameters: boosting (“gbdt” or “dart”), number of iterations (100, 200, 300), number of leaves (30, 40, 50, 60), learning rate (0.1, 0.05, 0.01), and maximum bin (255, 375, 500). Based on performance, we finalized the classifier with boosting=’gbdt’, num_iterations = 100 with early stopping if 10 sequential iterations did not reduce the error rate, learning_rate = 0.1, num_leaves = 30, and max_bin = 375.

Sample proximity classifiers

We also developed 2 simple proximity-based classifiers. Mathematically calculated proximity is logically appealing, such as in a dendrogram of hierarchically arranged samples based on a measure of distance. However, in this case, clustering depends on the subjective selection of a cut-point on a dendrogram. For objectivity, we calculated the shortest Euclidian distance between one sample and another in the training dataset using the expression values of all the genes measured (n = 20,360). We set the maximum value of these shortest distances as the classification threshold. Each test sample was assigned the diagnosis of its nearest training neighbor falling within this threshold.

Using a similar approach to the proximity-based classifier, we leveraged t-SNE-based clustering for classification. We calculated the linear distance between a test sample and all samples in the training dataset using the 2 t-SNE coordinates. The test sample received the diagnosis of the closest training sample, thereby mapping it to a cluster on the 2D t-SNE plot of the training dataset.

Results

We completed the transcriptomic atlas in 2 steps. First, we harmonized and integrated publicly available raw transcriptomic microarray data generated using the Applied Biosystems (previously Affymetrix) GeneChip Human Genome U133 Plus 2.0 Array. Most of the data were generated using fresh frozen samples from living individuals (74.8%); 19.8% of the samples used were postmortem. We simultaneously processed and summarized the raw transcriptomic microarray data using the standard RMA algorithm9 and visualized the relationships among the samples on a t-SNE plot (Supplementary Figure 2).

Accurate annotation of the samples with a diagnosis according to the current classification was the second step in creating the atlas. This was necessary because we integrated samples processed from the year 2003 to 2018 (the raw data have a date and time stamp). Many samples had histopathological diagnoses that did not match the primary diagnosis of their cluster; their membership to a cluster was visually unclear on the t-SNE plot; or, they lacked a histopathological diagnosis altogether. During the time span of the samples, there have been changes in the diagnostic criteria for nervous system neoplasms. New diagnostic entities have been discovered, some have been deprecated, and the neuro-oncology community has shifted from histopathological to methylation-based diagnosis for its improved accuracy. The misdiagnosis rate of histopathological-based diagnosis can be as high as 12%.3

To complete this challenge, we first removed variability in the gene expression data due to surrogate variables, which are unmeasured factors that influence data (as opposed to “batch” effects, which are data variations arising from technical or experimental differences). We quantified the variance (using PCVA analysis) in the gene expression data that associated with the following surrogate variables: the year the data were generated, the ID of the dataset containing the data, the institution depositing the dataset, and the country of that institution. We also quantified the variability in gene expression explained by the diagnosis to assess. To estimate surrogate variable influence and complete the analysis, we temporarily assigned the diagnosis of the nearest sample on the t-SNE plot to samples with an unclear diagnosis or used the original histological diagnosis when the adjacent t-SNE sample’s diagnosis was not logical (n = 2,111 samples). As expected, the diagnosis explained the greatest amount of variance in the gene expression data (46.8%). In descending order of explaining variance were the institution (16.4%), dataset ID (9.0%), year of data generation (5.2%), and country of the institution (1.5%). About 20.9% of the variance was unexplained by these variables. After adjusting the gene expression data for the effects of the institution, dataset ID, and the year the data were generated using ComBat,14 an established method leveraging an empirical Bayesian framework, the variance explained by these variables dropped to 2.6%, 1.6%, 0.5%, and 0.4%, respectively (see Supplementary Figure 3 for results according to dataset ID). Variance explained by the diagnosis (which included the temporary-assigned diagnoses) increased to 66.7% (Supplementary Figure 4).

To rigorously predict questionable diagnoses of some samples in the atlas, we created a training dataset using a subset of the samples to train classification models. The training dataset included the majority of samples within a cluster identified on the t-SNE plots of both uncorrected and batch-corrected gene expression data with a diagnosis according to the latest classification. In the absence of unequivocal “truth,” this method is acceptable and was used to validate the DNA methylation-based classifier for nervous system neoplasms.20 We used both uncorrected and batch-corrected gene expression data to increase precision. Of the 7,375 samples, 5,150 samples were selected to be part of the training dataset (Figure 1) to predict or reclassify the diagnosis of the remaining 2,225 samples. The training dataset was used to develop 4 different classifiers to reclassify the remaining samples. Multiple classifiers using different approaches were chosen to increase precision. These included the random forest, gradient boosting, t-SNE, and a Euclidean-based proximity classifier that classifies test samples based on the diagnosis of the closest sample in the training dataset according to the transcriptome. Cross-validation of these 4 classifiers using the training dataset showed high discriminating power with accuracies of 100% (95% confidence interval, CI, 99.93% to 100%) for the random forest classifier; 99.94% (95% CI, 99.83% to 99.99%) for the gradient boosting classifier; 99.22% (95% CI, 98.94% to 99.44%) for the t-SNE-based classifier; and, 99.26% (98.99% to 99.48%) for the Euclidean-based proximity classifier (Supplementary Figure 5). See Supplementary Table 2 for the area under the curve and F1-score performance metrics of these classifiers. Using these methods, the diagnoses of the 2,225 samples were predicted.

Figure 1:

t-SNE plot of 5,150 training samples, each represented as a colored dot labeled by diagnostic category.

Representation of the training dataset (5,150 samples) in the t-SNE dimensionality reduction performed on the full dataset. Individual samples (dots) are color-coded and labelled according to the diagnosis listed in the side legend. Full names of the 54 diagnostic entities are provided in Supplementary Table 1. Of note, we chose to label a specific type of supratentorial ependymomas as “RELA” instead of the most recent nomenclature “ZFTA fusion-positive” because there are non-RELA, ZFTA-fused ependymomas21 and these samples were identified as RELA-fusion positive.

The 4 classifiers predicted the diagnoses of 1,968 of the 2,225 samples (88.4%) consistently, defined by prediction of the same diagnosis by 3 or all classifiers. The same diagnosis was predicted by all four classifiers in 1,546 samples and by 3 of the 4 classifiers in 422 samples, and 216 samples (9.7%) were reclassified with the diagnoses predicted by 2 classifiers because their histopathological diagnosis (an expert’s assessment) was also consistent with the predicted diagnoses. If the histopathological diagnosis was outdated, consistency of the predicted diagnoses with the general histopathological diagnosis (eg medulloblastoma or atypical teratoid rhabdoid tumor without further specification, such as SHH) or the diagnostic class of the histopathological diagnosis (eg “Embryonal Tumors” or “Diffuse Gliomas”) was accepted. In 49 samples, the histological assessment supported 2 different diagnoses predicted by 2 different classifiers. In these rare cases, the diagnoses predicted by the t-SNE-based classifier was chosen to facilitate visual interpretation of the final results (Figure 2). For 41 samples (1.8%), a final diagnosis could not be confidently assigned because we could not resolve inconsistencies in the predicted diagnoses (Supplementary Figure 6).

Figure 2:

t-SNE plot of 7,334 samples, including 5,150 training and 2,184 reclassified samples, each represented as a colored dot labeled by diagnostic category.

Representation of all samples (5,150 samples in the training dataset and 2,184 samples with an uncertain diagnosis that are reclassified) in the t-SNE dimensionality reduction performed on the full dataset. Individual samples (dots) are color-coded and labelled according to the diagnosis listed in the side legend. Full names of the 54 diagnostic entities are provided in Supplementary Table 1. Of note, we chose to label a specific type of supratentorial ependymomas as “RELA” instead of the most recent nomenclature “ZFTA fusion-positive” because there are non-RELA, ZFTA-fused ependymomas21 and samples used in the training dataset were identified as RELA-fusion positive.

Understanding how machine learning algorithms predict diagnoses can be highly beneficial. For instance, recent efforts to decipher how the random forest-based classifier,3 trained on bulk genome-wide DNA methylation profiles, predicts that brain tumor diagnoses have provided valuable biological insights.22,23 Analyzing the different diagnoses frequently predicted for a given sample by the classifiers may reveal nuanced biological relationships between these diagnoses. Therefore, we examined the inconsistent predictions of diagnosis by 4 classifiers across 679 samples. Each sample received 4 predictions, resulting in 6 pairs of predictions per sample, totaling 4,074 pairs among the 679 samples. After excluding pairs with the same predicted diagnosis, 2,436 inconsistent pairs remained, representing 181 unordered unique pairs where one classifier predicted one diagnosis and another classifier predicted a different diagnosis. The frequency with which a diagnosis is predicted by a classifier can depend on the number of unique samples with that diagnosis used to train the classifier. Synthetically balancing the number of samples per diagnosis may not completely overcome this limitation. Therefore, we normalized the rate of a diagnosis predicted in the inconsistent pairs with the proportion of samples with that diagnosis in the training dataset. Figure 3A displays the normalized values, indicating how frequently a given diagnosis was predicted by a classifier beyond what would be expected based on its incidence in the training dataset, particularly when there is uncertainty in the predicted diagnosis. For example, diagnoses such as ganglioglioma, desmoplastic infantile ganglioglioma, and pleomorphic xanthoastrocytoma were predicted more frequently than expected under uncertain conditions. When one classifier predicted these diagnoses, Figure 3B shows the other diagnoses predicted by another classifier (ie the inconsistent pairs) for these same diagnoses, chosen as examples. It reveals that samples classified as ganglioglioma, desmoplastic infantile ganglioglioma, and pleomorphic xanthoastrocytoma by one classifier were most likely to be classified as pilocytic astrocytoma or IDH and H3 wildtype diffuse glioma by another. Similarly, samples classified as pleomorphic xanthoastrocytoma by one classifier were most likely to be classified as wildtype diffuse glioma or meningioma by another. Pleomorphic xanthoastrocytoma can mimic a meningioma. 24,25

Figure 3:

(A) Bar plot showing frequency of diagnosis predictions in 679 discordant cases, normalized to training set proportions. (B) Chord diagram illustrating divergent predictions for samples initially classified as ganglioglioma, DIG, or PXA, with arrows highlighting notable classification overlaps.

In the 679 samples with nonunanimous predictions among the classifiers, (A) the frequency of each diagnosis prediction is normalized to its proportion in the “core” dataset used for training the classifiers. (B) When one classifier predicted a sample as ganglioglioma, desmoplastic infantile ganglioglioma, or pleomorphic xanthoastrocytoma (below the dashed line), this chord diagram illustrates the other diagnoses predicted by another classifier (above the dashed line). Specific relationships are highlighted with a black border and an arrow. Full names of the diagnostic entities are provided in Supplementary Table 1.

Last, to increase the utility of this atlas, we manually curated and harmonized the publicly available clinical metadata associated with the samples. Our atlas is broadly representative across geographical regions (Figure 4A), spans a wide age spectrum (from fetus to a maximum age of 106 years; Figure 4B), and covers various anatomical locations (Figure 4C). When available, we also extracted relevant genetic information associated with the samples (Figure 4D) and overall survival (Figure 4E-N) of patients from whom the samples were collected. These and other data are tabulated in Supplementary Table 1.

Figure 4:

(A–D) Bar plots showing sample distribution by country, age, anatomic location, and availability of genetic data in 7,375 tumor samples. (E–N) Kaplan-Meier curves showing survival data associated with samples of different diagnoses.

Manually curated metadata and clinical data associated with the 7,375 samples in the atlas generated. The number of samples per (A) country listed in the contact information of the deposited raw data, (B) age group, and (C) anatomic location. The sex of the samples is color-coded (pink/blue or gray if not available) in the latter 2 plots. (D) The number of samples with associated genetic information. (E-N) Overall survival, visualized using Kaplan-Meier curves, based on the final diagnosis. E-G survival curves are of all types of diffuse gliomas according to different subgroups. Tick marks on the curves represent censored values. Full names of the diagnostic entities are provided in Supplementary Table 1. Abbreviations: amplif = amplification; codel = co-deletion; meth = methylation; mut = mutation; NA = not available; NOS = not otherwise specified; pTERT/pMGMT = promoter of the respective gene; yrs = years.

Example Use: Insights into Transcriptional Diversity in Pheochromocytoma and Paraganglioma

Despite being relatively well-characterized biologically, pheochromocytomas and paragangliomas (PH/PG) are transcriptionally underexplored relative to tumors with larger sample sizes in the atlas. Our literature search (see Supplementary Methods) identified only 3 studies profiling ≥100 tumors on a single platform, with sample sizes of 144,26 187,27 and 188.28 The TCGA study (n = 187) proposed 4 transcriptional subtypes, and another study (n = 188) proposed 3; the third adopted TCGA’s framework. None incorporated cross-platform validation.

Our harmonized expression atlas, comprising 240 PH/PG tumors profiled on a uniform platform—the largest such dataset to date—enabled refinement of PH/PG transcriptional subtypes, which we independently confirmed through cross-platform validation. t-SNE embedding of batch-corrected data revealed discrete clusters (Figure 5A), and DBSCAN identified 6 robust clusters. Differential expression analysis defined subtype-specific genes (Figure 5B), which successfully reproduced the 6-subtype structure in TCGA-PCPG RNA-seq data via hierarchical clustering (Figure 5C). A t-SNE embedding of TCGA samples, colored by cluster assignment, further confirmed consistency of clustering between datasets (Figure 5D).

Figure 5:

(A, D) t-SNE plots of transcriptional subtypes in PH/PG tumors from atlas and TCGA datasets. (B, C) Heatmaps showing expression of top subtype-specific genes, cluster membership, and genetic alterations. (E) Word clouds of enriched GO biological processes per subtype. (F) Dot plot of shared hub transcription factors with kME values indicating network connectivity. (G–I) Box plots showing anatomical site, metastatic potential, and age at diagnosis across subtypes.

Transcriptional subtypes of pheochromocytoma and paraganglioma (PH/PG).

(A) t-SNE embedding of gene expression profiles from 240 PH/PG tumors in the atlas. Color of the dots (samples) represents transcriptional subtype. (B) Heatmap of scaled expression for top 30 differentially expressed genes (rows) across the samples (columns) clustered into the 6 subtypes in the atlas. Top 2 rows show cluster membership and genetic mutations. Names on the left designate the subtype cluster. (C) Similar heatmap generated for TCGA samples, and ten or eleven representative genes upregulated in each subtype are shown on the right. Pseudo RET, pseudo VHL, and pseudo SDHx represent designations used in the original study analyzing the samples in the atlas. SDHx represents SDH gene family (e.g., SDHA, SDHB, SDHC, SDHD). Fus. = fusion. (D) t-SNE embedding of TCGA-PCPG samples, based on the most variable genes (~20,000). Color of the dots (samples) represents transcriptional subtype. (E) Word clouds summarizing dominant Gene Ontology Biological Processes enriched in the top upregulated genes in each transcriptional subtype. (F) Shared hub transcription factors identified through weighted gene co-expression network analysis performed independently on atlas and TCGA samples. kME represents the Pearson correlation between the expression of the transcription factor across samples and the first principal component of the expression matrix of genes in a module. Higher absolute values of kME (dot size) represent greater interconnectivity between the transcription factor and genes in a module. Color of the kME values in positive or negative direction represent direct or indirect interconnectivity. (G) Distribution of tumor anatomical locations (pheochromocytoma vs paraganglioma), (H) tumor aggressiveness and metastatic rates, and (I) age at diagnosis across subtypes, showing median and interquartile range, using TCGA clinical data.

Word clouds of Gene Ontology Biological Processes enriched in each subtype (Figure 5E) using differentially expressed genes in our atlas guided our subtype annotation as Neuronal, Vascular, Metabolic, Steroidal, Developmental, and Indeterminate. Although the latter did not show enrichment in any specific transcriptional signature, it was enriched in CSDE1 mutations (adjusted P-value = 0.01).

The Neuronal subtype expresses synaptic scaffold proteins (eg SHANK2), neurotransmission regulators (eg NRGN, GABRG2), and neuroendocrine markers (eg PNMT, FEV, PPP1R1B), defining a neuronal signaling phenotype. Genes involved in synaptic adhesion and vesicle trafficking (eg NCAM2, ATP8A2, RPH3A) and the neural transcriptional regulator SALL4 further reinforce this neuronal identity. This subtype was enriched for RAS/MAPK pathway mutations (NF1, RET, BRAF, FGFR1, HRAS; adjusted P-values 0.2, 0.09, >0.99, >0.99, and 0.03, respectively; adjusted P-value < 0.001 for all combined as one 'kinase' group; Figure 5B and C).

The Vascular subtype upregulates angiogenic regulators (eg EGFL7, PTPRB) that drive vascular remodeling as well as hypoxia-inducible genes (eg EGLN3, DDIT4L, DEPP1, MIR210HG). It highly expresses genes involved in vascular permeability (eg AQP1) and glycolytic metabolism (eg HK2, NDRG1), reflecting vascular adaptation in an oxygen-deprived microenvironment regulated likely by TFAP2C, a highly expressed transcription factor in this subtype. Mutations in VHL were specific to this subtype (adjusted P-value < 0.001). EPAS1 (encoding HIF-2α) mutations were enriched in both Vascular and Intermediate subtypes (adjusted P-value = 0.003).

The Metabolic subtype shows high expression of neuronal transporters (eg SLC6A15, SLC7A11) and metabolic enzymes involved in one-carbon and nitrogen metabolism (eg MTHFD2, SHMT2, ARG2, CPS1). Genes related to synaptic adhesion and nutrient adaptation (eg SLITRK1, NXPH4, CNIH3, STXBP6) also support this metabolically active profile. Mutations in components of the succinate dehydrogenase complex (eg SDHA, SDHB, SDHD, collectively designated as SDHx) and TGDS gene, encoding TDP-Glucose 4,6-Dehydratase, are specific to the Metabolic subtype (adjusted P-value < 0.001 and 0.04, respectively).

The Steroidal subtype shows high expression of enzymes essential for steroid biosynthesis (eg CYP11A1, FDX1, FDXR, ALAS1, SULT2A1) and cholesterol storage (eg SOAT1). High expression of hormone receptors and steroid-related transporters (eg MC2R, NR0B1, NR1H4, SLC51A) is consistent with a steroidal transcriptional profile. Occasionally, samples with a kinase mutation clustered with Steroidal subtype samples due to the strong expression of these genes.

The Developmental subtype shows high expression of transcription factors involved in neural patterning (eg IRX4, GLI2, NTRK3, MDGA1, WNT4) and axon guidance (eg EPHB1). High expression of endocrine maturation markers (eg MAMLD1) reflects a developmental and neuroendocrine transitional profile. Notably, it has high expression of GHR (growth hormone receptor), SST (somatostatin), and AR (androgen receptor). MAML3 gene fusions were specific to this subtype (adjusted P-value < 0.001).

In our Atlas and TCGA, weighted gene co-expression network analyses identified shared “hub” transcription factors—highly connected regulators within subtype-specific co-expression networks that could explain transcriptional subtype identity (Figure 5F): RXRG, AR, and GLI2 (Developmental); TWIST1, NR1I2, NR3C2, CREB5, SATB1, STAT4, and SALL4 (Neuronal); ZEB1, HEY1, NFIC, FOXL2, FOXO4, HOXA5, NR5A1, NR0B1, and NR1H4 (Steroidal); and SOX11 and TFAP2C (Vascular).

Clinically, the Metabolic subtype was enriched for paragangliomas, while Steroidal and Developmental subtypes were exclusive to pheochromocytomas (Figure 5G). Developmental tumors were more likely to exhibit aggressive or metastatic behavior, whereas no Steroidal tumors were clinically aggressive (Figure 5H). Age at diagnosis differed significantly, with Metabolic and Vascular subtypes presenting earlier, and Developmental and Indeterminate subtypes later (Figure 5I). There were no differences in sex among the subtypes.

Discussion

This study harmonized publicly available data to create a large-scale, clinically annotated transcriptomic atlas comprising 5,402 samples of nervous system neoplasms and 1,973 nonneoplastic nervous system samples. While there have been efforts focused on subtypes of a single diagnosis (eg medulloblastoma29), the atlas we generated is high-dimensional and nearly comprehensive in its coverage of various nervous system neoplasms. Machine learning tools used for visualization reveal that the transcriptomic signatures of neoplastic and nonneoplastic samples are diagnostically unique. We leveraged this uniqueness to reclassify samples with unknown, obsolete, or questionable diagnoses. This work demonstrates that the diagnostic distinctiveness of bulk DNA methylation signatures also extends to gene expression across a broad range of nervous system neoplasms and age groups. Additionally, the neuro-oncology community can use this atlas for comparative gene expression analysis among nervous system neoplasms, such as elucidating differences between rare and common neoplasms.

There are a few noteworthy strengths of this large-scale, harmonized, high-dimensional atlas. First is best illustrated using a dimensionally reduced plot with the unsupervised t-SNE algorithm. It is crucial to note that the linear distance between distant clusters on the t-SNE plot does not necessarily correlate with biological differences. For instance, it is incorrect to conclude that retinoblastoma is more biologically different from neuroblastoma than medulloblastoma based solely on the linear distance between these clusters on the t-SNE plot. However, the linear distance is meaningful over closer distances or within a cluster, ie the strength of its correlation with biological difference is greater. With this understanding, the proximity of clusters revealing known biological relationships makes the atlas reliable for comparative gene expression analyses. Several relationships between nonneoplastic and neoplastic entities are highlighted: nonneoplastic supratentorial white matter and gliomas, choroid plexus and choroid plexus papillomas, pituitary and pituitary neuroendocrine tumors, peripheral nerve and neurofibromas, and retina and retinoblastomas. Nonneoplastic fetal tissues are closely associated with their primitive neuroectodermal neoplasms, such as fetal retina and retinoblastoma and fetal cerebellum and medulloblastoma (the SHH subgroup). The biological similarity of the following pairs of neoplasms is also reflected by the proximity of their clusters: meningiomas and solitary fibrous tumors, subgroups of medulloblastomas, pituitary neuroendocrine tumors, and peripheral neuroendocrine tumors such as PH/PG, and neuroblastoma and ganglioneuroma. Recently, a close biological relationship between high-grade neuroepithelial tumors with MN1 alteration and ependymal layer tumors was demonstrated,30 and this is also evident in the t-SNE plot. The maintenance and revelation of these expected biological relationships after harmonized integration of samples, despite their diverse geographic origins, acquisition times, and preparation/collection methods, is a notable strength of our methods and the generated atlas.

Second, the large sample size of rare and underexplored tumors in our atlas enables the generation of new biological insights, exemplified by our analysis of PH/PGs. The 2022 World Health Organization (WHO) classification recognizes pheochromocytomas as adrenal paragangliomas,31 and our transcriptomic atlas confirms this by demonstrating a single large cluster of PH/PGs. The TCGA study27 identified key genetic drivers that the WHO recognized as clinically relevant; however, substantial gaps remained in fully linking genetic alterations to transcriptional profiles and clinical behavior. For example, the TCGA “Wnt-altered” subtype, primarily composed of MAML3 fusion- and CDSE1 mutation-positive tumors, lacked strong evidence for Wnt pathway activation at the transcriptional level. In contrast, the subtypes revealed using our atlas—containing the largest uniformly processed cohort of PH/PG samples to date—are robust, validated using independent TCGA data, and more thoroughly explain the relationships among genetic drivers, transcriptional programs, and clinical phenotypes. Our findings provide strong support for selecting only MAML3-fusion tumors for somatostatin receptor-based imaging,32 and uncover potential clinically actionable targets, including growth hormone and androgen receptors, which are highly expressed in the most clinically aggressive Developmental subtype. Furthermore, we identify novel subtype-specific transcriptional regulators, offering additional mechanistic insights that could inform future biological investigations and therapeutic development. Few of them (such as TWIST1 and ZEB1) are regulators of epithelial-mesenchymal transition.

Continuing to dissect extracranial nervous system tumors using our atlas, we also identified an important relationship among ganglioneuromas, ganglioneuroblastomas (GNB), and neuroblastomas. GNBs are rare tumors composed of both immature neuroblasts and mature ganglion cells and have been clinically recognized as having malignant potential intermediate between ganglioneuromas and neuroblastomas, with the capacity to transform into a neuroblastoma.33–36 In our atlas, all GNB samples (n = 19) transcriptionally clustered with either the ganglioneuroma or neuroblastoma groups; none formed a distinct or intermediate cluster. This finding suggests that GNB diagnoses may reflect spatial heterogeneity or sampling bias during histological assessment, rather than representing a biological entity distinct or intermediate from ganglioneuromas or neuroblastomas. Further investigations are necessary to confirm this hypothesis.

The atlas has a few notable limitations. First, the high accuracies of our classifiers may reflect an over-fitting phenomenon. Although, we jointly used them to increase the precision of predictions, the predicted diagnoses were not consistent among the classifiers in 1.8% of the samples. The classifiers inconsistently predicted many discrete gliomas, especially ganglioglioma and desmoplastic infantile ganglioglioma, which were also likely to be predicted as pilocytic astrocytoma. In the original work that developed the DNA methylation-based classifier of central nervous system tumors, there were supratentorial samples designated as pilocytic astrocytoma/ganglioglioma (labeled PA/GG ST), which were acknowledged as a “combined,” nonequivalent diagnosis with respect to the World Health Organization classification scheme.3 In light of this data from the prior methylation-focused work,3 our gene expression-based work supports further investigations into the biological differences between pilocytic astrocytomas and gangliogliomas to refine their classification. For instance, in one case series of adult histologically diagnosed pilocytic astrocytomas (n = 79), 61% of the samples did not match to the pilocytic astrocytoma methylation class, prompting the authors to call for refinement of the diagnostic criteria, too.37 Second, despite this work, we believe that a machine learning classifier using transcriptomic data generated with a microarray transcriptomic platform (such as GPL570) to diagnose nervous system neoplasms is far from being ready for clinical use. Most diagnostic tissue in clinical practice is formalin-fixed paraffin-embedded, and our atlas contains only 6 tumor samples that were formalin-fixed paraffin-embedded. Currently, the practical ease of retrieving and handling DNA makes a DNA methylation-based classifier using a microarray DNA methylation platform more suitable for clinical use.

Conclusion

We created a transcriptomic atlas through harmonized integration of publicly available 5,402 neoplastic and 1,973 nonneoplastic samples of the nervous system. This atlas covers a wide range of diagnoses, including rarely studied entities, and includes clinical data such as age, sex, tumor location, genetic information, and survival rates for many samples, enhancing its utility. Reclassifying many samples according to the latest classification increases the value of older samples and boosts the statistical power of comparative gene expression analyses. By using publicly available data, we incorporated samples from around the world, broadening the atlas’ ethnic representation. Finally, our methodological workflow can be used to integrate and harmonize existing raw data of other rare diagnoses and conditions generated on GPL570, increasing their utility and informing future research.

Supplementary Material

noaf130_Supplementary_Figure_S1-S6
noaf130_Supplementary_Table_S1
noaf130_Supplementary_Table_S2

Acknowledgements

None.

Contributor Information

Chi H Le, School of Medicine, Vanderbilt University, Nashville, Tennessee.

Ajai K Nelson, Oberlin College, Oberlin, Ohio.

Adam J Berrones, Division of Prevention and Quality Improvement, Kentucky Department for Public Health, Frankfort, Kentucky.

Janki R Naidugari, School of Medicine, University of Louisville, Louisville, Kentucky.

Robert P Naftel, Department of Neurological Surgery, Vanderbilt University Medical Center, Nashville, Tennessee.

Eyas M Hattab, Department of Pathology and Laboratory Medicine, University of Louisville, Louisville, Kentucky.

Brian J Williams, Department of Neurological Surgery, University of Louisville, Louisville, Kentucky.

Akshitkumar M Mistry, Department of Neurological Surgery, University of Louisville, Louisville, Kentucky.

Author contributions

Conception: A.M.M.

Design of the work: A.M.M., C.L.

Acquisition of data: A.M.M., J.R.N.

Data analysis: A.M.M., C.L., A.K.N.

Data presentation: A.M.M., A.J.B.

Data interpretation: A.M.M., C.L., A.K.N., E.M.H., R.P.N., B.J.W., A.J.B.

Drafting the work: A.M.M.

Reviewing it critically for important intellectual content: A.M.M., C.L., J.R.N., A.K.N., E.M.H., R.P.N., B.J.W., A.J.B.

Final approval of the version to be published: A.M.M., C.L., J.R.N., A.K.N., E.M.H., R.P.N., B.J.W., A.J.B.

Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved: A.M.M.

Conflict of interest statement

A.M.M. consults for Neosoma, Inc., and received honorarium from GT Medical Technologies, Inc.

Funding

Kentucky Pediatric Cancer Research Trust Fund (to A.M.M.); Vanderbilt Institute for Clinical and Translational Research (VR52534 to A.M.M.); National Institutes of Health/National Institute of General Medical Sciences (P20GM155899 to A.M.M.’s institution); and the University of Louisville Presidential Scholar Award (to A.M.M.).

Data availability

Raw data (.cel files) are available from the primary source listed in Supplementary Table 1. The data can be accessed at http://kdph.shinyapps.io/atlas/. Final processed transcriptomic data in the format of an.rds file to be loaded directly into R can be downloaded from the link in the github page https://github.com/axitamm/BrainTumorAtlas. The codes that reproduce the atlas from the raw data, generate the classifiers using the “core” dataset, use the classifiers to diagnose samples in the “test” dataset, and make the figures in this manuscript are also available from the github page.

References

  • 1. Dragomir MP, Calina TG, Perez E, et al. DNA methylation-based classifier differentiates intrahepatic pancreato-biliary tumours. EBioMedicine. 2023;93:104657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Jurmeister P, Gloss S, Roller R, et al. DNA methylation-based classification of sinonasal tumors. Nat Commun. 2022;13(1):7148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Capper D, Jones DTW, Sill M, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Technical Note: Design and Performance of the GeneChip® Human Genome U133 Plus 2.0 and Human Genome U133A 2.0 Arrays. 2003; https://assets.thermofisher.com/TFS-Assets%2FLSG%2Fbrochures%2Fhgu133_p2_technote.pdf. Accessed July 31, 2024. [Google Scholar]
  • 5. Lakiotaki K, Vorniotakis N, Tsagris M, Georgakopoulos G, Tsamardinos I.. BioDataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology. Database (Oxford). 2018;2018:bay011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Guo Y, Sheng Q, Li J, et al. Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data. PLoS One. 2013;8(8):e71462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Schmidberger M, Vicedo E, Mansmann U.. affyPara-a bioconductor package for parallelized preprocessing algorithms of affymetrix microarray data. Bioinform Biol Insights. 2009;3:83–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Dai M, Wang P, Boyd AD, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33(20):e175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–264. [DOI] [PubMed] [Google Scholar]
  • 10. Miller JA, Cai C, Langfelder P, et al. Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinf. 2011;12:322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Linderman GC, Rachh M, Hoskins JG, Steinerberger S, Kluger Y.. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat Methods. 2019;16(3):243–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Kobak D, Berens P.. The art of using t-SNE for single-cell transcriptomics. Nat Commun. 2019;10(1):5416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Hahsler M, Piekenbrock M, Doran D.. dbscan: fast density-based clustering with R. J Stat Softw. 2019;91(1):1–30. [Google Scholar]
  • 14. Johnson WE, Li C, Rabinovic A.. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–127. [DOI] [PubMed] [Google Scholar]
  • 15. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD.. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP.. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–357. [Google Scholar]
  • 17. Wright MN, Ziegler A.. A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77(1):1–17. [Google Scholar]
  • 18. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28(5):1–26.27774042 [Google Scholar]
  • 19. Ke G, Meng Q, Finley T, et al. Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30. [Google Scholar]
  • 20. Maros ME, Capper D, Jones DTW, et al. Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat Protoc. 2020;15(2):479–512. [DOI] [PubMed] [Google Scholar]
  • 21. Tauziede-Espariat A, Siegfried A, Nicaise Y, et al. ; RENOCLIP-LOC, the BIOMECA (Biomarkers for Ependymomas in Children, Adolescents) Consortium. Supratentorial non-RELA, ZFTA-fused ependymomas: a comprehensive phenotype genotype correlation highlighting the number of zinc fingers in ZFTA-NCOA1/2 fusions. Acta Neuropathol Commun. 2021;9(1):135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Benfatto S, Sill M, Jones DTW, et al. Explainable artificial intelligence of DNA methylation-based brain tumor diagnostics. Nat Commun. 2025;16(1):1787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Benfatto S, Hovestadt V. shinyMNP. 2023; https://hovestadtlab.shinyapps.io/shinyMNP/. Accessed August 1, 2024. [Google Scholar]
  • 24. Pierallini A, Bonamini M, Di Stefano D, Siciliano P, Bozzao L. Pleomorphic xanthoastrocytoma with CT and MRI appearance of meningioma. Neuroradiology 2002;41(1):30–34. [DOI] [PubMed] [Google Scholar]
  • 25. Usubalieva A, Pierson CR, Kavran CA, et al. Primary meningeal pleomorphic xanthoastrocytoma with anaplastic features: A report of 2 cases, one with BRAFV600E mutation and clinical response to the BRAF inhibitor dabrafenib. J Neuropathol Exp Neurol 2015;74(10):960–969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Calsina B, Pineiro-Yanez E, Martinez-Montes AM, et al. Genomic and immune landscape Of metastatic pheochromocytoma and paraganglioma. Nat Commun. 2023;14(1):1122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Fishbein L, Leshchiner I, Walter V, et al. ; Cancer Genome Atlas Research Network. Comprehensive molecular characterization of pheochromocytoma and paraganglioma. Cancer Cell. 2017;31(2):181–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Loriot C, Burnichon N, Gadessaud N, et al. Epithelial to mesenchymal transition is activated in metastatic pheochromocytomas and paragangliomas caused by SDHB gene mutations. J Clin Endocrinol Metab. 2012;97(6):E954–E962. [DOI] [PubMed] [Google Scholar]
  • 29. Rathi KS, Arif S, Koptyra M, et al. A transcriptome-based classifier to determine molecular subtypes in medulloblastoma. PLoS Comput Biol. 2020;16(10):e1008263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Lehman NL, Spassky N, Sak M, et al. Astroblastomas exhibit radial glia stem cell lineages and differential expression of imprinted and X-inactivation escape genes. Nat Commun. 2022;13(1):2083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Mete O, Asa SL, Gill AJ, et al. Overview of the 2022 WHO classification of paragangliomas and pheochromocytomas. Endocr Pathol. 2022;33(1):90–114. [DOI] [PubMed] [Google Scholar]
  • 32. Han S, Suh CH, Woo S, Kim YJ, Lee JJ.. Performance of (68)Ga-DOTA-conjugated somatostatin receptor-targeting peptide PET in detection of pheochromocytoma and paraganglioma: a systematic review and metaanalysis. J Nucl Med. 2019;60(3):369–376. [DOI] [PubMed] [Google Scholar]
  • 33. Li W, Ou Z, Wu Z, et al. Development and validation of a prognostic nomogram for patients with ganglioneuroblastoma: a SEER-based study. Heliyon. 2024;10(9):e30891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. He WG, Yan Y, Tang W, Cai R, Ren G.. Clinical and biological features of neuroblastic tumors: a comparison of neuroblastoma and ganglioneuroblastoma. Oncotarget. 2017;8(23):37730–37739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Decarolis B, Simon T, Krug B, et al. Treatment and outcome of ganglioneuroma and ganglioneuroblastoma intermixed. BMC Cancer. 2016;16:542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Burnand KM, Neville J, Budzanowski A, et al. Management of ganglioneuroma and ganglioneuroblastoma intermixed: a United Kingdom Children’s Cancer and Leukaemia Group (UK CCLG) nationwide study report. Pediatr Blood Cancer. 2025;72(2):e31445. [DOI] [PubMed] [Google Scholar]
  • 37. Bode H, Kresbach C, Holdhof D, et al. Molecular refinement of pilocytic astrocytoma in adult patients. Neuropathol Appl Neurobiol. 2023;50(1):e12949. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

noaf130_Supplementary_Figure_S1-S6
noaf130_Supplementary_Table_S1
noaf130_Supplementary_Table_S2

Data Availability Statement

Raw data (.cel files) are available from the primary source listed in Supplementary Table 1. The data can be accessed at http://kdph.shinyapps.io/atlas/. Final processed transcriptomic data in the format of an.rds file to be loaded directly into R can be downloaded from the link in the github page https://github.com/axitamm/BrainTumorAtlas. The codes that reproduce the atlas from the raw data, generate the classifiers using the “core” dataset, use the classifiers to diagnose samples in the “test” dataset, and make the figures in this manuscript are also available from the github page.


Articles from Neuro-Oncology are provided here courtesy of Society for Neuro-Oncology and Oxford University Press

RESOURCES