Abstract
Purpose:
Accurate assessment of invasiveness in pediatric thyroid carcinomas is essential to prevent unnecessary surgery and avoid surgery-associated complications. DNA methylation, a proven molecular biomarker for cancer classification, holds promise for stratifying thyroid cancer risk. The objectives were to determine the epigenetic hallmarks of pediatric thyroid carcinomas and to investigate whether DNA methylome profiling is a feasible approach for pre-operative risk stratification of this pediatric disease.
Experimental Design:
We interrogated genome-wide DNA methylation profiles from two separately processed cohorts of pediatric thyroid carcinoma. The reference cohort included 100 samples, consisting of 87 well-differentiated primary tumors—77 papillary and 10 follicular thyroid carcinomas—and 13 matched lymph node metastases. To predict oncogenic drivers and tumor invasiveness, defined by the presence of nodal metastasis, we trained two classifiers on the reference cohort and then evaluated their performance on a second validation cohort of 84 samples, including 83 primary tumors and one lymph node metastasis.
Results:
We identified distinct methylation patterns associated with tumor invasiveness and key driver mutations, including BRAF p.V600E, RAS-like mutations, kinase fusions, and DICER1 mutations. The differentially methylated regions reflect inflammatory stress, disrupted thyroid development and function, implicating AR, Hippo, and AP-1 signaling. Leveraging these epigenetic signatures, we developed and validated two methylation-based classifiers that accurately predict tumor invasiveness and oncogenic mutation subgroups.
Conclusions:
In pediatric thyroid carcinoma patients, DNA methylation assays accurately predict tumor invasiveness and driver mutations. Our findings highlight the clinical value of DNA methylation profiling for risk stratification and classification of pediatric thyroid cancers.
Keywords: Thyroid carcinoma, Invasiveness, DNA Methylation, Epigenetics
INTRODUCTION
Thyroid carcinoma (TC) is the most common endocrine cancer in children, with increasing incidence over the past two decades (1,2). Similar to adults (3), papillary thyroid carcinoma (PTC) represents approximately 90% of all thyroid cancer cases, with follicular thyroid carcinoma (FTC) and medullary thyroid carcinoma accounting for the remaining 10% of all cases (3). Compared to adult cases, pediatric thyroid malignancies exhibit distinct clinical, pathological, and molecular features, often presenting as more advanced diseases with higher recurrence rates (4). Rates of extracapsular extension have been reported in up to 50% of children vs. 30% of adults, regional nodal involvement in up to 80% vs. 50%, and distant metastasis in up to 30% vs. 5% (5,6).
The preoperative diagnosis of pediatric thyroid tumors relies on thyroid and neck ultrasound as well as cytopathology; both tests are highly subjective with broad inter- and intra-observer variability (7). Current guidelines recommend total thyroidectomy in nearly all pediatric patients with PTC due to the increased risk of multifocality and associated risk of recurrence and persistent disease (8–10). Though total thyroidectomy lowers the risk of recurrence and persistent disease, it carries higher complication rates in children than in adults, with hypoparathyroidism reported in up to 15% of pediatric patients and recurrent laryngeal nerve injury in 6% (10,11). For tumors with low invasive potential, total thyroidectomy exposes the patients to an unnecessary risk of surgical complications and a lifelong need for levothyroxine replacement. However, the overall mortality rate of pediatric TC is low. The incidence-based mortality rates ranged from 0 to 6.35 × 10−4 per 100,000 person-years and did not vary by tumor size or extent of disease (12).
DNA methylation, an essential epigenetic modification involving adding a methyl group to the 5-carbon of cytosine residues in CpG dinucleotides, plays a crucial role in regulating gene expression and maintaining genomic stability (13,14). Aberrant DNA methylation patterns in thyroid cancer, including global hypomethylation and site-specific hypermethylation, have been linked to various oncogenic processes (15). The exploration of DNA methylation in thyroid carcinoma has largely focused on analyzing changes in CpG island and genic methylation, as well as their effects on specific gene expression (16). Notably, The Cancer Genome Atlas (TCGA) project in 2014 provided comprehensive methylation profiles for a large cohort of PTC cases using the Illumina Infinium HM450 array, suggesting a classification of PTC subtypes based on molecular features and identifying potential biomarkers that could inform disease management (17). Machine learning methods have classified thyroid cancers into subtypes based on their methylation profiles, and a prognostic classifier based on 21 methylation sites was developed to predict recurrence in well-differentiated thyroid cancers (18,19).
The disrupted mitogen-activated protein kinase (MAPK) signaling pathway is a hallmark of thyroid cancers (1). BRAF mutations, particularly BRAF p.V600E, are the most prevalent genetic alteration in adult PTC (20), but their prevalence is lower in adolescents and rare in prepubertal pediatric PTC (21). Their associations with the extent of metastases and radioiodine avidity are variable (22–24). RAS mutations are the second most prevalent alterations found in adult thyroid tumors (25). While RAS mutations are less common in children, certain RAS-like mutations, as well as PTEN and DICER1 mutations, are more prevalent in pediatric tumors and associated with a lower risk for metastasis (26–29). Kinase fusions (KF) involving RET, NTRK, ALK, or BRAF (30) are more common in pediatric patients. They are associated with an increased rate of regional and distant metastasis in children and adolescents (21,26,27,31,32).
Despite extensive research on adult thyroid cancers, the potential of epigenome profiles for risk stratification in pediatric thyroid cancer, which exhibits distinct oncogenic signatures, remains underexplored, highlighting the need for tailored classification approaches specific to pediatric patients. To better understand the potential of DNA methylation for stratifying clinical risk in pediatric carcinoma, we conducted a methylome study of a cohort of pediatric malignant thyroid lesions and identified clinically significant DNA methylation biomarkers. These markers revealed known and novel oncogenic mechanisms that contribute to thyroid cancer invasiveness. We also developed a methylation-based classifier to stratify the risk of invasive disease behavior to guide the extent of surgical approaches of pediatric TC and validated our model on a second cohort.
MATERIALS AND METHODS
Cohort and clinical data
A total of 184 samples across 169 patients were studied, split into reference and validation cohorts (Fig. 1A). The reference cohort (n = 100) consisted entirely of fresh-frozen tissue samples used for model training and primary analysis: 87 primary tumors (77 PTCs, 10 FTCs) and 13 paired lymph node (LN) metastases, all of which were paired with PTCs (Fig. 1A). The reference patient cohort consisted of 69 females (79%) and 18 males (21%), reflecting the higher prevalence of thyroid cancer in females compared to males. Patient ages ranged from 6.24–21.43 years (mean 15.3, standard deviation 2.80) (Supplementary Tables S1–S2). The overall sex and age distributions in our pediatric PTC cohort match US distribution (Supplementary Table S3).
Figure 1: Molecular subgroups of pediatric thyroid carcinoma defined by invasiveness and driver mutations.

A, CONSORT diagram of this study, split into reference cohort for primary analysis and training of the models, and validation cohort. B–I, t-SNE dimensionality reduction of pediatric primary tumors (n = 87) in the reference cohort, color-coded by B, clinical invasiveness; C, sex; D, age at diagnosis; E-G, TNM stage; H, genetic driver mutation group; and I, estimated leukocyte fraction.
The validation cohort (n = 84) consisted of 83 primary tumors (78 PTC, 5 FTC), and 1 LN metastasis (FTC). The tissue samples were subjected to DNA extraction in a different laboratory before being processed following the same procedures as the reference cohort. 64 (76%) samples were fresh-frozen tissues, and 20 (24%) samples were formalin-fixed paraffin-embedded (FFPE) (Supplementary Fig. S1A). The patient cohort consisted of 66 females (80%) and 17 males (20%). Patient ages ranged from 3.64–21.00 years (mean 15.3, standard deviation 3.12). Across both cohorts, 14 matched primary-LN metastasis were available for analysis; 13 PTC LN from the reference cohort were matched to 10 primary tumors in the reference cohort and 2 in the validation cohort, with case 0212 having two matched LN metastases (one axial). The FTC LN metastasis sample from the validation cohort was matched to its primary in the same cohort (Supplementary Table S1).
The study protocol was approved by the Institutional Review Board of Children’s Hospital of Philadelphia (IRB #20–018240), and informed consent was signed by all patients included in the cohort according to the Helsinki Declaration. Tumor stages were categorized according to the eighth edition of the American Joint Committee on Cancer (AJCC) TNM staging manual. DNA methylation data were generated at the Children’s Hospital of Philadelphia (Philadelphia, PA) from frozen tissue and FFPE samples. Illumina Infinium HumanMethylationEPICv2 (EPICv2) BeadChip microarrays (33) (RRID:SCR_010233) were performed for all thyroid samples. Existing methylation studies conducted on thyroid carcinomas used the Infinium HumanMethylation450K (HM450) and EPIC BeadChip due to their compatibility with variable DNA input (34), simple workflow (35), and fast processing time. This technology is being replaced by its successor, the EPICv2 array, enabling extensive mapping of over 937,000 CpG sites (33). The EPICv2 array encompasses promoter regions and extends to CpG island shores that showed significant differential methylation in tissues.
Primary tumors were classified by nodal involvement: low-invasive (N0, no LN metastasis) and high-invasive (N1a/N1b, regional LN metastasis). High-invasive cases with <5 positive LN were designated “low-confidence” to acknowledge borderline clinical staging. In the reference cohort, 32 were low-invasive, and 55 were high-invasive (16 of which were of low confidence). In the validation cohort, 25 were low-invasive, and 58 were high-invasive (14 of which were of low confidence). Distant metastasis (M1) was present in 11 reference cases and 9 validation cases, all with N1b staging (Supplementary Tables S1–S2).
Genetic drivers were identified using the CHOP Solid Tumor Panel (CSTP) and CHOP Cancer Fusion Panel (CCFP) or RNA sequencing analysis. Driver alterations were successfully determined for all except for 10 indeterminate samples and grouped into four mutational subgroups based on their genetic drivers: BRAF p.V600E, kinase fusions (KF) (primarily involving RET, NTRK1/3, and ALK), DICER1 (RNase IIIb hotspot mutations), and RAS-like mutations (mostly involving N-H-KRAS, BRAF non-V600E, PTEN, and TSHR). The reference cohort comprised 22 patients (25.2%) with BRAF p.V600E, 34 patients (39.1%) with KF, 20 patients (23.0%) with RAS-like mutations, 8 patients (9.2%) with DICER1 mutations, and 3 patients (3.4%) with indeterminate driver mutations. The validation cohort comprised 20 patients (24.1%) with BRAF p.V600E, 45 patients (54.2%) with KF, 9 patients (10.8%) with RAS-like mutations, 2 patients (2.4%) with DICER1 mutations, and 7 patients (8.4%) with indeterminate driver mutations.
For comparison, adult thyroid cancer data (TCGA-THCA) were downloaded using TCGAbiolink (36) (RRID:SCR_017683), consisting of 499 PTC samples profiled on the HM450 array, including primary (n = 449) and paired LN metastases (n = 50). The study cohort consisted of 325 females (72%) and 124 (28%) males. Patient ages ranged from 15–89 years (mean 47.15, standard deviation 15.59). Primary tumors were similarly classified by nodal involvement, with 225 (50%) low- and 224 (49%) high-invasive samples. The cohort was also classified by genetic driver mutations and comprised of 265 patients (59%) with BRAF p.V600E, 60 (13.4%) patients with KF, 84 patients (18.7%) with RAS-like mutations, 2 patients (0.4%) with DICER1, and 38 patients (8.5%) with indeterminate driver mutations, representing the dominance of BRAF v.600E drivers in adult PTC (20) (Supplementary Table S4).
Data preprocessing
Raw methylation data from the reference and validation cohorts were obtained from IDAT files, normalized, and methylation values were jointly calculated using the R package SeSAMe’s standard preprocessing pipeline (35). Samples with discordant inferred and clinical sex were excluded. After excluding sex chromosome probes (n = 24,953) and retaining only CpG-targeting probes (n = 908,400), methylation values were collapsed to probe prefixes, and 902,052 probes were retained for downstream analysis. Samples with probe success rates (PSR) <0.70 were excluded (Supplementary Table S2). The validation cohort (median: 0.892 [IQR: 0.863–0.919]) showed significantly lower PSR compared to the reference cohort (median: 0.948 [IQR: 0.932–0.957]; Wilcoxon test, p=2.2e-16), which may be attributable to DNA damage from higher proportion of FFPE samples, which is known to degrade DNA through formaldehyde cross-linking, fragmentation, and other processing artifacts (Supplementary Fig. S1B) (37). Thus, the validation cohort was retained to represent the technical and biological variation representative of real-world conditions.
For the joint analysis of pediatric and adult TC samples, preprocessing of the adult methylation data was performed similarly as described above, by excluding sex chromosomes (n = 11,661) and retaining only CpG-targeting probes (n = 470,865). Pediatric (EPICv2) and adult (HM450) methylation array platform data were subset to overlapping probes (n = 382,015), after which they were merged into a unified matrix of 683 samples across all primary and LN samples. Subsequent integrative analysis was performed on this integrated dataset.
Unsupervised analysis and consensus clustering
Principal component analysis (PCA) was performed on the top 30,000 most variable CpG sites for each cohort. The basic assumption is that genes with higher variability contribute more to the clustering process. t-distributed stochastic neighbor embedding (t-SNE) visualization was performed on PCA-transformed data.
To identify stable methylation-based clusters, consensus clustering was performed on all reference and reference cohort samples using the ConsensusClusterPlus R package (RRID:SCR_016954) (38). For every candidate cluster number tested (k = 2–10), hierarchical clustering was performed and pruned into k clusters (80% of features, 1000 iterations, Euclidean distance). Pairwise consensus values, the proportion of runs in which two samples clustered together, were calculated and stored in consensus matrices for each run. To determine the optimal number of clusters, the cumulative distribution function (CDF) for each candidate cluster number k was plotted against the consensus value index, which represents the fraction of sample pairs that have consensus values greater than or equal to that particular consensus value. Change in area under the CDF was plotted against each candidate k. The final consensus cluster assignments were determined by hierarchical clustering of the consensus matrix distance (1 - consensus values) and pruning to the optimal k clusters.
Cell type deconvolution
Cell type deconvolution was performed on methylation data to infer the stromal composition using the EpiDISH package (39). Solid tissue-type inference was performed to deconvolve the samples into broad cell type components: epithelial, fibroblast, immune-cell (reference matrix: centEpiFibIC.m), and the immune cell component was further deconvolved into seven immune cell subtypes (centBloodSub.m). The immune cell fraction was denoted as the estimated leukocyte fraction for subsequent analyses. Hierarchical clustering was performed on the inferred cell type fractions. microRNA 200c (miR-200c) expression was estimated using the CytoMethIC package.
Differential methylation and enrichment analysis
CpG loci associated with tumor invasiveness were identified through differential methylation using the sesame modeling pipeline (35). DNA methylation β values were fitted to a linear model, and corresponding slope tests and goodness-of-fit tests (F-tests holding out each contrast variable) were performed to evaluate the significance of differences in DNA methylation levels. Differentially methylated loci (DML) were selected based on β coefficient (> 0.2 or < −0.2) and Benjamini-Hochberg (BH)-adjusted p-value (<0.05). The knowYourCG pipeline (40) was employed to test the enrichment of the DML across curated biological and technical databases, including chromatin states, gene association, transcription factor binding sites (TFBS), and more. Fischer’s exact test was conducted to determine whether a set of CpGs was enriched in certain categories or features. Significantly enriched databases were plotted.
Integrated DNA methylation and RNA expression analysis
RNA-seq data (log2 CPM) from primary reference cohort samples were analyzed with the limma R package (three samples missing data, n = 84, RRID:SCR_010943) (41). Differential expression between high- and low-invasive samples was modeled with a linear model adjusted by sex and leukocyte fraction, followed by Empirical Bayes and BH correction. DML were annotated to genes (hg38) and summarized per gene by median invasiveness-associated β coefficient. Gene-level methylation and expression results were integrated to identify activated (hypomethylation with increased expression) or silenced (hypermethylation with decreased expression). Gene set enrichment analysis (GSEA) was performed using the fgsea R package (RRID:SCR_020938) using signed metric (–log10(pval) x log2 fold change).
Epigenetic clock analysis
Epigenetic clock analysis has become a valuable method for estimating biological age by examining patterns of DNA methylation. This study applied age regression techniques to samples using the Horvath multi-tissue model from methylClock (42). The data was processed and verified for compatibility with the pediatric (EPICv2) and adult (HM450) arrays (43). Epigenetic age acceleration (EAA) was estimated as the residual value from a linear regression model of epigenetic age on chronological age (age at DNA sampling). Models were adjusted for sex and estimated leukocyte fraction. Statistical comparison between groups was performed using Wilcoxon rank-sum tests to assess pairwise differences in methylation distributions.
Random Forest classifier development and evaluation
For the reference cohort, the leave-one-out cross-validation (LOOCV) approach was implemented using primary tumor samples to train classifiers predicting tumor invasiveness as previously defined by nodal involvement (n = 87, classes: High-invasive, Low-invasive) and driver mutation group excluding indeterminate samples (n = 84, classes: BRAF p.V600E, RAS-like, Kinase Fusion, DICER1). In each fold, one training sample was held out for testing while the remaining samples were used for training. Feature selection was performed by training multiple Random Forest (RF) classifiers on subsets of 10,000 probes (n_trees = 500). Feature importance was determined by the permutation-based variable importance measure from the randomForest R package. Importance scores from all classifiers were aggregated, and the top 3,000 most important features were used to train the final RF model using all training samples for that fold, which was used to predict the label of the one held-out test sample. Prediction confidence was defined as the Random Forest prediction probability associated with the predicted label. Performance metrics, including error, accuracy, and area under the receiver operating curve (AUC), were calculated from the LOOCV predictions. For the multiclass driver mutation classifier, AUC was calculated in a one-vs-all approach for each class. SHapley Additive exPlanations (SHAP) values were calculated from the final model using the TreeSHAP R package and visualized with shapviz (44). The final classifiers were then applied to the validation cohort and LN metastases, and performance metrics and predictions were recorded.
DATA AVAILABILITY:
The generated thyroid methylome profiles are available in the Gene Expression Omnibus (RRID:SCR_005012) with superset accession GSE312914. Informatics for array data preprocessing and functional analysis is available in the R/Bioconductor package SeSAMe (version 3.22+): https://bioconductor.org/packages/release/bioc/html/sesame.html. Trained models, training and testing scripts are available at https://github.com/jennyznli/2025_TC. Additional raw data is available upon request from the corresponding author.
RESULTS
DNA methylation distinctly encodes the invasiveness of pediatric thyroid carcinoma.
Unsupervised clustering analysis of the tumor methylation profiles was performed to explore the global methylome landscape of pediatric thyroid samples (Fig. 1B–I). An exploratory t-SNE visualization of the reference cohort’s 87 primary samples identified three distinct clusters. To validate the robustness of the cluster assignments, we performed consensus clustering and determined the optimal number of clusters by examining the cumulative distribution plot (CDF) (Supplementary Fig. S1C–F). The peak in the delta area under the CDF at k = 3 indicated diminishing returns with additional clusters (Supplementary Fig. S1D), and the pairwise consensus matrix showed strong intra-cluster- and minimum cross-cluster agreement (Supplementary Fig. S1E).
The clusters were denoted (1) Low-invasive (LI): a cluster mainly comprised of low-invasive samples (n = 26); (2) High-invasive (HI): a cluster mainly comprised of high-invasive samples (n = 54); (3) Leukocyte-infiltrated (LEUKO): a smaller cluster comprised of samples with high leukocyte infiltration (n = 7) (Fig. 1B). Clusters showed little segregation by sex or age, as sex-chromosome CpGs were excluded from analysis (Fig. 1C–D). While clustering generally aligned with clinical invasiveness, some samples diverged from their assigned clusters, likely due to limitations in surgical assessment, lymphocyte infiltration, and subclonality (see discussion below).
We reviewed the available histopathological and molecular data associated with clusters as defined by methylation profiles. Pathologically, the LI cluster includes mainly T1a, T1b, T2, and T3/3a samples, with a predominance of T2 tumors. The LEUKO cluster consists of T1a, T1b, and T2 samples. The HI cluster spans all stages from T0 to T4/4a, with T4/T4a and M1 cases exclusive to this group (Fig. 1E–G). Molecularly, two distinct subclusters emerged within the HI cluster, defined by the BRAF p.V600E and KF mutational groups, respectively (Fig. 1H). Similarly, the LI cluster is further divided into two subclusters, each defined by RAS-like and DICER1 mutations. The LEUKO cluster is composed primarily of KF tumors and two indeterminate driver mutation samples, suggesting a decoupling from mutational signatures. Furthermore, our reference cohort showed a relatively copy-number-quiet profile with few focal alterations, lacking recurrent CDKN2A/B deletions or focal amplifications of receptor tyrosine kinases and immune-evasion genes reported in adult anaplastic thyroid cancers (45), consistent with differences in patient age and disease stage.
We denoted the LEUKO cluster due to its high leukocyte fraction (Fig. 1I) (see discussion below) (Supplementary Fig. S2A), as evidenced by the expression of mesenchymal markers such as miR-200c (46) (Supplementary Fig. S1G–I). Indeed, all seven high-invasive PTC cases within this cluster had chronic lymphocytic thyroiditis, characterized by marked infiltration of immune cells. Four of these harbor KF mutations (0149, 0334, 0201, 0164), one carries a BRAF p.V600E mutation (0261), and the remaining two have unknown drivers: one is a tall-cell variant (0138) with low invasiveness, and the other is a papillary thyroid microcarcinoma (PTMC) (0170B) with high-invasive behavior.
Differential methylation links invasiveness to disrupted thyroid development and function
To define the epigenetic hallmark of invasiveness in the pediatric thyroid carcinoma, we first performed differential methylation analysis of the three previously defined clusters in the reference cohort. Differentially methylated CpG loci (DMLs) associated with invasiveness were identified using the following criteria: (1) Benjamini-Hochberg (BH) FDR corrected p-value < 0.05; and (2) β coefficient > 0.2 (hypermethylated) or < −0.2 (hypomethylated).
LEUKO cluster samples exhibit higher global methylation and display extensive differential methylation, with 83,726 DMLs relative to the HI cluster and 60,290 DMLs relative to the LI cluster (Supplementary Fig. S2A–B). These differences largely reflect leukocyte-derived methylation signatures. Accordingly, LEUKO-associated DMLs were enriched for cell-type signatures of lymphocytes, including T and B cells (Supplementary Fig. S2C–D), as well as transcription factor binding sites (TFBS) involved in lymphoid development and hematopoiesis, such as RUNX3, LYL1, LMO2, ETV6, PBX1, TLX1, ZBTB16, and LMO1 (Supplementary Fig. S2E–F). Finally, cell-type deconvolution confirmed that LEUKO samples cluster together due to substantial immune infiltration, particularly CD4 T cells and B cells (Fig. 2A). Together, these findings indicate that immune cell-driven epigenetic alterations dominate the LEUKO methylome, dwarfing methylation differences associated with tumor invasiveness observed between the HI and LI groups (19,125 DMLs; Supplementary Fig. S2A).
Figure 2: Epigenome and transcriptome show distinct immune-related signatures in high-invasive thyroid carcinoma.

A, Unsupervised hierarchical clustering of deconvoluted cell type fractions in reference cohort primary tumors (n = 87). Top: major cell type fractions (epithelial, fibroblast, and immune cell). Bottom: Immune cell infiltration across seven immune subtypes. B, Heatmap of 4,856 significant differentially methylated loci (DML) comparing methylation levels (β) between low and high-invasive primary tumors (Padj < 0.05). C, Volcano plot of DML showing hypomethylated (blue, Δβ < −0.2) and hypermethylated (red, Δβ > 0.2) loci in high-invasive tumors (Padj < 0.05). D, Transcription factor binding site (TFBS) enrichment analysis of DMLs showing hypomethylated sites (left) and hypermethylated sites (right) in high-invasive samples, ordered by significance (-log10(FDR)). Dot size indicates size of database overlap; color intensity indicates enrichment strength (estimate). E, Integrated methylation-expression analysis plotting median differential methylation (Δβ) against differentially expression of corresponding TF in high-invasive samples (red: activated TF with hypomethylated binding sites; gray: not significant; FDR < 0.05). Size indicates the number of aggregated CpG probes per TFBS. F, Heatmap of expression levels for the fourteen most differentially expressed TFBS in high-invasive samples with unsupervised hierarchical clustering.
To dissect invasiveness-associated changes from immune-cell driven alterations, we then performed a regression analysis in which DNA methylation was treated as the response variable and invasiveness as the primary predictor, adjusted for sex and leukocyte fraction. Low-invasive samples have higher global methylation levels than high-invasive samples (Wilcoxon test, p = 2.2e-3). Compared to low-invasive cases, high-invasive tumors have 4,706 significantly hypomethylated and 150 hypermethylated CpGs (Fig. 2B–C). This is primarily driven by RAS-like tumors, as DICER1 tumors have a comparable global methylation level to the high-invasive tumors (Supplementary Fig. S2B).
In high-invasive tumors, hypomethylated loci were enriched for binding sites of thyroid lineage-determining and oncogenic transcription factors, including NKX2–1/TTF1, YAP, and TEAD1 (Hippo pathway) and AP-1 components FOS and JUN (Fig. 2D, Supplementary Fig. S2E–F). The enrichment of NKX2–1 binding sites links invasive behavior to disruption of thyroid tissue development and lineage regulation, a common mechanism in tumorigenesis (47). Concurrent enrichment of YAP and TEAD1 sites implicates activation of Hippo signaling in invasive progression, consistent with its role in promoting epithelial plasticity and growth (48,49). Similarly, hypomethylation at AP-1 binding sites suggests engagement of stress-responsive transcriptional programs that facilitate invasion (50).
In contrast, hypermethylated loci in high-invasive tumors showed minimal enrichment of transcription factor binding sites overall, with the androgen receptor (AR) binding representing the only notable signal (Fig. 2D, Supplementary Fig. S2E–F). The enrichment at AR-associated sites is consistent with prior evidence that hormone signaling can modulate tumor behavior (51) and that preserved AR activity may constrain the invasive potential of carcinomas (52,53). Together, these epigenetic alteration patterns associated with invasive progression reflect the disruption of lineage programming and activation of oncogenic transcriptional programs, alongside the repression of hormone-associated regulatory elements that may otherwise restrain invasion.
DNA methylation alteration at the binding site of transcription factors is often observed in conjunction with the upregulation of the transcription factor’s own expression, as shown by integrated methylation-expression analysis (Fig. 2E) and expression heatmaps (Fig. 2F). High-invasive PTC-specific transcription factors include inflammatory and immune-associated transcription factors, such as GATA3, STAT4, SPI1, and CD74, together with stress- and invasion-linked epithelial regulators, including TP63, RXRG, AHR, ETV4, and ELF3, again suggesting an inflammation-coupled epithelial plasticity program (Supplementary Fig. 2H), consistent with literature (54–56) and more recent evidence (57). On the contrary, low-invasive tumors retain higher expression of differentiation-associated TFs (GLIS2, SIX2), consistent with a more lineage-constrained state; GLIS1 is broadly downregulated in adult cancers, supporting its association with less aggressive biology (58). The gene-level integration of DNA methylation and RNA transcription revealed coordinated regulation and function of transcription factors linked to thyroid carcinoma invasiveness.
Pediatric and adult thyroid carcinomas exhibit distinct epigenetic aging signatures.
Pediatric and adult PTC differ in mutation spectra and likely tumorigenic mechanisms (4), but their epigenomic differences remain less understood. A joint t-SNE embedding of methylation profiles from pediatric and adult cohorts revealed that global methylation profiles show varying degrees of separation based on invasiveness (Fig. 3A), genetic subtypes (Fig. 3B), age groups (Fig. 3A–D), and leukocyte infiltration (Fig. 3C). The frequency of molecular alterations differed by age, with BRAF p.V600E mutations observed more commonly in adult thyroid cancer (Fig. 3B). While pediatric BRAF p.V600E cases are all highly invasive, adult BRAF p.V600E cases are mixed in invasiveness. RAS-like and DICER1 mutation-driven samples clustered separately, exhibiting relatively less separation between pediatric and adult groups, suggesting that the epigenome changes driven by these mutations predominate over age-related differences. In contrast, adult KF and BRAF p.V600E samples showed a more dispersed distribution, suggesting greater epigenetic heterogeneity than pediatric cases, potentially due to stochastic epigenetic drift with age (Fig. 3B).
Figure 3: Integrative analysis of pediatric and adult thyroid carcinoma epigenomes.

A-D, t-SNE dimensionality reduction of primary samples from the pediatric reference cohort (n = 87; open circles) and adult cohort (n = 449; filled circles) colored by A, clinical invasiveness, B, genetic driver mutations, C, estimated leukocyte fraction, and D, sex. E, Correlation between inferred epigenetic age and chronological age, colored by clinical invasiveness. The diagonal line indicates where predicted age equals chronological age. F, Epigenetic age acceleration comparison between pediatric and adult cohorts, stratified by clinical invasiveness (Wilcoxon test: pediatric p = 0.039, adult p = 0.129).
The epigenome can be used to inform estimates of chronological and biological aging, particularly the accelerated aging observed in cancer (59,60). Using established multi-tissue epigenetic clocks (42), we inferred epigenetic ages of thyroid carcinomas (Fig. 3E). In the pediatric cohort, the median inferred epigenetic age was 22.9 years (IQR: [18.0–31.6]), significantly higher than the actual median age at surgery of 15.3 years (IQR: [14.0–17.3]). Similarly, in the adult cohort, the median inferred epigenetic age was 57.4 (IQR: [45.3–67.9]) years, compared to an actual median age of 46 years (IQR: [35–58]) (Supplementary Table S2). These findings highlight significant epigenetic age acceleration in both groups, consistent with cancer’s extended proliferative history (60).
Despite a younger age at surgery, pediatric and adult groups show similar absolute epigenetic age acceleration, measured by regression residuals, suggesting a disproportionately greater acceleration in pediatric cases relative to age at diagnosis (Fig. 3E–F). Interestingly, in pediatric cases, high-invasive tumors exhibit slightly greater age acceleration than low-invasive tumors (Wilcoxon test: p = 0.039); a similar trend is observed in adult thyroid tumors but does not reach statistical significance (Fig. 3F). Our data indicate that age acceleration is likely reflective of intrinsic tumor biology, e.g., cell proliferation and invasiveness, relatively independent of the age of carcinoma initiation.
Machine-learning models to stratify high and low invasive thyroid carcinomas.
Having established the methylome differences linked to clinical invasiveness and driver mutations, we next evaluated whether DNA methylation profiles can inform clinical stratification. We developed Random Forest (RF) classifiers to predict clinical invasiveness, as defined by LN metastasis, and driver mutation classes from genome-wide CpG methylation profiles of primary thyroid carcinoma samples of the reference cohort, excluding indeterminate samples from driver classifier (invasiveness: n = 87, driver: n = 84; Fig. 4A). Applying a leave-one-out-cross-validation (LOOCV) approach, the invasiveness classifier achieved an overall accuracy of 84% (73/87; sensitivity = 90.9%), while the driver mutation classifier attained an overall accuracy of 95% (80/84), correctly predicting 95.5% (21/22) of BRAF p.V600E, 100% (8/8) of DICER1, 94.1% (32/34) of KF, and 95% (19/20) of RAS-like cases (Fig. 4B). Applying the final models on the validation cohort (invasiveness: n = 83; driver: n = 76, excluding indeterminate) yielded slightly lower performance with 77.1% (64/83; sensitivity = 94.8%; Fig. 4C) for the invasiveness classifier. The driver mutation classifier attained an overall accuracy of 82.9% accuracy (63/76), correctly predicting 60% (12/20) of BRAF p.V600E, 50% (1/2) of DICER1, 86.7% (39/45) of KF, and 100% (9/9) of RAS-like cases (Fig. 4C). Receiver operating characteristic (ROC) analyses confirmed strong discriminatory power across both cohorts, with AUCs exceeding 0.81 for invasiveness and ranging from 0.92 to 1.00 for driver mutation categories (Supplementary Fig. S4A–B).
Figure 4: Construction of classification models for pediatric thyroid carcinoma invasiveness.

A, Schematic of the random forest (RF) classifier development and validation pipeline for predicting clinical invasiveness and driver mutation groups. Top: Leave-one-out cross-validation (LOOCV) was performed on primary samples from the pediatric reference cohort using iterative training on subsets with feature selection. Bottom: Final models were trained on the reference cohort, with SHAP (SHapley Additive exPlanations) analysis for feature interpretation and comprehensive validation. B-C, Confusion matrices showing prediction performance for clinical invasiveness (left) and driver group (right) from the B, reference cohort, and C, validation cohort. D-E, t-SNE dimensionality reduction of the reference cohort showing misclassifications for D, clinical invasiveness, and E, driver group classifiers. Point transparency indicates prediction confidence, shapes denote accuracy. F, t-SNE visualization of fourteen paired primary samples (circle) and lymph node metastases (triangle), color-coded by patient ID with dashed lines connecting paired samples. G, Transcription factor binding site enrichment analysis of 3,000 most important probes from the clinical invasiveness (left) and driver group (right) classifiers, ranked by significance (-log10(FDR)). Dot size indicates database overlap; color intensity indicates enrichment strength (estimate).
Analysis of misclassifications across both cohorts revealed biologically meaningful patterns that could improve risk stratification. The two classifiers often misclassified the same cases with predicted oncogenic driver status aligned with genomic characterization, potentially offering clinically relevant leads for follow-up. The majority of invasiveness misclassifications involved clinically low-invasive tumors predicted as high-invasive, which often harbored aggressive oncogenic drivers such as BRAF p.V600E or KF. In the reference cohort, six of nine low-to-high misclassifications carried KF (n = 4) or BRAF p.V600E (n = 2) mutations (Fig. 4D–E). Several validation samples also demonstrated this pattern, with ten of sixteen misclassifications harboring BRAF p.V600E (n = 3) or KF mutations (n = 7) (Supplementary Fig. S4C–D). Given the congruence between the methylation classifications and oncogenic driver status, the low invasiveness clinical designation may reflect undetected or future metastatic potential, warranting heightened clinical surveillance for such cases. The remaining misclassifications often involved a second co-mutation. For example, the RAS-like, low-invasive case 0259 in the validation cohort is misclassified as high-invasive. This case harbored a cooperating KRAS and inactivating KEAP1 mutation, associated with the upregulation of KEAP1/NRF2 target genes. Such co-mutations, known to promote tumor progression in other cancers (61), may explain its high-invasive classification despite being clinically low-invasive.
Similarly, among the clinically reported high-invasive tumors that were misclassified as low-invasive, three in five misclassifications in the reference cohort (0135, 0172, and 0208) carried RAS-like mutations and clustered with LI profiles, despite having high-invasive clinical annotations. Interestingly, several of these misclassifications were clinically N1b but involved fewer than five LNs in the central neck, suggesting limited spread and representing borderline cases in our classification threshold (reference: 0208; validation: 0372, 0868) (Fig. 4D–E, Supplementary Fig. S4C–D).
Besides classification performance, prediction confidence may be informative. For invasiveness classification, prediction confidence was significantly higher for correct predictions in both the reference and validation cohort (one-sided Wilcoxon test, invasiveness: p = 0.019, p = 0.003; Supplementary Fig. S4E), with many misclassified samples located at cluster boundaries (Fig. 4D–E, Supplementary Fig. S4C–D), suggesting ambiguous or transitional methylation states. Further, the LEUKO cluster exhibits lower confidence scores (median = 0.679) compared to the HI (median = 0.924) and LI clusters (median = 0.928). This suggests that leukocyte contamination does affect the performance of our classifiers, as expected.
DNA methylation refines the classification of pediatric thyroid carcinoma invasiveness beyond driver mutations.
Despite the overall concordance between methylation and mutation-based classification, methylation classifiers may refine invasiveness predictions beyond conventional driver mutation associations. For example, case 0033, harboring RAS-like mutations typically linked to low invasiveness, was clinically characterized as high-invasive. It clustered with HI and was predicted accordingly (Fig. 4D–E, Supplementary Fig. S1F). Further examination revealed that this case carried complex mutations of NRAS p.Q61K and TP53 p.R273C, suggesting a more aggressive profile from co-mutations. On the other hand, the KF case 0077A, despite carrying a TG::IGF1R fusion, is correctly predicted to be of low invasiveness by the methylation classifier and placed in the LI cluster (Fig. 4D–E, Supplementary Fig. S1F). This discrepancy between the driver mutation and DNA methylation classification suggests a potential distinction between this fusion and other KFs. Indeed, studies in thyroid cells indicate that IGF1 signals preferentially via the PI3K pathway, as opposed to the usual MAPK pathway activation observed in other KFs (62).
Interestingly, the methylation classifier can also resolve some inconsistencies between clinical annotation and global methylation cluster assignment. For example, case 0126, carrying a subclonal BRAF p.V600E mutation with limited LN involvement (2/15 positive LN from central neck), was clinically labeled high-invasive (N1a) but showed an overall methylome profile resembling less invasive tumors. Despite this, the methylation classifier correctly identified it as high-invasive, indicating the classifier’s sensitivity to aggressive subclones. Another case (0171) with a BRAF p.V600E mutation was reported clinically as low-invasive and located in the periphery of the HI cluster. It was correctly classified as low invasive, though with borderline confidence (0.54). In the LEUKO cluster, sample 0138, harboring an indeterminate driver mutation and tall cell histology, was clinically reported as low-invasive but classified as high-invasive (Fig. 4D–E).
Together, the above case studies suggest that methylation profiles can detect subtle differences in invasiveness that are potentially overlooked by driver mutation groups or similarities in global methylation profiles.
DNA methylation classifier robustly predicts driver mutations from lymph node metastasis.
To test the generalizability of our classifiers, we analyzed the DNA methylome profiles of matched LN metastases across both reference and validation cohorts. Thirteen patients had matched LN metastases samples, with case 0212 having two matched LN metastases (one axial) (Supplementary Table S1). In the global methylome clustering, LN metastases maintain a close relationship with their matched primary tumors (Fig. 4F). Consistently, all had the same genetic driver predicted as was reported in the primary tumor. The driver classifier accurately predicted 13/14 (92.8%) driver mutations in the matched LN metastasis samples based on their methylome profiles. The sole misclassification occurred in case 0868, a high-invasive, FTC DICER1 case, which was incorrectly classified as RAS-like. Generally, the methylome of LN metastases clustered with their matched primary tumors, with the exception of case 0212, which deviated due to significant lymphocyte infiltration (0.57 leukocyte fraction; Fig. 4F, Supplementary Fig. S2A). The classifier correctly predicted the driver mutation of this case despite the deviation.
Together, these results underscore the utility of methylation-based classifiers in predicting both tumor invasiveness and oncogenic genetic alterations. Methylation profiles can not only recapitulate known molecular subtypes but may also resolve discrepancies between genotype and phenotype, supporting their potential clinical utility for triaging tumor aggressiveness.
DNA methylation classifier interpretation reveals divergent epigenetic signatures.
Finally, we interpreted the features used in the machine learning models for biological insights. First, feature importance analysis revealed that invasiveness and driver classifiers relied on distinct sets of CpG sites, though some overlapping features contributed to both tasks with moderate importance (Supplementary Fig. S4F). Enrichment of the top 3,000 predictive CpG features ranked by mean decrease in accuracy index uncovered Hippo signaling and thyroid tissue development, recapitulating findings from differential methylation analysis (Fig. 4G). The results aligned closely with previous enrichment results of hypermethylated sites from differential methylation analysis between low- and high-invasive tumors in both adult (Supplementary Fig. S3F) and pediatric (Fig. 2D) cohorts, suggesting conservation of these regulatory mechanisms across age groups.
To provide a more interpretable, sample-level view of feature influence, we performed a SHAP (Shapley Additive exPlanations) analysis to explore heterogeneity in the contribution of methylation features across different clinical and mutation groups. Targeting final models trained on all primary tumor samples from the reference cohort, we showed that high invasiveness predictions were primarily driven by hypomethylation features, consistent with global hypomethylation patterns observed in aggressive cases (Supplementary Fig. S4G). In contrast, prediction patterns for driver mutations were more complex as revealed by analyzing one-vs-rest classifiers. Both hypo- and hypermethylation features contributed to BRAF p.V600E and DICER1 predictions (Fig. 4G), whereas KF and RAS-like mutation predictions were more consistently driven by hypo- and hypermethylation, respectively (Supplementary Fig. S4G). Methylation alterations that contribute to DICER1 predictions are more limited and involve gene-specific methylation alterations, such as in genes regulated by miRNAs, consistent with DICER1’s function as a pre-miRNA processor (63). Overall, driver classifiers exhibit more complex patterns, suggesting stronger individual feature contributions.
DISCUSSION
We presented and validated DNA methylation-based classifiers of tumor invasiveness and oncogenic driver in pediatric thyroid carcinoma. Pediatric tumors differ fundamentally from adult PTC in their mutational profiles, clinical behavior, and treatment challenges; yet, they have been largely excluded from prior methylome studies. Rather than focusing on diagnosis or histologic subtype, our model targets invasiveness, a clinically actionable feature directly informing surgical decision-making and outcomes.
Our study revealed that pediatric thyroid tumors exhibit a diverse methylation landscape. High-invasive tumors are characterized by widespread hypomethylation and focal increases in methylation, which may reflect differences in the degree of developmental disruption, proliferation, and inflammatory stress. Invasiveness-associated DMLs were enriched at TFBS for nuclear receptors (AR), thyroid-specific TF such as TTF1/NKX2–1, and regulators of Hippo (64) and AP1 signaling (65) (Fig. 2D). Notably, methylation gain at the TTF1/NKX2–1 locus has been linked to TF binding loss and poor prognosis (66), and our findings align with the suppression of the androgen-AR axis as a methylation-mediated mechanism of aggressive PTC (67). Moreover, AR transactivation drives metastasis via the WNT pathway in prostate cancer (68), pointing to common epigenetic mechanisms across hormone-receptor-involved carcinomas. The epigenetic separation by invasiveness may also reflect differences in differentiation, as more invasive tumors tend to exhibit lower thyroid differentiation scores, lower BRAF-RAS Scores (BRS, reflecting a more BRAF-like state), and higher ERK scores (indicating elevated MAPK pathway activity), consistent with high-invasive behavior (Supplementary Fig. S3C–E).
Driver mutations are a primary force in shaping global methylation profiles, reflecting distinct tumorigenic trajectories and regulatory programs. RAS-like samples, associated with low invasiveness, exhibit significantly higher methylation levels than other mutational groups (69,70). In contrast, more invasive subtypes such as BRAF p.V600E and KF exhibit widespread hypomethylation (69,70). These observed methylation differences reflect downstream effects of the BRAF-like and RAS-like gene expression subtypes, driven by MAPK/ERK and PI3K/AKT signaling, respectively (17). Despite sharing lower global methylation averages with the high-invasive tumors, DICER1 samples are more similar to the RAS-like samples in the whole methylation profile and distinct from the high-invasive tumors. This is consistent with their shared low invasiveness and a previous report of DICER1 samples exhibiting RAS-like transcriptomic patterns (71). Notably, a case of FTC (0125A) harboring the DGCR8 hotspot mutation p.E518K clustered closely with other DICER1 tumors in terms of DNA methylation (Fig. 4E) (72,73), suggesting convergent disruptions in miRNA regulation.
Our methylation-based invasiveness classifier achieved 84% accuracy in the reference cohort and 77% concordance in the validation cohort with clinically reported invasiveness, confirming the expected clinical-molecular correlation in most cases. Out of the samples whose classifications did not match their clinically reported invasiveness labels, 75% of these cases were supported by driver mutations, while the remaining 25% reflected borderline cases. Furthermore, the model correctly classified driver mutations in all LN metastasis samples, supporting prior evidence that most carcinoma-related methylation changes often precede metastasis (74). Collectively, these findings suggest that methylomes have high predictive value for both molecular characterization and clinical presentation at the time of surgery.
As mentioned, methylation profiling resolves biologically meaningful heterogeneity among genotypically similar tumors. In both the primary and validation cohorts, cases clinically labeled as low-invasive and RAS-like were predicted as high-invasive by the methylation classifier and were found to harbor cooperating driver alterations (e.g., TP53, KRAS/KEAP1), consistent with higher invasiveness. These examples illustrate how the methylome captures the nuanced consequence of driver mutations, such as the combined effects of co-mutations and different kinase fusion sub-classes. One KF-to-BRAF misclassification in the validation cohort involved a BRAF fusion rather than a point mutation (0462: RAB3GAP2-BRAF), which may represent intermediate molecular subtypes (Supplementary Fig. S4D). The few discordant cases where driver mutation did not match expected clinical invasiveness also showed features suggestive of intermediate states, including low variant allele fractions (e.g., 0870 has a BRAF p.V600E variant allele frequency = 0.23).
We identified an immune-infiltrated methylation group (LEUKO cluster) enriched for transcription factors involved in hematopoiesis and vascular maintenance (Supplementary Fig. S2E–F). Because immune infiltration, such as lymphocytic thyroiditis, can substantially alter DNA methylation profiles, samples in the LEUKO cluster showed reduced classification confidence and biased predictions (toward high-invasive tumors), confounding tumor-intrinsic risk stratification. These findings underscore the need for classifiers that distinguish between tumor-intrinsic and microenvironment-driven methylation signals, including approaches such as cell-type deconvolution, expanded sampling of immune-rich, low-invasive tumors, or subgroup-specific modeling. Alternatively, excluding highly leukocyte-infiltrated samples using predefined thresholds may be necessary.
Age-related and cancer-associated DNA methylation changes are often intertwined, and it has been proposed that age-associated epigenetic alterations, particularly at Polycomb-regulated regions, accumulate over time and predispose cells to malignant epigenetic silencing (75). Under this model, adult cancers would be expected to exhibit greater epigenetic age acceleration than pediatric tumors, reflecting a longer pre-malignant mitotic and epigenetic history. In contrast, we observed comparable levels of methylation age acceleration in both pediatric and adult thyroid carcinomas (Fig. 3E–F), indicating that cancer-associated epigenetic aging is largely independent of chronological age at diagnosis. This finding suggests that accelerated epigenetic aging primarily reflects tumor clonal expansion and oncogene-driven proliferation rather than the accumulation of pre-existing age-related epigenetic changes. Together, these results support a model in which genetically driven tumor initiation rapidly imposes a cancer-specific epigenetic aging signature, even in pediatric disease.
As one of the first efforts to apply methylome-based stratification in pediatric carcinomas, this study is limited by its single-institution cohort. Validation of the classifier from external institutions will be critical to establish the generalizability and robustness of our model. Technical considerations such as batch effects, platform-specific biases, and tissue preservation methods must be addressed, especially when transitioning from laboratory experiments to clinical assays. Limited representation of certain histologic and molecular subtypes in the training cohort, such as FTC, RAS-like, DICER1, and APC-mutation samples, reduced classification confidence. The association between driver mutation groups and clinical invasiveness also constrains within-group diversity, limiting driver-specific differential methylation analyses. Furthermore, cooperating oncogenic mutations remain undersampled, which complicates the classification of driver groups. While the classifier demonstrated high accuracy, borderline cases underscore its value as a decision-support tool, complementing clinical, histologic, and radiologic data. Finally, prospective validation evaluating the classifier’s impact on surgical planning, patient counseling, and long-term clinical outcomes will be critical to assess real-world clinical utility. Our retrospective study cannot address how this classifier would perform in pre-operative decision-making contexts or whether it would meaningfully alter clinical management.
CONCLUSION
This study demonstrates the feasibility and clinical potential of using DNA-methylation profiling to stratify tumor invasiveness and driver mutation status in pediatric thyroid carcinomas. By identifying robust epigenetic signatures associated with both invasive behavior and specific genetic alterations, we showed that methylation-based classifiers can enhance existing diagnostic approaches and support risk-adapted management. Notably, methylation profiling outperformed genotype alone in resolving phenotypic heterogeneity, identifying outlier cases, and predicting metastatic potential, such as in LN samples. These findings lay the groundwork for developing clinically actionable, epigenetic-based risk stratification tools in pediatric oncology.
Supplementary Material
Translational Relevance:
Thyroid carcinoma is the most common endocrine malignancy in children, and current guidelines recommend total thyroidectomy for nearly all pediatric cases. While effective, the procedure carries higher complication risks in children, including hypoparathyroidism and nerve injury. Improved preoperative diagnostics could reduce unnecessary surgeries and lifelong hormone dependence. Existing imaging-based approaches are subjective and variable. In this study, we demonstrate that genome-wide DNA methylation profiling robustly captures molecular features of pediatric thyroid carcinoma, including invasiveness and driver mutations. These findings support the potential of DNA methylation as a preoperative prognostic tool to inform treatment decisions and minimize surgical risk.
ACKNOWLEDGMENTS
The authors thank the Center for Applied Genomics Genotyping Core at the Children’s Hospital of Philadelphia for their help with array processing. This work was supported by the Children’s Hospital of Philadelphia’s Thyroid Center Frontier Program (A. Bauer, A. Franco, J. Filho); National Institutes of Health R35-GM146978 (W. Zhou); National Institutes of Health R01-CA214511 (A. Franco); Department of Defense: W81XWH2210655 (A. Franco).
Abbreviations List:
- AUC
Area under the curve
- BH
Benjamini-Hochberg
- DML
Differentially methylated loci
- FFPE
Formalin-fixed paraffin-embedded
- FTC
Follicular thyroid carcinoma
- GSEA
Gene set enrichment analysis
- HI
High-invasive
- KF
Kinase fusions
- LEUKO
Leukocyte-infiltrated
- LI
Low-invasive
- LN
Lymph node
- LOOCV
Leave-one-out cross-validation
- PTC
Papillary thyroid carcinoma
- PTMC
Papillary thyroid microcarcinoma
- RF
Random Forest
- ROC
Receiver operating characteristic
- SHAP
SHapley Additive exPlanations
- TC
Thyroid carcinoma
- TFBS
Transcription factor binding site
- TNM
Tumor, Node, Metastasis (staging system)
- t-SNE
t-distributed stochastic neighbor embedding
Footnotes
AUTHOR DISCLOSURES
WZ received Infinium arrays from Illumina Inc. for research.
Conflict of Interest: WZ received funding and Infinium arrays from Illumina Inc. for research.
REFERENCES
- 1.Guleria P, Srinivasan R, Rana C, Agarwal S. Molecular landscape of pediatric thyroid cancer: A review. Diagnostics (Basel). 2022;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bernier M-O, Withrow DR, Berrington de Gonzalez A, Lam CJK, Linet MS, Kitahara CM, et al. Trends in pediatric thyroid cancer incidence in the United States, 1998–2013. Cancer. 2019;125:2497–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hogan AR, Zhuge Y, Perez EA, Koniaris LG, Lew JI, Sola JE. Pediatric thyroid carcinoma: incidence and outcomes in 1753 patients. J Surg Res. 2009;156:167–72. [DOI] [PubMed] [Google Scholar]
- 4.Jarzab B, Handkiewicz-Junak D. Differentiated thyroid cancer in children and adults: same or distinct disease? Hormones (Athens). 2007;6:200–9. [PubMed] [Google Scholar]
- 5.Alzahrani AS, Alkhafaji D, Tuli M, Al-Hindi H, Sadiq BB. Comparison of differentiated thyroid cancer in children and adolescents (≤20 years) with young adults. Clin Endocrinol (Oxf). 2016;84:571–7. [DOI] [PubMed] [Google Scholar]
- 6.Dinauer CA, Breuer C, Rivkees SA. Differentiated thyroid cancer in children: diagnosis and management. Curr Opin Oncol. 2008;20:59–65. [DOI] [PubMed] [Google Scholar]
- 7.Lai S-TT, Bauer AJ. Approach to the Pediatric Patient with Thyroid Nodules. J Clin Endocrinol Metab. 2025;110:2339–2352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Handkiewicz-Junak D, Niedziela M, Lewiński A, Bosowski A, Chmielik E, Czarniecka A, et al. Diagnostics and treatment of differentiated thyroid carcinoma in children - Guidelines of the Polish National Scientific Societies, 2024 Update. Endokrynol Pol. 2024;75:565–91. [DOI] [PubMed] [Google Scholar]
- 9.Lebbink CA, Links TP, Czarniecka A, Dias RP, Elisei R, Izatt L, et al. 2022 European Thyroid Association Guidelines for the management of pediatric thyroid nodules and differentiated thyroid carcinoma. Eur Thyroid J. 2022;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Francis GL, Waguespack SG, Bauer AJ, Angelos P, Benvenga S, Cerutti JM, et al. Management Guidelines for Children with Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid. 2015;25:716–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Stack BC, Twining C, Rastatter J, Angelos P, Baloch Z, Diercks G, et al. Consensus statement by the American Association of Clinical Endocrinology (AACE) and the American Head and Neck Society Endocrine Surgery Section (AHNS-ES) on Pediatric Benign and Malignant Thyroid Surgery. Head Neck. 2021;43:1027–42. [DOI] [PubMed] [Google Scholar]
- 12.Qian ZJ, Jin MC, Meister KD, Megwalu UC. Pediatric Thyroid Cancer Incidence and Mortality Trends in the United States, 1973–2013. JAMA Otolaryngol Head Neck Surg. 2019;145:617–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pfeifer GP. Defining driver DNA methylation changes in human cancer. Int J Mol Sci. 2018;19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zingg JM, Jones PA. Genetic and epigenetic aspects of DNA methylation on genome expression, evolution, mutation and carcinogenesis. Carcinogenesis. 1997;18:869–82. [DOI] [PubMed] [Google Scholar]
- 15.MC Barros-Filho, MB Dos Reis, Beltrami CM, de Mello JBH, Marchi FA, Kuasne H, et al. DNA Methylation-Based Method to Differentiate Malignant from Benign Thyroid Lesions. Thyroid. 2019;29:1244–54. [DOI] [PubMed] [Google Scholar]
- 16.Nonaka D. A study of FoxA1 expression in thyroid tumors. Hum Pathol. 2017;65:217–24. [DOI] [PubMed] [Google Scholar]
- 17.Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell. 2014;159:676–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Marczyk VR, Recamonde-Mendoza M, Maia AL, Goemann IM. Classification of thyroid tumors based on DNA methylation patterns. Thyroid. 2023;33:1090–9. [DOI] [PubMed] [Google Scholar]
- 19.Bisarro Dos Reis M, Barros-Filho MC, Marchi FA, Beltrami CM, Kuasne H, Pinto CAL, et al. Prognostic Classifier Based on Genome-Wide DNA Methylation Profiling in Well-Differentiated Thyroid Tumors. J Clin Endocrinol Metab. 2017;102:4089–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Henke LE, Perkins SM, Pfeifer JD, Ma C, Chen Y, DeWees T, et al. BRAF V600E mutational status in pediatric thyroid cancer. Pediatr Blood Cancer. 2014;61:1168–72. [DOI] [PubMed] [Google Scholar]
- 21.de Sousa MSA, Nunes IN, Christiano YP, Sisdelli L, Cerutti JM. Genetic alterations landscape in paediatric thyroid tumours and/or differentiated thyroid cancer: Systematic review. Rev Endocr Metab Disord. 2024;25:35–51. [DOI] [PubMed] [Google Scholar]
- 22.Li Y, Wang Y, Li L, Qiu X. The clinical significance of BRAFV600E mutations in pediatric papillary thyroid carcinomas. Sci Rep. 2022;12:12674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Givens DJ, Buchmann LO, Agarwal AM, Grimmer JF, Hunt JP. BRAF V600E does not predict aggressive features of pediatric papillary thyroid carcinoma. Laryngoscope. 2014;124:E389–93. [DOI] [PubMed] [Google Scholar]
- 24.Lupi C, Giannini R, Ugolini C, Proietti A, Berti P, Minuto M, et al. Association of BRAF V600E mutation with poor clinicopathological outcomes in 500 consecutive cases of papillary thyroid carcinoma. J Clin Endocrinol Metab. 2007;92:4085–90. [DOI] [PubMed] [Google Scholar]
- 25.Howell GM, Hodak SP, Yip L. RAS mutations in thyroid cancer. Oncologist. 2013;18:926–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Yang AT, Lai S-TT, Laetsch TW, Bhatti T, Baloch Z, Surrey LF, et al. Molecular landscape and therapeutic strategies in pediatric differentiated thyroid carcinoma. Endocr Rev. 2025;46:397–417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Franco AT, Ricarte-Filho JC, Isaza A, Jones Z, Jain N, Mostoufi-Moab S, et al. Fusion oncogenes are associated with increased metastatic capacity and persistent disease in pediatric thyroid cancers. J Clin Oncol. 2022;40:1081–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Suchy B, Waldmann V, Klugbauer S, Rabes HM. Absence of RAS and p53 mutations in thyroid carcinomas of children after Chernobyl in contrast to adult thyroid tumours. Br J Cancer. 1998;77:952–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nikiforov YE, Nikiforova MN, Gnepp DR, Fagin JA. Prevalence of mutations of ras and p53 in benign and malignant thyroid tumors from children exposed to radiation after the Chernobyl nuclear accident. Oncogene. 1996;13:687–93. [PubMed] [Google Scholar]
- 30.Nikiforov YE. RET/PTC rearrangement in thyroid tumors. Endocr Pathol. 2002;13:3–16. [DOI] [PubMed] [Google Scholar]
- 31.Nikiforov YE, Nikiforova MN. Molecular genetics and diagnosis of thyroid cancer. Nat Rev Endocrinol. 2011;7:569–80. [DOI] [PubMed] [Google Scholar]
- 32.Nikiforov YE. Thyroid carcinoma: molecular pathways and therapeutic targets. Mod Pathol. 2008;21 Suppl 2:S37–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kaur D, Lee SM, Goldberg D, Spix NJ, Hinoue T, Li H-T, et al. Comprehensive evaluation of the infinium human methylationepic v2 beadchip. Epigenetics Commun. 2023;3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lee SM, Loo CE, Prasasya RD, Bartolomei MS, Kohli RM, Zhou W. Low-input and single-cell methods for Infinium DNA methylation BeadChips. Nucleic Acids Res. 2024;52:e38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhou W, Triche TJ, Laird PW, Shen H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 2018;46:e123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44:e71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Steiert TA, Parra G, Gut M, Arnold N, Trotta J-R, Tonda R, et al. A critical spotlight on the paradigms of FFPE-DNA sequencing. Nucleic Acids Res. 2023;51:7143–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26:1572–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Teschendorff AE, Breeze CE, Zheng SC, Beck S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformatics. 2017;18:105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Goldberg DC, Fu H, Atkins D, Moyer E, Lee CN, Deng Y, et al. KnowYourCG: Facilitating base-level sparse methylome interpretation. Sci Adv. 2025;11:eadw3027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:R115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Chen BH, Zhou W. mLiftOver: harmonizing data across Infinium DNA methylation platforms. Bioinformatics. 2024;40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2020;2:56–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Pozdeyev N, Gay LM, Sokol ES, Hartmaier R, Deaver KE, Davis S, et al. Genetic analysis of 779 advanced differentiated and anaplastic thyroid cancers. Clin Cancer Res. 2018;24:3059–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Li F, Liang A, Lv Y, Liu G, Jiang A, Liu P. MicroRNA-200c Inhibits Epithelial-Mesenchymal Transition by Targeting the BMI-1 Gene Through the Phospho-AKT Pathway in Endometrial Carcinoma Cells In Vitro. Med Sci Monit. 2017;23:5139–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Flavahan WA, Gaskell E, Bernstein BE. Epigenetic plasticity and the hallmarks of cancer. Science. 2017;357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Cunningham R, Hansen CG. The Hippo pathway in cancer: YAP/TAZ and TEAD as therapeutic targets in cancer. Clin Sci. 2022;136:197–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zhang L, Yang S, Chen X, Stauffer, Yu F, Lele SM, et al. The hippo pathway effector YAP regulates motility, invasion, and castration-resistant growth of prostate cancer cells. Mol Cell Biol. 2015;35:1350–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Liu X, Li H, Rajurkar M, Li Q, Cotton JL, Ou J, et al. Tead and AP1 coordinate transcription and motility. Cell Rep. 2016;14:1169–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chen L-H, Xie T, Lei Q, Gu Y-R, Sun C-Z. A review of complex hormone regulation in thyroid cancer: novel insights beyond the hypothalamus-pituitary-thyroid axis. Front Endocrinol (Lausanne). 2024;15:1419913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Hoffmann S, Maschuw K, Hassan I, Wunderlich A, Lingelbach S, Ramaswamy A, et al. Functional thyrotropin receptor attenuates malignant phenotype of follicular thyroid cancer cells. Endocrine. 2006;30:129–38. [DOI] [PubMed] [Google Scholar]
- 53.Platet N, Cunat, Chalbos D, Rochefort H, Garcia M. Unliganded and liganded estrogen receptors protect against cancer invasion via different mechanisms. Mol Endocrinol. 2000;14:999–1009. [DOI] [PubMed] [Google Scholar]
- 54.Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–74. [DOI] [PubMed] [Google Scholar]
- 55.Meng F, Dai L. Transcription factors TP63 facilitates malignant progression of thyroid cancer by upregulating KRT17 expression and inducing epithelial-mesenchymal transition. Growth Factors. 2023;41:71–81. [DOI] [PubMed] [Google Scholar]
- 56.Liu J, Lin PC, Zhou BP. Inflammation fuels tumor progress and metastasis. Curr Pharm Des. 2015;21:3032–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Li P, Zhang W, Wu Q, Zhang X, Zheng. Retinoid X receptor γ regulates epithelial-mesenchymal transition and tumor immune infiltration in papillary thyroid cancer tumorigenesis: an experimental and in silico study. Endocr Connect. 2025;14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Peng Q, Xie T, Wang Y, Ho VW-S, Teoh JY-C, Chiu PK-F, et al. GLIS1, Correlated with Immune Infiltrates, Is a Potential Prognostic Biomarker in Prostate Cancer. Int J Mol Sci. 2023;25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhou W, Reizel Y. On correlative and causal links of replicative epimutations. Trends Genet. 2025;41:60–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Johnstone SE, Gladyshev VN, Aryee MJ, Bernstein BE. Epigenetic clocks, aging, and cancer. Science. 2022;378:1276–7. [DOI] [PubMed] [Google Scholar]
- 61.Romero R, Sayin VI, Davidson SM, Bauer MR, Singh SX, LeBoeuf SE, et al. Keap1 loss promotes Kras-driven lung cancer and results in dependence on glutaminolysis. Nat Med. 2017;23:1362–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Saito J, Kohn AD, Roth RA, Noguchi Y, Tatsumo I, Hirai A, et al. Regulation of FRTL-5 thyroid cell growth by phosphatidylinositol (OH) 3 kinase-dependent Akt-mediated signaling. Thyroid. 2001;11:339–51. [DOI] [PubMed] [Google Scholar]
- 63.Bernstein E, Caudy AA, Hammond, Hannon GJ. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature. 2001;409:363–6. [DOI] [PubMed] [Google Scholar]
- 64.Harvey KF, Tang TT. Targeting the Hippo pathway in cancer. Nat Rev Drug Discov. 2025;24:852–69. [DOI] [PubMed] [Google Scholar]
- 65.Eferl R, Wagner EF. AP-1: a double-edged sword in tumorigenesis. Nat Rev Cancer. 2003;3:859–68. [DOI] [PubMed] [Google Scholar]
- 66.Kondo T, Nakazawa T, Ma D, Niu, Mochizuki K, Kawasaki T, et al. Epigenetic silencing of TTF-1/NKX2–1 through DNA hypermethylation and histone H3 modulation in thyroid carcinomas. Lab Invest. 2009;89:791–9. [DOI] [PubMed] [Google Scholar]
- 67.Chou C-K, Chi S-Y, Chou F-F, Huang S-C, Wang J-H, Chen C-C, et al. Aberrant Expression of Androgen Receptor Associated with High Cancer Risk and Extrathyroidal Extension in Papillary Thyroid Carcinoma. Cancers (Basel). 2020;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Parolia A, Cieslik M, Chu S-C, Xiao L, Ouchi T, Zhang Y, et al. Distinct structural classes of activating FOXA1 alterations in advanced prostate cancer. Nature. 2019;571:413–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zafon C, Gil J, Pérez-González B, Jordà M. DNA methylation in thyroid cancer. Endocr Relat Cancer. 2019;26:R415–39. [DOI] [PubMed] [Google Scholar]
- 70.Ellis RJ, Wang Y, Stevenson HS, Boufraqech M, Patel D, Nilubol N, et al. Genome-wide methylation patterns in papillary thyroid cancer are distinct based on histological subtype and tumor genotype. J Clin Endocrinol Metab. 2014;99:E329–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Minna E, Devecchi A, Pistore F, Paolini B, Mauro G, Penso DA, et al. Genomic and transcriptomic analyses of thyroid cancers identify DICER1 somatic mutations in adult follicular-patterned RAS-like tumors. Front Endocrinol (Lausanne). 2023;14:1267499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Rodrigues L, Da Cruz Paula A, Soares P, Vinagre J. Unraveling the Significance of DGCR8 and miRNAs in Thyroid Carcinoma. Cells. 2024;13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Kommoss FKF, Chong A-S, Chong A-L, Pfaff E, Jones DTW, Hiemcke-Jiwa LS, et al. Genomic characterization of DICER1-associated neoplasms uncovers molecular classes. Nat Commun. 2023;14:1677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Konishi K, Watanabe Y, Shen L, Guo, Castoro RJ, Kondo K, et al. DNA methylation profiles of primary colorectal carcinoma and matched liver metastasis. PLoS ONE. 2011;6:e27889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Vaidya H, Jelinek J, Issa J-PJ. DNA methylation, aging, and cancer. Epigenomes. 2025;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The generated thyroid methylome profiles are available in the Gene Expression Omnibus (RRID:SCR_005012) with superset accession GSE312914. Informatics for array data preprocessing and functional analysis is available in the R/Bioconductor package SeSAMe (version 3.22+): https://bioconductor.org/packages/release/bioc/html/sesame.html. Trained models, training and testing scripts are available at https://github.com/jennyznli/2025_TC. Additional raw data is available upon request from the corresponding author.
