Skip to main content
. 2023 Jan 16;14:232. doi: 10.1038/s41467-022-34828-y

Fig. 5. Cross-species analysis of DNA methylation in the human ortholog gene space identifies both conservation and divergence of promoter methylation.

Fig. 5

a UMAP representation of DNA methylation at gene promoters based on cross-mapping of reference-free consensus reference fragments to annotated reference genomes. Samples are colored by taxonomic group, and the matched reference genomes are overlayed in black. Each sample is labeled by its sample identifier (Supplementary Data 1), which is searchable and readable when zooming into the PDF of the figure. Reference genomes are annotated by their UCSC Genome Browser identifiers (e.g., aquChr2 for the golden eagle genome, as described in the Methods section). Inset: UMAP representation of scrambled data, showing the lack of clustering in a control analysis. b ROC curves for random forest classifiers using the cross-mapped dataset to distinguish between heart and liver based on promoter methylation data for birds and mammals. The solid lines are based on the actual data, while the dashed lines are based on scrambled data (as in the inset in panel a). ROC-AUC values are given for the actual data (first) and scrambled data (second). c Boxplots showing DNA methylation levels at gene promoters for the four most predictive genes in the classification of heart versus liver, aggregated by taxonomic groups and overlayed with individual data points using the species abbreviations (Supplementary Data 2). Gene names and the predictiveness (feature importance) of their promoter methylation are indicated in the header bars. P-values were calculated using a two-sided Wilcoxon test. d ROC curves for random forest classifiers using the cross-mapped dataset to distinguish between birds and mammals based on promoter methylation data for heart and liver samples. The format is identical to panel b. e Boxplots showing DNA methylation levels at gene promoters for the four most predictive genes in the classification of mammals versus birds. The format is identical to panel c. Boxplots are specified as follows: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers.