Abstract
In summary, PlantEMS is designed to advance plant epigenetics research by providing a comprehensive repository of multi-omics and multi-modification data. This resource enables detailed investigations into the epigenetic regulatory mechanisms underlying essential plant traits and responses, potentially informing innovative strategies for crop management, monitoring, and development.
Dear Editor,
Epigenetic modifications are regulatory codes that control gene expression and can be stably inherited without alterations in the genomic sequence (Allis and Jenuwein, 2016). Numerous studies have demonstrated that epigenetic modifications regulate critical biological processes in plants, such as growth, development, response to environmental stressors, and reproduction (Zhang et al., 2023; Bulgakov, 2024).
The use of high-throughput sequencing technologies in plant research has provided insights into the epigenetic “landscape” of organisms, driving the development of numerous bioinformatics databases. However, none of the existing databases comprehensively encompass genomics, transcriptomics, and proteomics. Additionally, current epigenetic modification databases lack specialized chemical modification data tailored to multiple plant species.
Here, we introduce PlantEMS (Figure 1), a free, registration-free, specialized database for plants (http://plantems.lin-group.cn/plantems/index.jsp) that integrates multi-omics data on epigenetic modification sites, including DNA, RNA, and post-translational modifications (PTMs). As the first multi-omics database in plant epigenetics, PlantEMS is a collection of 57 types of chemical modifications spanning genomics, transcriptomics, and proteomics from 54 plant species, with a total of 13 175 137 modification sites. To facilitate analysis and visualization, PlantEMS offers powerful functionalities, including species information overviews, data quality browsing, feature distribution visualization, and an integrated genome browser. Additionally, PlantEMS provides a machine-learning-based web server for multiple species and modification types, enabling precise and efficient prediction of epigenetic modification sites. This machine-learning-based online predictor facilitates large-scale, high-precision annotation of multi-omics epigenetic modification data. The ability of PlantEMS to integrate epigenetic information from multi-omics and multiple modification types distinguishes it from previously developed plant-specific epigenetic modification databases such as PRMD and AraENCODE (Wang et al., 2023; Lang et al., 2024) (Supplemental Figure 1), further enhancing its value for plant biology studies.
Figure 1.
Overview of PlantEMS
PlantEMS compiles experimentally confirmed epigenetic modification datasets from published literature and existing databases, covering 54 plant species. PlantEMS includes 13 009 568 DNA modification sites of four types from six plants, 33 337 RNA modification sites of 28 types from 33 plants, and 132 232 PTM sites of 25 types from 28 plants. Additionally, PlantEMS integrates 12 omics-type-species prediction tools.
Data summary
The PlantEMS database encompasses epigenetic modification data from 54 distinct plant genera, derived from genomics, transcriptomics, and proteomics. Among these, Arabidopsis thaliana and Fragaria vesca feature comprehensive datasets spanning all three omics levels (Supplemental Figure 2A). The current version of PlantEMS includes a comprehensive array of 13 175 137 modification sites across these omics disciplines. Specifically, it incorporates 13 009 568 samples of four DNA modification types from six plants, 33 337 samples of 28 RNA modification types from 33 plants, and 132 232 samples of 25 PTM types from 28 plants (Supplemental Figures 2B–2D).
Within the realm of DNA modifications, 5-methylcytosine modifications are predominant, totaling 10 169 579 instances, followed by N4-methylcytosine modifications, which account for 1 924 790 instances. The least frequent are 5-hydroxymethylcytosine modifications, with 3969 instances noted (Supplemental Figure 3A). N6-methyladenine modifications are present in six plant species (Arabidopsis thaliana, Oryza sativa ssp. Nipponbare, Oryza sativa ssp. Japonica, Rosa chinensis, Casuarina equisetifolia, and Fragaria vesca), while 5-hydroxymethylcytosine modifications are exclusive to Oryza sativa ssp. Nipponbare (Supplemental Figure 3B). Additionally, the top 10 genes with the highest number of modification sites were identified (Supplemental Figure 3C).
There are 33 337 modification sites in RNA, representing 28 RNA modification types from 33 plants. Among RNA modifications, N6-methyladenosine modifications are the most abundant, with a total of 21 129 sites (Supplemental Figure 3D). Common modifications across the 33 species include Y, i6A|t6A, and m1A|m1l|ms2i6A, found in 27, 25, and 24 plants, respectively (Supplemental Figure 3E). Notably, the RPOB gene exhibits the highest number of RNA modification sites (Supplemental Figure 3F).
At the proteomic level, phosphorylation sites are the most prevalent, numbering 41 888, closely followed by 2-hydroxyisobutyrylation, which accounts for 23 059 instances (Supplemental Figure 3G). Phosphorylation, as the most ubiquitous modification type, is observed in 14 out of the 28 plants (Supplemental Figure 3H). Among the top 10 proteins, ATCG00490.1 has 118 PTM sites (Supplemental Figure 3I).
Browse PlantEMS
We have developed three browsing modules corresponding to epigenetic modifications of the genome, transcriptome, and proteome. At the top of the website, users can switch between different omics modules (Supplemental Figure 4, top). Within each module, users can select the plant species and modification type to browse. Upon clicking the “browse” button, a corresponding information panel appears below, where users can view specific details on different modifications, including data sources, sequencing techniques, feature distributions, and data quality (Supplemental Figure 4, bottom).
On the DNA and RNA browsing pages, users can access overview, density, and genomic feature sections. In the overview section, users can click on the “PMID” hyperlink to access additional information. The density section shows the distribution density of different modification types along the chromosomes. The genomic feature section displays the number of modification sites in various genomic regions (such as exons, introns, and intergenic regions) of the queried species. The DNA page includes information on coverage and score. Coverage refers to the number of sequencing reads that cover a specific genomic position. Higher coverage provides stronger evidence supporting the methylation status of the location, thereby enhancing the accuracy and reliability of methylation detection. The score is a quality metric used to indicate the reliability of methylation events detected at specific genomic locations. A higher score reflects greater confidence in the methylation status of that site.
On the PTM browsing page, we provide overview, mass error, and peptide score sections. The mass error refers to the difference between the measured ion mass (m/z value) and the theoretical mass. This metric reflects the deviation between experimental data and theoretical or expected data, typically expressed in ppm or Da. The peptide score is a metric used to evaluate the confidence of specific peptide identification results. This score is based on the degree of match between the mass spectrometry data of peptides obtained from experiments and the theoretical peptide data in the database.
Search PlantEMS
To facilitate data retrieval, we have developed search modules across three omics levels. At the top of the website, users can switch between different omics modules to perform searches (Supplemental Figure 5, top). PlantEMS offers a variety of search methods to help users retrieve specific query results. For DNA and RNA queries, available search options include search by gene, search by position, search by modification type, and search by ID. For PTM queries, the provided methods are search by species, search by peptide, search by accession, search by modification type, and search by ID.
After selecting a specific search method, users can input the required parameters to perform the search (Supplemental Figure 5, middle left: using search by gene in the DNA module as an example). On the search results page, users can choose to prioritize the display of information of interest (Supplemental Figure 5, middle right). For PTM search results, we have included additional information regarding the crosstalk of sequence sites. The query parameters specified by the user are displayed above the query results (Supplemental Figure 5, bottom). For DNA and RNA search results, users can click on the “view” hyperlink to access a graphical user interface with an integrated genome browser for visualizing aligned reads and regions of interest. For PTM search results, users can click on the “accession” hyperlink to access the UniProt database for more information.
Prediction tools in PlantEMS
To assist with the precise and efficient computational prediction of epigenetic modification sites, we have provided machine-learning prediction tools for identifying modification sites across multiple species and modification types. At the top of the navigation bar, users can freely select site prediction tools from three different omics categories (Supplemental Figure 6, top). Upon entering the respective omics tool page, users can choose the modification type and species of interest and then either input the relevant prediction sequence directly or click the “upload” button to submit a FASTA file. Finally, they can click the “submit” button to initiate the prediction (Supplemental Figure 6, bottom right). The resulting prediction page will display the current sequence, modification sites, and prediction results. Users can click the “download” button to retrieve the prediction data (Supplemental Figure 6, bottom left). A total of 12 online predictors are available in the current version of PlantEMS. At the genomic level, we offer tools for identifying N4-methylcytosine in two species (Casuarina equisetifolia and Fragaria vesca) and N6-methyladenine in three species (Arabidopsis thaliana, Casuarina equisetifolia, and Fragaria vesca). At the transcriptomic level, a tool is available for identifying 5-methylcytidine in A. thaliana. At the proteomic level, we provide tools for identifying six different PTMs in Oryza sativa, including 2-hydroxyisobutyrylation, acetylation, crotonylation, malonylation, succinylation, and ubiquitination. Details on the model construction process and performance evaluation can be found in the supplemental information and in references Lv et al. (2020a, 2020b, 2021) and Dao et al. (2020).
Download data from PlantEMS
For data download, we have developed three separate modules for accessing modification data from the three omics levels. Users can navigate to the corresponding omics page, select a species, and click to download the relevant data. All data are available for download without restrictions.
Funding
This work was supported by the National Natural Science Foundation of China (82130112 and 62402089), the Natural Science Foundation of Sichuan Province (2025ZNSFSC1465), and the China Postdoctoral Science Foundation (2023TQ0047 and GZC20230380).
Acknowledgments
No conflict of interest is declared.
Author contributions
Conceptualization, F.D., M.J.F., H. Lv, and H. Lin; writing – original draft preparation, F.D. and X.X.; writing – review & editing, M.J.F., H. Lv, and H. Lin; methodology, F.D., H.Z., C.W., and L.N.; visualization, X.X., W.S., H. Lai, and B.L.; software, H.Z. and Z.G.; data curation, Z.G., C.W., Y.W., F.H., X.L., S.X., D.G., Y.Y., and J.H.; formal analysis, Y.Z., S.L., Y.H., and C.L.C.Y.; supervision, M.J.F., H. Lv, and H. Lin
Published: December 20, 2024
Footnotes
Supplemental information is available at Plant Communications Online.
Contributor Information
Melissa Jane Fullwood, Email: mfullwood@ntu.edu.sg.
Hao Lin, Email: hlin@uestc.edu.cn.
Hao Lv, Email: hao.lyu@uestc.edu.cn.
Supplemental information
References
- Allis C.D., Jenuwein T. The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 2016;17:487–500. doi: 10.1038/nrg.2016.59. [DOI] [PubMed] [Google Scholar]
- Bulgakov V.P. Chromatin modifications and memory in regulation of stress-related polyphenols: finding new ways to control flavonoid biosynthesis. Crit. Rev. Biotechnol. 2024;44:1478–1494. doi: 10.1080/07388551.2024.2336529. [DOI] [PubMed] [Google Scholar]
- Dao F.Y., Lv H., Yang Y.H., Zulfiqar H., Gao H., Lin H. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput. Struct. Biotechnol. J. 2020;18:1084–1091. doi: 10.1016/j.csbj.2020.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lang X., Yu C., Shen M., Gu L., Qian Q., Zhou D., Tan J., Li Y., Peng X., Diao S., et al. PRMD: an integrated database for plant RNA modifications. Nucleic Acids Res. 2024;52:D1597–D1613. doi: 10.1093/nar/gkad851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lv H., Zhang Z.M., Li S.H., Tan J.X., Chen W., Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Briefings Bioinf. 2020;21:982–995. doi: 10.1093/bib/bbz048. [DOI] [PubMed] [Google Scholar]
- Lv H., Dao F.Y., Guan Z.X., Yang H., Li Y.W., Lin H. Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Briefings Bioinf. 2021;22 doi: 10.1093/bib/bbaa255. [DOI] [PubMed] [Google Scholar]
- Lv H., Dao F.Y., Zhang D., Guan Z.X., Yang H., Su W., Liu M.L., Ding H., Chen W., Lin H. iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes. iScience. 2020;23 doi: 10.1016/j.isci.2020.100991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Z., Liu M., Lai F., Fu Q., Xie L., Fang Y., Zhou Q., Li G. AraENCODE: A comprehensive epigenomic database of Arabidopsis thaliana. Mol. Plant. 2023;16:1113–1116. doi: 10.1016/j.molp.2023.06.005. [DOI] [PubMed] [Google Scholar]
- Zhang H., Jin Z., Cui F., Zhao L., Zhang X., Chen J., Zhang J., Li Y., Li Y., Niu Y., et al. Epigenetic modifications regulate cultivar-specific root development and metabolic adaptation to nitrogen availability in wheat. Nat. Commun. 2023;14:8238. doi: 10.1038/s41467-023-44003-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

