Abstract
Dynamics of nucleosome positioning affects chromatin state, transcription and all other biological processes occurring on genomic DNA. While MNase-Seq has been used to depict nucleosome positioning map in eukaryote in the past years, nucleosome positioning data is increasing dramatically. To facilitate the usage of published data across studies, we developed a database named nucleosome positioning map (NucMap, http://bigd.big.ac.cn/nucmap). NucMap includes 798 experimental data from 477 samples across 15 species. With a series of functional modules, users can search profile of nucleosome positioning at the promoter region of each gene across all samples and make enrichment analysis on nucleosome positioning data in all genomic regions. Nucleosome browser was built to visualize the profiles of nucleosome positioning. Users can also visualize multiple sources of omics data with the nucleosome browser and make side-by-side comparisons. All processed data in the database are freely available. NucMap is the first comprehensive nucleosome positioning platform and it will serve as an important resource to facilitate the understanding of chromatin regulation.
INTRODUCTION
Eukaryotic genomic DNA is tightly packaged into compacted nucleosome arrays, which are the fundamental units of chromatin structure (1). The term ‘nucleosome positioning’ is widely used to indicate where nucleosomes occupy on genomic DNA sequence (2–4). In the nucleus, nucleosomes dynamically transform between depletion and de novo occupation on genomic DNA, affecting all biological processes occurring on genomic DNA (5–7). It has been further reported that nucleosome positioning affects transcription initiation and elongation (8). Transcriptional machinery must access to chromatin to trigger sequential gene transcription (7), while nucleosome organization can influence gene activity by controlling the accessibility of transcriptional factor binding sites (9). Some studies suggested that nucleosome positioning influences the evolution of DNA sequence (10–12) since DNA repair machinery has different preferential access between linker DNA and nucleosomal DNA (13).
Until now, many different methods have been developed to mapping nucleosomes, such as predicting nucleosome positioning based on DNA sequence features (14,15), histone ChIP-Seq (16), or chromatin accessibility profiles (17). However, all these methods have limitation on either resolution or genome-wide coverage. MNase-Seq is another prevalent experimental approach in nucleosome mapping. In this approach, chromatin is digested with micrococcal nuclease and then followed by deep sequencing (18,19). Based on MNase-Seq, many computational tools have been developed to facilitate the application of this technology (20). Several programs have been published to identify nucleosome peaks, such as DANPOS (21) and iNPS (22). To better understand the role of nucleosome, it is very important to compare nucleosome profiles across different conditions or cell types. Multiple tools were developed to identify differential nucleosome regions, such as DANPOS (21), DiNuP (23), and Dimnp (24). In the past years, a large number of studies has employed MNase-Seq to depict nucleosome positioning map in eukaryote ranging from yeast to human (5,18,25,26). Consequently, MNase-Seq data is rapidly growing across a wide variety of organisms. It is imperative to build a platform to collect and integrate all published data and make datasets from different studies reusable and comparable, which will largely help biologist to further understand the biology behind nucleosome positioning. However, no such a database or platform was reported. To fill this gap, here we present NucMap, a database of genome-wide nucleosome positioning map across species. Based on a large collection of raw sequence data from published studies, NucMap is dedicated to integrating, analyzing, and visualizing nucleosome positioning data across species.
DATABASE IMPLEMENTATION
All raw MNase-Seq data were downloaded from GEO and ENCODE, processed by in-house pipeline, and then imported into the NucMap database. The main framework of NucMap was developed based on PHP and MySQL, which are a popular and open source script language and a relational database management system for web development, respectively. JQuery and Bootstrap were used to design the front-end web interface. AJAX (Asynchronous JavaScript And XML), a set of Web development techniques, was used to create asynchronous bioinformatics application running in the back-end. Back-end bioinformatics applications were implemented with Python and Bash. JBrowser (27) was integrated to visualize nucleosome positioning data.
DATABASE CONTENT AND USAGE
Overview of NucMap
Currently, we have collected and processed 798 experimental datasets from 477 samples across 15 species. All functionalities in NucMap are organized into four modules: search, browse, analysis and download.
Searching NucMap
We have developed two types of search modules in NucMap, which are sample-based and gene-based. Sample-based search mainly focuses on helping users to find the samples they are interested in (Figure 1). Shortcut links can be used to obtain all samples for specific species. Users can also search sample of interest with accession number in GEO and ENCODE or sample feature. With hyperlinked sample ID, users can access more specific information for each sample, including original data source, original publication, all downloadable data for this sample and other related omics samples. On the page of search result, the buttons ‘View selected samples’ and ‘Analyzed selected samples’ connect searching results to other modules in NucMap. With these two buttons, users can visualize nucleosome positioning data from selected samples in the nucleosome browser or perform analysis on selected samples with the online analysis module, which will be introduced in the later sections.
Promoter-associated nucleosome free region (NFR) is related to promoter-proximal pausing to enable precise gene regulation (28,29). Gene-based searching helps users to checkup nucleosome positioning at the promoter region of an individual gene. Both gene name and transcript name are supported in searching. The number of nucleosome peaks at transcription start site were provided in different genomic ranges. The positions of +1 and −1 nucleosome are provided to check up the size of NFR at promoter. Moreover, nucleosome density information across all samples in the same species will be shown in the same table, so that users can make side-by-side comparison (Figure 2; Table 1).
Table 1.
Species | #Experiment | #Samples |
---|---|---|
Arabidopsis thaliana | 19 | 12 |
Caenorhabditis elegans | 21 | 11 |
Candida albicans | 4 | 2 |
Danio rerio | 4 | 2 |
Drosophila melanogaster | 124 | 70 |
Homo sapiens | 71 | 50 |
Mus musculus | 215 | 106 |
Neurospora crassa | 10 | 6 |
Oryza sativa | 3 | 1 |
Plasmodium falciparum | 9 | 9 |
Saccharomyces cerevisiae | 284 | 186 |
Schizosaccharomyces pombe | 18 | 14 |
Trypanosoma brucei | 8 | 4 |
Xenopus laevis | 6 | 2 |
Zea mays | 2 | 2 |
Nucleosome browser
To facilitate browsing nucleosome profile at single-base resolution, NucMap has deployed a nucleosome browser based on the open source program JBrowser. In the nucleosome browser, each species has an independent browser instance and track selector. With track selector, users can load or unload the tracks for all processed genomic data, including raw reads density and nucleosome peaks analyzed by different algorithms (Figure 3A). With interactive buttons and interfaces, users can choose tracks of interest, and zoom in/out and highlight any genomic region on the whole genome. This feature will help users to check every detail regarding nucleosome occupancy on each individual gene or genomic region. Meanwhile, users can also directly load track files from their local computer or a third-part database to the nucleosome browser without uploading data to our server (Figure 3B). Therefore, nucleosome browser can help users quickly make side-by-side comparison across multiple relevant genomic track files. For example, biologist can load DNA methylation data or histone ChIP-Seq data into the same browser session and obtain a comprehensive overview of chromatin state around a gene of interest.
Analysis
Genome-wide enrichment analysis is a popular method to understand global features in omics data. To help users make a global analysis on nucleosome positioning patterns, we have developed online analysis module. This module can characterize nucleosome occupancy profile at all genomic regions (Figure 4). Users can also classify the regions of interest into multiple groups according to the purpose of their studies, so that they can compare the difference of enrichment curves among groups. Both normalized raw reads and nucleosome peaks are supported in the enrichment analysis. Finally, publication-quality figures will be presented. All operations are based on web interface, and there is no requirement for prior knowledge regarding bioinformatics tools and programming.
Download
All processed nucleosome positioning data are freely available. The data for each sample include three levels; (i) processed reads: bigwig track based on aligned reads and aligned reads after enhancing signal; (ii) nucleosome peaks: nucleosome peaks identified by iNPS and DANPOS; (iii) annotated peaks and reads: nucleosome peaks annotated to nearest TSS, the matrix of peak count around each TSS and the matrix of aligned reads count around each TSS, which were normalized to RPM (Reads Per Million). For each species, all these data were organized in two ways on the download page, by sample and by data type. Users can also visualize our data with our online links in their local browser or other online genome browsers.
FUTURE DIRECTIONS
MNase-Seq is an important approach to study the role of nucleosome in transcriptional regulation. With an increasing usage of MNase-Seq in eukaryotes, nucleosome positioning data is rapidly growing. NucMap is the first open resource and platform for nucleosome positioning data from MNase-Seq across species. All available MNase-Seq data in GEO and ENCODE up to date are included in NucMap. As one of important database resources in BIG Data Center (30), NucMap will be continuously collecting and integrating published data.
Nowadays, biologists usually integrate and analyze multiple-scales omics data to study transcriptional regulation. Nucleosome positioning is one type of chromatin state information. To deeply understand chromatin biology, we will make NucMap compatible to other public epigenomics databases, such as MethBank (31), Cistrome (32) and ENCODE (33). Thus, datasets in other repositories, such as DNA methylation data, histone and transcription factor ChIP-Seq, can be directly loaded and compared with nucleosome positioning data in NucMap. Based on comprehensive analysis on cross-omics data, biologists will therefore learn more about chromatin regulation.
ACKNOWLEDGEMENTS
We thank members of the BIG Data Center for reporting bugs and sending comments and Dr Jinfeng Shao and Dr Gjalt G. Wybenga for valuable discussion and suggestion on paper writing.
FUNDING
National Key Research Program of China [2016YFB0201702 to J.X.]; Promoting Big Data Development Project, the National Development and Reform Commission of China [2016-999999-65-01-000696-07 to J.X.]; National Natural Science Foundation of China [31771465 and 31471248 to J.X.]; International Partnership Program of the Chinese Academy of Sciences [153F11KYSB20160008]; Key Program of the Chinese Academy of Sciences [KJZD-EW-L14 to J.X.]; The 13th Five-year Informatization Plan of Chinese Academy of Sciences [XXH13505-05 to J.X.]. Funding for open access charge: National Key Research Program of China [2016YFB0201702].
Conflict of interest statement. None declared.
REFERENCES
- 1. Luger K., Mader A.W., Richmond R.K., Sargent D.F., Richmond T.J.. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature. 1997; 389:251–260. [DOI] [PubMed] [Google Scholar]
- 2. Drew H.R., Travers A.A.. DNA bending and its relation to nucleosome positioning. J. Mol. Biol. 1985; 186:773–790. [DOI] [PubMed] [Google Scholar]
- 3. Segal E., Fondufe-Mittendorf Y., Chen L., Thastrom A., Field Y., Moore I.K., Wang J.P., Widom J.. A genomic code for nucleosome positioning. Nature. 2006; 442:772–778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Struhl K., Segal E.. Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 2013; 20:267–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Teif V.B., Vainshtein Y., Caudron-Herger M., Mallm J.P., Marth C., Hofer T., Rippe K.. Genome-wide nucleosome positioning during embryonic stem cell development. Nat. Struct. Mol. Biol. 2012; 19:1185–1192. [DOI] [PubMed] [Google Scholar]
- 6. Jiang C., Pugh B.F.. Nucleosome positioning and gene regulation: advances through genomics. Nat. Rev. Genet. 2009; 10:161–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Bai L., Morozov A.V.. Gene regulation by nucleosome positioning. Trends Genet. 2010; 26:476–483. [DOI] [PubMed] [Google Scholar]
- 8. Jin Y.S., Heim S., Mandahl N., Biorklund A., Wennerberg J., Mitelman F.. Unrelated clonal chromosomal aberrations in carcinomas of the oral cavity. Genes Chromosomes Cancer. 1990; 1:209–215. [DOI] [PubMed] [Google Scholar]
- 9. Voong L.N., Xi L., Sebeson A.C., Xiong B., Wang J.P., Wang X.. Insights into nucleosome organization in mouse embryonic stem cells through chemical mapping. Cell. 2016; 167:1555–1570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Dai Z., Dai X., Xiang Q.. Genome-wide DNA sequence polymorphisms facilitate nucleosome positioning in yeast. Bioinformatics. 2011; 27:1758–1764. [DOI] [PubMed] [Google Scholar]
- 11. Warnecke T., Batada N.N., Hurst L.D.. The impact of the nucleosome code on protein-coding sequence evolution in yeast. PLoS Genet. 2008; 4:e1000250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Washietl S., Machne R., Goldman N.. Evolutionary footprints of nucleosome positions in yeast. Trends Genet. 2008; 24:583–587. [DOI] [PubMed] [Google Scholar]
- 13. Shim E.Y., Hong S.J., Oum J.H., Yanez Y., Zhang Y., Lee S.E.. RSC mobilizes nucleosomes to improve accessibility of repair machinery to the damaged chromatin. Mol. Cell Biol. 2007; 27:1602–1613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Chen K., Wang L., Yang M., Liu J., Xin C., Hu S., Yu J.. Sequence signatures of nucleosome positioning in Caenorhabditis elegans. Genomics Proteomics Bioinformatics. 2010; 8:92–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Zhang J., Peng W., Wang L.. LeNup: learning nucleosome positioning from DNA sequences with improved convolutional neural networks. Bioinformatics. 2018; 34:1705–1712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhang Y., Shin H., Song J.S., Lei Y., Liu X.S.. Identifying positioned nucleosomes with epigenetic marks in human from ChIP-Seq. BMC Genomics. 2008; 9:537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Zhong J., Luo K., Winter P.S., Crawford G.E., Iversen E.S., Hartemink A.J.. Mapping nucleosome positions using DNase-seq. Genome Res. 2016; 26:351–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Cui K., Zhao K.. Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-Seq. Methods Mol. Biol. 2012; 833:413–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Schones D.E., Cui K., Cuddapah S., Roh T.Y., Barski A., Wang Z., Wei G., Zhao K.. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008; 132:887–898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Teif V.B. Nucleosome positioning: resources and tools online. Brief. Bioinform. 2016; 17:745–757. [DOI] [PubMed] [Google Scholar]
- 21. Chen K., Xi Y., Pan X., Li Z., Kaestner K., Tyler J., Dent S., He X., Li W.. DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing. Genome Res. 2013; 23:341–351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Chen W., Liu Y., Zhu S., Green C.D., Wei G., Han J.D.. Improved nucleosome-positioning algorithm iNPS for accurate nucleosome positioning from sequencing data. Nat. Commun. 2014; 5:4909. [DOI] [PubMed] [Google Scholar]
- 23. Fu K., Tang Q., Feng J., Liu X.S., Zhang Y.. DiNuP: a systematic approach to identify regions of differential nucleosome positioning. Bioinformatics. 2012; 28:1965–1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Liu L., Xie J., Sun X., Luo K., Qin Z.S., Liu H.. An approach of identifying differential nucleosome regions in multiple samples. BMC Genomics. 2017; 18:135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Valouev A., Ichikawa J., Tonthat T., Stuart J., Ranade S., Peckham H., Zeng K., Malek J.A., Costa G., McKernan K. et al. . A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res. 2008; 18:1051–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Weiner A., Hughes A., Yassour M., Rando O.J., Friedman N.. High-resolution nucleosome mapping reveals transcription-dependent promoter packaging. Genome Res. 2010; 20:90–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Skinner M.E., Uzilov A.V., Stein L.D., Mungall C.J., Holmes I.H.. JBrowse: a next-generation genome browser. Genome Res. 2009; 19:1630–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Gilchrist D.A., Dos Santos G., Fargo D.C., Xie B., Gao Y., Li L., Adelman K.. Pausing of RNA polymerase II disrupts DNA-specified nucleosome organization to enable precise gene regulation. Cell. 2010; 143:540–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Jimeno-Gonzalez S., Ceballos-Chavez M., Reyes J.C.. A positioned +1 nucleosome enhances promoter-proximal pausing. Nucleic Acids Res. 2015; 43:3068–3078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Members, B.I.G.D.C. Database resources of the BIG Data Center in 2018. Nucleic Acids Res. 2018; 46:D14–D20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Li R., Liang F., Li M., Zou D., Sun S., Zhao Y., Zhao W., Bao Y., Xiao J., Zhang Z.. MethBank 3.0: a database of DNA methylomes across a variety of species. Nucleic Acids Res. 2018; 46:D288–D295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Liu T., Ortiz J.A., Taing L., Meyer C.A., Lee B., Zhang Y., Shin H., Wong S.S., Ma J., Lei Y. et al. . Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 2011; 12:R83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Consortium E.P. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]