Abstract
Summary: Nuclear receptors (NRs) are a class of transcription factors playing important roles in various biological processes. An NR often impacts numerous genes and different NRs share overlapped target networks. To fulfil the need for a database incorporating binding sites of different NRs at various conditions for easy comparison and visualization to improve our understanding of NR binding mechanisms, we have developed NURBS, a database for experimental and predicted nuclear receptor binding sites of mouse (NURBS). NURBS currently contains binding sites across the whole-mouse genome of 8 NRs identified in 40 chromatin immunoprecipitation with massively parallel DNA sequencing experiments. All datasets are processed using a widely used procedure and same statistical criteria to ensure the binding sites derived from different datasets are comparable. NURBS also provides predicted binding sites using NR-HMM, a Hidden Markov Model (HMM) model.
Availability: The GBrowse-based user interface of NURBS is freely accessible at http://shark.abl.ku.edu/nurbs/. NR-HMM and all results can be downloaded for free at the website.
Contact: jwfang@ku.edu
1 INTRODUCTION
Nuclear receptors (NRs) are a class of ligand-regulated transcription factors involved in many important biological processes, such as development and metabolism, and implicated in various human diseases, including diabetes, cardiovascular disease and cancers (Overington et al., 2006). A single NR often impacts numerous genes, and different NRs may compete for target sites, resulting in overlapped target gene networks (Cotnoir-White et al., 2011). Therefore, it is important to analyse and compare the binding sites and target genes of various NRs and determine the cross-talk between NRs. The recent development of chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-Seq) has greatly advanced the NR binding site detection (Johnson et al., 2007). Consequently, data of experimental NR binding sites have been accumulated rapidly. Although there is limited effort to collect and archive NR-related ChIP-Seq data (Kennedy et al., 2010; Tang et al., 2011), to the best of our knowledge, no comprehensive database of NR binding sites across whole genome has been developed for easy comparing binding sites for different NRs.
It is well known that the binding sites of various NRs share a similar sequence pattern that contains two half sites with a consensus sequence of RGKTCA or its reverse complementary with variable intervals from 0 to 8 base pairs (Sandelin and Wasserman, 2005). There are three forms of such a pattern: direct repeat, inverted repeat, everted repeat and their reverse complement. Several methods have been used to predict NR binding sites (Bulyk, 2003; Cartharius et al., 2005; Denver and Williamson, 2009; Grau et al., 2006; Sandelin et al., 2004; Varga, 2010). One widely used method is NHR-Scan (Sandelin and Wasserman, 2005), a Hidden Markov Model (HMM) using 107 experimentally determined NRs to predict two half sites simultaneously. However, NHR-Scan is no longer actively maintained, and its web-based interface does not allow predicting binding sites at a large scale. To take advantage of recently discovered binding sites and also to provide a standalone application for genome-scale response element prediction to the community, we have developed NR-HMM, a virtual basic application implementing an HMM model based on the NHR-Scan algorithm. The new model was trained on 151 experimentally verified binding sites.
We present NURBS, a database of binding sites across the whole-mouse genome of 8 NRs identified in 40 ChIP-Seq along with predicted binding sites of mouse genome using NR-HMM.
2 IMPLEMENTATION
The ChIP-Seq data used in NURBS are collected from NCBI (Kodama et al., 2012). We regularly search the NCBI database for each NR by its nomenclature committee name, abbreviation and common name. At the time the manuscript was written, NURBS had 40 ChIP-Seq datasets for 8 NRs. The information regarding these datasets and corresponding NRs is summarized in an online table available in the NURBS website.
To allow comparing multiple datasets, we perform genome alignment and peak annotation for all collected datasets using a well established procedure. In brief, all sequenced reads are aligned to mm9 mouse reference genome (http://genome.ucsc.edu/) (Dreszer et al., 2012) using bowtie (version 0.12.7) (Langmead et al., 2009). All peaks are detected using the model-based analysis for ChIP-Seq (MACS v.1.4.1) (Zhang et al., 2008). The Poisson p-value cut-off for peak detection is set to 10−5. Overlapped peaks are split using Mali Salmon’s Peak Splitter program (http://www.ebi.ac.uk/bertone/software.html). The peak annotation is done using R package ChIPpeakAnno (Zhu et al., 2010).
NR-HMM was built using NR binding sites collected in previous work and JASPAR CORE (http://jaspar.cgb.ki.se/cgi-bin/jaspar_db.pl) (Sandelin and Wasserman, 2005). The sequences of the collected binding sites were manually inspected, and only those with two half sites were kept. The final dataset contained 151 NR binding sites. The NR-HMM model is built using the framework of NHR-Scan (Sandelin and Wasserman, 2005). The parameters of HMM were estimated using Baum–Welch algorithm. The transmission probability matrix was estimated based on the frequency of each state in the training data. Laplace’s rule pseudo count was applied to estimate the emission probability matrix. The NR-HMM is then used to predict NR binding sites in mouse genome (mm9), available as a GBrowse track in NURBS.
The NURBS is implemented as a MySQL relational database and uses the Generic Genome Browser (GBrowse) (Donlin, 2009) as its web browser based interface. GBrowse is user-friendly and highly customizable. Links to the Mouse Genome Informatics database (MGI, http://www.informatics.jax.org/) are available in NURBS wherever possible. Because MGI also uses GBrowse as its user interface, all MGI annotation trackers can be easily downloaded and enabled in NURBS.
The main search page of NURBS is highly configurable by users. Detailed examples are provided in an online help file. In brief, users can click track buttons to select datasets for visualization and comparison. The coverage option allows users to choose whether to display the binding region intensities. NURBS supports searches by name and region. A search result page displays selected tracks of experimental data, predicted peaks by NR-HMM and annotations. A hypertext link is available for a peak, leading to additional information, such as sequence and genome location. Similarly, when an annotated feature such as a gene is visible in the result page, a hyperlink to the MGI database is provided so the users can access more information such as genetic map position, mammalian homology information, gene ontology and expression, as well as links to other public databases such as Ensembl genome browser, University of California at Santa Cruz (UCSC) genome browser and NCBI.
3 CONCLUSION
We have developed NURBS, a web-based database for experimental and predicted NR binding sites in mouse genome. It has a customizable and user-friendly interface easy for navigating, searching and comparing experimental and predicted NR binding sites for multiple NRs. All the data and the HMM model are freely available for download. Currently, we are incorporating transcriptome data in NURBS for more advanced studies and expanding the database to human NRs for cross-species comparison. We intend to make NURBS a database open to the community and encourage users to provide feedback and submit new data and references. We have been actively populating the database and plan to maintain regular updates for the years to come.
NURBS is distinct from other existing NR-related databases. It is developed to provide researchers a convenient way to compare experimental and predicted binding sites for various NRs, along with genomic annotations. Other existing NR databases are devoted to the sequences, structures and functions (Martinez et al., 1998), mutations (Van Durme et al., 2003; Vroling et al., 2012) and phylogenies (Ruau et al., 2004) of NRs, not their binding sites. Recently, Ochsner et al. (2012) set-up a database, Transcriptomine, which is focused on the expression and function of NRs-related genes. The well developed Cistrome (Tang et al., 2011) provides information about a number of NRs and their co-factors and epigenomic information, and it is better suited for investigating individual NR, as it does not provide features for comparison between NRs. In addition, Cistrome dedicates genomic visualization to remote UCSC genome browser, whereas NURBS uses an integrated GBrowse, also used by MGI.
Acknowledgement
The authors thank Dr Shan Gao for his assistance.
Funding: National Institutes of Health (DK092100 and CA53596 to Y.W. and DK090036 to G.G.).
Conflict of Interest: none declared.
References
- Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. doi: 10.1186/gb-2003-5-1-201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cartharius K, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics. 2005;21:2933–2942. doi: 10.1093/bioinformatics/bti473. [DOI] [PubMed] [Google Scholar]
- Cotnoir-White D, et al. Evolution of the repertoire of nuclear receptor binding sites in genomes. Mol. Cell. Endocrinol. 2011;334:76–82. doi: 10.1016/j.mce.2010.10.021. [DOI] [PubMed] [Google Scholar]
- Denver RJ, Williamson KE. Identification of a thyroid hormone response element in the mouse Kruppel-like factor 9 gene to explain its postnatal expression in the brain. Endocrinology. 2009;150:3935–3943. doi: 10.1210/en.2009-0050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Donlin MJ. Using the Generic Genome Browser (GBrowse) Curr. Protoc. Bioinformatics. 2009;28:9.9.1–9.9.25. doi: 10.1002/0471250953.bi0909s28. [DOI] [PubMed] [Google Scholar]
- Dreszer TR, et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 2012;40:D918–D923. doi: 10.1093/nar/gkr1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grau J, et al. VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res. 2006;34:W529–W533. doi: 10.1093/nar/gkl212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson DS, et al. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316:1497–1502. doi: 10.1126/science.1141319. [DOI] [PubMed] [Google Scholar]
- Kennedy BA, et al. HRTBLDb: an informative data resource for hormone receptors target binding loci. Nucleic Acids Res. 2010;38:D676–D681. doi: 10.1093/nar/gkp734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kodama Y, et al. The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 2012;40:D54–D56. doi: 10.1093/nar/gkr854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinez E, et al. The Nuclear Receptor Resource: a growing family. Nucleic Acids Res. 1998;26:239–241. doi: 10.1093/nar/26.1.239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ochsner SA, et al. Transcriptomine, a web resource for nuclear receptor signaling transcriptomes. Physiol. Genomics. 2012;44:853–863. doi: 10.1152/physiolgenomics.00033.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Overington JP, et al. Opinion—how many drug targets are there? Nat. Rev. Drug Discov. 2006;5:993–996. doi: 10.1038/nrd2199. [DOI] [PubMed] [Google Scholar]
- Ruau D, et al. Update of NUREBASE: nuclear hormone receptor functional genomics. Nucleic Acids Res. 2004;32:D165–D167. doi: 10.1093/nar/gkh062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sandelin A, et al. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32:D91–D94. doi: 10.1093/nar/gkh012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sandelin A, Wasserman WW. Prediction of nuclear hormone receptor response elements. Mol. Endocrinol. 2005;19:595–606. doi: 10.1210/me.2004-0101. [DOI] [PubMed] [Google Scholar]
- Tang Q, et al. A comprehensive view of nuclear receptor cancer cistromes. Cancer Res. 2011;71:6940–6947. doi: 10.1158/0008-5472.CAN-11-2091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Durme JJ, et al. NRMD: Nuclear Receptor Mutation Database. Nucleic Acids Res. 2003;31:331–333. doi: 10.1093/nar/gkg122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varga G. Target gene identification via nuclear receptor binding site prediction. Methods Mol. Biol. 2010;674:241–249. doi: 10.1007/978-1-60761-854-6_15. [DOI] [PubMed] [Google Scholar]
- Vroling B, et al. NucleaRDB: information system for nuclear receptors. Nucleic Acids Res. 2012;40:D377–D380. doi: 10.1093/nar/gkr960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu LJ, et al. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics. 2010;11:237. doi: 10.1186/1471-2105-11-237. [DOI] [PMC free article] [PubMed] [Google Scholar]