ppdb: a plant promoter database

Yoshiharu Y Yamamoto; Junichi Obokata

doi:10.1093/nar/gkm785

. 2007 Oct 18;36(Database issue):D977–D981. doi: 10.1093/nar/gkm785

ppdb: a plant promoter database

Yoshiharu Y Yamamoto ¹, Junichi Obokata ^1,^*

PMCID: PMC2238996 PMID: 17947329

Abstract

ppdb (http://www.ppdb.gene.nagoya-u.ac.jp) is a plant promoter database that provides promoter annotation of Arabidopsis and rice. The database contains information on promoter structures, transcription start sites (TSSs) that have been identified from full-length cDNA clones and also a vast amount of TSS tag data. In ppdb, the promoter structures are determined by sets of promoter elements identified by a position-sensitive extraction method called local distribution of short sequences (LDSS). By using this database, the core promoter structure, the presence of regulatory elements and the distribution of TSS clusters can be identified. Although no differentiation of promoter architecture among plant species has been reported, there is some divergence of utilized sequences for promoter elements. Therefore, ppdb is based on species-specific sets of promoter elements, rather than on general motifs for multiple species. Each regulatory sequence is hyperlinked to literary information, a PLACE entry served by a plant cis-element database, and a list of promoters containing the regulatory sequence.

BACKGROUND

A promoter database can be generated from a combination of genome sequence, information of promoter positions and a list of cis-regulatory elements. Currently a major restriction on the quality of a promoter database is our limited knowledge of cis-elements. There are several established genome-wide plant promoter databases available today (RARGE: (1), http://www.rarge.gsc.riken.jp/; AGRIS: (2), http://www.arabidopsis.med.ohio-state.edu/; AthaMap: (3), http://www.athamap.de/), and which are based on cis-regulatory sequences from PlantCARE [(4), http://www.bioinformatics.psb.ugent.be/webtools/plantcare/html/], PLACE [(5), http://www.dna.affrc.go.jp/PLACE/] or TRANSFAC [(6), http://www.gene-regulation.com/pub/databases.html]. These three promoter databases focus on cis-regulatory elements rather than core promoter structure, aiming to reveal the regulatory machinery that give the expression profiles. Unfortunately, these databases provide information only for Arabidopsis, and there are no genome-wide plant promoter databases available for other plant species.

Local distribution of short sequences (LDSS) analysis is a method to extract promoter constituents by genome-wide statistical analysis (7,8). We have applied this method to the Arabidopsis and rice genomes, and identified 1000 octamer sequences per genome as LDSS-positive promoter elements (8). According to their distribution profiles, the identified octamers have been classified into regulatory element group (REG), TATA box and Y Patch as three major promoter element groups. REG is a direction-insensitive element that is preferentially found around −100 bp relative to the major transcription start site (TSS), and contains many established cis-regulatory sequences. Y Patch is a direction-sensitive plant core promoter element that appears around TSS. We found that utilized sequences of all three groups, including TATA element, are moderately differentiated between Arabidopsis and rice, demonstrating the importance of individual preparation of promoter elements for each genome.

The large collection of extracted promoter elements can be utilized as a tool to reveal precise promoter architecture. Therefore, here we present a novel searchable ppdb database, based on the LDSS-positive elements. Utilization of a genome-specific set of promoter elements and the detection of the core promoter structure are the two unique features of this database. Currently, ppdb is the only one plant promoter database with information about core promoter types on a genomic scale, and the first genome-wide database for rice promoters.

PROMOTER SELECTION AND OUTPUT WINDOWS

Major function of ppdb is to detect promoter elements in the genome sequence and to summarize promoter structures. Data source of ppdb is shown in Table 1. The database detects REG, TATA box and Y Patch. Promoters of interest can be identified by a word search (e.g. ‘photosystem’) or a gene number (e.g. At5g38410, Os01g0100700 or AK121523) on the front page (http://www.ppdb.gene.nagoya-u.ac.jp). Selection of a specific gene model gives the following information: (i) sequence data, (ii) TSS data, (iii) a summary of the core promoter structure and (iv) REG data (Figure 1).

Table 1.

Source of ppdb

	Specification	Source	Size
Arabidopsis
Genome sequence and gene annotation	TAIR 6	http://www.arabidopsis.org/
TSS information	Cap signatured CT-MSS tags	Yamamoto, Y. Y. et al., unpublished data	158 237
	Selected RAFL cDNA	http://rarge.gsc.riken.jp/	62 108
Promoter elements	LDSS-positive octamers	(8)	659
	PLACE entries corresponding to LDSS elements	http://www.dna.affrc.go.jp/PLACE/	21 (only matched motifs)
Rice
Genome sequence and gene annotation	RGSP build 4.0	http://rapdb.lab.nig.ac.jp/
TSS information	Selected fl cDNA	http://cdna01.dna.affrc.go.jp/cDNA/	17 286
Promoter elements	LDSS-positive octamers	(8)	600
	PLACE entries corresponding to LDSS elements	http://www.dna.affrc.go.jp/PLACE/	4 (only matched motifs)

Open in a new tab

Figure 1. — Selection of a specific gene model gives the following information: (i) sequence data, (ii) TSs data, (iii) a summary of the core promoter structure and (iv) REG data. Peak TSS is highlighted. TPM means tag per million and this is an indication of expression level at each TSS.

At the sequence window, octamer elements identified by the LDSS analysis (8) are highlighted. There are two modes for detection, ‘Reliable’ and ‘ALL. Reliable’ is a default setting where only elements at appropriate positions relative to the peak TSS are detected. Promoters without any TSS information do not show any elements. In this case, selecting ALL allows global detection without any positional restriction. The sensitive area in the Reliable mode for each element group is described on the front page.

The ‘TSS information’ table provides the expressional strength of each TSS. Tag per million (TPM) in the window shows the relative counts of TSS tags in a tag library, and this information comes from CT-MPSS analysis (Yamamoto,Y.Y. et al., unpublished data). The methods and quality assessment of the data will be described elsewhere.

‘The table of Core promoter information’ shows the presence and absence of TATA box and Y Patch. Currently, a search for Inr (Initiator for the consensus around TSS) is not executed, thus all promoters will show ‘Not Available’ for it. We have a plan to add Inr information in a near future as a minor update.

The ‘REG information’ table shows a REG list of a promoter and its corresponding PPDB and PLACE motifs. For example, the table in Figure 1 shows that AtREG379 belongs to the ACGT group of PPDB and corresponds to ACGT, ACGTG, GCCAC and ACGTGKC motifs of PLACE (5). PPDB motifs have been extracted from REG sequences with the aid of a two-dimensional (2D) REG-promoter clustering (8). REG sequences, as well as PPDB and PLACE motifs, are linked to other pages for biological information.

REG information is shown under the category of ‘Promoter Summary’. Selection of ALL adds another category, ‘Not Reliable Promoter Summary’. This category can be used when searching for regulatory elements (REG) from wider regions or when there is no TSS information on the promoter of interest.

ADDITIONAL PAGES

Some biological information of REGs is also provided by ppdb. As shown in Figure 2, there is a page with a whole list of REGs, the ‘ALL REG List’. This list presents the relationship between REG sequences, PPDB motifs and PLACE motifs.

Selection of a specific REG sequence leads to ‘Summary of the REG’, followed by ‘Hit Gene List of the REG’. The ‘Summary of the REG’ section shows corresponding PPDB and PLACE motifs and a brief description of the motifs. Selection of each motif leads to PLACE ID and also to the original article(s) for the motif. The ‘Hit Gene List’ section gives information about promoters sharing the REG.

ACKNOWLEDGEMENTS

This work was supported by Grant-in-Aid for Scientific Research on Priority Areas ‘Comparative Genomics’ (Y.Y.Y. and J.O.), Scientific Research (B) (J.O.) and Scientific Research (C) (Y.Y.Y.) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Funding to pay the Open Access publication charges for this article was provided by Ministry of Education, Culture, Sports, Science and Technology, Japan.

Conflict of interest statement. None declared.

REFERENCES

1.Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, et al. Functional annotation of a full-length Arabidopsis cDNA collection. Science. 2002;296:141–145. doi: 10.1126/science.1071006. [DOI] [PubMed] [Google Scholar]
2.Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E. AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics. 2003;4:25. doi: 10.1186/1471-2105-4-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bülow L, Steffens NO, Galuschka C, Shindler M, Hehl R. AthaMap: from in silico data to real transcription factor binding sites. In Silico Biol. 2006;6:0023. [PubMed] [Google Scholar]
4.Lescot M, Déhais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouzé P, Rombauts S. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 2002;30:325–327. doi: 10.1093/nar/30.1.325. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Higo K, Ugawa Y, Iwamoto M, Korenaga T. Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 1999;27:297–300. doi: 10.1093/nar/27.1.297. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34:D108–D110. doi: 10.1093/nar/gkj143. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.FitzGerald PC, Shlyakhtenko A, Mir AA, Vinson C. Clustering of DNA sequences in human promoters. Genome Res. 2004;14:1562–1574. doi: 10.1101/gr.1953904. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Yamamoto YY, Ichida H, Matsui M, Obokata J, Sakurai T, Satou M, Seki M, Shinozaki K, Abe T. Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics. 2007;8:67. doi: 10.1186/1471-2164-8-67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1.Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, et al. Functional annotation of a full-length Arabidopsis cDNA collection. Science. 2002;296:141–145. doi: 10.1126/science.1071006. [DOI] [PubMed] [Google Scholar]

[B2] 2.Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E. AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics. 2003;4:25. doi: 10.1186/1471-2105-4-25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Bülow L, Steffens NO, Galuschka C, Shindler M, Hehl R. AthaMap: from in silico data to real transcription factor binding sites. In Silico Biol. 2006;6:0023. [PubMed] [Google Scholar]

[B4] 4.Lescot M, Déhais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouzé P, Rombauts S. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 2002;30:325–327. doi: 10.1093/nar/30.1.325. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Higo K, Ugawa Y, Iwamoto M, Korenaga T. Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 1999;27:297–300. doi: 10.1093/nar/27.1.297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34:D108–D110. doi: 10.1093/nar/gkj143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.FitzGerald PC, Shlyakhtenko A, Mir AA, Vinson C. Clustering of DNA sequences in human promoters. Genome Res. 2004;14:1562–1574. doi: 10.1101/gr.1953904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Yamamoto YY, Ichida H, Matsui M, Obokata J, Sakurai T, Satou M, Seki M, Shinozaki K, Abe T. Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics. 2007;8:67. doi: 10.1186/1471-2164-8-67. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

ppdb: a plant promoter database

Yoshiharu Y Yamamoto

Junichi Obokata

Abstract

BACKGROUND

PROMOTER SELECTION AND OUTPUT WINDOWS

Table 1.

Figure 1.

ADDITIONAL PAGES

Figure 2.

ACKNOWLEDGEMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

ppdb: a plant promoter database

Yoshiharu Y Yamamoto

Junichi Obokata

Abstract

BACKGROUND

PROMOTER SELECTION AND OUTPUT WINDOWS

Table 1.

Figure 1.

ADDITIONAL PAGES

Figure 2.

ACKNOWLEDGEMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases