Abstract
With the ever increasing number of Expressed Sequence Tags (ESTs) from various sequencing projects, ESTs have become valuable and first-hand source of in-silico mining of simple sequence repeats (SSR) markers. We examined a total of 3419 EST sequences from three bamboo species, namely, Phyllostachys edulis, Bambusa oldhamii and Dendrocalamus sinicus for the presence of di- to hexa- microsatellites. The frequency of SSR containing ESTs varied from 5.36% in B. oldhamii to 13.05% in P. edulis. No SSRs were found in D. sinicus. Tri-nucleotide repeats (49.34%) were most frequent in P. edulis, while not much comparable difference in repeats was found in B. oldhamii. Flanking primer pairs were also designed in-silico for the sequences containing SSRs and their position on the genome hypothesized using similarity searching. SSRs located in open reading frame (ORF) were given functional annotation using Gene Ontology. Polymorphic SSRs were also detected using new pipeline- polySSR. Polymorphism level was very low (2.43%) and the position of the polymorphic SSRs was determined. The development of SSRs and the study of polymorphism will help in the further study of intra- and inter- gene flow, genetic structure, variability, linkage mapping and evolutionary relationships in bamboo
Keywords: Expressed Sequence Tags (EST), Simple Sequence Repeats (SSR), bamboo, Phyllostachys edulis, Bambusa oldhamii, Dendrocalamus sinicus
Background
A large number of plant genomes are under consideration for whole genome sequencing but for most of them, due to their large genome size, the task has been little difficult and time consuming. However, for them, as a prehand, large scale EST sequencing projects has been started as an alternative. Expressed Sequence Tags (EST) are cDNA clones, sequenced randomly in a single pass run. Based on the direction of cloning, it can be 5' EST or 3' EST. Since the ESTs are derived from cDNA, they provide direct evidence for the study of transcriptome and genome.
Microsatellites or Simple Sequence Repeats (SSR) or Short tandem repeats (STR) are 1-6 bp tandemly repeated motifs present in both coding and noncoding regions of prokaryotic and eukaryotic genome. SSR are extensively used as molecular markers because of its multiallelic nature, co-dominant inheritance and relative abundance. Since EST also represent the coding part of the genome, these serve as an important source for mining putative SSR markers and provide first hand insight into the organism's genetic diversity. Here, in this study, we present the mining of EST-SSR markers and detecting polymorphism in microsatellites in 3 bamboo species, namely, Phyllostachys edulis, Bambusa oldhamii, Dendrocalamus sinicus. Two approaches were used in this work. First, by using MISA, SSR markers were predicted and functional annotation of those SSR containing sequences was done. Another approach was to determine polymorphism using the novel pipeline PolySSR.
Methodology
ESTs for 3 different bamboo species namely, Phyllostachys edulis, Bambusa oldhamii and Dendrocalamus sinicus were downloaded from NCBI's dbEST. A total of 3087 EST sequences of P. edulis, 318 EST sequences of B. oldhamii and 14 EST sequences of D. sinicus was downloaded from dbEST as on may 29, 2009, dbEST release 052909. dbEST has redundancy in EST sequences. In order to remove the redundancy, CAP3 Assembler [1] was used for clustering.
SSR were detected using MIcro SAtellite identification tool (MISA) [2]. The search criteria were to search for a minimum of 14 bp SSR repeat. Primer pairs for the SSRs were designed using Primer3 (v 0.4.0) [3]. The web interface allows the user defined constraints; however, here the default parameters were used.
EST-SSR sequences were subjected to similarity searching against nonredundant (nr) database with constraint of ORGN: Oryza sativa. Bamboo being monocotyledon, rice was used as model organism owing to more proximity in phylogeny. It was performed using NCBI's Basic Local Alignment Search Tool (BLAST), variant BLASTX [4]. The sequence was considered homologous if the e-value was ≤ 1e-5 and score ≥ 100. Based on the position of SSR in the homology search, they were assigned whether they lie in 5' UTR, 3' UTR or ORF. Only those EST sequences were further analyzed in which SSR were predicted to be in ORF. The functionality was assigned according to Gene Ontology (GO) annotation. Rice was used as model organism for similarity searching. GO annotation of Oryza sativa was used to map functions by similarity searching in GRAMENE. Polymorphism exhibited by EST-SSR was mapped using the new pipeline PolySSR [5].
Results and Discussion
For analysis, the EST data was significantly reduced to a nonredundant set of sequences by atleast 30% using CAP3. Mono-SSR repeats were not considered since they do not serve important as molecular markers. The search criteria were kept low to maximize the SSR discovery [6]. In D. sinicus, no SSRs were detected by MISA and excluded for further study.
Among cereal crops, tri-SSRs were most frequent followed by di- and tetra-SSR [7] and in our study also, tri-SSRs were found to be most abundant (49.34%) among all SSRs in P. edulis which is in agreement to other poaceae family members like barley, maize, sorghum, rice, wheat [8, 9,10]. In general, AT motif are common in plants [11, 12]. However in bamboo, AT motif was least occurring. In P. edulis, among di-SSRs, AG/CT was most abundant. The result was consistent with the results from poaceae members [8, 9,10].. AG/CT motif can represent codons GAG, AGA, UCU and CUC in mRNA population and code for R, E, A and L respectively. Since A and L are found in increased amount in proteins, the abundance of AG/CT in the genome can be substantiated. CG repeats are least found in cereal species [13] and in our present study also, CG/GC motif was completely absent.
In plants, common motif is AAG. CCG is a specific feature of monocot genome [14] but in bamboo species, this trend was not observed. Among tri-SSRs, P. edulis showed abundance in AGG/CCT, AGC/CGT and AGT/ATC repeats. These motifs were fairly well represented in rice [15], maize [16], pearl millet [13], and barley [8]. In our study, AAT/ATT motif was found to be nil. This may be explained because TAA- based variants code for stop codons that have direct effect on protein synthesis [16]. In this study, tetra- and hexa repeats were found in lowest frequency (Table 1 see supplementary material).
In B. oldhamii, different SSR motifs very almost evenly distributed. Lack of considerable frequency difference in the occurrence of various SSR motifs in B. oldhamii and absence of SSRs in D. sinicus can be reasoned with the less number of EST sequences available and analyzed. Flanking primers pairs were designed for SSR containing sequences. For 90.57% of P. edulis EST-SSR sequences, primers were designed whereas for all SSR containing sequences of B. oldhamii primers were designed in silico. Sharma et al. (2009) [17] designed primers for 81.25% of SSR containing EST from P. edulis and B. oldhamii and were able to amplify 76.1% of those EST primers in lab. Most SSRs for which primers were designed were found to be ≫ 20 bp length, thus has a smaller chance of mutation or slipped strand mispairing over smaller sequence length. This may lead to more chance of sequence conservation.
The SSR containing sequences for which primers were designed were then analyzed for determining the relative SSR position on genome using sequence similarity search. Most of the SSRs were predicted in 5' UTR followed by 3'UTR. Very few were predicted to be in ORF. Maximum SSRs were found to be in 5'UTR in bamboo species, in accordance to another study [18]. Almost all AG motifs were found in 5' UTR where as CT motif was found maximally in 5' UTR and very few in 3' UTR. ORF contained mostly trinucleotide repeats [19]. High frequency of trinucleotide repeats can be explained as these are less affected by base mutations and hence more conserved. However, disease causing effect of change in SSR sequence in humans has been reported [20]. These SSR predicted to be in ORF were mapped against related Gene Ontology (GO) IDs. They were mapped with variable functions which are summarized giving the corresponding GO annotations for sequences in Table 2 (see Table 2).
The polySSR pipeline resulted in 287 contigs and 2127 singlets of P. edulis. The contigs were analyzed for polymorphism. In all, only 7 contigs were found to contain 7 polymorphic SSRs. The 7 contigs included 21 P. edulis EST sequences, of which 4 EST sequences (189099774, 189100527, 189100246, 189100551) were also included in our final MISA analysis. The SSR motif -GA showed high polymorphism. GA/CT motif was also most polymorphic in rice [10]. Polymorphism was previously reported in Bamboo species by Sharma et al. [17]. Primers were designed for the 7 contig sequences and position determination on genome was done. The presence of variation in repeat units of SSR in 5'UTR is said to affect gene transcription and/or translation, if in 3'UTR, they may be responsible for gene silencing or transcription slippage and if in ORF they inactivate or activate genes or truncate protein [7, 21]. These SSRs can be used as molecular markers if the SSR containing genes are identified and their polymorphism validated. The summary of polySSR result is given in Table 3 (see supplementary material). In B. oldhamii and D. sinicus, no polymorphic SSRs were found. Thus the divergent biological role of SSRs is important in the study of plant genomics and functionalities.
Supplementary material
Acknowledgments
The authors greatly acknowledge Elodie Salzemann, Wangeningen University, Netherland for using our data and providing us with the polySSR results for the study of polymorphism in EST-SSR in bamboo species. Also, the authors thank Pankaj Bhardwaj, IHBT, Himachal Pradesh for providing valuable suggestions and help
Footnotes
Citation:Oviya & Shanmughavel, Bioinformation 5(6): 240-243 (2010)
References
- 1. http://www.genome.clemson.edu/resources/online_tools/cap3.
- 2. http://pgrc.ipk-gatersleben.de/misa/misa.html.
- 3. http://fokker.wi.mit.edu/primer3/input-040.htm.
- 4. http://blast.ncbi.nlm.nih.gov/Blast.cgi.
- 5.Tang J, et al. BMC Bioinformatics. 2008;9:374. doi: 10.1186/1471-2105-9-374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 06.Ellis JR, Burke JM. Heredity. 2007;99:125. doi: 10.1038/sj.hdy.6801001. [DOI] [PubMed] [Google Scholar]
- 7.Varshney RK, et al. TRENDS in Biotechnology. 2005;23(1):48. doi: 10.1016/j.tibtech.2004.11.005. [DOI] [PubMed] [Google Scholar]
- 8.Theil Michalek T, et al. Theor Appl Genet. 2003;106:411. doi: 10.1007/s00122-002-1031-0. [DOI] [PubMed] [Google Scholar]
- 9.Nicot N, et al. Theor Appl Genet. 2004;109:800. doi: 10.1007/s00122-004-1685-x. [DOI] [PubMed] [Google Scholar]
- 10.Kantety RV, et al. Plant Molecular Biology. 2001;48:501. doi: 10.1023/a:1014875206165. [DOI] [PubMed] [Google Scholar]
- 11.Shanker A, et al. Scientia Horticulturae. 2007;113:353. [Google Scholar]
- 12.Lagercrantz Ellergen U, et al. Nucleic Acids Res. 1993;21:1111. doi: 10.1093/nar/21.5.1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Senthilvel S, et al. BMC Plant Biology. 2008;8:119. doi: 10.1186/1471-2229-8-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. YC Koro Li, et al. Mol. Biol. Evol. 2004;21(6):991. doi: 10.1093/molbev/msh073. [DOI] [PubMed] [Google Scholar]
- 15.Temnykh S, et al. Theor. Appl. Genet. 2000;100:697. [Google Scholar]
- 16.Chin Maize ECL, et al. Genome. 1996;39:866. doi: 10.1139/g96-109. [DOI] [PubMed] [Google Scholar]
- 17.Sharma V, et al. Genet. 2009;10:721. [Google Scholar]
- 18.Bouck A, Vision T. Molecular ecology. 2007;16:907. doi: 10.1111/j.1365-294X.2006.03195.x. [DOI] [PubMed] [Google Scholar]
- 19.Toth G, et al. Genome Res. 2000;10:1967. [Google Scholar]
- 20.Bates G, Lehrach H. Bioessays. 1994;16:277. doi: 10.1002/bies.950160411. [DOI] [PubMed] [Google Scholar]
- 21.Fujimori S, et al. FEBS Letters. 2003;554:17. doi: 10.1016/s0014-5793(03)01041-x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
