Abstract
Resistance genes are among the most important gene classes for plant breeding purposes being responsible for activation of plant defense mechanisms. Among them, the nucleotide binding site-leucine rich repeat (NBS-LRR) class R-genes are the most abundant and actively found in all types of plants. Insilico characterization of EST database resulted in the detection of 28 NBS types R-gene sequences in Curcuma longa. All the 28 sequences represented the NB-ARC domain, 21 of which were found to have highly conserved motif characteristics and categorized as regular NBS genes. The Open Reading Frames varied from 361 (CL.CON.3566) to 112 (CL.CON.1267) with an average of 279 amino acids. Most alignment occurred with monocots (67.8%) with emphasis on Oryza sativa and Zingiber sequences. All best alignments with dicots occurred with Arabidopsis thaliana, Populus trichocarpa and Medicago sativa. These detected NBS type Rgenes from Curcuma longa can be used as a valuable resource for molecular marker development, molecular mapping of R-genes, and identification of resistance gene analogs and functional and evolutionary characterization of NBS–LRR–encoding resistance genes in asexually reproducing plants.
Keywords: Curcuma longa, expressed sequence tags, NBS-LRR, R-genes, TBLASTN
Background
Pathogen attack has caused an estimated 12% loss of the global crop production in the last decade with up to 80% loses accounted from the tropical countries [1]. The most important group of genes that has been used by breeders for disease control is the plant resistance (R) genes. Resistance genes which are members of a very large multigene family are highly polymorphic and have diverse recognition specificities. As many as 70 different R genes showing resistance to major plant pathogens has been isolated, cloned, and characterized in different plants in the last 15 years [2]. These can be classified into five categories based on their predicted protein structure [3, 4]. Of the cloned plant disease resistance (R) genes, approximately 75% encode cytoplasmic receptor-like proteins characterized by an N-terminal nucleotidebinding site (NBS), leucine-rich repeat (LRR) domain and a leucine zipper (LZ), Toll interleukine 1-receptor (TIR) or a coiled-coil (CC) sequence [5]. The LRR region recognizes the pathogens, the TIR and CC regions are involved in signal transduction during many cell processes [6], while the NBS usually signalizes for programmed cell death [7]. Many genes encode proteins of this class: I2 [8] and Sw5 [9] from tomato; RPM1 [10], RPS2 [11] and RPS4 [12] from Arabidopsis thaliana; Pib [13], Pi-ta [14] and Xa1 [15] from Oryza sativa (rice); Hero [16], R1 [17] and Rx2 [18] from potato, L [19], and P [20] of flax, N [21] of tobacco etc. Infact, the whole-genome sequence analysis revealed that there are 150–175 NBS–LRR genes in the Arabidopsis genome [22] and approximately 600 NBS–LRR genes in the rice genome [23]. Curcuma longa L. (turmeric) of the family Zingiberaceae is one of the most important crop with great medicinal and economic significance. Turmeric rhizome is valued world over and has been in use from ancient time as a spice, food preservative, coloring agent, and in the traditional systems of medicine. India is the world's largest producer, and exporter of turmeric followed by China, Indonesia, Bangladesh and Thailand [24]. The International Trade Centre, Geneva, has estimated an annual growth rate of 10% in the world demand for turmeric. Continuous domestication of the preferred genotypes coupled with their exclusive vegetative nature seems to have eroded the genetic base of these crops and as a result, all of their cultivars available today are equally susceptible to major diseases such as rhizome rot caused by Pythium aphanidermatum, leaf blotch caused by Taphrina maculans and leaf spot caused by Colletotrichum capsici. Moreover, turmeric is completely sterile and is propagated exclusively by vegetative means using rhizome. In this context, characterization of resistance-related sequences may provide a lead towards retrieving resistance specificities suitable for the improvement of this crop. Recent advances in Curcuma genomic technologies have generated a large number of expressed sequence tags (ESTs) that have been made available in public database. As of July 2011, GenBank had released 12,593 EST sequences from Curcuma longa. This database can be used as a starting material for the characterization of NBS-LRR class R gene sequences in turmeric. Thus, our objective is to perform a data mining-based identification of plant NBS-LRR class R-genes in Curcuma longa EST database, by using well known R-genes sequences as template, comparing the identified sequences with known R-genes deposited in public DNA and protein databases.
Methodology
Curcuma longa transcriptome database was searched for NBS-LRR R-gene homologues using Amino-acid sequences of known genes as query. Accession numbers of sequences used at NCBI (National Center for Biotechnology Information; http://www.ncbi.nlm.nih.gov) are shown in Table 1 (see Table 1), together with sequences features and accession numbers. They are grouped according to the conserved domains previously described. All turmeric sequences used during this work were obtained from Curcuma longa EST database. EST database of NCBI contains 12953 Curcuma longa express sequence tag data. We have mined 12593 EST sequences consisting of two tissue libraries of rhizomes 6870 (DY395309- DY388440) and leaves 5723 (DY388439-DY382717). The EST sequences were screened against the UniVec database from NCBI ( ftp://ftp.ncbi.nih.gov/pub/ UniVec/) for detecting vector and adapter sequences by using the program Cross_Match. CAP3 program was used to assemble the EST sequence into contigs for creating a non-redundant dataset. The program TBLASTN [25] was used to perform reverse alignment on Curcuma longa contigs. The clusters frame of the TBLASTN alignment was used to predict the Open Reading Frames (ORFs) for each searched contig. For this purpose, the Expasy Translate Tool ( bo.expasy.org/tools/dna.html) was used, which predicts the correct ORF for a DNA sequence in the corresponding amino acid FASTA sequence. The obtained ORFs were subsequently submitted to a Reverse Position Specific BLAST (RPS-BLAST) against Conserved Domain Database [26] aiming to identify patterns or motifs in predicted cluster products. Reciprocal alignments were conducted for ORFs by using the nr databank and stand-alone BLAST package from NCBI. Matched sequences were annotated for latter comparison.
Results and Discussion
R-genes are quite abundant in higher plants but the most functionally defined R genes belong to a class that encode cytoplasmic receptor-like proteins characterized by an N-terminal nucleotide-binding site (NBS) and a leucinerich repeat (LRR) domain. A set of 28 non-redundant NBS sequences were retrieved through TBLASTN alignment of 4035 Curcuma longa contig sequences. They have been annotated for one or more than one R-gene (data summarized in Table 2 (see Table 2). Earlier, five resistance gene analogues (RGAs) have been already isolated and characterized in Curcuma longa [27]. However, all the five RGAs isolated were of the CCNBS- LRR class without exhibiting significant variations in the NBS type R gene domain characterization. In contrast, it was expected that some similar genes grouped at the same class should cause some level of redundancy [28]. Contigs representing exclusive NBS type R genes with variability were (I) XNBS– LRR: 10; CC-NBS-NBS-LRR: 2; CC-NBS-LRR: 9; NBS-LRR: 2; NBS: 3 and CC-NBS: 2 (Figure 1) In 21 out of 28 NBS genes, all the motifs characteristic of the NBS domain were conserved and categorized as regular NBS genes. The others were very different in their structures from the majority, or were simply truncated and categorized as non-regular NBS genes. Two nonregular NBS genes yielded higher P values when they were hit by TBLASTN in the NBS regions and had standard LRR regions while 3 genes had only some of the conserved NBS motifs. Two non-regular NBS genes encoded a coiled motif but were highly divergent in NBS region and lacked LRR regions. In the N-terminal region, 10 regular NBS genes contained some unknown motifs, which were symbolized as X. 11 regular NBS genes encoded the CC motif (CNL and CNNL) while the rest where without the CC motif (XNL). No genes were encoded with the TIR motifs. TIR motif is supposed to be absent in monocotyledonous plants [4], being present in all dicotyledonous taxa actually studied. Sizes of Curcum longa contig aligned to NBS-LRR R-genes varied from 1256 (CL.CON.1529) to 452 nucleotides (CL.CON.1267). The prediction of contig coding regions revealed that ORFs were coded in both forward and reverse reading frames, with an average of 279 amino acids (aa) in length. ORF sizes varied from 361 (CL.CON.3566) to 112 amino acids (CL.CON.1267). The search for conserved domains (CD-Search) revealed conserved motifs in all the analyzed contig clusters. All the 28 contig Curcuma longa clusters represented the NB-ARC domain. In the LLR region, Pfam software detected 32 LRR motifs in the 28 NBS genes. This number is higher than the number of Curcuma longa contigs with NBS-LRR R genes, due to their occurrence in tandem repetitions. Sometimes these LRR sequences are imperfect and may be difficult to recognize with available insilico tools, so it is possible that a larger number may be identified manually. Two of the contig clusters CL.CON.1267 and CL.CON.3620 with a poorly developed NBS motif represented very short ORFs of 112 and 123 amino acids respectively. Considering the best matches to the 28 Curcuma longa NBS-LRR contigs identified, 9 were from plants of dicotyledenous families such as Arabidopsis thaliana, Populus trichocarpa, Pyrus communis, Glycine max, Cajanus cajans and Medicago sativa. From monocots, rice (O. sativa) sequences appeared as best matches (9 contig clusters) followed by Zingiber officinale (3 contig clusters). A comprehensive list of all the sequences that aligned with Curcuma longa NBS-LRR contig clusters are represented in table 2 (see Table 2). The comparison of our results regarding the organization of detected Curcuma longa NBS-LRR genes was mainly with rice and ginger. It has been observed that most of the information regarding R-genes available in databases refers to herbaceous model and crop plants such as rice and Arabidopsis, may be because most identified and sequenced R-genes were a consequence of mapping approaches that have been abundantly performed in these plants. The larger number of sequences from Oryza sativa representing best alignments to Curcuma does not represent a higher similarity to this plant species, but it reflects the large number of sequences of this model plant deposited in GenBank. Barbosa-da-Silva et al., 2005 [29] has also found that Eucalyptus even being a woody plant exhibited maximum alignment of R-genes with herbaceous Arabidopsis thaliana. There can be other arguments as well such as
Curcuma belongs to the same family as ginger (Zingiberaceae) and
bothCurcuma and rice are monocots and exhibit similar levels of complexity.
However, we cannot also rule out the fact that significant sequence similarity was also detected with dicot plants. This suggests that, Curcuma longa might be positioned at the transition point between dicots and monocots as far as resistance genes are concerned. However, detail characterization of the NBS– LRR gene in turmeric has to be made before making a valid conclusion on its evolutionary aspect. The number of NBS type R-genes identified here is quite low considering the total size of the EST database. However, there can be other types of R-genes in Curcuma longa, which were not targeted in this study. Moreover, the EST database has not been obtained under pathogen stress condition. This may suggest that the identified NBS sequences are expressed constitutively but also leads to the supposition that a higher number of R-genes may be present in Curcuma under other experimental conditions. Thus, the generation of additional ESTs especially under infection by pathogen, can make it possible to detect many new NBS genes from Curcuma longa.
Figure 1.

Graphical representation of the NBS-LRR R-genes retrieved from Curcum longa EST database.
Conclusion
Using bioinformatics tools, it was possible to detect and characterize NBS type R-genes from Curcuma longa transcriptome. Twenty eight (28) NBS type R genes were detected with distinct NB-ARC domain, 21 of which were regular NBS genes. This insilico method of detecting NBS-LRR type R genes in Curcuma longa has been done for the first time in this study. The identified sequences will be valuable resources for the development of markers for molecular breeding and identification of RGAs (resistance gene analogs) in Curcuma and other related species. A few of the NBS type R-genes of Curcuma isolated in this study may also be used for fluorescent insitu hybridization (FISH) on Eucalyptus chromosomes, also helping in the comparison of different parental species and the respective hybrids. Further, these insilico detected NBS type R-genes will reveal furthers insights on the organization, function and evolution of the NBS–LRR-encoding resistance genes in asexually reproducing plants.
Supplementary material
Acknowledgments
The authors are grateful to Dr. Manoj Ranjan Nayak, President, Siksha O Anusandhan University for his encouragement and support.
Footnotes
Citation:Joshi et al, Bioinformation 6(9): 360-363 (2011)
References
- 1. www.crcpress.com/product/pest management.
- 2.J Liu, et al. J Genet Genomics. 2007;34:765. [Google Scholar]
- 3.WY Song, et al. Plant Cell. 1997;9:1279. doi: 10.1105/tpc.9.8.1279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.J Ellis, D Jones. Curr Opin Plant Biol. 1998;1:288. doi: 10.1016/1369-5266(88)80048-7. [DOI] [PubMed] [Google Scholar]
- 5.KE Hammond-Kosack, JD Jones. Annual Rev of Plant Physiol Plant Mol Biol. 1997;48:575. doi: 10.1146/annurev.arplant.48.1.575. [DOI] [PubMed] [Google Scholar]
- 6.GB Martin, et al. Annu Rev Plant Biol. 2003;54:23. doi: 10.1146/annurev.arplant.54.031902.135035. [DOI] [PubMed] [Google Scholar]
- 7.EA van der Biezen, JD Jones. Curr Biol. 1998;8:R226. doi: 10.1016/s0960-9822(98)70145-9. [DOI] [PubMed] [Google Scholar]
- 8.N Ori, et al. Plant Cell. 1997;9:521. [Google Scholar]
- 9.SH Brommonschenkel, et al. Mol Plant Microbe Interact. 2000;13:1130. doi: 10.1094/MPMI.2000.13.10.1130. [DOI] [PubMed] [Google Scholar]
- 10.MR Grant, et al. Science. 1995;269:843. [Google Scholar]
- 11.M Mindrinos, et al. Cell. 1994;78:1089. [Google Scholar]
- 12.W Gassmann, et al. Plant J. 1999;20:265. [Google Scholar]
- 13.ZX Wang, et al. Plant J. 1999;19:55. [Google Scholar]
- 14.GT Bryan, et al. Plant Cell. 2000;12:2033. doi: 10.1105/tpc.12.11.2033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.S Yoshimura, et al. Proc Natl Acad Sci U S A. 1998;95:1663. [Google Scholar]
- 16.K Ernst, et al. Plant J. 2002;31:127. [Google Scholar]
- 17.A Ballvora, et al. Plant J. 2002;30:361. [Google Scholar]
- 18.A Bendahmane, et al. Plant J. 2000;21:73. [Google Scholar]
- 19.GJ Lawrence, et al. Plant Cell. 1995;7:1195. doi: 10.1105/tpc.7.8.1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.P Dodds, et al. Plant Cell. 2001;13:163. [Google Scholar]
- 21.S Whitham, et al. Proc Natl Acad Sci U S A. 1996;93:8776. [Google Scholar]
- 22.BC Meyers, et al. Plant Cell. 2003;15:809. doi: 10.1105/tpc.009308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.T Zhou, et al. Mol Genet Genom. 2004;271:402. doi: 10.1007/s00438-004-0990-z. [DOI] [PubMed] [Google Scholar]
- 24. http://www.printsasia.com/book/Indian-Spices-Production-and- Utilization-H-P-Singh-K-Sivaraman-M-Tamil-Selvan.
- 25.SF Altschul, et al. Nucleic Acids Res. 1997;25:3389. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.A Marchler-Bauer, et al. Nucleic Acids Res. 2002;30:281. doi: 10.1093/nar/30.1.281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.RK Joshi, et al. Genet Mol Res. 2010;9:1796. doi: 10.4238/vol9-3gmr910. [DOI] [PubMed] [Google Scholar]
- 28.BC Meyers, et al. Plant J. 1999;20:317. [Google Scholar]
- 29.A Barbosa-da-Silva, et al. Genet Mol Biol. 2005;28:562. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
