Abstract
Castor (Ricinus communis L.) a chief non-edible oilseed crop has numerous industrial applications. Systematic genetic diversity analysis utilizing DNA based markers has been quick and reliable method that ensures selection of diverse parents for exploitation of higher levels of heterosis in breeding programs. From NCBI database, 63,852 EST sequences of castor were mined. One thousand one hundred and five (1105) EST–SSRs and 1652 repeat motifs sequences were identified from 20,495 non-redundant unigene sequences. Repeat motifs consisted of 29.7 % mono nucleotide repeats, 24.8 % di nucleotide repeats, 27.27 % tri nucleotide repeats and 3.94 % tetra nucleotide repeats. Twenty eight primer pairs were chosen from SSR-containing ESTs to determine genetic diversity among 27 castor accessions. Twelve EST–SSRs showed polymorphism. Number of alleles detected were 2–3 with an average of 2.33 per locus. 150–400 bp was the size of an allele. Dendrogram analysis grouped the 27 accessions into two separate clusters. Genetic similarity coefficient of dendrogram ranged from 0.24 to 0.83. The polymorphic information content value of 0.28–0.49 revealed medium level of diversity in castor. Results of present study indicated that EST–SSRs to be efficient markers for genetic diversity studies. Knowledge on level of diversity existing in castor genotypes would be useful for breeders to plan efficient hybrid breeding programme.
Electronic supplementary material
The online version of this article (doi:10.1007/s12298-016-0367-x) contains supplementary material, which is available to authorized users.
Keywords: Castor, EST–SSRs, Polymorphism, Genetic diversity
Introduction
Castor (Ricinus communis L., 2n = 20) belongs to Euphorbiaceae family. The crop has use as non-edible oil across the world. Major castor growing countries include India, Brazil, China, Russia and Thailand (Nagesh Kumar et al. 2015). India accounts for a total production of 17.33 lakh tonnes from an area of 11.05 lakh ha and a productivity of 1568 kg ha−1 during 2014–2015. Castor seeds contain highest oil content of 40–55 % when compared with other oilseed crops. Kernels contain oil that ranges between 64–71 %. Among all the vegetable oils, castor oil has the distinct advantage due to its high level (>85 %) of fatty acid known as ricinoleic acid which makes it world’s most useful and economically important natural oil. It has the best raw industrial oil owing to its highly stable nature and as it shows minimal variation in fatty acid. Castor oil and it’s derivatives mainly used in industry sector (Ogunniyi 2006) and castor cake is very useful as organic manure. The non-edible nature of castor oil is due to the presence of ricin / Ricinus communis agglutin (RCA).
Castor a highly cross pollinated crop has recorded 15–20 % of heterosis in the hybrids. India has been credited with release of world’s first castor hybrid GCH-3. Several high yielding hybrids coupled with resistance / tolerance to various biotic and abiotic stresses were released and are under commercial cultivation in varied agro-climatic conditions. Already released hybrids occupy most of the castor growing area because hybrids are high yielders than natural varieties (Gouri Shankar et al. 2013). Selection of diverse potential parents is necessarily used to exploit higher levels of heterosis in hybrid breeding programme. Genetic diversity primarily has been assessed with both conventional and molecular markers. In the recent past large number of DNA based markers have been developed, such as restriction length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), simple sequence repeat polymorphism (SSR), intron length polymorphism (ILP) and single nucleotide polymorphism (SNP) (Dudhe et al. 2014). Conventional diversity analysis procedures consume time, labour intensive and affected by environments. Genetic diversity assessment utilizing DNA based markers has been rapid and reliable. To study the extent of genetic diversity in castor, DNA based markers have been extensively used viz., RAPD, AFLP, RFLP and SSR (Anjani 2012). It has been well known that SSR markers are co-dominant, abundantly found throughout the genome and highly polymorphic, easily reproducible, hence widely favored by researchers. The expressed sequence tags (ESTs) which is expressed portion of gene is a valuable genomic resource for the mining of gene targeted SSR markers. Due to availability of large number of bioinformatic tools and huge sequence level information, mining of EST–SSRs and genomic SSRs has gained importance in recent past (Dudhe and Sarada 2012). The interspecific transferability of EST–SSRs and genomic SSRs has been well proved (Wen et al. 2010). EST–SSR markers have been widely applied in gene tagging, linkage map construction and QTL mapping (Qiu et al. 2010). Hence, for various crops, EST–SSRs have been developed and widely used in crop improvement (Qiu et al. 2010). In Castor the availability of EST–SSR markers are limited when compared with other crops viz., wheat, rice, maize, chickpea, pigeon pea, groundnut, sunflower and rapeseed mustard. Hence, the present study was taken up to mine EST–SSR markers by using available ESTs in public domain, to study the occurrence and distribution pattern of motifs and their loci and to prove their working nature by utilizing them in the genetic diversity study in certain castor genotypes.
Materials and methods
Twenty seven castor accessions including checks were grown in a randomized block design with two replications with four plants each. Spacing of 90 × 60 cm was followed. The plants were grown during Rabi, 2012. Pistillate and male lines used in present study are the parental lines of the existing hybrids. The other elite material used is either at par or superior to existing parental lines or existing checks for oil and yield parameters. The elite lines combine the genes from different existing donors for particular trait. These lines were systematically evaluated at different AICRP castor research stations viz., IIOR, Hyderabad; RARS, Palem (PJTSAU, Telangana); Junagadh (JAU, Gujrat) and Sardar Krishi Nagar (SDAU, Gujarat) for particular trait. The information on characters like stem colour, capsule nature, bloom, pistillateness, disease and pest tolerance of the accessions used in the study have been presented (Table 1).
Table 1.
Information on morphological diversity in castor accessions
| Genotypes | Source | Information about genotypes | Morphological traits | ||
|---|---|---|---|---|---|
| Stem colour | Capsule | Bloom | |||
| 1. DPC-9 | IIOR, Rajendranagar | Pistillate line | Green | Spiny | Zero |
| 2. RG-48 | IIOR, Rajendranagar | Elite line (germplasm accession) | Red | Non spiny | Double |
| 3. RG-43 | IIOR, Rajendranagar | Early flowering, resistant to reniform nematode and leaf hopper | Red | Spiny | Triple |
| 4. RG-1354 | IIOR, Rajendranagar | Fusarium wilt resistant | Green | Spiny | Double |
| 5. RG-47 | IIOR, Rajendranagar | Fusarium wilt resistant and root rot | Green | Spiny | Triple |
| 6. RG-67 | IIOR, Rajendranagar | Elite line (germplasm accession) | Green | Spiny | Double |
| 7. RG-1686 | IIOR, Rajendranagar | Male line (germplasm accession) | Green | Spiny | Double |
| 8. DCS-78 | IIOR, Rajendranagar | Male line | Green | Spiny | Double |
| 9. RG-20 | IIOR, Rajendranagar | Elite line (germplasm accession) | Green | Non spiny | Double |
| 10. PCS-293 | RARS, Palem, Telangana | Elite line | Green | Spiny | Double |
| 11. PCS-312 | RARS, Palem, Telangana | Elite line | Red | Spiny | Double |
| 12. PCS-106 | RARS, Palem, Telangana | Elite line | Green | Spiny | Double |
| 13. PCS-236 | RARS, Palem, Telangana | Elite line | Red | Spiny | Double |
| 14. PCS-252 | RARS, Palem, Telangana | Elite line | Red | Spiny | Double |
| 15. PCS-230 | RARS, Palem, Telangana | Elite line | Green | Spiny | Double |
| 16. PCS-171 | RARS, Palem, Telangana | Elite line | Green | Spiny | Triple |
| 17. PCS-265 | RARS, Palem, Telangana | Elite line | Red | Spiny | Double |
| 18. PCS-228 | RARS, Palem, Telangana | Elite line | Red | Spiny | Double |
| 19. PCS-224 | RARS, Palem, Telangana | Elite line | Green | Spiny | Double |
| 20. PCS-302 | RARS, Palem, Telangana | Elite line | Green | Spiny | Double |
| 21. JP-65 | JAU, Junagadh, | Pistillate line | Green | Spiny | Zero |
| 22. SKI-215 | S.K. Nagar, Gujarat | Male line, Fusarium wilt resistant | Red | Non spiny | Double |
| 23. M-574 | IIOR, Rajendranagar | Pistillate line | Green | Spiny | Triple |
| 24. RG-1(Aruna) | IIOR, Rajendranagar | Check variety | Green | Spiny | Double |
| 25. Kiran | RARS, Palem, Telangana | Check variety | Red | Non spiny | Double |
| 26. Kranthi | RARS, Palem, Telangana | Check variety | Red | Spiny | Double |
| 27. Haritha | RARS, Palem, Telangana | Check variety | Green | Spiny | Double |
DNA extraction, amplification
Leaf material was collected at 30 DAS. DNA was extracted by utilizing CTAB (Doyle and Doyle 1990). Purified total DNA was checked for quality utilizing spectrophotometer (Glasel 1995). DNA in samples was quantified by loading 1 μl on to 0.8 % agarose gels. To assess the quality and the quantity uncut λ DNA was loaded as control. DNA was amplified in 10 µl containing 25 ng µl−1 DNA template 3 µl, 1× PCR reaction buffer 15 mM Tris HCl 1 µl, 1U Taq DNA polymerase (Genei, Bangalore), forward primer 10 µm, reverse primer 0.5 µm, 0.2 mM dNTPs 1 µl and sterile distilled water 3.9 µl. PCR was carried out in a Veriti™, 96-well Thermal cycler of Applied Biosystems, CA, USA. The PCR set up used was initial denaturation at 94 °C for 5 min, followed by 35 cycles of 45 s at 94 °C denaturation, 1 min at specific annealing temperatures and 1 min at 72 °C extension followed by a final extension at 72 °C for 10 min. PCR products were separated by electrophoresis in 3 % Lonza agarose gel in 0.5× TAE buffer at 120 V with 0.05 µg µl−1 ethidium bromide used for screening of genotypes. Samples were loaded along with 100 bp ladder as reference (Genei, Bangalore). Gels were documented after separation of bands using the Molecular Imager Gel Doc (Bio-RAD).
Detection of EST SSR containing sequences
In the present study, a total of 63,852 ESTs (54.4 Mb) were downloaded in FASTA format from National Center for Biotechnology Information (NCBI) (www.ncbi.nlm.nih.gov/). Total length of EST sequence was 45,046,231 bp with an average of 706 space length base pair. EST sequences were assembled by CAP3 software to seek unigene sequences (http://genome.cs.mtu.edu/sas.html). These sequences were used to identify microsatellites utilizing SSR Locator (da Maia et al. 2008). SSRs (motifs ranging from 1 to 10 nucleotides) were searched to evaluate the pattern of EST–SSRs distribution. The present study was taken up to select the repeat number parameters ≥20 for mono, ≥10 di, ≥7 for tri, ≥5 for tetra, ≥4 for penta and hexa and ≥3 for hepta, octa, nano and deca nucleotides respectively. Since, longer SSRs had higher probability to be polymorphic, longest sequence used for SSR identification was 1361 bp.
Primer designing and in silico PCR of EST–SSRs
Non-redundant 20,495 unigene sequences containing EST sequences were used for primer designing utilizing standard parameters of primer3 software inbuilt in SSR locator. The parameter set up used for primer designing were PCR primer length of 15–25 and 20 bp as optimum with product size of 150–400 bp, 55–61 °C annealing temperature with 60 °C as optimum and 40–60 % minimum GC content with 50 % being the optimum. In silico PCR was run which has inbuilt SSR Locator software to identify the suitable primers that yielded desired product size.
Data analysis
List of 28 randomly selected EST–SSR markers with expected amplicon size has been represented (Table 2). Accessions were genotyped based on the presence (1) or absence (0) of peaks represented as bands for each EST–SSRs. Faint or unclear bands were not considered for scoring to calculate the polymorphic information content (PIC) value (Anderson et al. 1993). The qualitative data generated was used to generate a matrix determining similarity among the samples. Similarity matrix was used to draw a dendrogram with Unweighted Pair Group Method with Arithmetic Mean (UPGMA) clustering method via SHAN module of NTSYS-pc version 2.2j software of Applied Biostatistics Inc, Setauket, NY, USA for a phylogenetic representation of genetic relationships calculated by using the similarity coefficient (Sneath and Sokal 1973).
Table 2.
List of 28 EST–SSR markers with expected amplicon size
| S. no. | Locus name | Forward and reverse primers sequences | Amplicon size |
|---|---|---|---|
| 1 | RcDES593 | F:GTTCAAGCTCTAGTTCGGTGAG R: TACACTTTCTGTCTGTCCATGC |
371 |
| 2 | RcDES 598 | F:GAAAGCAACAAGAAAAGGTCTG R: AAGCTGGTGGTTTACTTGCTAC |
346 |
| 3 | RcDES 599 | F:CTTCATTAAGCCATCAAAGAGC R:AATGTAGTGTCTGTCTGTTGCG |
256 |
| 4 | RcDES 596 | F:TCCACATTGCTAACAAGCATAG R:AAATGCACACAGCTAAGACAAG |
381 |
| 5 | RcDES 597 | F:CAGCCAAACATAAGATTCATGC R:ATCATCTGGGTGTCTCAAAATC |
379 |
| 6 | RcDES 595 | F:TCTCATCACTAACAAACCAGCC R:GGTTCTGAAAGAAGTGAAATGG |
128 |
| 7 | RcDES 592 | F:GGCTTCTCCTCTGTTTGTTATG R: ACCACATGCAACAATCACAAG |
139 |
| 8 | RcDES 594 | F:GGCTTCTAATCTTTACTTCACC R: TATTTTCATCACCGACCCTAAC |
35 |
| 9 | RcDES 600 | F:CCGCTCACTACAAGTGGTACTC R:GAGTACTGGACAGCGATGAGAC |
387 |
| 10 | RcDES 601 | F:TAACTTGTCTGTTTTGGGTGAC R:GATTGGAGTCATGGAGAGAGAG |
258 |
| 11 | RcANGRAU 1 (3) | F:ACGAGGCTCTGTCTCTGTGT R:GGCAAAATTCAACACCATTC |
125 |
| 12 | Rc ANGRAU 18 (34) | F:TTAGGGTTTGGTTCTTTGGA R:TCAAGTGCCCATTTTAGAGC |
186 |
| 13 | Rc ANGRAU 20 (39) | F:AGCAGCAACAAACTACGCTT R:GTTGCATCGATAGAGCCTGT |
279 |
| 14 | Rc ANGRAU 22 (41) | F:AAAAGGTGGGATAGAGCCTGT R:CCATCAAGAATACCCTCCCT |
400 |
| 15 | Rc ANGRAU 25 (46) | F:CTGCTTCTCTCTGCATTGTG R:CATGGGATCTTTGGTTCTTG |
276 |
| 16 | RcANGRAU 33 | F:ATTTATTGCATGTTGCAGCC R:GCAAAGTTCACCGATGAGAA |
378 |
| 17 | Rc ANGRAU 34(56) | F:AAAAGACAAAGCAGCACCAC R:GCAGCTCCGAGTTGTACG |
225 |
| 18 | Rc ANGRAU 43 (65) | F:AACGAAGCATTTTAATCCCC R:GTGAGCGAGATTATCGGAGA |
367 |
| 19 | Rc ANGRAU 46 (68) | F:TCCCATCAGTTTTTGCTGTT R:AGAATAGGATCCATCCTGCC |
259 |
| 20 | Rc ANGRAU 6 (13) | F:ATACTCTCAGTGCTGCTGCC R:AAAACGTGTAAACGGGATCA |
252 |
| 21 | GB RC 019 | F:GTATGGTACGATCTCTTTGGAC R:TACGCAGAGAAGCACTAATAGA |
277 |
| 22 | GBRC 021 | F:AAGCTCAACTTAAAAGCCTAGA R:ATTTTGGTGGTCTCTAGTTCAGTC |
211 |
| 23 | GBRC 080 | F:TTGAGGAAACAGAAGATCAAAT R:AGCACGCCAATACCTCTTGTAA |
213 |
| 24 | GB RC 105 | F:TAGATTTTTATGGATAGGTGCC R:ATGTAGACACTTGACTCACGAA |
206 |
| 25 | Rc DES 28 | F:CCCAAAAGACTAACAACAACC R:ATGCATCTGTTGGAGTTGC |
239 |
| 26 | Rc DES 45 | F:CACAAACACACATATCATGTCC R:CTCAAGTGCATCTGAAACG |
218 |
| 27 | Rc DES 46 | F:ACGAGGAGGGAGACTAAATGC R:CACTGATATACACACCACAGTGAC |
229 |
| 28 | Rc DES 100 | F:CTGGCATTGCAGATCGTATGA R:GCGCCACCACCTTGATCTT |
233 |
Results
Detection and distribution of EST–SSR motifs
From 63,852 ESTs, 20,495 (32.09 %) non-redundant unigene sequences were identified (Table 3). These 20,495 unigene sequences were used for SSR mining as explained in materials and method. The longest sequence containing microsatellite was of 1361 bp. Total 1105 microsatellites have been identified from 63,852 ESTs which contributes to 1.73 % (Supplementary Table 1). Twenty eight primer pairs were synthesized for further study which accounts for 2.53 % of the total number of EST–SSRs identified. The number and percentages of the identified repeat motifs revealed that mono, di and trinucleotides are present in large numbers and in more or less in equal proportion (Table 4).
Table 3.
Summary statistics of ESTs assembled and total number of EST–SSRs identified
| Feature | Statistics |
|---|---|
| Total number of ESTs used | 63,852 (100 %) |
| Total length of EST sequence | 45,046,231 bp (100 %) |
| Average EST length | 720 bp (0.001 %) |
| Unigene sequences identified | 20,495 (32.09 %) |
| Longest sequence containing EST–SSR | 1361 bp (0.003 %) |
| Total number of EST–SSRs identified | 1105 (1.73 %) |
| Total number of primers synthesized | 28 (2.53 %) |
Table 4.
Number and percentages of repeat motifs identified
| Mono | Di- | Tri- | Tetra- | Penta- | Hexa- | Hepta- | Octa- | Nova- | Deca- | Total |
|---|---|---|---|---|---|---|---|---|---|---|
| 483 | 411 | 456 | 66 | 65 | 127 | 35 | 4 | 21 | 4 | 1672 |
| 28.89 % | 24.58 % | 27.27 % | 3.94 % | 3.89 % | 7.6 % | 2.09 % | 0.24 % | 1.26 % | 0.24 % | 100 % |
Occurrence patterns of motifs and their loci
Number and percentages of repeat motifs identified have been presented (Table 4). For 4-mer to 6-mer only frequent motifs have been studied and presented. For other motifs all possible arrangements were studied and presented (Supplementary Table 2). A/T monomer repeats were found in 475 loci with breakup of 260 (55 %) and 215 (45 %) loci formed by A and T nucleotides respectively. C/G motifs were found in 7 loci with breakup of five (71 %) and two (29 %) respectively. SSRs with predominant repeats were A/T i.e. 100 % monomer loci. In the present study monomers occurred at 29.17 % of 1672 identified loci. AG/CT motifs were the most abundant adding upto 60 and 92 loci respectively. For the tri repeats AAG/CTT are the most frequent and comprised of 25 and 33 loci respectively. Among 4-mers, many different arrangements were found. AAGA/TCTT motif has been presented at 10 loci, 5-mers AGAAA/TTTCT at 7 loci, 6-mer AGAAGG/CCTTCT at 1 locus. From all the remaining repeats occurrences have been widely distributed with low percentage values for each arrangement. For 7-mer, 8-mer, 9-mer and 10-mer repeats, total occurrences were found to be 35, 4, 21 and 4 respectively.
Distribution of amino acids, number of loci and total repeats
Another area of this investigation has been to study the abundance of amino acids predicted from EST–SSRs. The distribution of amino acids has been presented (Supplementary Table 3; Fig. 1). Among these leu was most abundant (92 % loci and 663 repeats), followed by ser (91 % of loci and 589 repeats), Phe, Iln Gln, Lys, Pro, Glu, Arg, Ala, His (50–36 % loci and 372–212 repeats) followed by Thr, Term, Asn, Gly, Cys and Asp (34–21 % of loci and with 196–119 repeats). Met, Trn, Val and Tyr have least percentage of loci (12–6) and repeats (92–30).
Fig. 1.
Total number of amino acid distribution, total repeats and number of loci identified
Genetic diversity
Castor cultivation has been for seed oil in tropical and subtropical regions. In the present study, 12 polymorphic EST–SSRs detected twelve alleles. A wide range of fragments from 150 to 400 bp were observed. Castor genotypes were tested with 28 primer sets. Among them, 12 primer sets showed polymorphism with PIC values that ranged between 0.28 and 0.49 with an average of 0.339. Remaining primer set did not show any polymorphism. Number of alleles detected by individual EST–SSR averaged 2.3. Cluster analysis was performed to generate a dendrogram using 27 castor genotypes (Fig. 2). Cluster I included 26 accessions (96.30 %) while cluster II accommodated single accession (3.70 %) with 0.83 % similarity coefficient. M-574 a single genotype formed separate OUT in cluster showed less similarity coefficient of 0.24 %. M-574 is a pistillate (female) line generated through mutation breeding of VP-1. VP-1 genotype is widely used female parent in castor hybridization programmes and is female parent for several commercially released hybrids. However, it is susceptible to wilt. VP-1 is subjected to mutations using gamma rays to develop a line for wilt tolerance or resistance. M-574 is one of the derivatives of the mutation breeding with wilt resistance.
Fig. 2.
UPGMA dendrogram of genetic relationship among 27 castor accessions analysis using 28 EST–SSR markers
Discussion
Frequency and distribution of EST–SSRs
Due to availability of large scale genomic information for majority of oilseed crop, the in silico studies have been an important area of interest for the identification of novel useful genes (Dudhe et al. 2012). Genomic resources available for castor have been successfully utilized for the development of SSR markers but still the number of markers available for castor improvement has been limited. It is well known fact that SSRs can be mined from genic and non geneic regions. Earlier in castor Sharma and Chauhan (2011) mined 73 % of SSRs from genic regions and 27 % in non genic regions. SSRs were dense in the putative genic regions of 5′UTRs (26 %), introns (25 %), 3′UTRs (16 %) and in exons (6 %). SSRs were present at very high frequency in castor (one SSR per every 1.7 kb or 2.45 EST or 67 % of sequenced clones) as given in table owing to its small genome size (680 bp to 1.2 kb) when compared to either Jatropha or Hevea or Cassava (2.2–7 kb) to which castor belongs or even compared to Rice (6 kb) or Arabidopsis (6 kb) or even plant genome (6.4 kb) (Table 5). High density of SSRs (excluding mono repeats) present in castor genic sequences, compared to smaller SSRs makes them liable to polymorphism (Qiu et al. 2010). They identified 7732 SSR sites from 5122 ESTs (unigene) in total of 18,928 ESTs (17.03 Mb). In our study we have mined 1672 SSR sites from 20,495 ESTs (unigene) in total of 63,852 ESTs (54.4 Mb). We have downloaded nearly 3 times more EST sequences and identified 4 times unigene compared with Qiu et al. (2010) which is the greatest advancement of this study hence, in the present study the chances of getting novel EST–SSRs are more.
Table 5.
Frequency of SSRs in the genome of various crop plants
| Crop | Frequency of SSRs in non redundant sequences | Reference |
|---|---|---|
| Castor | 1.7 kb or one SSR every 2.45 EST | Qiu et al. 2010 |
| Castor | 66.7 % of sequenced clones | Seo et al. 2011 |
| Castor | 1.2 kb | Pranavi et al. 2011 |
| Castor | 680 bp | Sharma and Chauhan 2011 |
| Jatropha | 5.95 kb | Yadav et al. 2011 |
| Hevea | 2.2 kb | Feng et al. 2009 |
| Cassava | 7.0 kb | Raji et al. 2009 |
| Rice | 6 kb | Varshney et al. 2002 |
| Arabidopsis | 6 kb | Cradle et al. 2000 |
| Plant genome | 6.4 kb | Gupta and Varshney 2000 |
Sharma and Chauhan (2011) stated that castor genome has a lot of repeat/motif variation. SSRs in UTR regions showed 80.2 % MNRs, 70.3 % DNRs, 83.2 % TTNRs and 77.1 % PNRs as compared to exons which contained 76.1 % TNRs and 73.2 % HNRs. Among the MNRs, A/T accounted to 98.54 % in our study, which indicates the larger presence of these repeats (Supplementary Table 2) whereas Qiu et al. (2010) reported 96.9 %. Hence, from both the studies it can be concluded that in castor genome MNR, A/T are the richest repeat.
DNRs varied between 24.8 and 80.4 % in the castor genome (Table 6). AG/GA and AG/CT were the dominant motifs. DNRs were present in non genic regions, introns, 5′UTRs and 3′UTRs. Earlier Seo et al. (2011) observed di-nucleotide repeats to range from 4 to 52 (mean 12.4). Generally AG/CT and AC/GT abundance has been common in plant genome. Higher frequency of DNR motif is a characteristic of plant genomes (Wang et al. 1994). It is evident from earlier reports that DNRs in SSRs from 5′UTR regions with repeat unit length of 16–30 bp showed higher polymorphism (Sharma and Chauhan 2011). GA/CT motifs were understood to be high due to high level of occurrence of the translated amino acid products of the motifs (Kantety et al. 2002). Slipped strand mispairing during replication and mutations have been the cause for the variation in frequencies and abundance of different repeat motifs in SSR length (Levinson and Gutman 1987). Even in the present study, AG/CT (36.9 %) was followed by GA/TC (30.9 %). Least present were CA/TG and AC/GT (Supplementary Table 2). The result confirms the finding that AG/CT repeat motif occurs predominantly with a frequency of about twice that of AC/GT (Seo et al. 2011). Abundance of AG/CT regions can be explained by the fact that they were the preferred regions for amplification. On the other hand, it was recorded that DNA fragment that was directed to primers with AT microsatellite repeats did not get amplified (Vasconcelos et al. 2012).
Table 6.
Distribution and abundance of repeat motif in Castor genome expressed in percent
| DNR | TNR | TTNR | Dominant repeat | Reference | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| DNR | % in genome | DNR % | TNR | % in genome | TNR % | TTNR | % in genome | TTNR % | ||
| AT | 51 | 43 | TCT | 29 | – | – | 12 | – | DNR | Sharma and Chauhan (2011) |
| AG/GA | 62.4 | 68.9 | AAG/AGA | 33.5 | 25.9 | – | 2.2 | – | DNR | Pranavi et al. (2011) |
| AG/CT | 80.4 | 51.7 | CTT/AAG | 17 | 5.2 | AAAT/AGAT | 2.6 | 0.7 | TNR | Seo et al. (2011) |
| AG/CT | 25.1 | 69.6 | AAG/CTT | 47.8 | 23.7 | AAAG/CTTT | 2.84 | 40.9 | TNR | Qiu et al. (2010) |
| AG/CT | 24.8 | 36.9 | AAG/CTT | 27.60 | 12.7 | AAGA/TCTT | 2.9 | 20.8 | TNR | Present study |
Overall TNRs varied between 17 and 47.8 % in the castor genome. TNR number ranged from 4 to 56 with a mean of 7.35. Tri-nucleotide repeats were dominant (5.2–25.9 %) followed by di-nucleotide repeat motifs. Among TNRs, AAG/CTT was the most dominant, followed by AAG/AGA and AAG/CTT. In cereals also, TNRs were recorded and the most frequent followed by DNRs and TTNRs when MNRs were excluded (Varshney et al. 2005). CCG/CGG was relatively rare in dicotyledonous plants (Morgante et al. 2002). Tri motifs were harbored in exons. The di and tetra motifs / SSRs present in UTR regions were identified to give higher polymorphism when compared with tri motifs which are abundant in exonic regions. In earlier reports, di, tri and tetra motif loci polymorphism proportion amounted to 54.8, 28.8 and 47 % whereas TTNRs and PNRs were randomly distributed (Sharma and Chauhan 2011). Higher order repeat motifs (more than three bases) as well as the compound repeat were found to have less polymorphism as compared to lower order repeats (Feng et al. 2009). Such amplification and polymorphism represented primers which contained more than 15 repeat units (Sharma and Chauhan 2011). The present investigation showed MNRs, DNRs and TNRs to be in equal proportions (29.7, 24.8 and 27.6 %) and followed by other repeats (Table 4).
TNRs have been understood to be runs of particular amino acids. Most frequent amino acid runs in castor SSRs as reported were those belonging to serine (TCT)n, glutamate (GAA)n, arginine (CGC)n and phenylalanine (TTC)n (Sharma and Chauhan 2011). In plant ESTs, TNRs abundance in SSRs may be attributed to the absence of frame shift mutations within the transcribed genes which have resulted in the suppression of non-tri nucleotide SSRs in coding regions (Metzgar et al. 2000). Among the TNRs, codon repeats correspond to small hydrophilic amino acids which have been easily tolerated than hydrophobic and basic amino acids (Katti et al. 2001). Four thousand nine hundred and seventy eight (4978) codons were identified in castor EST–SSRs sequence data. Maximum codons were of leucine 663 repeats (13.31 %, CTT) followed by serine 589 repeats (11.83 %, TCT) and tyrosine with least 6 repeats (0.12 %, TAT) (Supplementary Table 3). Most frequent amino acid runs reported in castor included 16.5 % serine (TCT)n followed by 13.6 % glutamate (GAA)n, 12.3 % arginine (CGC)n and 9.7 % phenylalanine (TTC)n (Sharma and Chauhan 2011). Few other essential amino acids more than 5 % present were phenylalanine (7.47 %, TCT), isoleucine (7.03 %, ATT) and lysine (5.68 %, AAA). The number of loci identified was maximum for leucine (92 loci) and minimum for tyrosine (6 loci) (Fig. 1).
The predicted genome size of castor was estimated to be 323 Mb (Arumuganathan and Earle 1991). In the present study we have mined 54.4 Mb ESTs data to identify 1105 EST–SSRs. The high frequency and distribution of EST–SSRs in this study may be attributed to small genome size and due to criteria used for SSR search by using SSR Locator. Variation in the frequency in SSRs in the EST of a particular species have been attributed to the criteria adapted in SSR search in the database mining approaches and to the quantity of data used to identify SSRs (Varshney et al. 2002 and Yan et al. 2008).
Polymorphism of EST–SSRs
Twelve randomly chosen EST–SSRs were identifed as polymorphic with products of expected size which contributes to 42.85 % of the total primer pairs synthesised. The high rate of polymorphism may be due to compound nature of SSRs. Twenty eight alleles were detected with an average of 2.3 alleles for each locus containing TNRs as a dominant motif. In previous studies, SSRs with DNRs showed higher allele numbers of 2.7 per locus followed by 2.3 TNRs per locus (Sharma and Chauhan 2011). From 118 loci which contained 350 alleles i.e. 223 from di loci and 107 from tri loci, an average of 2.97 alleles per locus was recorded. Di, tri and tetra motif loci proportion was 54.8, 28.8 and 47 % respectively. The proportion of polymorphic primers was 41.1 % when null allele primers and those that harbor introns were excluded resulting in 2.97 alleles per marker. This revealed that the polymorphic ratio of EST–SSR primers to be at medium level in castor (Qiu et al. 2010).
PIC values for the tested primers varied from a low of 0.27 (Rc DES 596) to a high of 0.49 (Rc DES 592) with an average value of 0.332. Allan et al. (2008) reported 9 genomic SSR markers, 0.403 gene diversity (PIC) with an average of 3.01 alleles per locus. Bajay et al. (2009) developed 12 genomic SSR markers with an average of 0.416 gene diversity (He) and 3.3 alleles per locus. Qiu et al. (2010) reported 118 polymorphic SSR loci, 0.36 PIC and 2.97 alleles per locus which suggested that EST–SSR markers developed had moderate level of polymorphism. Seo et al. (2011) developed 28 polymorphic SSR loci with PIC value of 0.26 and an average of 3.18 alleles per locus. Gajera et al. (2010) reported 5 polymorphic ISSRs with an average PIC value of 0.88. Pecina-Quintero et al. (2013) used the proven SSRs of Bajay et al. (2009) and obtained PIC values that ranged from 0.5 to 0.812 and 5.5 alleles per marker. Data indicated that genetic variation was mostly due to variation within populations, 10.5 % to among populations and 0.5 % to among regions. PIC values as reported can range from 0 to 1. PIC of 0 markers has only one allele and at PIC of 1 marker would have an infinite number of alleles. More than 0.7 is considered to be highly informative (Hildebrand et al. 1992). All the above results which also include the present study on castor show PIC values between 0.27 and 0.49. PIC values are important criteria to access diversity or level of polymorphism. The PIC value will be almost zero if there is no allelic variation and it can reach a maximum of 1.0 if a genotype has only new allele which is a rare phenomenon. Hence, in our study PIC values indicate the moderate level of polymorphism or gene diversity in castor genome, which may help future researcher in selection of the informative markers.
UPGMA analysis showed 27 accessions to be grouped into two main clusters. Similarity coefficient between accessions ranged between 0.24 and 0.83 (Fig. 2). Cluster I included 26 accessions (96.30 %) while cluster II accommodated single accession (3.70 %) with 0.83 % similarity coefficient. M-574 a single genotype formed separate OUT in cluster showed less similarity coefficient of 0.24 %. M-574 is a wilt resistant pistillate line with condensed internodes and cup shaped leaves, developed from VP-1 through mutation breeding. The castor genotypes used in the study showed variation in their morphological characters (bloom, stem colour and capsule spininess). Despite the observed morphological variation, variability recorded by use of DNA markers could not be observed at molecular level and this was reflected in the low PIC (gene diversity) and clustering of genotypes into few groups. Studies also confirm that Ricinus species does not have a specific genetic pattern or markers that do not allow differentiation based on geographical origin and DNA fragments do not appear to be associated with differentiation of population (Pecina-Quintero et al. 2013). The first report which came out with low polymorphism indicated the narrow genetic base (Allan et al. 2008). Milani et al. (2009) studied diversity structure based on phenotypic traits and RAPD markers. They reported accessions from different countries that showed lack of geographically structured clusters. Earlier reports on diversity study conducted with 200 RAPD and 21 ISSR primers revealed low variability (Gajera et al. 2010). Studies with 232 high-quality single nucleotide polymorphisms showed large molecular variance within populations as compared to those among populations which confirm a very weak geographic stature among castor populations. This result further confirms the previous findings obtained with dominant RAPD markers (Foster et al. 2010).
Finally we conclude that by using SSR Locator tool, castor EST databases can be systematically searched for EST–SSR markers. The newly identified and synthesized EST–SSR markers based on PIC values indicate the moderate level of polymorphism or gene diversity in castor genome. The results of the study inferred that EST–SSR markers have potential to study genetic diversity and can be used for trait marker relationship studies. Plant breeders with the knowledge of genetic diversity of castor genotypes can design well organized crop improvement programme.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgments
Authors sincerely acknowledge the financial support provided under the University Grants Commission (UGC), New Delhi Major Research Project Grant No. 40-40/2011 (SR) to carry out the present work to the corresponding author and PI of the project.
Abbreviations
- EST
Expressed sequence tag
- SSR
Simple sequence repeat
- PIC
Polymorphic information content
- MNR
Mono nucleotide repeats
- DNR
Di nucleotide repeats
- TNR
Tri nucleotide repeats
- TTNR
Tetra nucleotide repeats
- NCBI
National Center for Biotechnology Information
- UPGMA
Unweighted pair group method with arithmetic mean
- UTR
Untranslated region
References
- Allan G, Williams A, Rabinowicz PD, Chan AP, Ravel J, Keim P. Worldwide genotyping of castor bean germplasm (Ricinus communis L.) using AFLPs and SSRs. Genet Resour Crop Evol. 2008;55:365–378. doi: 10.1007/s10722-007-9244-3. [DOI] [Google Scholar]
- Anderson JA, Churchill GA, Autrique JE, Tanksley SD, Sorrells ME. Optimizing parental selection for genetic linkage maps. Genome. 1993;36:181–186. doi: 10.1139/g93-024. [DOI] [PubMed] [Google Scholar]
- Anjani K. Castor genetic resources: a primary gene pool for exploitation. Ind Crops Prod. 2012;35:1–14. doi: 10.1016/j.indcrop.2011.06.011. [DOI] [Google Scholar]
- Arumuganathan K, Earle E. Nuclear DNA content of some important plant species. Plant Mol Biol Rep. 1991;9:208–218. doi: 10.1007/BF02672069. [DOI] [Google Scholar]
- Bajay MM, Pinheiro JB, Batista CEA, Nobrega MBM, Zucchi MI. Development and characterization of microsatellite markers for castor (Ricinus communis L.), an important oleaginous species for biodiesel production. Conserv Genet. 2009;1:237–239. doi: 10.1007/s12686-009-9058-z. [DOI] [Google Scholar]
- Cradle L, Ramsay L, Milroure D, Macaulay M, Marshall D, Waugh R. Computational and experimental characterization of physically clustered simple sequence repeats in plants. Genetics. 2000;156(2):847–854. doi: 10.1093/genetics/156.2.847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- da Maia LC, Palmieri DA, de Souza VQ, Kopp MM, de Carvalho FIF, de Oliviera AC. SSR locator: tool for simple sequence repeat discovery integrated with primer design and PCR simulation. Int J Plant Genom. 2008 doi: 10.1155/2008/412696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doyle JJ, Doyle JL. Isolation of plant DNA fresh tissue. Focus. 1990;12:13–15. [Google Scholar]
- Dudhe MY, Sarada C. Comparative assessment of microsatellite identification tools available in public domain. DOR News Lett. 2012;18(2 & 3):8–9. [Google Scholar]
- Dudhe MY, Meena HP, Ranganatha ARG, Mukta N, Lavanya C. In silico-identification of conserved domains from EST database in safflower. J Oilseeds Res. 2012;29(Special issue):178–181. [Google Scholar]
- Dudhe MY, Sudhakarbabu O, Sudhakar C. Development of intron-flanking EST-specific primers from drought responsive expressed sequence tags (ESTs) in safflower. SABRAO J Breed Genet. 2014;46(1):56–66. [Google Scholar]
- Feng SP, Li WG, Huang HS, Wang JY, Wu TY. Development, characterization and cross-species/genera transferability of EST–SSR markers for rubber tree (Hevea brasiliensis) Mol Breed. 2009;23:83–97. doi: 10.1007/s11032-008-9216-0. [DOI] [Google Scholar]
- Foster JT, Allan GJ, Chan AP, Rabinowicz PD, Ravel J, Jackson P, Keim P. Single nucleotide polymorphisms for assessing genetic diversity in castor bean (Ricinus communis) BMC Plant Biol. 2010 doi: 10.1186/1471-2229-10-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gajera BB, Kumar N, Singh AS, Punvar BS, Ravikiran R, Subhash N, Jadeja GC. Assessment of genetic diversity in castor (Ricinus communis L.) using RAPD and ISSR markers. Ind Crops Prod. 2010;32:491–498. doi: 10.1016/j.indcrop.2010.06.021. [DOI] [Google Scholar]
- Glasel JA. Validity of nucleic acid purities monitored by 260 nm/280 nm absorbance ratios. Biotechniques. 1995;118:62–63. [PubMed] [Google Scholar]
- Gouri Shankar V, Venkata Ramana Rao P, Bindu Priya P, Nagesh Kumar MV, Ramanjaneyulu AV, Vishnuvardhan Reddy A. Genetic purity assessment of castor hybrids using EST–SSR markers. SABRAO J Breed Genet. 2013;45(3):504–509. [Google Scholar]
- Gupta PK, Varshney RK. The development and use of microsatellite markers for genetic analysis and plant breeding with emphasis on bread wheat. Euphytica. 2000;113:163–185. doi: 10.1023/A:1003910819967. [DOI] [Google Scholar]
- Hildebrand CE, Torney DC, Wagner RP. Informativeness of polymorphic DNA markers, New Mexico: Los Alamos Science Inc; 1992. pp. 100–102. [Google Scholar]
- Kantety RV, La Rota M, Matthews DE, Sorrells ME. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Mol Biol. 2002;48:501–510. doi: 10.1023/A:1014875206165. [DOI] [PubMed] [Google Scholar]
- Katti MV, et al. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol. 2001;18:1161–1167. doi: 10.1093/oxfordjournals.molbev.a003903. [DOI] [PubMed] [Google Scholar]
- Levinson G, Gutman GA. Slipped-strand mispairing-a major mechanism for DNA-sequence evolution. Mol Biol Evol. 1987;4:203–221. doi: 10.1093/oxfordjournals.molbev.a040442. [DOI] [PubMed] [Google Scholar]
- Metzgar D, Bytof J, Wills C. Selection against frame shift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000;10:72–80. [PMC free article] [PubMed] [Google Scholar]
- Milani M, Dantas FV, Martins WFS. Genetic divergence among castor bean genotypes by morphologic and molecular characters. Revista Brasileira de Oleaginosas e Fibrosas. 2009;3:61–71. [Google Scholar]
- Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002;30:194–200. doi: 10.1038/ng822. [DOI] [PubMed] [Google Scholar]
- Nagesh Kumar MV, Gouri Shankar V, Ramya V, Bindu Priya P, Ramanjaneyulu AV, Seshu G, Vishnu Vardhan Reddy D. Enhancing castor (Ricinus communis L.) productivity through genetic improvement for Fusarium wilt resistance—a review. Ind Crops Prod. 2015;67:330–335. doi: 10.1016/j.indcrop.2015.01.039. [DOI] [Google Scholar]
- Ogunniyi DS. Castor oil: a vital industrial raw material. Bioresource Tech. 2006;97:1086–1091. doi: 10.1016/j.biortech.2005.03.028. [DOI] [PubMed] [Google Scholar]
- Pecina-Quintero V, Anaya-López JL, Núñez-Colín CA, Zamarripa-Colmenero A, Montes-García N, Solís-Bonilla JL, Aguilar-Rangel MR. Assessing the genetic diversity of castor bean from Chiapas, México using SSR and AFLP markers. Ind Crops Prod. 2013;41:134–143. doi: 10.1016/j.indcrop.2012.04.033. [DOI] [Google Scholar]
- Pranavi B, Sitaram G, Yamini KN, Dinesh Kumar V. Development of EST–SSR markers in castor bean (Ricinus communis L.) and their utilization for genetic purity testing of hybrids. Genome. 2011;54:684–691. doi: 10.1139/g11-033. [DOI] [PubMed] [Google Scholar]
- Qiu L, Yang C, Tian B, Yang JB, Liu A. Exploiting EST databases for the development and characterization of EST–SSR markers in castor bean (Ricinus communis L.) BMC Plant Biol. 2010;10:1–10. doi: 10.1186/1471-2229-10-278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raji AA, Anderson JV, Kolade OA, Ugwu CD, Dixon AG, Lngelbrecht IL. Gene-based microsatellites for cassava (Manihot esculenta Crantz): prevalence, polymorphisms, and cross-taxa utility. BMC Plant Biol. 2009;9:118. doi: 10.1186/1471-2229-9-118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seo K-I, Lee G-A, Ma K-H, Hyun D-Y, Park Y-J, Jung J-W, Lee S-Y, Gwag J-G, Kim C-K, Lee M-C. Isolation and characterization of 28 polymorphic SSR loci from castor bean (Ricinus communis L.) J Crop Sci Biotechnol. 2011;14:97–103. doi: 10.1007/s12892-010-0107-7. [DOI] [Google Scholar]
- Sharma A, Chauhan RS (2011) Repertoire of SSRs in the castor bean genome and their utilization in genetic diversity analysis in Jatropha curcas. Comp Funct Genom. 2011:1–9, Article ID 286089. doi:10.1155/2011/286089 [DOI] [PMC free article] [PubMed]
- Sneath PHA, Sokal RR. Numerical Taxonomy. San Francisco: Freeman Press; 1973. [Google Scholar]
- Varshney RK, Thiel T, Stein N, Langridge P, Graner A. In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett. 2002;7:537–546. [PubMed] [Google Scholar]
- Varshney RK, Graner A, Sorrells ME. Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 2005;23:48–55. doi: 10.1016/j.tibtech.2004.11.005. [DOI] [PubMed] [Google Scholar]
- Vasconcelos S, Onofre AV, Milani M. Molecular markers to access genetic diversity of castor bean. In: Abdurakhmonov I, editor. Current status and prospects for breeding purposes. Rijeka: InTech; 2012. pp. 201–217. [Google Scholar]
- Wang Z, Weber JL, Zhong G, Tanksley SD. Survey of plant short tandem DNA repeats. Theor Appl Genet. 1994;88:1–6. doi: 10.1007/BF00222386. [DOI] [PubMed] [Google Scholar]
- Wen M, Wang H, Xia Z, Zou M, Lu C, Wang W. Development of EST–SSR and genomic-SSR markers to assess genetic diversity in Jatropha curcas L. BMC Res Notes. 2010;3:42. doi: 10.1186/1756-0500-3-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yadav HK, Ranjan A, Asif MH, Mantri S, Sawant SV, Tuli R. EST-derived SSR markers in Jatropha curcas L. development, characterization, polymorphism, and transferability across the species/genera. Tree Genet Genom. 2011;7:207–219. doi: 10.1007/s11295-010-0326-6. [DOI] [Google Scholar]
- Yan QL, Zhang YH, Li HB, Wei CH, Niu LL, Guan S, Li SG, Du LX. Identification of microsatellites in cattle unigenes. J Genet Genom. 2008;35:261–266. doi: 10.1016/S1673-8527(08)60037-5. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


