Abstract
Background
The isolation of microsatellite markers remains laborious and expensive. For some taxa, such as Lepidoptera, development of microsatellite markers has been particularly difficult, as many markers appear to be located in repetitive DNA and have nearly identical flanking regions. We attempted to circumvent this problem by bioinformatic mining of microsatellite sequences from a de novo-sequenced transcriptome of a butterfly (Euphydryas editha).
Principal Findings
By searching the assembled sequence data for perfect microsatellite repeats we found 10 polymorphic loci. Although, like many expressed sequence tag-derived microsatellites, our markers show strong deviations from Hardy-Weinberg equilibrium in many populations, and, in some cases, a high incidence of null alleles, we show that they nonetheless provide measures of population differentiation consistent with those obtained by amplified fragment length polymorphism analysis. Estimates of pairwise population differentiation between 23 populations were concordant between microsatellite-derived data and AFLP analysis of the same samples (r = 0.71, p<0.00001, 425 individuals from 23 populations).
Significance
De novo transcriptional sequencing appears to be a rapid and cost-effective tool for developing microsatellite markers for difficult genomes.
Introduction
Many types of genetic analysis take advantage of microsatellite markers, which are highly polymorphic loci of simple sequence repeats located through the genome. For example, microsatellite analysis is useful in studies of paternity, population structure and history, as well as to make conservation decisions for the management of endangered species [1], [2].
Given the broad-scale utility of these markers, a large number of approaches have been developed for their isolation from genomic DNA [3]. These approaches typically involve some form of microsatellite enrichment, followed by time consuming and costly brute force sequencing. Aside for the labor and cost associated with traditional approaches, the microsatellite enrichment step sometimes fails. For example, for reasons not fully understood, isolation of microsatellites from Lepidopteran genomes is extremely difficult [4]–[6]. This problem is not confined to Lepidoptera, affecting bivalve mollusks [7], mosquitoes [8], mites [9], ticks [8], nematodes [10], [11] and birds [12], [13].
The increase in publicly available EST data for many species has made bioinformatic isolation of microsatellite markers increasingly commonplace (e.g., [14]–[17]). However, microsatellites isolated from EST libraries differ from those typically found in regions of the genome unassociated with genes. Gene-associated microsatellites are physically linked to particular alleles of a gene, and may hitchhike if the gene is under selection. Microsatellite variation in untranslated regions of transcribed DNA may affect the rates of gene expression or translation, and thus may be under selection. Indeed, EST-derived microsatellites almost universally show strong deviations from Hardy-Weinberg equilibrium. However, the relatively few studies that compare the performance of EST-derived microsatellites with that of other genotyping techniques have generally found comparable results [18]–[22]. Here we used the Roche 454 Titanium platform for transcriptional sequencing of Edith's checkerspot butterfly (Euphydryas editha), in order to rapidly isolate polymorphic microsatellite loci for a conservation genetics study. We then compared the estimates of population differentiation and biogeographic structure obtained by this approach with those from AFLP genotyping of the same set of populations [23].
Materials and Methods
Microsatellite identification
RNA was extracted from a larva, a pupa and an adult E. editha. RNA extraction, normalized library preparation, sequencing and assembly using the Roche Newbler assembler was performed by the University of Illinois W.M. Keck Center for Comparative and Functional Genomics using protocols and reagents supplied by Roche. The assembled data were then queried for the presence of microsatellites using a simple python script using all possible sequences combinations of di-, tri- and tetra-nucleotide repeats, with at least eight perfect repeats. Primers for microsatellite-containing sequences were designed using Primer3 [24] and tested for amplification and polymorphism.
Microsatellite amplification and polymorphism testing
Microsatellite loci were tested for amplification and polymorphism in 10 µl PCR mixes containing 1 ng genomic DNA, 10 mg BSA, 10 pmol primers, 6.7 nmol of ChromaTide® Rhodamine Green™-5-dUTP (Molecular Probes, presently discontinued) and 5 µl AmpliTaq Gold® PCR Master Mix (Applied Biosystems). The temperature cycling conditions were as follows: 7 min at 94°C, then 35 cycles of 10 sec at 94°C, 1.5 minutes at 60°C and 2 minutes at 68°C. The reaction was terminated with a final incubation of 30 minutes at 72°C. 1 µl of each reaction was then analyzed using an ABI3100 DNA sequencer. For genotyping each well had 0.1 µl LIZ labeled GeneScan 500 size standard (Applied Biosystems) and enough deionized formamide for a total volume of 10 µl. Alleles were scored using GeneMarker.
Quality control
Deviations from Hardy-Weinberg equilibrium were assessed using GenAlEx [25]. Many individuals in the present study were previously genotyped by Wee [26] using AFLP markers. Thus, we were able to assess concordance between results of the two studies by comparing Fst matrixes generated by the two techniques. We computed Fst distances for 23 populations (425 individuals) used in Arlequin (v.3) [27], and compared them to the Fst matrix from Wee [26] using a Mantel test with 10,000 bootstrap replicates. We also screened an additional 406 individuals from 48 more populations for polymorphism analysis (Table S1).
Results
After quality filtering, the 454 run generated 864,056 reads, totaling 245,064,986 bases, which were assembled into 14,244 contigs with a threshold of 200 bp overlap and 95% identity. 49,937 singleton reads remained unassembled and were not included in the subsequent analysis, although if needed, they may be used for microsatellite mining. The assembled contigs contained 92 microsatellite loci, 72 of which were selected for microsatellite development. Of these, 36 loci amplified successfully and appeared polymorphic (see Table S2). Following the initial screening performed of eight individuals, we developed four multiplex PCR cocktails containing a total 10 polymorphic loci for large-scale genotyping (Table S1). Sequences for the other loci are available from the authors upon request. The reaction conditions were as above, but without fluorescently labeled dUTPs in reactions 1 and 2, and with primer concentrations as noted in Table S2. The 10 loci are deposited in GenBank under accession numbers GU997598-GU997607.
The markers show significant deviations from Hardy-Weinberg equilibrium in the many of the populations (Figure 1). The difference between observed and expected heterozygosities was positively correlated with the number of failed amplification for each locus, suggesting that null alleles may in part be responsible for driving this difference (rs = 0.81, n = 10, p = 0.0042). However, estimates of pairwise population differentiation were concordant between microsatellite-derived data and an earlier AFLP analysis of the same samples by Wee [26] (r = 0.71, n = 23, p<0.00001).
Raw microsatellite data generated in this study have been deposited in the Dryad database (www.datadryad.org) under accession number 1540.
Discussion
Microsatellite isolation from lepidopteran genomes has been difficult, possibly because microsatellite loci appear to be rare, and may have very similar flanking regions [6], which makes the design of primers problematic. We hypothesized that microsatellite loci isolated from non-translated transcripts may be less likely to exist as duplicate copies, and thus be more amenable to marker development. This has made microsatellite isolation relatively straightforward in our case. Given the decrease in next-generation sequencing costs, transcriptional re-sequencing will be a faster and cheaper way to isolate microsatellites, compared with traditional enrichment techniques. We were able to complete microsatellite development and screening in about three months of part time work by a single technician. Our actual cost of library construction and sequencing, was about US$15,000, is comparable to that charged by private companies for microsatellite enrichment [3]. Since then, the actual cost of library construction and next generation sequencing has dropped by at least 50%, and is decreasing further.
In this and several other studies, microsatellites derived from transcribed sequence data significantly depart from Hardy-Weinberg equilibrium (Figure 1) [14]–[17]. This could be due to selection on polymorphisms in untranslated gene regions where these microsatellites typically reside, or to non-neutral dynamics of the genes to which they are physically linked. In our study, percent reaction failure explained most of the variance in the differences between observed and expected heterozygosities (Table 1). Therefore, at least in our case, Hardy-Weinberg disequilibrium may be partially due to insufficient optimization of PCR conditions and allele dropout. Whether or not higher levels of null alleles are common in EST-derived microsatellites is not clear, since these data are not routinely reported with such studies. We strongly recommend further optimization of the reaction conditions for the loci presented here, especially since the manufacture of fluorescent dUTPs used in this study has been discontinued.
Table 1. Primers used for large-scale genotyping.
PCR # | Locus | Primers sequence | Primer amount (pmol) | Label | Repeat Motif | Range (bp) | Allele Count | Ho | He | Percent missing |
1 | euphy 2 | tgatgataacgagcgggaag | 0.5 | 5′TAM | CAG | 144–191 | 20 | 0.42 | 0.72 | 0.60% |
cggtaccgctacgtgactact | ||||||||||
euphy 3 | gctgtaatttggtaaggggttg | 0.5 | 5′ HEX | ATC | 121–171 | 18 | 0.52 | 0.83 | 0.84% | |
tacgttcagtgatggacatgc | ||||||||||
euphy 21 | acgcaaggtgctccacttat | 0.5 | 5′ HEX | CAA | 220–239 | 9 | 0.18 | 0.24 | 1.32% | |
ttgctacgctaacagcatcg | ||||||||||
euphy 69 | ctcctccgcaccaacaagta | 1 | 5′ FAM | GTT | 72–103 | 13 | 0.17 | 0.39 | 3.59% | |
aaacgtctacgttagaaggtatgt | ||||||||||
2 | euphy 14 | tgactgaacacacggacgat | 0.5 | 5′ TAM | TACA | 99–170 | 32 | 0.15 | 0.68 | 14.0% |
tccatcatgctttaagtgagga | ||||||||||
euphy 61 | aaagcgtgcttacattacatgg | 0.5 | 5′ TAM | AC | 186–246 | 42 | 0.44 | 0.87 | 12.9% | |
tcccgtttaacataatctgtgg | ||||||||||
3 | euphy 35 | atagaaataaacatgcggccata | 10 | dUTP | TG | 267–335 | 56 | 0.33 | 0.96 | 13.1% |
cagatgtacaagaggctgcctta | ||||||||||
euphy 50 | atgcgatttcatgccacata | 10 | dUTP | CA, A | 135–176 | 28 | 0.22 | 0.85 | 22.5% | |
ccatcctgacatgtgaaacg | ||||||||||
4 | euphy 37 | tgcaagacttgaaatatggttatca | 10 | dUTP | C, CA | 130–182 | 21 | 0.41 | 0.80 | 2.28% |
gtccattggaaggatcagga | ||||||||||
euphy 47 | cacgtgagcattccagtttg | 10 | dUTP | AT | 172–335 | 34 | 0.44 | 0.87 | 5.99% | |
tcggcgtaacggtttaaatg |
Summary statistics are based on a survey of 835 individuals from 72 populations (Table 1). Even and odd numbered reactions were pooled and analyzed together in the same sequencer run. The percentages of missing were significantly different among the PCR mixes, being significantly higher in reactions 2 and 3 (F3,6 = 15.4, p = 0.0038).
In principle, deviations from Hardy-Weinberg can create substantial biases [28], limiting the utility of such markers. The extent to which these issues may affect analysis with EST-derived microsatellites is presently unclear, but should be carefully investigated by future studies. Ideally, studies isolating microsatellites from ESTs should verify their performance by comparing results with another genotyping method, as we have done with AFLPs. Likewise, it would be useful to present an analysis of null allele presence.
Supporting Information
Acknowledgments
We are grateful to C. L. Boggs for her assistance with library construction and sequencing contracts, and to J. Strassmann and her lab for hosting this project. We thank E. Meglécz for reviewing the manuscript.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work has been funded by a grant from the United States Fish and Wildlife Service. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Wright J, Bentzen P. Microsatellites: genetic markers for the future. Reviews in Fish Biology and Fisheries. 1994;4:384–388. [Google Scholar]
- 2.Varshney R, Graner A, Sorrells M. Genic microsatellite markers in plants: features and applications. TRENDS in Biotechnology. 2005;23:48–55. doi: 10.1016/j.tibtech.2004.11.005. [DOI] [PubMed] [Google Scholar]
- 3.Zane L, Bargelloni L, Patarnello T. Strategies for microsatellite isolation: a review. Molecular Ecology. 2002;11:1–16. doi: 10.1046/j.0962-1083.2001.01418.x. [DOI] [PubMed] [Google Scholar]
- 4.Nève G, Meglécz E. Microsatellite frequencies in different taxa. Trends in Ecology & Evolution. 2000;15:376–377. doi: 10.1016/s0169-5347(00)01921-2. [DOI] [PubMed] [Google Scholar]
- 5.Meglécz E, Petenian F, Danchin E, D'Acier A, Rasplus J, et al. High similarity between flanking regions of different microsatellites detected within each of two species of Lepidoptera: Parnassius apollo and Euphydryas aurinia. Molecular Ecology. 2004;13:1693–1700. doi: 10.1111/j.1365-294X.2004.02163.x. [DOI] [PubMed] [Google Scholar]
- 6.Zhang D. Lepidopteran microsatellite DNA: redundant but promising. Trends in Ecology & Evolution. 2004;19:507–509. doi: 10.1016/j.tree.2004.07.020. [DOI] [PubMed] [Google Scholar]
- 7.Cruz F, Pérez M, Presa P. Distribution and abundance of microsatellites in the genome of bivalves. Gene. 2005;346:241–247. doi: 10.1016/j.gene.2004.11.013. [DOI] [PubMed] [Google Scholar]
- 8.Fagerberg AJ, Fulton RE, Black WC., IV Microsatellite loci are not abundant in all arthropod genomes: analyses in the hard tick, Ixodes scapularis and the yellow fever mosquito, Aedes aegypti. Insect Molecular Biology. 2001;10:225–236. doi: 10.1046/j.1365-2583.2001.00260.x. [DOI] [PubMed] [Google Scholar]
- 10.Grillo V, Jackson F, Gilleard J. Characterisation of Teladorsagia circumcincta microsatellites and their development as population genetic markers. Molecular and Biochemical Parasitology. 2006;148:181–189. doi: 10.1016/j.molbiopara.2006.03.014. [DOI] [PubMed] [Google Scholar]
- 11.Johnson P, Webster L, Adam A, Buckland R, Dawson D, et al. Abundant variation in microsatellites of the parasitic nematode Trichostrongylus tenuis and linkage to a tandem repeat. Molecular and Biochemical Parasitology. 2006;148:210–218. doi: 10.1016/j.molbiopara.2006.04.011. [DOI] [PubMed] [Google Scholar]
- 12.Primmer C, Raudsepp T, Chowdhary B, Møller A, Ellegren H. Low frequency of microsatellites in the avian genome. Genome Research. 1997;7:471. doi: 10.1101/gr.7.5.471. [DOI] [PubMed] [Google Scholar]
- 13.Neff B, Gross M. Microsatellite evolution in vertebrates: inference from AC dinucleotide repeats. Evolution. 2001;9:1717–1733. doi: 10.1111/j.0014-3820.2001.tb00822.x. [DOI] [PubMed] [Google Scholar]
- 14.Qiu X, Liu S, Wang X, Meng X. Eight SSR loci from oyster Crassostrea gigas EST database and cross-species amplification in C. plicatula. Conservation Genetics. 2009;10:1013–1015. [Google Scholar]
- 15.Sharma R, Bhardwaj P, Negi R, Mohapatra T, Ahuja P. Identification, characterization and utilization of unigene derived microsatellite markers in tea (Camellia sinensis, L). BMC Plant Biology. 2009;9:53. doi: 10.1186/1471-2229-9-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang S, Zhang L, Matz M. Microsatellite characterization and marker development from public EST and WGS databases in the reef-building coral Acropora millepora (Cnidaria, Anthozoa, Scleractinia). Journal of Heredity. 2009;100:329–337. doi: 10.1093/jhered/esn100. [DOI] [PubMed] [Google Scholar]
- 17.Yang J, Yang J, Li H, Zhao Y, Yang S. Isolation and characterization of 15 microsatellite markers from wild tea plant (Camellia taliensis) using FIASCO method. Conservation Genetics. 2009;10:1621–1623. [Google Scholar]
- 18.Coulibaly I, Gharbi K, Danzmann RG, Yao J, Rexroad CE. Characterization and comparison of microsatellites derived from repeat-enriched libraries and expressed sequence tags. Animal Genetics. 2005;36:309–315. doi: 10.1111/j.1365-2052.2005.01305.x. [DOI] [PubMed] [Google Scholar]
- 19.Garoia F, Guarniero I, Grifoni D, Marzola S, Tinti F. Comparative analysis of AFLPs and SSRs efficiency in resolving population genetic structure of Mediterranean Solea vulgaris. Molecular Ecology. 2007;16:1377–1388. doi: 10.1111/j.1365-294X.2007.03247.x. [DOI] [PubMed] [Google Scholar]
- 20.Scariot V, De Keyser E, Handa T, Deriek J. Comparative study of the discriminating capacity and effectiveness of AFLP, STMS and EST markers in assessing genetic relationships among evergreen azaleas. Plant Breeding. 2007;126:207–212. [Google Scholar]
- 21.Varshney R, Chabane K, Hendre P, Aggarwal R, Graner A. Comparative assessment of EST-SSR, EST-SNP and AFLP markers for evaluation of genetic diversity and conservation of genetic resources using wild, cultivated and elite barleys. Plant Science. 2007;173:638–649. [Google Scholar]
- 22.Woodhead M, Russell J, Squirrell J, Hollingsworth P, Mackenzie K, et al. Comparative analysis of population genetic structure in Athyrium distentifolium (Pteridophyta) using AFLPs and SSRs from anonymous and transcribed gene regions. Molecular Ecology. 2005;14:1681–1695. doi: 10.1111/j.1365-294X.2005.02543.x. [DOI] [PubMed] [Google Scholar]
- 23.Vos P, Hogers R, Bleeker M, Reijans M, Van de Lee T, et al. AFLP: a new technique for DNA fingerprinting. Nucleic Acids Research. 1995;23:4407–4414. doi: 10.1093/nar/23.21.4407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rozen S, Skaletsky H. Primer3 on the WWW for General Users and for Biologist Programmers. Bioinformatics Methods and Protocols. 1999. pp. 365–386. [DOI] [PubMed]
- 25.Peakall R, Smouse P. GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes. 2006;6:288–295. doi: 10.1093/bioinformatics/bts460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wee P-P. Effects of geographic distance, landscape features and host association on genetic differentiation of checkerspot butterflies. 2004. Ph.D. Dissertation, Austin: University of Texas.
- 27.Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online. 2005;1:47. [PMC free article] [PubMed] [Google Scholar]
- 28.Chapuis M, Estoup A. Microsatellite null alleles and estimation of population differentiation. Molecular Biology and Evolution. 2007;24:621–631. doi: 10.1093/molbev/msl191. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.