Abstract
The development of genomic resources in non-model taxa is essential for understanding the genetic basis of biological diversity. Although the genomes of many Drosophila species have been sequenced, most of the phenotypic diversity in this genus remains to be explored. To facilitate the genetic analysis of interspecific and intraspecific variation, we have generated new genomic resources for seven species and subspecies in the D. ananassae species subgroup. We have generated large amounts of transcriptome sequence data for D. ercepeae, D. merina, D. bipectinata, D. malerkotliana malerkotliana, D. m. pallens, D. pseudoananassae pseudoananassae, and D. p. nigrens. de novo assembly resulted in contigs covering more than half of the predicted transcriptome and matching an average of 59% of annotated genes in the complete genome of D. ananassae. Most contigs, corresponding to an average of 49% of D. ananassae genes, contain sequence polymorphisms that can be used as genetic markers. Subsets of these markers were validated by genotyping the progeny of inter- and intraspecific crosses. The ananassae subgroup is an excellent model system for examining the molecular basis of speciation and phenotypic evolution. The new genomic resources will facilitate the genetic analysis of inter- and intraspecific differences in this lineage. Transcriptome sequencing provides a simple and cost-effective way to identify molecular markers at nearly single-gene density, and is equally applicable to any non-model taxa.
Keywords: Drosophila, ananassae, transcriptome, genotyping, linkage map
Introduction
One of the primary goals of evolutionary biology is to identify and characterize the loci that contribute to phenotypes involved in adaptation, reproduction, and survival.1,2 In many systems, the biggest obstacle to understanding the molecular basis of adaptive traits is the lack of genetic resources. For this reason, many studies have focused on model organisms such as Drosophila melanogaster and Arabidopsis thaliana. While this approach has produced important insights (see refs.3–6), extending this analysis to other groups of organisms is essential if we are to address a full diversity of phenotypes and mechanisms of evolution.
The advent of next generation sequencing has dramatically decreased the cost, time, and labor involved in large-scale genetic analysis, extending the range of evolutionary and ecological questions that can be addressed experimentally. It is now feasible for many labs to sequence the genomes of non-model organisms in order to generate the necessary research tools. However, the relatively short read length provided by most sequencing technologies can make the assembly of genome sequences very challenging in the absence of a reference genome. Fortunately, in many cases sequencing even a small subset of the genome is sufficient to address the questions at hand. This realization has stimulated the development of “reduced representation” approaches that simplify sequence assembly at the expense of limiting the fraction of the genome that will be sequenced. The two most popular approaches are RAD-tag sequencing, which involves sequencing only the genomic fragments flanking restriction enzyme cut sites,3,4 and transcriptome sequencing. The latter approach has been used as a cost-effective way to generate dense molecular markers in a number of non-model plant and animal species5-9(for review see ref.14).
Transcriptome sequencing simplifies assembly in two ways. First, it reduces sequence complexity, increasing coverage per nucleotide for a given amount of data. Deeper coverage can be particularly important for de novo assembly.10,11 When the goal is to develop molecular markers, higher coverage has the added advantage of increasing the confidence in the identified polymorphisms. Second, coding regions typically contain fewer repetitive elements and often have other properties (e.g., higher GC content) that reduce the probability of errors in sequence assembly.12,13 In some designs, transcriptome sequencing can provide information about both sequence variation and gene expression in a single experiment.14,15 For these reasons, transcriptomes offer a viable alternative to whole-genome sequencing for many evolutionary studies in non-model species.
In this report, we use transcriptome sequencing to develop genomic resources for several non-model Drosophila species. Much of what is known about the genetic basis of phenotypic evolution and speciation emerged from comparative studies of Drosophila genetics and development.16-20 Part of the reason for this success is the propensity of Drosophila to form “species complexes”—groups of closely related species that have diverged enough to show substantial phenotypic differences and reproductive isolation, but not enough to preclude hybridization entirely. Although the genomes of more and more Drosophila species are being sequenced,21-23 the vast majority of the morphological, ecological, and behavioral diversity in this genus remains unexplored.
One example is the Drosophila ananassae species group, which contains 22 described species and three species complexes—ananassae, ercepeae, and bipectinata (see ref.31 for a phylogeny). D. ananassae, a member of the ananassae species complex, has been used widely as a model in evolutionary genetics.24-26 The bipectinata complex consists of four closely related species—D. bipectinata, D. parabipectinata, D. malerkotliana and D. pseudoananassae.17,27-29 In each of the latter two species, two allopatric subspecies have been described: D. m. malerkotliana and D. m. pallens, and D. p. pseudoananassae and D. p. nigrens. The ercepeae complex includes three species occurring on islands throughout the Indian Ocean—D. ercepeae in La Reunion, D. vallismaia in the Seychelles, and D. merina in Madagascar.30,31 Species of the bipectinata and ercepeae complexes are remarkable for their inter- and intraspecific phenotypic diversity.32,33 Although most species pairs can be hybridized, the viability and fertility of their F1 progeny varies considerably.28,30 These features make the bipectinata and ercepeae complexes an excellent model for the study of speciation and phenotypic evolution. Here, we describe new genomic resources, including transcriptome assemblies and molecular markers, which will enable evolutionary and ecological studies in this lineage.
Results
Interspecific hybridization
In the bipectinata species complex, our results largely confirm earlier observations.28 Crosses between D. m. malerkotliana and D. m. pallens were as prolific as crosses within either subspecies. Crosses between D. bipectinata and D. m. malerkotliana were successful in both directions and started producing large numbers of F1 progeny 2–3 weeks after mating. Most tested strains of D. bipectinata were inter-fertile with all tested strains of D. m. malerkotliana. We only identified a single D. bipectinata strain from New Guinea (LAE329) that did not hybridize with several D. malerkotliana strains in either direction. Crosses between D. bipectinata and D. parabipectinata were prolific in both directions, while crosses between D. parabipectinata and D. m. pallens did not succeed despite repeated attempts to cross several pairs of strains in either direction. In all interspecific crosses that produced F1 progeny, F1 females were fertile while F1 males were completely sterile. Crosses between the model strains of D. p. pseudoananassae and D. p. nigrens produced fertile F1 males and females in both directions, although both F1 and F2 progeny were few in number and weak.
In the ercepeae species complex, obtaining F1 progeny proved more difficult. Only one strain of each species was available (File S1). Crosses between D. ercepeae and D. vallismaia were attempted repeatedly but produced very few F1 progeny in either direction. F1 hybrid males were completely sterile and had no motile sperm. F1 females were also completely sterile when crossed to D. vallismaia males (no crosses to D. ercepeae were attempted). Dissection of their reproductive tracts showed that none of them had been inseminated, suggesting that no mating took place. Crosses between D. ercepeae and D. merina produced small numbers of progeny after long delays, but were considerably more successful in both directions than crosses between D. ercepeae and D. vallismaia. F1 hybrid females from D. ercepeae/D. merina crosses in both directions were weakly fertile and produced small numbers of F2 progeny when crossed to either D. ercepeae or D. merina males. Surprisingly, F1 hybrid males were also weakly fertile when crossed to D. merina females (no crosses to D. ercepeae females were attempted), and produced small numbers of F2 progeny. The hybrid nature of the F2 progeny could be easily verified by their phenotypes. Dissection of a few D. ercepeae/D. merina F1 hybrid males confirmed that they had a small proportion of motile sperm.
In the ananassae species complex, the third major complex within the ananassae subgroup,31 we have attempted crosses among D. ananassae, D. ochrogaster, D. monieri, D. phaeopleura, and D. atripex in all possible combinations and in both directions. Only one strain of each species was tested. None of these crosses were successful. In the process, we found that D. ochrogaster could produce small numbers of progeny parthenogenetically; this was confirmed using virgin females maintained without males. Our overall conclusion from the crossing experiments is that genetic analysis is possible for most species pairs in the bipectinata and ercepeae species complexes, but not in the ananassae species complex (with the exception of the closest relatives of D. ananassae—see ref.35).
Chromosome arrangements and model inbred strains
Every pair of species in the bipectinata complex differs by several fixed chromosomal inversions, and each species is also polymorphic for many inversions.34 We examined the polytene chromosomes of multiple strains of D. m. malerkotliana, D. m. pallens, and D. bipectinata and their F1 hybrids in order to identify promising model strains that would differ by as few inversions as possible. We found that D. m. malerkotliana strain 14024–0391.00 was polymorphic for several chromosome arrangements, one of which was identical to one of the several orders found in D. m. pallens strain Q121. This was the only homosequential pair of chromosome arrangements identified among five strains of D. m. malerkotliana and five strains of D. m. pallens, although other homosequential arrangements exist in these subspecies.34 D. m. malerkotliana 14024–0391.00 and D. m. pallens Q121 were inbred for several generations by single-pair, full-sib crosses. The polytene chromosomes of the progeny of each single-pair cross were examined until we identified a pair of strains that did not segregate any inversions and were homosequential with each other. These strains were named D. m. malerkotliana mal0-isoC and D. m. pallens palQ-isoA, and their chromosome order is given in Supplemental file 2.
We determined that a different chromosome order segregating in the D. m. malerkotliana strain 14024–0391.00 differed by the fewest inversions from the arrangement found in the D. bipectinata strain 14024–0381.03. Comparison to a much larger data set showed that all but one of these inversions are completely fixed between the two species.34 The remaining inversion, which occupies the distal part of the chromosome arm 2L from 18A to 28D, is polymorphic within D. bipectinata.34 However, repeated attempts to cross the only available strain that lacked this inversion (D. bipectinata LAE329) to several different strains of D. malerkotliana did not succeed. The D. bipectinata strain 14024–0381.03 did not segregate any inversions. The D. m. malerkotliana strain 14024–0391.00 was inbred by single-pair crosses as described above until we identified an inversion-free strain whose chromosome order matched that of D. bipectinata 14024–0381.03 as completely as possible. This strain was named D. m. malerkotliana mal0-sc2. The two strains differ by one inversion each on XL and XR (Muller element A), two adjacent inversions on 2L (Muller E), none on 2R (Muller D), one on 3L (Muller C), and several overlapping inversions that cover almost the entire 3R (Muller B). The chromosome orders of D. m. malerkotliana mal0-sc2 and D. bipectinata 14024–0381.03 are given in Supplemental file 2.
Strains D. m. malerkotliana mal0-isoC and mal0-sc2, D. m. pallens palQ-isoA, and D. bipectinata 14024–0381.03 were each inbred for 13, 12, 12, and 18 generations respectively by single-pair, full-sib crosses. The resulting D. bipectinata strain was named D. bipectinata bip3-isoA. These four inbred strains were used for the subsequent crosses between D. m. malerkotliana and D. m. pallens and between D. m. malerkotliana and D. bipectinata, as well as for transcriptome sequencing and SNP identification. Note that SNPs fixed between any pair of inbred strains are not guaranteed fixed between the species represented by these strains.
D. p. pseudoananassae strain Q117 and D. p. nigrens strain VT04–33 were kindly provided by Dr. Muneo Matsuda. These strains are homosequential,34 and were used for genetic crosses and transcriptome sequencing without inbreeding. Their chromosome order is given in Supplemental file 2. No inversions were observed in the polytene chromosomes of F1 hybrids between D. merina and D. ercepeae. As no polytene chromosome maps exist for either species, we did not attempt to characterize their chromosome arrangements. Both parental strains were used for genetic crosses and transcriptome sequencing without inbreeding.
Confirming previous reports,34 our results show that fixed chromosomal inversions will complicate genetic analysis in any interspecific cross in the bipectinata species complex. However, unobstructed high-resolution mapping is possible in any genomic region in crosses between different subspecies (D. m. malerkotliana/D. m. pallens and D. p. pseudoananassae/D. p. nigrens) and between D. merina and D. ercepeae.
Partial transcriptome assemblies
For each model strain, we generated a partial transcriptome assembly by sequencing a normalized cDNA library prepared from whole-body, mixed-sex adult RNA. Normalization reduces coverage variation by selectively removing the more abundant transcripts. Since the efficiency of de novo assembly is strongly dependent on coverage,35 normalization increases the proportion of the total transcriptome that can be assembled from a given amount of short-read sequence data. Thus, normalized cDNA libraries offer a more cost-effective approach to identifying sequence polymorphisms than unnormalized libraries.
Illumina sequencing generated 775−5,695 Mb of sequence reads per strain (Table 1). For each strain, de novo assembly resulted in 9,731−42,348 contigs of 200 bp or longer, with an N50 of 446–1068 bp (counting only the contigs of 200 bp or longer) (Table 2, files S6–11). As expected, assembly size increased with the growing amount of sequence data and increasing read lengths as the sequencing technology developed, so that more recent libraries produced more complete assemblies. These sequence data have been submitted to the Sequence Read Archive under accession numbers SRR544884–9.
Table 1. Sequence data.
Strain |
Read type1 |
Cycles |
Number of reads |
Total sequence2 |
D.m. malerkotliana mal0-isoC |
SE |
85 |
14587591 |
1239945235 |
D.m. malerkotliana mal0-sc2 |
PE |
85 |
67001676 |
5695142460 |
D.m. pallens palQ121-isoA |
PE |
60 |
12714517 |
775585537 |
D. bipectinata bip3-isoA |
PE |
85 |
66680018 |
5667801530 |
D. merina |
PE |
60 |
18447100 |
1125273100 |
D. ercepeae |
PE |
60 |
17218726 |
1050342286 |
D.p. pseudoananassae Q117 |
PE (indexed) |
100 |
28608906 |
27117846070 |
D.p. nigrens VT04–33 | PE (indexed) | 100 | 18386421 | 1746709995 |
1 Read type refers to either paired end or single end sequencing. Indexed libraries included a barcoded adaptor and were pooled with other samples.2 The total sequence listed in base pairs
Table 2. Transcriptome coverage.
Strain |
Assembly size (bp) |
k-mer1 |
N50 |
N90 |
Number of contigs > 200 bp |
Number of D. ananassae genes hit by contigs > 200 bp |
Number of D. melanogaster genes hit by contigs > 200 bp2 |
D.m. malerkotliana mal0-isoC |
5523613 |
39 |
755 |
273 |
9731 |
6755 |
5750 |
D.m. malerkotliana mal0-sc2 |
21376161 |
29:63 |
620 |
248 |
42348 |
10181 |
8696 |
D.m. pallens palQ121-isoA |
10340709 |
35 |
687 |
265 |
19339 |
9908 |
8195 |
D. bipectinata bip3-isoA |
27944402 |
29:63 |
1068 |
276 |
41144 |
10598 |
9125 |
D. merina |
9974770 |
41 |
632 |
263 |
19406 |
8618 |
7200 |
D. ercepeae |
7850500 |
41 |
568 |
250 |
15570 |
7691 |
6429 |
D.p. pseudoananassae Q117 |
14346488 |
63 |
459 |
240 |
34044 |
11093 |
8803 |
D.p. nigrens VT04–33 | 12642103 | 63 | 446 | 239 | 30400 | 10642 | 8458 |
1 29:63 indicates that the assembly was performed with TRANS-AbySS, thus the assembly was done at kmers corresponding to all odd numbers within this bracket and subsequently merged. 2Fewer D. melanogaster genes are hit than in D. ananassae, as there are many genes in the D. ananassae assembly for which no D. melanogaster ortholog has been identified.
Although we generated a large amount of non-redundant transcriptome sequence for each species, these assemblies are clearly incomplete in two ways. First, it is likely that many rare transcripts were either not represented in the sequencing libraries, or were not sequenced with enough depth for confident assembly. Second, the short size of many contigs indicates that they represent only partial transcript sequences. Furthermore, the use of adult tissue in generating the libraries precludes the sequencing of genes that are expressed only at pre-adult stages. To estimate the completeness of our transcriptome assemblies, we used BLAST to compare them to the predicted transcriptome of D. ananassae and to the better annotated but more diverged genome of D. melanogaster (files S12–17). The number of unique D. ananassae genes matching our de novo contigs varies from 6,755 to 11,093 (Table 2). Taking the total number of predicted D. ananassae genes as a benchmark, de novo transcriptome assemblies generated from modest amounts of sequence data include between 43% (D. m. malerkotliana) and 69% (D. p. pseudoananassae) of all genes in the genome, with an average of 62%.
Inter- and intraspecific sequence polymorphisms
A large number of polymorphic nucleotide positions were identified for each pair of strains (Table 3, files S18–23). A total of 155245, 343156, 216786, and 318507 single nucleotide polymorphisms (SNPs) were found in the D. m. malerkotliana / D. m. pallens, D. m. malerkotliana / D. bipectinata, D. ercepeae / D. merina, and D. p. pseudoananassae / D. p. nigrens comparisons, respectively. 71, 73, 78 and 80% of all assembled contigs harbor SNPs in these four comparisons. These SNPs represent approximately 58, 62, 66 and 75% of all genes in the genome if the number of predicted D. ananassae genes is used for comparison. An average of 46% of the SNPs are fixed differences, i.e., positions that are monomorphic for one allele in the first strain and for a different allele in the second (Table 3). This number may be an overestimate if low coverage at some positions prevented the detection of segregating alleles in one or both parental strains.
Table 3. Single nucleotide polymorphisms identified in transcriptome assemblies.
Comparison |
Number of SNPs in contigs > 200 bp1 |
Total number of contigs > 200 bp with SNPs |
Number of D. ananassae genes hit by contigs with SNPs2 |
Number of D. melanogaster genes hit by contigs with SNPs |
% Fixed3 |
D.m. malerkotliana mal0-isoC/D.m. pallens palQ121-isoA |
155245 |
20770 |
9291 |
7667 |
40% |
D. bipectinata bip3-isoA/D. malerkotliana mal0-sc2 |
343156 |
41636 |
9985 |
8797 |
68% |
D. ercepeae/D.merina |
216786 |
27336 |
10525 |
8664 |
53% |
D.p. pseudoananassae Q117/D.p. nigrens VT04–33 | 318507 | 54363 | 12024 | 9425 | 25% |
1 All statistics in this table are for the combined results from the reciprocal mapping of each (sub)species. For example, the mappings of D.m. malerkotliana to D.m. pallens, D.m. malerkotliana to D.m. malerkotliana, D.m. pallens to D.m. malerkotliana and D.m. pallens to D.m. pallens are considered as one. 2This is the number of unique hits in both reciprocal mappings, thus, if a gene is hit in both mappings it is counted only once. It includes both fixed and polymorphic SNPs. Indels are not included in any calculations. 3The percentage of SNPs in both reciprocal mappings that are fixed between the strains, rather than polymorphic in either or both strains.
The vast majority of assembled genes contain at least one fixed difference between any two strains. Not all of these fixed differences are suitable for use as genetic markers since a number of additional criteria must be satisfied for successful genotyping by any existing technology. In particular, any genotyping assay that involves PCR and/or primer extension requires the target SNPs to be located well away from other SNPs in order to avoid annealing and amplification biases. In this respect, the high density of SNPs in Drosophila genomes is “too much of a good thing,” as it poses considerable problems for genetic analysis. Nevertheless, our results suggest that a modest amount of transcriptome sequence data are sufficient to design genetic markers at very high density, approaching single-locus resolution in many genomic regions.
SNP markers for genetic analysis
To test our strategy for high-throughput design of genetic markers, we selected a small number of fixed differences identified between each pair of strains for empirical validation, and genotyped these markers in the progeny of genetic crosses (files S28−36). A SNP was considered to be validated if it was included in the final linkage map after all filtering steps (see Methods). In the D. m. malerkotliana/D. m. pallens cross, 177 F2 individuals were genotyped for a panel of 96 SNPs using the GoldenGate BeadXpress assay (Illumina) (Table 4, file S28). These SNPs are located on the proximal Muller element D (chromosome arm 2R) and throughout Muller E (chromosome arm 2L). In all other crosses, genotyping was performed using the Sequenom MassARRAY, which typically allows a maximum of 32 SNPs per sample for a single assay. In the D. ercepeae / D. merina cross, 25 markers were validated in 137 F2 individuals (Table 4, files S29−30). These markers are distributed on Muller elements D, E, and A (X chromosome). The smaller number of individuals in this cross relative to others is due to the unexpected fertility of the F1 males. In the intended backcross to D. merina, some F2 individuals were homozygous for the D. ercepeae alleles at some loci, and were excluded from analysis even though their inclusion had little impact on the linkage relationships among markers. In the D. p. pseudoananassae/D. p. nigrens and D. m. malerkotliana/D. bipectinata crosses, the markers were located on all major chromosome arms (Muller elements A−E). In the D. m. malerkotliana/D. bipectinata cross, 24 SNPs (75%) were validated in a panel of 168 F2 progeny (Table 4, see ref.68). In the D. p. pseudoananassae/D. p. nigrens cross, 49 SNPs (76%) were validated in 274 individuals (Table 4, files S31−36). Thus, similar success rates are observed in different crosses and with different technologies.
Table 4. Number of genotyped markers and individuals.
Comparison |
Muller A1 |
Muller B |
Muller C |
Muller D |
Muller E |
Total markers |
#Ind. |
D.m. malerkotliana mal0-isoC/D.m. pallens palQ121-isoA |
- |
- |
- |
5 |
64 |
69 |
172 |
D.bipectinata bip3-isoA/D.m. malerkotliana mal0-sc2 |
9 |
4 |
5 |
5 |
6 |
29 |
351 |
D. ercepeae/D. merina |
4 |
- |
- |
9 |
12 |
25 |
137 |
D.p. pseudoananassae Q117/D.p. nigrens VT04–33 | 5 | 5 | 3 | 14 | 22 | 49 | 272 |
1 Muller A is equivalent to the X in D. melanogaster. Muller D and E are equivalent to 3L and 3R, respectively. Muller B and C are equivalent to 2L and 2R.
There were several common reasons for marker failure regardless of the genotyping technology. One to five percent of the SNPs failed in the initial assay for technical reasons. For some of the remaining SNPs, the detection method failed to distinguish between heterozygotes and one of the homozygotes; this appeared as segregation distortion characterized by a complete lack of one of the three genotypes. Finally, some of the eliminated SNPs appear to be quality markers that could not be placed in any linkage group due to small sample sizes.
Partial linkage maps
To explore the utility of the newly designed markers for genetic mapping, we constructed partial linkage maps for all crosses (files S3−5).36,37 In all cases, the genetic maps appear to be longer than in D. ananassae. The map of Muller E (3R), where marker density was highest, spans 48.2 cM in D. ananassae,38 118 cM in D. ercepeae/D. merina, and up to 136 cM in D. p. pseudoananassae / D. p. nigrens. The length of the map is not directly related to marker density, as the D. m. malerkotliana / D. m. pallens map contains the largest number of markers but is 75 cM (~three markers from proximal Muller D are included in this estimate). These differences could reflect either an underlying biological reality (higher recombination rate or a larger genome in some species), or the effects of genotyping errors and uneven recombination rates on statistical calculations.39 Segregation distortion, which is common in interspecific crosses40-42as well as some crosses between different strains,43,44 may also have contributed to the increased genetic distances.
As expected,38,45-47 comparisons of marker locations in the bipectinata and ercepeae species complexes, D. ananassae, and D. melanogaster confirm that homologous genes have remained on homologous chromosome arms over millions of years of evolution. Consistent with previous findings,47,48 we observe differences between species in the order of genes within chromosome arms, although it shows moderate conservation on smaller scales. This pattern is likely due to large and small paracentric inversions that move genes within chromosome arms.43,49,50
Discussion
Next generation sequencing can be used for genetic mapping in two general ways. In the first category of methods, sequencing and genotyping are performed in a single step. Complete or partial genomes of individual progeny are sequenced, typically to a low coverage, and the genotype of each individual at each nucleotide position is determined directly from the sequence reads. These methods include multiplexed shotgun genotyping (MSG)3 and PSIseq.51 The advantage of genotyping-by-sequencing is that it combines all data acquisition into a single experiment and yields very high marker densities. On the other hand, this approach can be difficult to scale up to hundreds or thousands of progeny due to barcoding limitations and the labor involved in library preparation. More importantly, genotyping-by-sequencing usually relies on the availability of at least one, and preferably both, parental genomes since SNP identification depends on reference mapping and parentage is difficult to assign without a reference. For this reason, it may be challenging to extend approaches such as MSG to non-model taxa. RAD-seq (restriction-site-associated DNA sequencing) can also be used as a genotyping-by-sequencing strategy. Due to its reduced representation feature, this approach is less dependent on the availability of reference genomes, but it faces similar scale limitations and is best deployed in situations where relatively modest numbers of individuals must genotyped for a large number of markers.14,52
The second alternative is to separate the sequencing and genotyping steps: identify molecular markers by comparing the complete or partial genomes of parental strains, then genotype some subset of these markers using one of the many available technologies (such as Illumina BeadXpress or Sequenom MassARRAY). This approach is especially useful when large numbers of individuals must be genotyped for a moderate number of markers.14 Typical MassARRAY or BeadXpress marker panels tend to have tens to hundreds of markers, not many thousands as has been described for genotyping by sequencing.3 In non-model taxa where no genome sequences are available, RAD-seq and transcriptome sequencing are used most commonly for marker identification. For example, modified RAD-seq pipelines have been used for de novo assembly of reduced representation genomic libraries, by performing local alignment of reads anchored at the same restriction site.53 RAD-seq and transcriptome sequencing yield comparable numbers of contigs and molecular markers.53-55 Transcriptomes tend to produce longer contigs, while RAD tags are probably more polymorphic on average since they include a greater proportion of non-coding sequences. For any given species and experiment, the choice of technical approaches will depend on many factors including the number of individuals, the expected amount of recombination, and the level of sequence polymorphism.
This study provides extensive characterization of the transcriptomes of six members of the bipectinata and ercepeae species complexes. Our results show that relatively small amounts of Illumina sequence data are sufficient to assemble a substantial portion of the total Drosophila transcriptome and to identify tens of thousands of SNP markers that cover the genome at nearly single-gene density. We confirm that the SNP markers identified by transcriptome sequencing and cross-species reference assembly are suitable for high-throughput genotyping, and can be used to generate linkage maps rapidly and efficiently. The new genomic resources will enable high-resolution genetic analysis of the inter- and intraspecific differences. Species of the bipectinata and ercepeae species complexes display varying degrees of reproductive isolation and have undergone extensive phenotypic diversification, especially with respect to sex-specific morphological traits, over a relatively short time span. These features make them a good model for elucidating the genetic basis of speciation and sexual dimorphism, and the experimental strategy described here will complement other approaches in addressing this question. The same strategy can enable a forward genetic approach in any non-model organism, as it does not rely on any organism-specific features or resources. For most organisms, the cost of identifying and genotyping SNP markers with this approach will be minor compared with the logistics and expense of performing genetic crosses and phenotypic analysis.
Materials and Methods
Genetic crosses
Drosophila strains were obtained from the US Drosophila Species Stock Center or provided by Drs Y. Fuyama and M. Matsuda, and maintained on standard cornmeal media (file S1). For interspecific crosses, virgin females were crossed to virgin males in mass cultures, with at least 30 males and at least 30 females per vial. At least five separate attempts were made for each pair of species. Flies were transferred to fresh vials every few days either until larvae were observed in the media or until all adults were dead. In most interspecific crosses, no progeny were observed until at least 2 weeks after the parents were combined. In interspecific crosses where no progeny were produced at all, the last few surviving females were dissected in insect saline and their reproductive tracts were checked for the presence of sperm. No sperm was observed in any of such crosses, suggesting that no successful matings occurred. The fertility of F1 males and females was tested by placing them together with females or males, respectively, of each of the parental species, and transferring the cultures to fresh media either until progeny were produced or until all adults were dead.
In crosses where both male and female hybrids are fertile (D. m. malerkotliana/D. m pallens and D. p. pseudoananassae/D. p. nigrens), at least 30 virgin F1 females were crossed to at least 30 F1 males to create an F2 genotyping panel. In the D. ercepeae/D. merina and D. m. malerkotliana/D. bipectinata crosses, F1 males are largely or completely sterile. Therefore, F1 females were crossed to males of one of the parental species (D. merina and D. bipectinata, respectively). F2 males were collected and aged for 10 d, at which point they were scored for phenotypes of interest and frozen. Depending upon the intended downstream application DNA from individual flies was extracted either using the Gentra Puregene DNA extraction kit with a protocol modified for small tissue samples or a simple salt based protocol.56
Inbreeding of selected strains of each species or subspecies was performed by single-pair, full-sib crosses. To characterize chromosomal inversions, polytene chromosome spreads were prepared from the salivary glands of female 3rd instar larvae using acetic orcein staining.
Preparation of normalized cDNA libraries for Illumina sequencing
For each parental strain, total RNA was extracted from a pool of 20−60 mixed-sex adults using the standard TRIzol protocol (Invitrogen, cat# 15596-026). Quality and concentration of the RNA was measured using gel electrophoresis57 and a NanoDrop spectrophotometer (Thermo Scientific). The polyadenylated fraction of the total RNA was converted into double-stranded cDNA using the MINT cDNA synthesis kit (Evrogen, cat# SK005) and an oligo(dT) primer. cDNA libraries were then normalized using the TRIMMER kit (Evrogen), which is based on the digestion of reannealed cDNA samples with a duplex-specific nuclease.58 The normalized cDNA was sheared into random fragments by sonication on a Bioruptor (Diagenode, cat# UCD-200 TM) using 30 sec cycles of on/off sonication for 15 min. Alternatively, if the libraries were to be indexed, they were digested with DNA fragmentase (New England Biolabs, cat# M0348S) for 30 min to produce ~100–300 bp fragments. Fragmented libraries were prepared for sequencing using the Illumina library preparation kit (for non-indexed libraries, cat# PE-102-1001) or the NEBNext kit (for indexed libraries, cat# E600L). Indexed libraries included 5-base barcodes inserted into the standard Illumina adaptor sequence. Barcodes were generated with the Python script Barcode_Generator.py (Comai lab, UC Davis) so that any two barcodes differed by at least two base pairs. Standard Illumina adapters were used for the non-indexed libraries. After adaptor ligation, the libraries were size-selected using gel electrophoresis or Agencourt Ampure XP (Beckman Coulter, cat# A63880) for a final fragment size of 250–350 bp, and amplified with Phusion HF MasterMix (New England Biolabs, cat# M0531S). Size distribution and concentration of the libraries were confirmed on a Bioanalyzer (Agilent). Sequencing of the finished libraries was performed at the UC Davis Genome Center (Table 1).
Sequence assembly and filtering
Our cDNA library construction approach introduced adapters onto the 5′ and 3′ ends of transcripts. We trimmed the adaptor sequences using Perl scripts prior to assembly. No further quality filtering was performed, as it appears to have little effect on the quality of the final assembly.59 Reads were assembled de novo, separately for each parental library, using AbySS,35 or for D. bipectinata/D. m. malerkotliana using Trans-AbySS.52 The optimal k-mer size, which depends on both coverage and read length,35,52 was chosen empirically by performing multiple assemblies at k-mers ranging from approximately half the read length to three quarters of the read length. If Trans-AbySS was used, the fact that different k-mers are better for different transcripts could be exploited, and assembly was performed for all k-mers between 29 and 63 in increments of two and subsequently merged (Table 1). Because we sought long contigs for genotyping, final k-mer was selected so as to maximize the number of assembled contigs longer than 200 bp and the N50 of these contigs. Only contigs over 200 bp were retained for subsequent analysis, including the calculation of assembly quality statistics using Perl scripts from.59 Contig sequences can be found in Supplemental files 6–11.
Blast analysis and annotation
To identify the transcriptome contigs and estimate their genomic locations, we compared our assemblies to the annotated genome and transcriptome of D. ananassae (release 1.3), the closest relative of the bipectinata and ercepeae complexes with an annotated genome (files S12–17). Each assembly was BLASTed (NCBI) against the D. ananassae transcriptome using blastn with default parameters.60,61 The output was limited to the top hit and excluded alignments. For each contig, we used Perl and Python scripts to identify the orthologous gene in D. ananassae, its physical location in the D. ananassae genome, and the length and quality of the alignment. Once the ortholog in D. ananassae is identified we were able to add any identified D. melanogaster orthologs and their locations in the genome using precomputed files obtained from Flybase62 (files S12–17).
Reciprocal reference mapping
Each de novo transcriptome assembly was used as a reference to map the sequence reads from the appropriate species or subspecies using the Burrows-Wheeler Alignment Tool (BWA-0.5.4)63 or Short Oligonucleotide Analysis Package (SOAP2).64,64 For paired-end libraries, forward and reverse reads files were mapped separately as paired end mapping was not available in bwa at the time. Reads from each strain were mapped both to its own de novo assembly and to that of its closest relative; for example, D. m. malerkotliana reads were mapped to both D. m. malerkotliana and D. m. pallens references, and vice versa. Same-species mapping serves to correct errors in the de novo assembly and to identify polymorphic nucleotide positions in each parental strain, while cross-species mapping allows the identification of fixed differences between the strains.
SNP identification, filtering, and selection
The SAMtools package65 was used to process SAM alignment files, merge paired-end reads, and identify putative SNPs. Filtering of error-prone SNPs was performed for most libraries using samtools.pl and the varFilter function with default parameters. More recent libraries (D. p. pseudoananassae and D. p. nigrens) were filtered using VarScan pileup2snp with default parameters.66 Matched same-species and cross-species assemblies were merged using custom Perl scripts; e.g, the D. m. pallens = > D. m. malerkotliana assembly was merged with the D. m. malerkotliana = > D. m. malerkotliana assembly. Additional filtering was then performed to limit the final SNP tables to polymorphisms with a frequency of at least 10% or a count of at least two, and a depth of at least eight in each library. The aforementioned contig annotation was then merged into the SNP tables using Perl scripts (files S18–23).
The pool of possible SNPs for verification was limited to those that were fixed between strains. These SNPs were then subjected to additional filtering to maximize the chances of successful genotyping. The following restrictions were imposed: position at least 50 bp from either end of the contig, coverage of at least 10 x, and alignment with a D. ananassae ortholog of at least 100 bp with a BLAST e-value of 0.01 or lower. Genotyping assays required at least 50 bp of sequence on either side of the target SNP to be free of additional polymorphisms. The 100 bp flanking each SNP was identified using Perl scripts. These scripts also identified polymorphisms flanking each SNP in the form of IUPAC codes, which were then used to eliminate such SNPs from further consideration. To ensure that the SNP region does not span any intron/exon boundaries, the 100 bp around each candidate SNPs were manually BLASTed against the D. ananassae genome. Finally, the SNPs were selected based on their estimated genomic location in D. ananassae so as to evenly cover the desired regions of chromosomes.
Primer combinations for the MassARRAY (Sequenom) genotyping platform were designed using the MassARRAY Typer software package with default parameters, except the weight of the scores for the likelihood of primer dimers and hairpins was lowered to 0.75. The primers were allowed to be from 15 to 30 bases. Primers were ordered from IDT, and the genotyping was performed at the UC Davis Veterinary Genetics laboratory (files S25–28). For GoldenGate genotyping, sequences flanking the SNPs were submitted to Illumina for processing with the Illumina Assay Design Tool, which generates a score between 0 and 1 for each candidate SNP and identifies SNPs that cannot be used for genotyping. All SNP sequences with a score below 0.6 were removed. 96 SNPs were selected from this pool based on their estimated locations (file S24). The oligonucleotide pool assay was ordered from Illumina, and the genotyping was performed by the UC Davis Genome Center DNA Core Facility.
Map construction
Linkage maps were constructed using R/qtl,37 a QTL mapping package for the statistical programming language R. The initial marker order was based upon the D. ananassae genome assembly.38 All markers and individuals with more than 25% missing genotype data were eliminated. In the GoldenGate BeadXpress assay, markers that exhibit unusual overlap between the clusters corresponding to different genotypes are not suitable for analysis since genotypes cannot be called reliably. Furthermore, if the heterozygote cluster is unexpectedly intense, the same type of failure is likely (Illumina technical note), leading us to remove such markers from the data set as well. Similar problems are possible with the Sequenom assays. Switched alleles were investigated using est.rf() and the checkAlleles () function in R/qtl. Segregation distortion was identified using geno.table(), which calculates P-values to test for departure from a 1:2:1 expectation (intercross) or 1:1 (backcross). Markers that show distortion at a 5% level after a Bonferroni correction for multiple tests were eliminated from the data set in most cases. If the pattern of distortion was perpetuated across multiple linked markers they were retained as it is likely a reflection of biological reality rather than genotyping error. Markers that could not be placed in any linkage group with a minimum LOD score of 10 were excluded from the analysis.
The wide-ranging conservation of gene content within Muller elements / chromosome arms in Drosophila is well documented.38,45-47 We therefore assumed that genes located on a particular chromosome arm in D. ananassae and D. melanogaster are likely to be located on the same chromosome arm in the bipectinata and ercepeae species complexes. A paracentric inversion is the most frequent type of genome rearrangement in Drosophila. Interspecific comparisons thus should show blocks of local synteny. Possible marker orders were therefore investigated using ripple() with an error probability of 0.01 and a window of 7. Occasional single markers not associated with a synteny block were investigated and placed using tryallpositions() and a minimum LOD score of 10. Genotypes that, in the final map order, were identified as problematic by top.errorlod() were subsequently recorded as missing. Furthermore once the final order had been determined individuals that were outliers with respect to the total number of recombination events were removed from the data set. Final marker orders were compared with the recently generated assembly of the D. bipectinata genome (GenBank AFFE00000000.1). Marker distances were determined using est.map() with the Kosambi67 mapping method which assumes some crossover interference.
Acknowledgments
We are grateful to Dr Muneo Matsuda for Drosophila strains and for help in analyzing chromosome order in the bipectinata species complex, to Dr Y. Fuyama and the San Diego Drosophila Species Stock Center for other fly strains, and to Rachael Curtis, Mary Magsombol, Romai Sebhatu, and Margaret Wittman for help with phenotypic analysis and DNA extractions. AK would like to thank Christian Schloetterer, Thomas Flatt, and Alistair McGregor for hosting him during his sabbatical, and Robert Kofler, Pablo Orozco, Eshwar Meduri, Nicola Palmieri, and Ramvinay Pandey for teaching him the basics of sequence analysis and providing many of the scripts used in this paper. Illumina sequencing was performed at the UC Davis Genome Center. Genotyping was performed at the UC Davis Veterinary Genetics Laboratory. This work was supported by the NSF grant IOS-0815141 to A.K. and Doctoral Dissertation Improvement Grant DEB-1110246 to SS.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed
Author Contributions
A.K.: interspecific crosses and chromosome maps. S.S. and A.K: Assembly and mapping of D. pallens, D. malerkotliana, D. ercepeae, D. merina, D. p. pseudoananassae and D. p. nigrens. S.S.: Library preparation, SNP identification and genotyping assays for the aforementioned species. T.S.: Assembly, SNP identification, and genotyping in D. bipectinata and D. malerkotliana mal0-sc2.
Supplemental Material
Transcriptome assemblies have been submitted to the TSA database under the following accession numbers: GAEJ00000000 D.p. pseudoananassae; GADS00000000 Drosophila merina; GADR00000000 Drosophila pseudoananassae nigrens; GADQ00000000 Drosophila malerkotliana pallens; GADP00000000 Drosophila ercepeae; GADM00000000 Drosophila malerkotliana malerkotliana.
For additional Supplemental Materials, please contact: Sarah Signor; Email: sasignor@ucdavis.edu.
Footnotes
Previously published online: www.landesbioscience.com/journals/fly/article/22353
References
- 1.Stinchcombe JR, Hoekstra HE. Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits. Heredity (Edinb) 2008;100:158–70. doi: 10.1038/sj.hdy.6800937. [DOI] [PubMed] [Google Scholar]
- 2.Benfey PN, Mitchell-Olds T. From genotype to phenotype: systems biology meets natural variation. Science. 2008;320:495–7. doi: 10.1126/science.1153716. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Andolfatto P, Davison D, Erezyilmaz D, Hu TT, Mast J, Sunayama-Morita T, et al. Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res. 2011;21:610–7. doi: 10.1101/gr.115402.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Baird NA, Etter PD, Atwood TS, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers Fay JC, ed. PLoS ONE 2008;3(10):e3376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Geraldes A, Pang J, Thiessen N, Cezard T, Moore R, Zhao Y, et al. SNP discovery in black cottonwood (Populus trichocarpa) by population transcriptome resequencing. Mol Ecol Resour. 2011;11(Suppl 1):81–92. doi: 10.1111/j.1755-0998.2010.02960.x. [DOI] [PubMed] [Google Scholar]
- 6.Cánovas A, Rincon G, Islas-Trejo A, Wickramasinghe S, Medrano JF. SNP discovery in the bovine milk transcriptome using RNA-Seq technology. Mamm Genome. 2010;21:592–8. doi: 10.1007/s00335-010-9297-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics. 2010;11:180. doi: 10.1186/1471-2164-11-180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Renaut S, Nolte AW, Bernatchez L. Mining transcriptome sequences towards identifying adaptive single nucleotide polymorphisms in lake whitefish species pairs (Coregonus spp. Salmonidae) Mol Ecol. 2010;19(Suppl 1):115–31. doi: 10.1111/j.1365-294X.2009.04477.x. [DOI] [PubMed] [Google Scholar]
- 9.Russell JR, Bayer M, Booth C, Cardle L, Hackett CA, Hedley PE, et al. Identification, utilisation and mapping of novel transcriptome-based markers from blackcurrant (Ribes nigrum) BMC Plant Biol. 2011;11:147. doi: 10.1186/1471-2229-11-147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Martin J, Bruno VM, Fang Z, Meng X, Blow M, Zhang T, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics. 2010;11:663. doi: 10.1186/1471-2164-11-663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, et al. De novo assembly and analysis of RNA-seq data. Nat Methods. 2010;7:909–12. doi: 10.1038/nmeth.1517. [DOI] [PubMed] [Google Scholar]
- 12.Tóth G, Gáspári Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000;10:967–81. doi: 10.1101/gr.10.7.967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Consortium IHGS. Initial sequencing and analysis of the human genome. Nature. 2001;109:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- 14.Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet. 2011;12:499–510. doi: 10.1038/nrg3012. [DOI] [PubMed] [Google Scholar]
- 15.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.McDermott SR, Noor MAF. Genetics of hybrid male sterility among strains and species in the Drosophila pseudoobscura species group. Evolution. 2011;65:1969–78. doi: 10.1111/j.1558-5646.2011.01256.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schäfer MA, Routtu J, Vieira J, Hoikkala A, Ritchie MG, Schlötterer C. Multiple quantitative trait loci influence intra-specific variation in genital morphology between phylogenetically distinct lines of Drosophila montana. J Evol Biol. 2011;24:1879–86. doi: 10.1111/j.1420-9101.2011.02316.x. [DOI] [PubMed] [Google Scholar]
- 18.Lee SF, Rako L, Hoffmann AA. Genetic mapping of adaptive wing size variation in Drosophila simulans. Heredity (Edinb) 2011;107:22–9. doi: 10.1038/hdy.2010.150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chang AS, Noor MAF. Epistasis modifies the dominance of loci causing hybrid male sterility in the Drosophila pseudoobscura species group. Evolution. 2010;64:253–60. doi: 10.1111/j.1558-5646.2009.00823.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gleason JM, James RA, Wicker-Thomas C, Ritchie MG. Identification of quantitative trait loci function through analysis of multiple cuticular hydrocarbons differing between Drosophila simulans and Drosophila sechellia females. Heredity (Edinb) 2009;103:416–24. doi: 10.1038/hdy.2009.79. [DOI] [PubMed] [Google Scholar]
- 21.Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–95. doi: 10.1126/science.287.5461.2185. [DOI] [PubMed] [Google Scholar]
- 22.Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, et al. Drosophila 12 Genomes Consortium Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–18. doi: 10.1038/nature06341. [DOI] [PubMed] [Google Scholar]
- 23.Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, et al. Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res. 2005;15:1–18. doi: 10.1101/gr.3059305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schug MD, Baines JF, Killon-Atwood A, Mohanty S, Das A, Grath S, et al. Evolution of mating isolation between populations of Drosophila ananassae. Mol Ecol. 2008;17:2706–21. doi: 10.1111/j.1365-294X.2008.03770.x. [DOI] [PubMed] [Google Scholar]
- 25.Grath S, Baines JF, Parsch J. Molecular evolution of sex-biased genes in the Drosophila ananassae subgroup. BMC Evol Biol. 2009;9:291. doi: 10.1186/1471-2148-9-291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nanda P, Singh BN. Effect of chromosome arrangements on mate recognition system leading to behavioral isolation in Drosophila ananassae. Genetica. 2011;139:273–9. doi: 10.1007/s10709-011-9548-2. [DOI] [PubMed] [Google Scholar]
- 27.Bock IR. Intra-and interspecific chromosomal inversions in the Drosophila bipectinata species complex. Chromosoma. 1971;34:206–229. doi: 10.1007/BF00285187. [DOI] [PubMed] [Google Scholar]
- 28.Bock IR. The bipectinata complex: A study in interspecific hybridization in the genus Drosophila (Insecta: Diptera) Aust J Biol Sci. 1978;31:397–208. [Google Scholar]
- 29.Singh S, Singh BN. Drosophila bipectinata species complex. Indian J Exp Biol. 2001;39:835–44. [PubMed] [Google Scholar]
- 30.Lemeunier F, Aulard S, Arienti M, et al. The ercepeae complex: new cases of insular speciation within the Drosophila ananassae species subgroup (melanogaster group) and descriptions of two new…. Ann Entomol Soc Am. 1997;90:28–42. [Google Scholar]
- 31.Matsuda M, Ng CS, Doi M, Kopp A, Tobari YN. Evolution in the Drosophila ananassae species subgroup. Fly (Austin) 2009;3:157–69. doi: 10.4161/fly.8395. [DOI] [PubMed] [Google Scholar]
- 32.Kopp A, Barmina O. Evolutionary history of the Drosophila bipectinata species complex. Genet Res. 2005;85:23–46. doi: 10.1017/S0016672305007317. [DOI] [PubMed] [Google Scholar]
- 33.Kopp A, True JR. Evolution of male sexual characters in the Oriental Drosophila melanogaster species group. Evol Dev. 2002;4:278–291. doi: 10.1046/j.1525-142x.2002.02017.x. [DOI] [PubMed] [Google Scholar]
- 34.Tomimura Y, Matsuda M, Tobari YN. Chromosomal phylogeny and geographical divergence in the Drosophila bipectinata complex. Genome. 2005;48:487–502. doi: 10.1139/g05-012. [DOI] [PubMed] [Google Scholar]
- 35.Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: A parallel assembler for short read sequence data. Genome Res. 2009;19:1117–23. doi: 10.1101/gr.089532.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Voorrips RE. MapChart: software for the graphical presentation of linkage maps and QTLs. J Hered. 2002;93:77–8. doi: 10.1093/jhered/93.1.77. [DOI] [PubMed] [Google Scholar]
- 37.Broman KW, Wu H, Sen S, Churchill GA. R/qtl: QTL mapping in experimental crosses. Bioinformatics. 2003;19:889–90. doi: 10.1093/bioinformatics/btg112. [DOI] [PubMed] [Google Scholar]
- 38.Schaeffer SW, Bhutkar A, McAllister BF, Matsuda M, Matzkin LM, O’Grady PM, et al. Polytene chromosomal maps of 11 Drosophila species: the order of genomic scaffolds inferred from genetic and physical maps. Genetics. 2008;179:1601–55. doi: 10.1534/genetics.107.086074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Staten R, Schully SD, Noor MAF. A microsatellite linkage map of Drosophila mojavensis. BMC Genomics. 2004;5:12–9. doi: 10.1186/1471-2164-5-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.McDaniel SF, Willis JH, Shaw AJ. A linkage map reveals a complex basis for segregation distortion in an interpopulation cross in the moss Ceratodon purpureus. Genetics. 2007;176:2489–500. doi: 10.1534/genetics.107.075424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Woodruff GC, Eke O, Baird SE, Félix M-A, Haag ES. Insights into species divergence and the evolution of hermaphroditism from fertile interspecies hybrids of Caenorhabditis nematodes. Genetics. 2010;186:997–1012. doi: 10.1534/genetics.110.120550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Aparicio JM, Ortego J, Calabuig G, Cordero PJ. Evidence of subtle departures from Mendelian segregation in a wild lesser kestrel (Falco naumanni) population. Heredity (Edinb) 2010;105:213–9. doi: 10.1038/hdy.2009.173. [DOI] [PubMed] [Google Scholar]
- 43.Stocker AJ, Rusuwa BB, Blacket MJ, Frentiu FD, Sullivan M, Foley BR, et al. Physical and linkage maps for Drosophila serrata, a model species for studies of clinical adaptation and sexual selection. G3 (Bethesda) 2012;2:287–297. doi: 10.1534/g3.111.001354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Drury DW, Jideonwo VN, Ehmke RC, Wade MJ. An unusual barrier to gene flow: perpetually immature larvae from inter-population crosses in the flour beetle, Tribolium castaneum. J Evol Biol. 2011;24:2678–86. doi: 10.1111/j.1420-9101.2011.02394.x. [DOI] [PubMed] [Google Scholar]
- 45.Muller HJ. Bearings of the Drosophila work on systematics. In: The New Systematics 1940th ed. Oxford: Clarendon Press;185–268. [Google Scholar]
- 46.Sturtevant AH, Novitski E. The homologies of the chromosome elements in the genus Drosophila. Genetics. 1941;26:517–41. doi: 10.1093/genetics/26.5.517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bhutkar A, Schaeffer SW, Russo SM, Xu M, Smith TF, Gelbart WM. Chromosomal rearrangement inferred from comparisons of 12 Drosophila genomes. Genetics. 2008;179:1657–80. doi: 10.1534/genetics.107.086108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Laborda PR, Gazaffi R, Garcia AAF, De Souza AP. A molecular linkage map for Drosophila mediopunctata confirms synteny with Drosophila melanogaster and suggests a region that controls the variation in the number of abdominal spots. Insect Molecular Biology. 2012;21:89–95. doi: 10.1111/j.1365-2583.2011.01117.x.. [DOI] [PubMed] [Google Scholar]
- 49.Gray YH. It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet. 2000;16:461–8. doi: 10.1016/S0168-9525(00)02104-1. [DOI] [PubMed] [Google Scholar]
- 50.Lim JK, Simmons MJ. Gross chromosome rearrangements mediated by transposable elements in Drosophila melanogaster. Bioessays. 1994;16:269–75. doi: 10.1002/bies.950160410. [DOI] [PubMed] [Google Scholar]
- 51.Earley EJ, Jones CD. Next-generation mapping of complex traits with phenotype-based selection and introgression. Genetics. 2011;189:1203–9. doi: 10.1534/genetics.111.129445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, et al. De novo assembly and analysis of RNA-seq data. Nat Methods. 2010;7:909–12. doi: 10.1038/nmeth.1517. [DOI] [PubMed] [Google Scholar]
- 53.Etter PD, Preston JL, Bassham S, Cresko WA, Johnson EA. Local de novo assembly of RAD paired-end contigs using short sequencing reads Welch JJ, ed. PLoS ONE 2011;6(4):e18561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Amores A, Catchen J, Ferrara A, Fontenot Q, Postlethwait JH. Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics. 2011;188:799–808. doi: 10.1534/genetics.111.127324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Willing EM, Hoffmann M, Klein JD, Weigel D, Dreyer C. Paired-end RAD-seq for de novo assembly and marker design without available reference. Bioinformatics. 2011;27:2187–93. doi: 10.1093/bioinformatics/btr346. [DOI] [PubMed] [Google Scholar]
- 56.Ashburner M. Protocol 47. In: Drosophila: A Laboratory Manual 1989th ed. Cold Spring Harbor: Cold Spring Harbor Press;106–107. [Google Scholar]
- 57.Sambrook J, Fritsch EF, Maniatis T. Molecular Cloning: A Laboratory Manual NY: Cold Spring Harbor Press; 1989. [Google Scholar]
- 58.Zhulidov PA, Bogdanova EA, Shcheglov AS, Vagner LL, Khaspekov GL, Kozhemyako VB, et al. Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 2004;32:e37. doi: 10.1093/nar/gnh031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Haridas S, Breuill C, Bohlmann J, Hsiang T. A biologist’s guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes. J Microbiol Methods. 2011;86:368–75. doi: 10.1016/j.mimet.2011.06.019. [DOI] [PubMed] [Google Scholar]
- 60.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, et al. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 2009;37:D555–9. doi: 10.1093/nar/gkn788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.McQuilton P, St Pierre SE, Thurmond J, FlyBase Consortium FlyBase 101--the basics of navigating FlyBase. Nucleic Acids Res. 2012;40(Database issue):D706–14. doi: 10.1093/nar/gkr1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–95. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008;24:713–4. doi: 10.1093/bioinformatics/btn025. [DOI] [PubMed] [Google Scholar]
- 65.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics. 2009;25:2283–5. doi: 10.1093/bioinformatics/btp373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kosambi DD. The estimation of map distances from recombination values. Ann Eugen. 1944:173–5. [Google Scholar]
- 68.Seher TD, Ng CS, Signor SA, Podlaha O, Barmina O, Kopp A. Genetic basis of a violation of Dollo's law: re-evolution of rotating sex combs in Drosophila bipectinata. Genetics. 2012 doi: 10.1534/genetics.112.145524. [DOI] [PMC free article] [PubMed] [Google Scholar]