Skip to main content
F1000Research logoLink to F1000Research
. 2015 Jan 29;4:31. [Version 1] doi: 10.12688/f1000research.6090.1

Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents.

Casey M Bergman 1,a, Penelope R Haddrill 2
PMCID: PMC4331666  PMID: 25717372

Abstract

To contribute to our general understanding of the evolutionary forces that shape variation in genome sequences in nature, we have sequenced genomes from 50 isofemale lines and six pooled samples from populations of Drosophila melanogaster on three continents. Analysis of raw and reference-mapped reads indicates the quality of these genomic sequence data is very high. Comparison of the predicted and experimentally-determined Wolbachia infection status of these samples suggests that strain or sample swaps are unlikely to have occurred in the generation of these data. Genome sequences are freely available in the European Nucleotide Archive under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center.

Keywords: Drosophila melanogaster, Wolbachia pipientis, population genomics, population genetics, pool-seq, DNA-seq

Introduction

Whole genome shotgun sequences can now be generated easily using short-read sequencing technology for most organisms. Hundreds of resequenced genomes now exist for Drosophila melanogaster that can be used for population and genomic analysis in this model insect species ( Lack et al., 2014). To contribute to the worldwide sampling of population genomic data in D. melanogaster, we have sequenced genomes of multiple isofemale lines from three populations collected on different continents reported in Verspoor & Haddrill (2011): Montpellier, France (FR, n=20), Athens, Georgia, USA (GA, n=15) and Accra, Ghana (GH, n=15). Pools of these same isofemale lines were also sequenced to be able compare results based on strain-specific sequencing to pooled sequencing. Strains sequenced here were chosen because isofemale lines exist in the Drosophila Species Stock Center and because their infection status for the Wolbachia pipientis bacterial endosymbiont had previously been determined ( Verspoor & Haddrill, 2011).

Materials and methods

Isofemale strains were selected randomly from the full population samples reported in Verspoor & Haddrill (2011). Genomic DNA for isofemale lines was prepared by snap freezing females in liquid nitrogen, then extracting DNA using a standard phenol-chloroform extraction protocol with ethanol and ammonium acetate precipitation. DNA samples were generated for each isofemale lines using 50, 25, and 25 adult females for the FR, GA and GH populations, respectively.

For pooled samples, single adult females from each isofemale line were used to construct two samples for each population. The first pooled sample contains one fly from each of the same strains that were sequenced as isofemale lines (FR_pool_20, GA_pool_15, GH_pool_15). The second pooled sample contains one fly from all isofemale lines sampled for each population reported in Verspoor & Haddrill (2011) (FR_pool_39, GA_pool_30, GH_pool_32).

500 bp short-insert libraries using the Illumina Paired-End Sample Prep Kit (Part # 1005063) were constructed and 90 bp paired-end reads were generated using an Illumina HiSeq 2000 to an estimated coverage of ~50× per strain by BGI-Hong Kong. Forty-one samples were sequenced in single lanes shared typically with two other samples on a single run and 15 samples were sequenced using the same layout on two runs, generating 71 pairs of fastq files for the 56 samples. Data were generated over a total of seven sequencing runs. Raw data was filtered by BGI to remove read pairs where either read contained adapters or greater than 50% of bases with a quality value <= 5. No other trimming or filtering of the raw data was performed prior to submission using original filenames provided by BGI to the European Nucleotide Archive.

Dataset validation

To validate the quality of the raw sequence data, forward and reverse reads were analyzed using fastQC (version 0.11.2) ( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Forward and reverse read files for all runs had PASS status for most fastQC statistics. Per base sequence quality gave FAIL status for forward or reverse read files for all of the GA samples (which were sequenced together on one run) because of poor quality scores in the terminal 1–5 bp of the read. These poor quality termini can be easily trimmed and do not affect mappability, as the percent of reads mapped for these runs is very high (see Dataset 1).

Descriptive statistics for validation of Drosophila melanogaster genome sequence data

The PercentMapped column is obtained from the output of samtools flagstat using BAM files of mapped reads generated by bowtie2. The WolbachiaDepth, WolbachiaBreadth and PredictedInfectionStatus columns are obtained from the output of bedtools genomecov using BAM files of mapped reads generated by bowtie2. The ExperimentalInfectionStatus column is obtained from the results of Verspoor & Haddrill (2011). All other columns are obtained from the output of fastQC on the raw, unmapped reads.

To validate that the majority of the DNA sequenced is from the focal organism(s), untrimmed reads for each sample were mapped in paired-end mode using Bowtie (version 2.2.4) ( Langmead & Salzberg, 2012) with default options to a “hologenome” reference generated by concatenating genome sequences for D. melanogaster (Genbank accession GCA_000001215.4) ( Hoskins et al., 2015) and W. pipientis (Genbank accession AE017196) ( Wu et al., 2004). Mapping to a hologenome was performed since many of these strains are known to be infected with Wolbachia ( Verspoor & Haddrill, 2011). Unfiltered BAM files were used to estimate the proportion of reads in each sample that mapped to the expected target organisms using samtools flastat (version 0.1.19-44428cd) ( Li et al., 2009). Greater than 96.8% of all reads in each run were mapped to the hologenome reference, indicating low levels of contaminating DNA in these data ( Dataset 1).

Mapping to a hologenome also allowed us to verify if strain or sample swaps occurred in the process of producing these genome sequences by comparing predicted Wolbachia infection status with previously determined PCR-based infection status ( Verspoor & Haddrill, 2011). Wolbachia infection status was predicted from genome sequences for each strain following a modified protocol from Richardson et al. (2012). Briefly, strains were predicted as "infected" when breadth of mapped read coverage was greater than 90% of the Wolbachia genome and mean depth of coverage was greater than one. Here, we compute breadth of coverage directly from the bedtools genomecov (version v2.22.0) ( Quinlan & Hall, 2010) output rather than from a consensus sequence, as was done previously by Richardson et al. (2012). Predicted Wolbachia infection status matched experimentally determined infection status for 55/56 samples (98.2% concordance), indicating that strain or sample swaps are unlikely to have occurred during the generation of this dataset ( Dataset 1). The only exception observed was for line GA08 from the Georgia population, which the WGS data indicates is infected while PCR data indicates it is uninfected. This observation can be explained by either PCR amplification failure for the GA08 stock in Verspoor & Haddrill (2011) or infection of the GA08 stock after data collection for Verspoor & Haddrill (2011). Further analysis of the Wolbachia infection status of this stock is warranted prior to use.

Data availability

Raw sequence data for the 56 samples reported here can be found in the European Nucleotide Archive ( http://www.ebi.ac.uk/ena) under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center ( https://stockcenter.ucsd.edu) under accessions 14021-0231.139, 14021-0231.140, 14021-0231.141, 14021-0231.142, 14021-0231.143, 14021-0231.144, 14021-0231.145, 14021-0231.146, 14021-0231.147, 14021-0231.148, 14021-0231.149, 14021-0231.150, 14021-0231.151, 14021-0231.152, 14021-0231.153, 14021-0231.154, 14021-0231.155, 14021-0231.156, 14021-0231.157, 14021-0231.158, 14021-0231.183, 14021-0231.184, 14021-0231.185, 14021-0231.186, 14021-0231.187, 14021-0231.188, 14021-0231.189, 14021-0231.190, 14021-0231.191, 14021-0231.192, 14021-0231.193, 14021-0231.194, 14021-0231.195, 14021-0231.196, 14021-0231.197, 14021-0231.163, 14021-0231.164, 14021-0231.165, 14021-0231.166, 14021-0231.167, 14021-0231.168, 14021-0231.170, 14021-0231.172, 14021-0231.174, 14021-0231.176, 14021-0231.177, 14021-0231.178, 14021-0231.180, 14021-0231.181 and 14021-0231.182..

Descriptive statistics for validation of each run can be found in Dataset 1, DOI: 10.5256/f1000research.6090.d42636 ( Bergman & Haddrill, 2014).

Acknowledgments

We thank BGI-Hong Kong for assistance with genome sequencing and initial data quality control analysis and Daniel Halligan for assistance with data management.

Funding Statement

This work was supported by Human Frontier Science Program Young Investigator grant RGY0093/2012 to CMB and National Environmental Research Council grant NE/G013195/1 to PRH.

I confirm that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

v1; ref status: indexed

References

  1. Bergman CM, Haddrill PR: Dataset 1 in “Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents”. F1000Research. 2014. Data Source [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Hoskins RA, Carlson JW, Wan KH, et al. : The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 2015. 10.1101/gr.185579.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Lack J, Cardeno C, Crepeau M, et al. : The Drosophila Genome Nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population. Genetics. 2015. 10.1534/genetics.115.174664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–359. 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Li H, Handsaker B, Wysoker A, et al. : The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Richardson MF, Weinert LA, Welch JJ, et al. : Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster. PLoS Genet. 2012;8(12):e1003129. 10.1371/journal.pgen.1003129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Verspoor RL, Haddrill PR: Genetic diversity, population structure and Wolbachia infection status in a worldwide sample of Drosophila melanogaster and D. simulans populations. PLoS One. 2011;6(10):e26318. 10.1371/journal.pone.0026318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Wu M, Sun LV, Vamathevan J, et al. : Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol. 2004;2(3):E69. 10.1371/journal.pbio.0020069 [DOI] [PMC free article] [PubMed] [Google Scholar]
F1000Res. 2015 Feb 18. doi: 10.5256/f1000research.6521.r7634

Referee response for version 1

Ian Dworkin 1

This article primarily summaries the generation of a large set of resequenced Drosophila strains from three populations (Ghana, France and the US). Sequencing was done both individually for each isofemale strain, as well as in sequenced pools for each of three populations. While the primary goal of this research appears to be to provide the community with these additional genomic resources, the researchers were also particularly interested in examining Wolbachia infection status in the strains. Given that all raw data has been made available, it is likely that will provide an important useful resource for genomic analyses.

A few minor comments:

Some comparison of mapping quality for the pooled sequences (as compared to the individual isofemale strains) would have been useful.

Some explanation as to why the number of individuals used for the three different sequencing pools differed would have also been helpful to understand the provenance of the data.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2015 Feb 3. doi: 10.5256/f1000research.6521.r7534

Referee response for version 1

John Pool 1

The authors' data will add value to Drosophila population genomic resources.  I see no technical flaws in the manuscript.  If the authors see fit, they could a bit more context to the data.  For example, they could note that a mosaic of homozygous and heterozygous regions may be expected from the isofemale line genomes.  Optionally, they could also briefly put these three populations in historical context (i.e. that the species originated from sub-Saharan Africa but perhaps not western Africa specifically, that it expanded out of sub-Saharan Africa with a population bottleneck, and that North American populations are thought to have both European and African ancestry).  The France and Ghana samples sequenced here may prove useful for identifying population ancestry in North American and other admixed populations. 

Trivial edits:

Methods paragraph 1:

“each isofemale lines” (delete final “s”)

References - from title of Lack et al. 2015, delete second “genomes”.  Update precise author information.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Descriptive statistics for validation of Drosophila melanogaster genome sequence data

    The PercentMapped column is obtained from the output of samtools flagstat using BAM files of mapped reads generated by bowtie2. The WolbachiaDepth, WolbachiaBreadth and PredictedInfectionStatus columns are obtained from the output of bedtools genomecov using BAM files of mapped reads generated by bowtie2. The ExperimentalInfectionStatus column is obtained from the results of Verspoor & Haddrill (2011). All other columns are obtained from the output of fastQC on the raw, unmapped reads.

    Data Availability Statement

    Raw sequence data for the 56 samples reported here can be found in the European Nucleotide Archive ( http://www.ebi.ac.uk/ena) under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center ( https://stockcenter.ucsd.edu) under accessions 14021-0231.139, 14021-0231.140, 14021-0231.141, 14021-0231.142, 14021-0231.143, 14021-0231.144, 14021-0231.145, 14021-0231.146, 14021-0231.147, 14021-0231.148, 14021-0231.149, 14021-0231.150, 14021-0231.151, 14021-0231.152, 14021-0231.153, 14021-0231.154, 14021-0231.155, 14021-0231.156, 14021-0231.157, 14021-0231.158, 14021-0231.183, 14021-0231.184, 14021-0231.185, 14021-0231.186, 14021-0231.187, 14021-0231.188, 14021-0231.189, 14021-0231.190, 14021-0231.191, 14021-0231.192, 14021-0231.193, 14021-0231.194, 14021-0231.195, 14021-0231.196, 14021-0231.197, 14021-0231.163, 14021-0231.164, 14021-0231.165, 14021-0231.166, 14021-0231.167, 14021-0231.168, 14021-0231.170, 14021-0231.172, 14021-0231.174, 14021-0231.176, 14021-0231.177, 14021-0231.178, 14021-0231.180, 14021-0231.181 and 14021-0231.182..

    Descriptive statistics for validation of each run can be found in Dataset 1, DOI: 10.5256/f1000research.6090.d42636 ( Bergman & Haddrill, 2014).


    Articles from F1000Research are provided here courtesy of F1000 Research Ltd

    RESOURCES