Abstract
Summary
The recent compilation of over 1100 worldwide wild-derived Drosophila melanogaster genome sequences reassembled using a standardized pipeline provides a unique resource for population genomic studies (Drosophila Genome Nexus, DGN). A visual display of the estimated metrics describing genome-wide variation and selection patterns would allow gaining a global view and understanding of the evolutionary forces shaping genome variation.
Availability and implementation
Here, we present PopFly, a population genomics-oriented genome browser, based on JBrowse software, that contains a complete inventory of population genomic parameters estimated from DGN data. This browser is designed for the automatic analysis and display of genetic variation data within and between populations along the D. melanogaster genome. PopFly allows the visualization and retrieval of functional annotations, estimates of nucleotide diversity metrics, linkage disequilibrium statistics, recombination rates, a battery of neutrality tests, and population differentiation parameters at different window sizes through the euchromatic chromosomes. PopFly is open and freely available at site http://popfly.uab.cat.
1 Introduction
High-throughput sequencing technologies are allowing the description of genome-wide variation patterns of an ever growing number of organisms. Several studies have been carried out in the last years involving dozens and even hundreds of wild-derived samples in the species Drosophila melanogaster, the model organism for population genetic studies (reviewed by Casillas and Barbadilla, 2017). The Drosophila Genome Nexus (DGN) project has reassembled most published D. melanogaster population genomic data, creating a set of around 1100 worldwide genome sequences comparable among them, which greatly facilitates future population genomic studies in this model species (Lack et al., 2015, 2016).
One main bioinformatics challenge when analyzing a huge amount of genomic data is how to get an easy and intuitive visualization and retrieval of such information. Genome browsers provide a unique platform for molecular biologists to browse, search, retrieve and analyze these genomic data efficiently and conveniently taking advantage of their user-friendly graphical interface. PopDrowser, the Population Drosophila Browser (Ràmia et al., 2012), displays population genomic parameters estimated from a single population of D. melanogaster (the Drosophila Genetic Reference Panel, Mackay et al., 2012). However, this browser has become outdated in terms of performance and data storage. Here, we present PopFly, a population genomics-oriented web-browser that updates our previous PopDrowser. PopFly contains a complete inventory of population genomic parameters estimated from the DGN project data, along with functional annotations from the reference D. melanogaster genome sequence. The user-friendly graphical web interface of this new browser allows an easy visualization and retrieval of the broadest catalog of genome-wide patterns of nucleotide variation and population genetics estimates in D. melanogaster at different resolution scales. Furthermore, the automated nature of the data processing pipeline makes this platform highly scalable, allowing the continuous updating of the database by the addition of the increasing number of new genome sequences.
2 Browser overview
2.1 Input data
The input data is a set of aligned D. melanogaster genome sequences from the DGN project. At present, the analyzed data comprises more than 960 genome sequences from 30 populations out of 18 countries spanning 5 continents. The genome sequences of Drosophila yakuba and Drosophila simulans are used as outgroup species.
2.2 Software implementation
PopFly contains a set of precomputed population genomic estimates generated through the combined implementation of programs VariScan2 (Hutter et al., 2006), LDhelmet (Chan et al., 2012), and custom ad-hoc scripts. Data and summary statistics are graphically displayed along the chromosome arms on a web-based user interface (Fig. 1) using the JBrowse software (Buels et al., 2016), which considerably improves the performance of its previous version, and can be easily downloaded in bedGraph, wiggle or gff3 format files. PopFly also incorporates utilities to perform on-the-fly statistical analyses and download sequences, and allows uploading user custom tracks. The current browser implementation is running under Apache on a CentOS 7.2 Linux x64 server, 16 IntelXeon 2.4GHz processors, 32GB RAM.
Fig. 1.

PopFly snapshot with certain activated tracks and the utility to download sequences
2.3 Browser tracks
The genome browser includes, for each sampled population and metapopulation (populations aggregated by continent): summary measures of nucleotide diversity, divergence between species, linkage disequilibrium statistics, historical population-scaled recombination rate estimates, a battery of neutrality tests and population differentiation metrics (Table 1), computed at non-overlapping windows of varying size (1 kb, 10 kb, 50 kb, 100 kb). The browser also contains the D. melanogaster genome reference sequence along with its functional annotations (version 5.57 from FlyBase), and the high-resolution reference recombination maps from Comeron et al., (2012) and Fiston-Lavier et al., (2010).
Table 1.
PopFly category tracks
| Category | Annotations and main parameter estimates |
|---|---|
| Reference tracks | D. melanogaster reference genome (build 5.57) sequence and annotations |
| Frequency-based nucleotide variation | Watterson’s nucleotide diversity (θ), nucleotide diversity (π), number of 0-fold and 4-fold segregating sites (P0f, P4f), 0-fold and 4-fold nucleotide diversity (π0f, π4f) |
| Divergence-based metrics | Nucleotide divergence per bp (k) with D. yakuba and D. simulans, number of 0-fold and 4-fold divergent sites (D0f, D4f), and 0-fold and 4-fold divergence (k0f, k4f) |
| Linkage disequilibrium | LD sites, D, |D|, D’, |D’|, r2, number of haplotypes (h), haplotype diversity (Hd) |
| Recombination | Recombination rate estimates from Comeron et al. (2012) and Fiston-Lavier et al. (2010), historical population-scaled recombination rate (ρA=2Ner; ρX = 8/3 Ner) |
| Selection tests based on SFS and/or variability | Fu & Li D and F test statistics, Tajima’s D, Fu’s Fs statistic |
| Selection tests based on polymorphism and divergence | Ka/Ks ratio, neutrality index (NI), direction of selection (DoS), proportion of adaptive substitutions (α) from McDonald-Kreitman test |
| Population differentiation | FST estimates between populations |
Acknowledgements
We would like to thank Miquel Ràmia for his helpful guidance and suggestions. We also thank Josefa González and the JBrowse developing community for their valuable comments to improve PopFly.
Funding
This work was supported by the Ministerio de Economía y Competitividad [BFU2013-42649-P to A.B.]; the Generalitat de Catalunya [2014-SGR-1346]; the Departament de Genètica i Microbiologia of the Universitat Autònoma de Barcelona [12a PIPF to S.H.]; the Youth Employment Initiative and European Social Fund [PEJ-2014 to E.S]; and the National Institutes of Health [R01 GM111797 to J.E.P.].
Conflict of Interest: none declared.
References
- Buels R. et al. (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol., 17, 66.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casillas S., Barbadilla A. (2017) Molecular population genetics. Genetics, 205, 1003–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan A.H. et al. (2012) Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet., 8, e1003090.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Comeron J.M. et al. (2012) The many landscapes of recombination in Drosophila melanogaster. PLoS Genet., 8, e1002905.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fiston-Lavier A.-S. et al. (2010) Drosophila melanogaster recombination rate calculator. Gene, 463, 18–20. [DOI] [PubMed] [Google Scholar]
- Hutter S. et al. (2006) Genome-wide DNA polymorphism analyses using VariScan. BMC Bioinform., 7, 409.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lack J.B. et al. (2016) A thousand fly genomes: an expanded drosophila genome nexus. Mol. Biol. Evol., 33, 3308–3313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lack J.B. et al. (2015) The drosophila genome nexus: a population genomic resource of 623 drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics, 199, 1229–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mackay T.F.C. et al. (2012) The drosophila melanogaster genetic reference panel. Nature, 482, 173–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ràmia M. et al. (2012) PopDrowser: the population drosophila browser. Bioinformatics, 28, 595–596. [DOI] [PubMed] [Google Scholar]
