Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2017 May 4;33(17):2779–2780. doi: 10.1093/bioinformatics/btx301

PopFly: the Drosophila population genomics browser

Sergi Hervas 1,, Esteve Sanz 2, Sònia Casillas 1, John E Pool 3, Antonio Barbadilla 1,
Editor: Oliver Stegle
PMCID: PMC5860067  PMID: 28472360

Abstract

Summary

The recent compilation of over 1100 worldwide wild-derived Drosophila melanogaster genome sequences reassembled using a standardized pipeline provides a unique resource for population genomic studies (Drosophila Genome Nexus, DGN). A visual display of the estimated metrics describing genome-wide variation and selection patterns would allow gaining a global view and understanding of the evolutionary forces shaping genome variation.

Availability and implementation

Here, we present PopFly, a population genomics-oriented genome browser, based on JBrowse software, that contains a complete inventory of population genomic parameters estimated from DGN data. This browser is designed for the automatic analysis and display of genetic variation data within and between populations along the D. melanogaster genome. PopFly allows the visualization and retrieval of functional annotations, estimates of nucleotide diversity metrics, linkage disequilibrium statistics, recombination rates, a battery of neutrality tests, and population differentiation parameters at different window sizes through the euchromatic chromosomes. PopFly is open and freely available at site http://popfly.uab.cat.

1 Introduction

High-throughput sequencing technologies are allowing the description of genome-wide variation patterns of an ever growing number of organisms. Several studies have been carried out in the last years involving dozens and even hundreds of wild-derived samples in the species Drosophila melanogaster, the model organism for population genetic studies (reviewed by Casillas and Barbadilla, 2017). The Drosophila Genome Nexus (DGN) project has reassembled most published D. melanogaster population genomic data, creating a set of around 1100 worldwide genome sequences comparable among them, which greatly facilitates future population genomic studies in this model species (Lack et al., 2015, 2016).

One main bioinformatics challenge when analyzing a huge amount of genomic data is how to get an easy and intuitive visualization and retrieval of such information. Genome browsers provide a unique platform for molecular biologists to browse, search, retrieve and analyze these genomic data efficiently and conveniently taking advantage of their user-friendly graphical interface. PopDrowser, the Population Drosophila Browser (Ràmia et al., 2012), displays population genomic parameters estimated from a single population of D. melanogaster (the Drosophila Genetic Reference Panel, Mackay et al., 2012). However, this browser has become outdated in terms of performance and data storage. Here, we present PopFly, a population genomics-oriented web-browser that updates our previous PopDrowser. PopFly contains a complete inventory of population genomic parameters estimated from the DGN project data, along with functional annotations from the reference D. melanogaster genome sequence. The user-friendly graphical web interface of this new browser allows an easy visualization and retrieval of the broadest catalog of genome-wide patterns of nucleotide variation and population genetics estimates in D. melanogaster at different resolution scales. Furthermore, the automated nature of the data processing pipeline makes this platform highly scalable, allowing the continuous updating of the database by the addition of the increasing number of new genome sequences.

2 Browser overview

2.1 Input data

The input data is a set of aligned D. melanogaster genome sequences from the DGN project. At present, the analyzed data comprises more than 960 genome sequences from 30 populations out of 18 countries spanning 5 continents. The genome sequences of Drosophila yakuba and Drosophila simulans are used as outgroup species.

2.2 Software implementation

PopFly contains a set of precomputed population genomic estimates generated through the combined implementation of programs VariScan2 (Hutter et al., 2006), LDhelmet (Chan et al., 2012), and custom ad-hoc scripts. Data and summary statistics are graphically displayed along the chromosome arms on a web-based user interface (Fig. 1) using the JBrowse software (Buels et al., 2016), which considerably improves the performance of its previous version, and can be easily downloaded in bedGraph, wiggle or gff3 format files. PopFly also incorporates utilities to perform on-the-fly statistical analyses and download sequences, and allows uploading user custom tracks. The current browser implementation is running under Apache on a CentOS 7.2 Linux x64 server, 16 IntelXeon 2.4GHz processors, 32GB RAM.

Fig. 1.

Fig. 1

PopFly snapshot with certain activated tracks and the utility to download sequences

2.3 Browser tracks

The genome browser includes, for each sampled population and metapopulation (populations aggregated by continent): summary measures of nucleotide diversity, divergence between species, linkage disequilibrium statistics, historical population-scaled recombination rate estimates, a battery of neutrality tests and population differentiation metrics (Table 1), computed at non-overlapping windows of varying size (1 kb, 10 kb, 50 kb, 100 kb). The browser also contains the D. melanogaster genome reference sequence along with its functional annotations (version 5.57 from FlyBase), and the high-resolution reference recombination maps from Comeron et al., (2012) and Fiston-Lavier et al., (2010).

Table 1.

PopFly category tracks

Category Annotations and main parameter estimates
Reference tracks D. melanogaster reference genome (build 5.57) sequence and annotations
Frequency-based nucleotide variation Watterson’s nucleotide diversity (θ), nucleotide diversity (π), number of 0-fold and 4-fold segregating sites (P0f, P4f), 0-fold and 4-fold nucleotide diversity (π0f, π4f)
Divergence-based metrics Nucleotide divergence per bp (k) with D. yakuba and D. simulans, number of 0-fold and 4-fold divergent sites (D0f, D4f), and 0-fold and 4-fold divergence (k0f, k4f)
Linkage disequilibrium LD sites, D, |D|, D’, |D’|, r2, number of haplotypes (h), haplotype diversity (Hd)
Recombination Recombination rate estimates from Comeron et al. (2012) and Fiston-Lavier et al. (2010), historical population-scaled recombination rate (ρA=2Ner; ρX = 8/3 Ner)
Selection tests based on SFS and/or variability Fu & Li D and F test statistics, Tajima’s D, Fu’s Fs statistic
Selection tests based on polymorphism and divergence Ka/Ks ratio, neutrality index (NI), direction of selection (DoS), proportion of adaptive substitutions (α) from McDonald-Kreitman test
Population differentiation FST estimates between populations

Acknowledgements

We would like to thank Miquel Ràmia for his helpful guidance and suggestions. We also thank Josefa González and the JBrowse developing community for their valuable comments to improve PopFly.

Funding

This work was supported by the Ministerio de Economía y Competitividad [BFU2013-42649-P to A.B.]; the Generalitat de Catalunya [2014-SGR-1346]; the Departament de Genètica i Microbiologia of the Universitat Autònoma de Barcelona [12a PIPF to S.H.]; the Youth Employment Initiative and European Social Fund [PEJ-2014 to E.S]; and the National Institutes of Health [R01 GM111797 to J.E.P.].

Conflict of Interest: none declared.

References

  1. Buels R. et al. (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol., 17, 66.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Casillas S., Barbadilla A. (2017) Molecular population genetics. Genetics, 205, 1003–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chan A.H. et al. (2012) Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genet., 8, e1003090.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Comeron J.M. et al. (2012) The many landscapes of recombination in Drosophila melanogaster. PLoS Genet., 8, e1002905.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fiston-Lavier A.-S. et al. (2010) Drosophila melanogaster recombination rate calculator. Gene, 463, 18–20. [DOI] [PubMed] [Google Scholar]
  6. Hutter S. et al. (2006) Genome-wide DNA polymorphism analyses using VariScan. BMC Bioinform., 7, 409.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lack J.B. et al. (2016) A thousand fly genomes: an expanded drosophila genome nexus. Mol. Biol. Evol., 33, 3308–3313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lack J.B. et al. (2015) The drosophila genome nexus: a population genomic resource of 623 drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics, 199, 1229–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Mackay T.F.C. et al. (2012) The drosophila melanogaster genetic reference panel. Nature, 482, 173–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ràmia M. et al. (2012) PopDrowser: the population drosophila browser. Bioinformatics, 28, 595–596. [DOI] [PubMed] [Google Scholar]

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES