Abstract
Adaptive challenges that humans faced as they expanded across the globe left specific molecular footprints that can be decoded in our today's genomes. Different sets of metrics are used to identify genomic regions that have undergone selection. However, there are fewer methods capable of pinpointing the allele ultimately responsible for this selection. Here, we present PopHumanVar, an interactive online application that is designed to facilitate the exploration and thorough analysis of candidate genomic regions by integrating both functional and population genomics data currently available. PopHumanVar generates useful summary reports of prioritized variants that are putatively causal of recent selective sweeps. It compiles data and graphically represents different layers of information, including natural selection statistics, as well as functional annotations and genealogical estimations of variant age, for biallelic single nucleotide variants (SNVs) of the 1000 Genomes Project phase 3. Specifically, PopHumanVar amasses SNV-based information from GEVA, SnpEFF, GWAS Catalog, ClinVar, RegulomeDB and DisGeNET databases, as well as accurate estimations of iHS, nSL and iSAFE statistics. Notably, PopHumanVar can successfully identify known causal variants of frequently reported candidate selection regions, including EDAR in East-Asians, ACKR1 (DARC) in Africans and LCT/MCM6 in Europeans. PopHumanVar is open and freely available at https://pophumanvar.uab.cat.
Graphical Abstract
Graphical Abstract.

PopHumanVar is an interactive online application that is designed to facilitate the exploration and thorough analysis of candidate genomic regions by integrating both functional and population genomics data currently available. PopHumanVar generates useful summary reports of prioritized variants that are putatively causal of recent selective sweeps.
INTRODUCTION
The landscape of variation in human genomes holds the record of our evolutionary history. Despite the numerous attempts to identify selection targets in diverse populations (1–5), or date the time of appearance of an adaptive mutation and trace its spread around the globe (6–11), how, where, and when our genomes underwent adaptation is a subtle issue which is far from being resolved.
One of the results of next-generation sequencing (NGS) technologies is the 1000 Genomes Project (1000GP) (12), an international research effort to generate a catalog of human genetic variation. Years after its completion, it still represents one of the largest public catalogs of human variation and genotype data. Reporting >84 million variants, with 2504 sequenced genomes from 26 populations, it is one of the main references for population genomics in the human species.
With the 1000GP, came the possibility of scanning the entire genome for signatures of natural selection, resulting in the piling up of genomic regions believed to have evolved under positive selection (13–18). However, these genome-wide scans present two major constraints: the scarce agreement among studies, and the lack of in-depth characterization of candidate loci. In 2018, we tackled the first constraint by presenting PopHumanScan (19), a genome-wide catalog that brings together 2859 candidate regions under selection resulting from the combination of several metrics that capture selection in a wide range of time scales and selective regimes. Even though PopHumanScan compiled an exhaustive list of candidate regions and cross-referenced them to 268 previous publications, it did not provide tools to facilitate their validation nor to perform thorough analyses at the single nucleotide variant (SNV) level.
Integrating the numerous currently available information layers on functional and population genetics metrics can help portray the genomic landscape of a putatively selected region and aid the prioritization of causal genetic variants. These sources range from functional annotations (e.g. associations with phenotypes and diseases, implication in the regulation of gene expression, or predicted functional effects), to selection statistics based on the analysis of genomics data, to genealogical estimations of variant age. As far as we know, even though several SNV-oriented public online databases exist that cover one of the previous aspects (see e.g. snpXplorer (20)), none of them bring both functional and evolutionary information all together with the main focus of identifying causal variants of selective sweeps.
Here, we present PopHumanVar, an interactive online application that is designed to facilitate the exploration and thorough analysis of candidate genomic regions under selection, generating useful summary reports of prioritized variants that are putatively causal of recent selective sweeps. It compiles and graphically represents selection statistics based on linkage disequilibrium, a comprehensive set of functional annotations, and recent genealogical estimations of variant age for SNVs of the 26 populations of the phase 3 of the 1000GP. Specifically, PopHumanVar gathers data either computed or compiled from the following data sources: the Integrated Haplotype Score (iHS) (21), the Number of Segregating sites by Length (nSL) (22), the Integrated Selection of Allele Favored by Evolution (iSAFE) (23), SnpEFF (24), RegulomeDB (25), ClinVar (26), GWAS Catalog (27), DisGeNET (28), and the Genealogical Estimation of Variant Age (GEVA) as obtained from the Human Genome Dating database (or Atlas of Variant Age) (29). As such, PopHumanVar is complementary to our previous genome browser -PopHuman (30)- and database of candidate selection regions -PopHumanScan (19)-, allowing researchers to focus on particular selective sweeps, pinpoint the corresponding causal variants, and estimate variant age. For populations and/or samples not included in the online application, PopHumanVar allows uploading and analyzing a VCF file with custom data.
The utility of PopHumanVar has been tested on frequently reported candidate genomic regions in genome-wide scans for positive selection in humans, including a region close to the gene EDAR, which is associated with hair follicle thickness and straightness and shovel-shaped incisors in East-Asians (31–34), a region in the gene ACKR1 (DARC), which is associated with resistance to malaria in Africans (35–37), as well as a region close to the genes LCT and MCM6, which is associated with lactase persistence in Europeans (38,39). In all three cases, PopHumanVar is able to identify the causal variant reported in previous studies and accurately estimate the variant age. These promising results illustrate the exploratory potential of PopHumanVar to push out into yet unfamiliar human adaptation signatures, including those compiled in PopHumanScan or the ones that can be visually extracted from PopHuman, but also any other genomic region of interest.
CONTENTS OF POPHUMANVAR
PopHumanVar collects evolutionary data, functional annotations, and age information altogether. Evolutionary and age information have been computed on the 26 populations of the phase 3 of the 1000GP (12), while functional annotations have been retrieved from publicly available databases (see below). In total, 81.70 M SNVs of the 1000GP have information for one or more of the collected data sources.
Selection statistics and favored mutation rank
All selection statistics were computed on the 26 populations of the phase 3 of the 1000GP, including non-inbred individuals as specified by Gazal et al. (40). We considered autosomal biallelic SNVs that are accessible to sequencing techniques according to the 1000GP pilot accessibility mask (12). We advise taking results for the four admixed-American populations with caution, as these populations have complex recent demographic histories that may mimic some patterns of genetic diversity that PopHumanVar uses to infer selection, and thus results from these populations may be difficult to interpret.
Integrated Haplotype Score (iHS)
Defined by Voight et al. (21), it tracks the decay of haplotype homozygosity for both ancestral and derived haplotypes. It has good power to detect selective sweeps at a moderate frequency (50–80%) (16,21,41). iHS was computed with selscan v1.2.0a and norm v1.2.1a (42). We only considered those SNVs having a Minor Allele Frequency (MAF) higher than 0.05 and a maximum gap of 20 kb between consecutive SNPs when assembling haplotypes. The recombination maps used to interpolate genetic positions (necessary to compute iHS) were the sex-averaged ones from Bhérer et al. (43). We obtained estimates for a total of 12.14 M SNVs (Figure 1). Significance was assessed from the empirical distribution of iHS values in each population separately.
Figure 1.
Summary of the contents of PopHumanVar. Bars represent the number and percentage of single nucleotide variants (SNVs) with information for each dataset.
Number of Segregating sites by Length (nSL)
It is also a haplotype-based statistic. It combines information on the distribution of fragment lengths, defined by pairwise differences, with the distribution of the number of segregating sites between all pairs of chromosomes (22). It is better than iHS at capturing soft sweeps. nSL was also computed with selscan v1.2.0a and norm v1.2.1a. We only considered those SNVs having a MAF higher than 0.05 and a maximum gap of 20 kbp between consecutive SNPs when assembling haplotypes. We obtained estimates for a total of 12.66 M SNVs (Figure 1). As for iHS, significance was assessed from the empirical distribution of nSL values in each population separately.
Integrated Selection of Allele Favored by Evolution (iSAFE)
It aims to identify the specific variant ultimately responsible for a selective sweep (23). iSAFE exploits coalescent-based signals in the surroundings of a candidate region to rank mutations according to their likelihood of having caused the selective sweep. In order to compute iSAFE genome-wide, we analyzed overlapping sliding windows of 3 Mbp, with a 1 Mbp overlap, all along the autosomal chromosomes (values suggested by the iSAFE authors in their GitHub repository at https://github.com/alek0991/iSAFE). From each window, we kept values for the 1 Mbp middle chunk and discarded values in the shoulders. In order to facilitate the genome-wide approach, we ran iSAFE with default parameters, but ignoring the gaps and increasing the maximum rank parameter up to the window size (MaxRank = window = 300) in order to retrieve values for all SNVs in the window. We obtained iSAFE values for a total of 42.14 M SNVs (Figure 1). Significance was assessed from the empirical distribution of iSAFE values in each population separately.
Functional annotations
SnpEFF
It predicts and annotates the functional effects of genetic variants (24) (e.g. stop gain, splice donor variant, missense variant, intergenic region…), which are classified into four different categories based on their impact (i.e. high, moderate, low or modifier). We ran SnpEFF v.5.0 with default parameters and obtained annotations for 79.56 M SNVs (Figure 1). Affected genes, if any, were also recorded.
RegulomeDB
It predicts and annotates the regulatory potential of intergenic variants (25). Evidence is compiled from GEO (44), ENCODE (45), and the published literature, and it includes known, as well as predicted, regulatory DNA elements, such as regions of DNase hypersensitivity sites, transcription factor binding sites, and promoter regions that have been biochemically characterized to regulate transcription. RegulomeDB scores the regulatory potential of intergenic SNVs based on overlapping supporting information (i.e. 15 different scores, from 1a to 7). We retrieved RegulomeDB v.2.0.3 scores for 12.76 M SNVs (Figure 1).
ClinVar
It is one of the largest catalogs of genetic variants that are clinically associated with diseases, together with supporting evidence (26). It rates variant-disease associations into different categories (e.g. pathogenic, risk factor, presenting drug response, protective, benign…). We retrieved ClinVar (updated on 2021/03/04) annotations for 630 k SNVs (Figure 1).
GWAS Catalog
It is a quality-controlled, manually-curated, literature-derived collection of all published genome-wide association studies (GWAS) assaying at least 100 000 genetic variants (27). We retrieved the number of associations in the GWAS Catalog v1.0.2, as well as the specific traits reported, over 149 k SNVs (Figure 1).
DisGeNET
It is one of the largest publicly available collections of genes and variants associated with human diseases (28). It integrates data from expert-curated repositories, homogeneously annotated with controlled vocabularies and community-driven ontologies. It provides original metrics to assist the prioritization of genotype-phenotype relationships, such as disease specificity, evidence index, or number of Pubmed identifiers. We retrieved DisGeNET v.7.0 annotations for 189 k SNVs (Figure 1).
Age estimation
Human Genome Dating (or Atlas of Variant Age)
It gathers age estimation results for more than 45 M variants in the human genome, computed using the Genealogical Estimation of Variant Age (GEVA) (29). GEVA is a method that exploits coalescent modeling to infer the time to the most recent common ancestor (TMRCA) between individual genomes based on three different clock models, considering: (i) mutation events that occur independently in each lineage and pile up as the ancestral haplotype is passed on over the generations (i.e. mutational clock); (ii) recombination events that shorten the length of the ancestral haplotype, independently in each lineage and across generations (i.e. recombination clock); or (iii) both (i.e. joint clock). We retrieved age estimates, as well as the corresponding quality scores, for all three clock models, from the Atlas of Variant Age database (downloaded on 2021/06/26) for a total of 43.23 M SNVs (Figure 1).
OVERVIEW OF THE POPHUMANVAR INTERFACE
The PopHumanVar interface is divided into four main sections: (i) Stats Visualization represents the main navigation interface and provides several interactive graphs to aid the exploration and prioritization of genomic variants in the region of interest (Figure 2); (ii) Download provides tools to customize batch downloads from the database; (iii) Upload Data allows uploading and analyzing a VCF file with custom data; and (iv) Tutorial describes the database and presents a step-by-step usage example.
Figure 2.
Simplified representation of the PopHumanVar interface. Some representative graphs of each of the five elements of the Stats Visualization section of the database are represented; from left to right, top to bottom: Summary Report, Selection, Favored Mutation, Functional Description and Age Information.
Stats visualization
Stats visualization unfolds five subitems, each pointing to a visualization tab with one or more interactive graphs and tables. Note that while in the Stats Visualization, an additional menu –FILTERS MENU– adds to the left-side panel of the application. It allows: (i) choosing a genomic region of interest, either by entering its coordinates (GRCh37/hg19) or by searching a variant rsID, gene symbol or Ensembl identifier; (ii) activating one or more populations for which to display selection statistics and favored mutation ranks; and (iii) setting filters and parameters specific to the different visualization tabs. Changes in the FILTERS MENU will only be applied after clicking the ‘Update’ button at the bottom of the panel.
Selection (iHS & nSL)
iHS and nSL value distributions within and across the genomic region of interest are displayed for each of the selected populations. Significant values are indicated by the golden horizontal lines and in the interactive hover panel, and can be filtered from the FILTERS MENU (default empirical P-value ≤ 0.005, customizable by the user).
Favored mutation (iSAFE)
As above, iSAFE value distributions within and across the genomic region of interest are displayed for each of the selected populations. For simplicity, only scores higher than 0.05 are represented. Significant values are indicated by the golden horizontal lines and in the interactive hover panel, and can be filtered from the FILTERS MENU (default empirical P-value ≤ 0.0001 following Akbari et al. (23), customizable by the user).
Functional description
It includes different representations of the annotations retrieved or computed from SnpEFF, RegulomeDB, ClinVar, GWAS Catalog and DisGeNET. Several filters can be applied from the FILTERS MENU.
Age information
Genealogical estimations of variant age are represented for each SNV across the genomic region of interest. Several filters and parameters can be applied from the FILTERS MENU, including the clock model and the units of age in generations or years. In the graph, size and color represent the quality score of the estimations.
Summary report
This section aims to summarize all the evolutionary and functional information gathered for the genomic region of interest through a JBrowse implementation showing gene annotations overlapping the region, direct links to PopHuman (30) and PopHumanScan (19), an eloquent summary graph including selection statistics and functional data, a list of top-20 automatically-prioritized putatively causal variants, and representations of EHH and haplotype furcations around any SNV of the top-20 prioritized variants (by default the one having the most extreme iSAFE value; computed with rehh (46)). In the summary graph, iSAFE scores are represented for all SNVs across the genomic region of interest. Color represents the strongest SnpEFF functional effect of each variant, and size represents its combined iHS + nSL value. All the information displayed in this section refers to a specific population, which can be chosen from the right-side menu –DISPLAY OPTIONS–. Changes in the DISPLAY OPTIONS menu will be applied after clicking the ‘Refresh’ button at the bottom of the panel.
Download
Download unfolds two subitems: Current Region and Batch Download. Both are used to download PopHumanVar data in tabular files. Current Region is the most customizable option and allows specifying how filters should be applied and which data should be included in the downloaded files. The right-side menu is used to set these parameters. Batch Download is used to make bulk downloads of the whole database contents. Data for a maximum of 50 Mbp (which may be split into several regions) can be retrieved at a time. Alternatively, data for whole chromosomes can be downloaded in compressed tabular files.
Upload data
This section allows uploading a VCF file with custom data, which may cover up to 2 Mbp of genomic sequence for one single population. The data is processed automatically by the PopHumanVar pipeline and results are sent by email as a dynamic Shiny markdown file.
Tutorial
This section documents the data used and the procedures implemented in PopHumanVar, and includes a complete tutorial introducing the usage of the database through a step-by-step worked example.
POPHUMANVAR WITH AN EXAMPLE: SELECTION AT THE EDAR LOCUS
Here, we illustrate the usage of PopHumanVar with an example. This section summarizes the main findings, while the Tutorial section of the application contains a step-by-step guide of the same example.
For this case study, we will focus on a genomic region of 1.15 Mbp in chromosome 2 (chr2:109500927–109615828; GRCh37/hg19). The region contains the gene EDAR –Ectodysplasin A Receptor–, a cell-surface receptor that, upon binding to its ligand, induces an intracellular cascade leading to the activation of the transcription factor NF-κB.
EDAR is a well-studied gene involved in the development of hair follicles, teeth, and sweat glands (31–34). It has frequently been reported in genome-wide scans for positive selection in humans (19,47–49) and is one of the candidate regions cataloged in PopHumanScan (Figure 3). PopHumanScan reports signatures of selection for haplotype-based statistics (i.e. iHS and XP-EHH) in East-Asian populations, especially in the Southern Han Chinese (CHS) population. In addition, the region shows extreme values (i.e. more than two standard deviations away from the mean value) both for the haplotype-based statistics iHS and XP-EHH, and the Site Frequency Spectrum (SFS)-based statistics Tajima's D and Fay and Wu's H, as displayed in PopHuman. Although both PopHuman and PopHumanScan bring our attention to this region, none of them allows us to shift to the SNV level and determine which variant was selected and when.
Figure 3.
Characterization and variant prioritization in the EDAR gene region, which is associated with hair follicle thickness and straightness and shovel-shaped incisors in East-Asians. Complementary information obtained from PopHumanScan (top) and PopHumanVar (bottom) is shown. Color labels at the right of the PopHumanVar section represent, from top to bottom: iSAFE, iHS, nSL (top 0.5%), SnpEff effect, GWAS Catalog hits, and Atlas of Variant Age variant age in generations.
Instead of reporting summary statistics for a genomic region of interest, as PopHuman and PopHumanScan do, the PopHumanVar application presented here reports information at the SNV level and helps prioritize causal variants of selective sweeps. In the EDAR gene region, apart from gathering abundant functional annotations and evolutionary statistics, PopHumanVar prioritizes the protein-coding missense variant rs3827760 (A > G) as the top causal variant (Figure 3), which happens to be the known causal variant of a well-studied selective sweep in East-Asians (34,49). The derived G allele (Val370Ala substitution) is found at high frequency in East-Asian populations (87%), as well as Native American populations (39%) (31). It was driven to high frequency in East-Asia by positive selection prior to 10 000 years ago (6,31). In the GWAS catalog, it is reported to be associated with ear, eyebrow and chin morphology, and male-pattern baldness. The prioritized variant rs3827760 reports the highest iSAFE—as well as iHS and nSL—values in the region.
Two additional case studies, that of genes ACKR1 (DARC) in Africans and LCT/MCM6 in Europeans, are shown in the Supplementary Data.
CONCLUSION
The PopHumanVar interactive application presented here, successfully tested by confirmatory results on the EDAR gene and two other well-known case studies, demonstrates its exploratory potential to prioritize variants in regions holding signatures of natural selection. Contrary to other SNV-oriented public online databases, the PopHumanVar approach brings both functional and evolutionary information all together, including natural selection statistics, functional annotations and genealogical estimations of variant age, and goes one step forward in the task of identifying and dating the emergence of variants that were putatively causal of the corresponding selective sweeps. In this way, PopHumanVar eases the description and thorough analysis of yet unfamiliar human adaptation signatures such as those compiled in PopHumanScan or the ones that can be visually extracted from PopHuman. Future implementations to PopHumanVar will include the development of a pre-processing module that returns uniform, adequate data from any human variation source data, so that additional populations not in the 1000GP, or new 1000GP samples, can be easily incorporated in PopHumanVar. All in all, we think that the public release of PopHumanVar will help advance our understanding of how environmental and social challenges have shaped our genomes through the action of natural selection.
IMPLEMENTATION AND AVAILABILITY
PopHumanVar is based on the Shiny framework (50) for development of web-based applications using the R programming environment (51). Interactive plots are implemented with plotly (52). Interactive tables are generated with DT (53), an R-based interface to the JavaScript DataTables library. The genome browser integrated into the Summary Report section is implemented using the JBrowseR package (54). User queries are processed by R and sent to a MariaDB database. All scripts are available in the GitHub repository (https://github.com/ainacolovila/PopHumanVar).
PopHumanVar is served with Apache on a CentOS 7.2 Linux x64 server with 16 Intel Xeon 2.4 GHz processors and 32 GB RAM. All data, tools and support resources provided by the PopHumanVar database are open and freely available at https://pophumanvar.uab.cat. PopHumanVar is accessible and legible on computer, phone and tablet screens.
Supplementary Material
ACKNOWLEDGEMENTS
The authors would like to thank the Port d’Informació Científica (PIC) of the UAB for providing the informatics infrastructure in which most of the population genomics statistics have been computed, and help on using it. We also thank Esteve Sanz for providing some data management utilities, Laia Carrillo for evaluating the PopHumanVar data on several case regions, and members of the Genomics, Bioinformatics and Evolutionary Biology group for testing the database implementation. Finally, we thank two anonymous referees for very helpful comments on the PopHumanVar implementation and manuscript.
Notes
Present address: Marta Coronado-Zamora, Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona 08003, Spain.
Contributor Information
Aina Colomer-Vilaplana, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Jesús Murga-Moreno, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain; Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Aleix Canalda-Baltrons, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Clara Inserte, Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Daniel Soto, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Marta Coronado-Zamora, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain; Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Antonio Barbadilla, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain; Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
Sònia Casillas, Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain; Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
Ministerio de Economía y Competitividad (Spain); ERDF funds [CGL2017-89160P to M.S., A.B.]; AGAUR (Generalitat de Catalunya) [2017SGR-1379 to A.R.]; Secretaria d’Universitats i Recerca de la Generalitat de Catalunya and the European Social Fund [2020FI_B-01045 to A.C.-V.]; Departament de Genètica i de Microbiologia (UAB) [PIF to J.M.-M.]. Funding for open access charge: Ministerio de Economía y Competitividad (Spain).
Conflict of interest statement. None declared.
REFERENCES
- 1. Nielsen R., Akey J.M., Jakobsson M., Pritchard J.K., Tishkoff S., Willerslev E.. Tracing the peopling of the world through genomics. Nature. 2017; 541:302–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Fan S., Hansen M.E.B., Lo Y., Tishkoff S.A.. Going global by adapting local: a review of recent human adaptation. Science. 2016; 354:54–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Hinds D.A., Stuve L.L., Nilsen G.B., Halperin E., Eskin E., Ballinger D.G., Frazer K.A., Cox D.R.. Whole-genome patterns of common DNA variation in three human populations. Science. 2005; 307:1072–1079. [DOI] [PubMed] [Google Scholar]
- 4. Altshuler D., Donnelly P.The International HapMap Consortium . A haplotype map of the human genome. Nature. 2005; 437:1299–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L.et al.. Integrating common and rare genetic variation in diverse human populations. Nature. 2010; 467:52–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Smith J., Coop G., Stephens M., Novembre J.. Estimating time to the common ancestor for a beneficial allele. Mol. Biol. Evol. 2018; 35:1003–1017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Speidel L., Forest M., Shi S., Myers S.R.. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 2019; 51:1321–1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Speidel L., Cassidy L., Davies R.W., Hellenthal G., Skoglund P., Myers S.R.. Inferring population histories for ancient genomes using genome-wide genealogies. Mol. Biol. Evol. 2021; 38:3497–3511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Bergström A., Stringer C., Hajdinjak M., Scerri E.M.L., Skoglund P.. Origins of modern human ancestry. Nature. 2021; 590:229–237. [DOI] [PubMed] [Google Scholar]
- 10. Kelleher J., Wong Y., Wohns A.W., Fadil C., Albers P.K., McVean G.. Inferring whole-genome histories in large population datasets. Nat. Genet. 2019; 51:1330–1338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Rasmussen M.D., Hubisz M.J., Gronau I., Siepel A.. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 2014; 10:e1004342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Auton A., Abecasis G.R., Altshuler D.M., Durbin R.M., Abecasis G.R., Bentley D.R., Chakravarti A., Clark A.G., Donnelly P., Eichler E.E.et al.. A global reference for human genetic variation. Nature. 2015; 526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Johnson K.E., Voight B.F.. Patterns of shared signatures of recent positive selection across human populations. Nat. Ecol. Evol. 2018; 2:713–720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Sugden L.A., Atkinson E.G., Fischer A.P., Rong S., Henn B.M., Ramachandran S.. Localization of adaptive variants in human genomes using averaged one-dependence estimation. Nat. Commun. 2018; 9:703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Sabeti P.C., Schaffner S.F., Fry B., Lohmueller J., Varilly P., Shamovsky O., Palma A., Mikkelsen T.S., Altshuler D., Lander E.S.. Positive natural selection in the human lineage. Science. 2006; 312:1614–1620. [DOI] [PubMed] [Google Scholar]
- 16. Sabeti P.C., Varilly P., Fry B., Lohmueller J., Hostetter E., Cotsapas C., Xie X., Byrne E.H., McCarroll S.A., Gaudet R.et al.. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007; 449:913–918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Biswas S., Akey J.M.. Genomic insights into positive selection. Trends Genet. 2006; 22:437–446. [DOI] [PubMed] [Google Scholar]
- 18. Akey J.M. Constructing genomic maps of positive selection in humans: where do we go from here?. Genome Res. 2009; 19:711–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Murga-Moreno J., Coronado-Zamora M., Bodelón A., Barbadilla A., Casillas S.. PopHumanScan: the online catalog of human genome adaptation. Nucleic Acids Res. 2019; 47:D1080–D1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Tesi N., van der Lee S., Hulsman M., Holstege H., Reinders M.J.T.. snpXplorer: a web application to explore human SNP-associations and annotate SNP-sets. Nucleic Acids Res. 2021; 49:W603–W612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Voight B.F., Kudaravalli S., Wen X., Pritchard J.K.. A map of recent positive selection in the human genome. PLOS Biol. 2006; 4:e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Ferrer-Admetlla A., Liang M., Korneliussen T., Nielsen R.. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol. Biol. Evol. 2014; 31:1275–1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Akbari A., Vitti J.J., Iranmehr A., Bakhtiari M., Sabeti P.C., Mirarab S., Bafna V.. Identifying the favored mutation in a positive selective sweep. Nat. Methods. 2018; 15:279–282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Cingolani P., Platts A., Wang L.L., Coon M., Nguyen T., Wang L., Land S.J., Lu X., Ruden D.M.. A program for annotating and predicting the effects of single nucleotide polymorphisms. SnpEff. Fly (Austin). 2012; 6:80–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Boyle A.P., Hong E.L., Hariharan M., Cheng Y., Schaub M.A., Kasowski M., Karczewski K.J., Park J., Hitz B.C., Weng S.et al.. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012; 22:1790–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Landrum M.J., Lee J.M., Riley G.R., Jang W., Rubinstein W.S., Church D.M., Maglott D.R.. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014; 42:D980–D985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. MacArthur J., Bowler E., Cerezo M., Gil L., Hall P., Hastings E., Junkins H., McMahon A., Milano A., Morales J.et al.. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017; 45:D896–D901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Piñero J., Ramírez-Anguita J.M., Saüch-Pitarch J., Ronzano F., Centeno E., Sanz F., Furlong L.I.. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020; 48:D845–D855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Albers P.K., McVean G.. Dating genomic variants and shared ancestry in population-scale sequencing data. PLOS Biol. 2020; 18:e3000586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Casillas S., Mulet R., Villegas-Mirón P., Hervas S., Sanz E., Velasco D., Bertranpetit J., Laayouni H., Barbadilla A.. PopHuman: the human population genomics browser. Nucleic Acids Res. 2018; 46:D1003–D1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Bryk J., Hardouin E., Pugach I., Hughes D., Strotmann R., Stoneking M., Myles S.. Positive selection in fast Asians for an EDAR allele that enhances NF-κB activation. PLoS One. 2008; 3:e2209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Enattah N.S., Sahi T., Savilahti E., Terwilliger J.D., Peltonen L., Järvelä I.. Identification of a variant associated with adult-type hypolactasia. Nat. Genet. 2002; 30:233–237. [DOI] [PubMed] [Google Scholar]
- 33. Kamberov Y.G., Wang S., Tan J., Gerbault P., Wark A., Tan L., Yang Y., Li S., Tang K., Chen H.et al.. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell. 2013; 152:691–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Park J.-H., Yamaguchi T., Watanabe C., Kawaguchi A., Haneji K., Takeda M., Kim Y.-I., Tomoyasu Y., Watanabe M., Oota H.et al.. Effects of an Asian-specific nonsynonymous EDAR variant on multiple dental traits. J. Hum. Genet. 2012; 57:508–514. [DOI] [PubMed] [Google Scholar]
- 35. Yin Q., Srivastava K., Gebremedhin A., Makuria A.T., Flegel W.A.. Long-range haplotype analysis of the malaria parasite receptor gene ACKR1 in an East-African population. Hum. Genome Var. 2018; 5:26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Schmid P., Ravenell K.R., Sheldon S.L., Flegel W.A.. DARC alleles and Duffy phenotypes in African Americans. Transfusion (Paris). 2012; 52:1260–1267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. McManus K.F., Taravella A.M., Henn B.M., Bustamante C.D., Sikora M., Cornejo O.E.. Population genetic analysis of the DARC locus (Duffy) reveals adaptation from standing variation associated with malaria resistance in humans. PLOS Genet. 2017; 13:e1006560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Tishkoff S.A., Reed F.A., Ranciaro A., Voight B.F., Babbitt C.C., Silverman J.S., Powell K., Mortensen H.M., Hirbo J.B., Osman M.et al.. Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 2007; 39:31–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Ingram C.J.E., Mulcare C.A., Itan Y., Thomas M.G., Swallow D.M.. Lactose digestion and the evolutionary genetics of lactase persistence. Hum. Genet. 2009; 124:579–591. [DOI] [PubMed] [Google Scholar]
- 40. Gazal S., Sahbatou M., Babron M.-C., Génin E., Leutenegger A.-L.. High level of inbreeding in final phase of 1000 Genomes Project. Sci. Rep. 2015; 5:17453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Pickrell J.K., Coop G., Novembre J., Kudaravalli S., Li J.Z., Absher D., Srinivasan B.S., Barsh G.S., Myers R.M., Feldman M.W.et al.. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 2009; 19:826–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Szpiech Z.A., Hernandez R.D.. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol. Biol. Evol. 2014; 31:2824–2827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Bhérer C., Campbell C.L., Auton A.. Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nat. Commun. 2017; 8:14994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Davis S., Meltzer P.S.. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007; 23:1846–1847. [DOI] [PubMed] [Google Scholar]
- 45. Consortium T.E.P. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004; 306:636–640. [DOI] [PubMed] [Google Scholar]
- 46. Klassmann A., Gautier M. Detecting selection using Extended Haplotype Homozygosity-based statistics on unphased or unpolarized data. 2020; Authorea doi:30 October 2020, preprint: not peer reviewed 10.22541/au.160405572.29972398/v1. [DOI] [PMC free article] [PubMed]
- 47. Savolainen O., Lascoux M., Merilä J.. Ecological genomics of local adaptation. Nat. Rev. Genet. 2013; 14:807–820. [DOI] [PubMed] [Google Scholar]
- 48. Racimo F., Gokhman D., Fumagalli M., Ko A., Hansen T., Moltke I., Albrechtsen A., Carmel L., Huerta-Sánchez E., Nielsen R.. Archaic adaptive introgression in TBX15/WARS2. Mol. Biol. Evol. 2017; 34:509–524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Fujimoto A., Kimura R., Ohashi J., Omi K., Yuliwulandari R., Batubara L., Mustofa M.S., Samakkarn U., Settheetham-Ishida W., Ishida T.et al.. A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 2008; 17:835–843. [DOI] [PubMed] [Google Scholar]
- 50. Chang W., Cheng J., Allaire J.J., Sievert C., Schloerke B., Xie Y., Allen J., McPherson J., Dipert A., Borges B.. 2021) shiny: Web Application Framework for R.
- 51. Core Team R R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing. 2020; Vienna, Austria. [Google Scholar]
- 52. Sievert C. Interactive Web-Based Data Visualization with R, plotly, and shiny Chapman and Hall/CRC. 2020; [Google Scholar]
- 53. Xie Y., Cheng J., Tan X.. DT: A Wrapper of the JavaScript Library ‘DataTables’. 2021; [Google Scholar]
- 54. Hershberg E.A., Stevens G., Diesh C., Xie P., De Jesus, Martinez T., Buels R., Stein L., Holmes I.. JBrowseR: an R interface to the JBrowse 2 genome browser. Bioinformatics. 2021; 10.1093/bioinformatics/btab459. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



