Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2013 Nov 3;42(Database issue):D910–D916. doi: 10.1093/nar/gkt1052

dbPSHP: a database of recent positive selection across human populations

Mulin Jun Li 1,2, Lily Yan Wang 1,2, Zhengyuan Xia 2,3, Maria P Wong 2,4, Pak Chung Sham 5,6,7, Junwen Wang 1,2,7,*
PMCID: PMC3965004  PMID: 24194603

Abstract

The dbPSHP database (http://jjwanglab.org/dbpshp) aims to help researchers to efficiently identify, validate and visualize putative positively selected loci in human evolution and further discover the mechanism governing these natural selections. Recent evolution of human populations at the genomic level reflects the adaptations to the living environments, including climate change and availability and stability of nutrients. Many genetic regions under positive selection have been identified, which assist us to understand how natural selection has shaped population differences. Here, we manually collect recent positive selections in different human populations, consisting of 15 472 loci from 132 publications. We further compiled a database that used 15 statistical terms of different evolutionary attributes for single nucleotide variant sites from the HapMap 3 and 1000 Genomes Project to identify putative regions under positive selection. These attributes include variant allele/genotype properties, variant heterozygosity, within population diversity, long-range haplotypes, pairwise population differentiation and evolutionary conservation. We also provide interactive pages for visualization and annotation of different selective signals. The database is freely available to the public and will be frequently updated.

INTRODUCTION

Natural selection plays a crucial role in the evolution of species, where random mutations are undergoing positive, purifying or balancing selection (1) for adaptation to the living environments including climate change, availability and stability of nutrients, introduction of novel disease agents, dispersed niche, etc. Recent evolutionary adaptations in the human lineage have been reflected by many population-specific traits such as pigmentation, malaria resistance and lactose tolerance (2–4). Many genetic regions of human genome under positive selection have been successfully identified, which assist us in understanding how natural selection has shaped population differences (5). Signatures of selection can be detected by observing the underlying patterns of DNA polymorphisms in one or different populations, which will facilitate the identification of positively selected genes or loci that are associated with specific function, trait or disease (6,7).

Statistical methods and tools have been successfully developed to detect genome-wide selective signals based on genetic data of human populations. Given one population, positive or negative selection tends to skew the allele frequencies comparing with neutral model. Statistics such as Tajima’s D (8) and Fay and Wu’s H (9) can detect a locus’s departures from neutrality and underlying selection. Linkage information can also be used to infer the selection signals. Besides, strong selection signal can also be discovered by searching a long-range haplotype. Extended haplotype homozygosity (EHH) (10) and integrated haplotype score (iHS) (11) have been used to capture these loci based on the length of haplotypes associated with a given allele. Recently, several new programs, such as HaploPS and SweeD, have been developed to efficiently search the regions on the genome carrying positive selection signals with higher sensitivity and specificity (12,13). Positive selection can also be identified by tracking the increment of identity-by-descent among individuals in a population (14,15). Moreover, large allele frequency differences between populations can be measured by fixation index (FST) (16) at each single nucleotide polymorphism (SNP) locus in the genome. Researchers also developed a tool, cross-population extended haplotype homozygosity test (XP-EHH), to detect ongoing or nearly fixed selective sweeps by comparing haplotypes from two populations (17). The cross-population composite likelihood ratio test (XP-CLR) scans multi-locus allele frequency differentiation between two populations to detect selective sweeps in analogy to EHH (18). Last, rejected substitution is adopted in genomic evolutionary rate profiling (19) to assess the strength of the selected elements on single nucleotide level.

The causal mutations for population adaptation have been proved to locate in many functional loci on the human genome. For different human populations, studies have shown that environmental changes, such as diet, climate and infectious disease, have caused advantageous rapid amino acid evolutions and consequently affect protein functions (20). Analysis has also been performed to identify a number of positively selected synonymous variants affecting the translation efficiency (21). Recently, researchers revealed that local adaptations have a higher chance to affect gene expression than amino acid sequence by studying selective signals between gene expression-associated SNPs and nonsynonymous SNPs (22). Until now, over hundreds of function-associated regions/genes have been reportedly undergoing positive selection from different human populations by inferring population genetic data. However, it is a tedious and time-consuming process of curation if researchers want to retrieve the selection information of their regions of interest or traits from literature. By far, little resources are available for users to search for known selective regions and their associated function effects.

However, the selective signals detected by aforementioned statistical methods are not always consistent in terms of the degree of derived allele frequency, which is usually varied by different datasets. To accurately identify true positive selection and the causal mutation, we need to combine different statistical values. A composite of multiple signals method has been proposed to combine five selective signals with satisfactory power (23). Some resources such as SNP@Ethnos (24), Haplotter (11), SNP@Evolution (25) and dbCLINE (26) have also provided respective selection signals for some populations in early HapMap dataset. However, more supporting signals are needed for explicit elaboration, and more world-wide populations should be investigated based on larger sample size. The recent International HapMap Project and 1000 Genomes Project have produced high quality genotyping data in a large sample size of different human populations, which enable us to systematically detect natural selection signals in a genome wide scale (27,28). Therefore, a comprehensive, easy-to-use and up-to-date resource focusing on recent human positive selection is urgently required.

Here we developed a database dbPSHP, a user friendly web portal on recent positive selection across human populations. We first manually collected 15 472 recent positive selections and related information in different human populations from literature. We further compiled a database that contains 15 calculated statistical signals for SNP sites from the HapMap 3 and 1000 Genomes Projects, which focus on variant allele/genotype properties, variant heterozygosity, within population diversity, long-range haplotypes, pairwise population differentiation and evolutionary conservation. We also provided interactive pages for visualization and annotation of different selective signals.

DATABASE DESIGN AND CONTENT

dbPSHP provides a manually curated dataset of positively selected loci of human populations from literature. It also consists of a variety of important attributes associated with recent human selection for one or pairwise populations under a consistent framework. The selection signals are evaluated on several aspects including ancestral and derived allele, allele frequency, genotype frequency, Hardy–Weinberg equilibrium (HWE), heterozygosity, nucleotide diversity, Tajima’s D, iHH, iHS, derived allele frequency difference (ΔDAF), fixation index (FST), XP-EHH, XP-CLR, neutral rate, and rejected substitution (Table 1). Furthermore, dbPSHP has been designed as a knowledge base and web service that offers a rapid search and interactive interface for the users.

Table 1.

The scope and calculated scores in the dbPSHP database

Attribute Evaluation term Abbreviation
Variant genotype properties Derived allele DA
Ancestral allele AA
Allele frequency DAF
AAF
Genotype frequency GFHOM1
GFHET
GFHOM2
Hardy–Weinberg equilibrium HWE1
HWE2
Variant heterozygosity Heterozygosity HET
Nucleotide diversity PI
Within population diversity Tajima’s D TD
Long-range haplotypes Integrated extended haplotype homozygosity IHH
Integrated haplotype score UIHS
IHS
Differentiation between populations Difference of derived allele frequency DDAF
DDAF_POP1_POP2
Fixation index FST1
FST1_POP1_POP2
FST2
FST2_POP1_POP2
Cross-population extended haplotype homozygosity UXPEHH
XPEHH_POP1_POP2
Cross-population composite likelihood ratio XPCLR
XPCLR_POP1_POP2
Evolutionary conservation Neutral rate NR
Rejected substitution RS

DAF is the allele frequency for the derived allele; AAF is the allele frequency for the ancestral allele; GFHOM1 is the genotype frequency for homozygous derived allele AA; GFHET is the genotype frequency for heterozygous Aa; GFHOM2 is the genotype frequency for homozygous ancestral allele aa; HWE1 is the value of simple chi square goodness-of-fit test; HWE2 is the P-value of exact test; FST1 is the FST of Wright’s approximate formula; FST2 is the FST of Cockerham & Weir estimator; UIHS is the unstandardized integrated haplotype score; UXPEHH is the unstandardized cross-population extended haplotype homozygosity; POP1_POP2 represents the pairwise scores of two specific populations (Supplementary Methods).

We started with data collection from the publications attempting to study positively selected loci/genes related to specific functions/traits/diseases of human populations during recent human evolution. We manually searched these publications through PubMed and occasional collection of some specific reports by natural selection related keywords (details in Supplementary Methods). The current version of dbPSHP contains 15 472 manually collected loci/genes under positive selection from 132 publications. Among them, 101 publications attempt to study the specific adaptive traits, and 31 publications detect the genome-wide selective signals with different statistical methods.

We then processed the genetic data of different populations using the International HapMap phase 3 and the 1000 Genomes Project phase 1 (details in Supplementary Methods). We pre-computed statistical scores in different categories that mainly include variant allele/genotype frequency, variant heterozygosity, within population diversity, long-range haplotypes, pairwise population differentiation and evolutionary conservation (Supplementary Table S3 and Supplementary Methods).

There are different criteria to determine whether the investigated loci have been undergoing positive selection. High frequency of derived allele, deviations from HWE, reduced heterozygosity, negative Tajima's D, high FST value and relatively higher iHS more or less indicate the selective signals. To facilitate the identification of true signals, we designed a filtering function by a set of defined score cutoff, which have been frequently used as empirical estimation of positive selection in current evolution studies. We further generated a list of putatively causal mutations for each population using these hard filtering (Supplementary Methods).

EVALUATION

To evaluate the reliability and accuracy of the statistical scores in dbPSHP, we first used two well-known cases under strong positive selection in specific population. Lactose tolerance has been previously identified as the positive selection in a large fraction of individuals of European descent after domestication of cattle, which genetically caused by a mutation in the lactase gene (LCT) (4). We validated the statistical scores for all of genetic variants in the LCT gene and nearby 500 kb genetic hitchhiking region in the CEU population. We found this positively selected region is significantly supported by all critical signals of most genetic variants in both HapMap 3 and 1000 Genomes Project dataset, including highly deviated derived allele frequency (ΔDAF), distinguished iHS and high FST, XP-EHH and XP-CLR values compared with other populations (Supplementary Figures S1 and S2). Further, we used another well studied gene, SLC24A5, related to the selection of lighter pigmentation between Europeans and West Africans (29). We checked the selective scores along the SLC24A5 and neighbouring selective sweep and we found, for CEU population of both HapMap 3 and 1000 Genomes Project dataset, there are increased signals of derived allele frequency and other indicators, especially in the downstream of SLC24A5 gene (Supplementary Figures S3 and S4).

Furthermore, we measured the overall reliability of pre-calculated scores in dbPSHP by comparing the score distribution between reported selective region and background. We collected 997 CEU loci, 574 YRI loci and 516 CHB loci from our curated positive selection list. We then extracted all genetic variants within these regions from both HapMap 3 and 1000 Genomes Project dataset. We constructed background genetic variants by randomly selecting the same number of genomic regions. We performed Mann–Whitney U test, for FST, |iHS|, |XP-EHH| and XP-CLR, to examine whether the selective scores in curated regions (regarded as under positive selection) are significantly larger than those in the background. We finally observed significant differences for almost all cases in different populations and the SNP dataset (Supplementary Table S4). The experiment further confirmed the usability of dbPSHP as a useful resource in the studies of recent human evolution.

Although there are some resources, such as SNP@Ethnos, Haplotter and SNP@Evolution, that selectively calculate particular selection scores in some populations using a different version of the HapMap dataset, it can hardly satisfy the immediate requirements of human evolutionary biology and population genetics. Even for the frequently used dataset CMS (23), it only provides five statistical scores (iHS, XP-EHH, ΔiHH, ΔDAF and FST) on limited population, as well as a simple query interface. Comparing with these resources, dbPSHP systematically curates reported function-related regions/genes under recent positive selection in the human populations from literature. It also constructs a database integrating up to 15 statistical terms for positive selection by a large number of populations, latest human genetic dataset and interactive user interfaces, which allows detecting a different level of positive selections and facilitates better hypothesis generation (Supplementary Table S5).

USAGE

dbPSHP website accepts three input formats including dbSNP ID, genomic locus and RefGene name. dbSNP ID will be converted to dbSNP 137 according to the SNP track history RsMergeArch. Genomic locus can be either a site (e.g. chr2:136575199) or a region (e.g. chr5:33944721-33984780). Both gene official symbol and Refseq accession number are supported as queries. For sanity visualization, the system will extend 50 kb surrounding regions if a user inputs a signal site. Users can also select the SNP data set (HapMap3 or 1000 Genomes Project) and investigated population in the input page. Also, a user can filter and sort the selection scores under different combination of empirical cutoffs.

dbPSHP uses a series of user-friendly interfaces to display the results, which not only efficiently present the query result but also facilitate the knowledge findings. The top left panel of the result page consists of three tabs. It first provides a scatter plot drawing the distribution of selective scores in the query region. A user can switch among different attributes by changing the select box. In this function, dbPSHP only returns the loci containing the selected attribute. The chart can be clicked, zoomed and is highly interactive with summary table below (Figure 1a). Besides, dbPSHP uses Google Map KML to generate the allele frequency map for all populations of selected SNP data set on a global Google Map, which provides an intuitive view for the allele distribution worldwide. The current population will be highlighted by a red outline (Figure 1b). A user can click each pie chart to get detailed information about the population in this map. dbPSHP also customizes related tracks in the UCSC Genome Browser, and a user can check it in the last internal tab of this panel.

Figure 1.

Figure 1.

The main functional units of dbPSHP interface. (a) The interactive chart for the scatter plot of different statistical scores, which depicts the iHS distribution of genetic hitchhiking region surrounding the LCT gene in the CEU population. (b) The worldwide allele frequency map of a genetic variant rs10188066 and selected population is marked with a red outline. Derived allele frequency is marked with blue color and ancestral allele frequency is marked with red color in each pie chart. (c) The summary table of important statistical terms for selected variant. (d) The three tabs records detailed information about selected variant including variant attributes, selective scores, literature evidence, mapped gene, GWAS information, cross population selective signals and comprehensive variant annotations from the external browser. (e) The searchable table collected literature-based positive selections in the human population.

Below the abovementioned panel, dbPSHP offers a summary table that extracts some important attributes for selected variants. Each row on the table can be clicked and is interactive with the above scatter plot (Figure 1c). The right panel in the result page has three tabs that show detailed information about a selected variant. The ‘dbPSHP Information’ tab lists the important attributes related to positive selection and reports the information of a published selective region as well as previous GWAS results recorded in GWASdb (30) (Figure 1d). ‘Cross Population’ tab records cross-population scores between queried population and each of the other populations by several statistical measurements including ΔDAF, FST, XP-EHH and XP-CLR. To facilitate the identification of driver mutation in the investigated genetic hitchhiking region, a particular tab ‘Variant Annotation’ connects current variants to a comprehensive annotation browser SNVrap (31).

To benefit from efficient storage and simplify querying from the client environment, we encapsulated all selective attributes into a VCF INFO field and created an indexed VCF compressed file for each population using Tabix (32). Users can extract information by vcftools (33) for further process. dbPSHP also hosts a FTP server which contains compressed files and curation data for downloading. Because the full database is relatively large, we further provided RESTful style of Web Services for instant retrieving of interested regions by different interfaces.

dbPSHP hosts a repository with collected literature-based loci with positively selected signals as well as their effects (Figure 1e). Users can query the records by text-free description such as ‘rs16891982’, ‘Pigmentation’, ‘LCT’ and ‘chr6:148734174-149732519’. Besides, dbPSHP also accepts the submission of newly discovered positive selections, which will be added into dbPSHP after double checking.

DISSCUSSION

dbPSHP is a database that systematically collects reported function-related regions/genes under recent positive selection in the human population. Our manually curated database will be frequently updated. dbPSHP further compiles a comprehensive resource that uses 15 evolutionary/statistical terms for the world-wide populations from the HapMap 3 and 1000 Genomes Project. Users can conveniently retrieve the information in either website or client by flexible queries. A set of visualization pages provides extensive views for intuitive identification of different selective signals. We believe this resource will help researchers efficiently identify, visualize and validate putative positively selected loci, as well as the causal mutation, in human evolution, and to further discover the mechanism behind these natural selections.

The statistical scores used in the database have been widely used to efficiently identify the genetic signatures of natural selection and accelerate follow-up downstream functional study. The imprint of evolutionary selection on ENCODE regulatory elements have been substantially studied, and many positive or negative selection regions are found to be functionally relevant (34). As the genome-wide association studies (GWAS) and the emerging whole genome sequencing studies (WGS) are discovering a huge number of disease associated genetic variants, future studies will be focused on the functional validation of these genetic variants, where human evolution is an essential part. Systematic evaluation of the selection attributes of associated genetic variants detected by GWAS may facilitate the finding of true causal loci for complex traits of specific population (35). Many traits/diseases associated-SNPs (30) expressed population-specific alleles as a result of different natural selection patterns across the population by polygenic adaptation (36–38). Using the evolutionary spectrum based on SNPs data and comprehensive genomic data, researchers have successfully identified many locally adapted genes or loci under environmental selection (39–41).

In addition, tracking the natural selection between human and other species can also promote functional implications of positively selected loci. With high-coverage genome data, researchers successfully identified lots of orthologous genes under positive selection across mammalian or primate genomes (42,43). Apart from genes, many other genomic elements have also been revealed under positive evolutional selection according to inter-species investigation, which include transcription factor binding sites (44), enhancers (45), non-coding DNAs (46) and transposable element-derived fragments (47). These results can efficiently benefit the functional interpretation of shared genomic elements driven by similar adaptive forces between species. Also, it will greatly facilitate the finding of genomic loci, which are selected uniquely during recent human evolution.

It is noticeable that there are many strategies to detect the true selective outliers from the background. For example, the normal range of FST lies between 0–1, but negative values may indicate sampling error, which should be excluded in the following procedure. Traditionally, the empirical FST P-value can be obtained by fitting to genome wide empirical distributions of FST, which are generated from SNPs data. To eliminate the false positive loci from genome scans when using FST, a researcher proposed a hierarchical island model comparing with a simple island model (48). Besides, simulated DNA sequence can also be used to generate neutral distributions to test the probability of a FST without ascertainment biases (49). Another widely used approach is to identify the candidates of selection regions from iHS. It is suggested that raw iHS need to be binned by defined genetic distance first and the variant with derived allele frequency <5% should be removed. Then, a sliding window of 50 SNPs is applied to compute the percentage of SNPs with |iHS| >2. The same strategy is also usually adopted in the processing of XP-EHH. Therefore, many raw statistical values in our database should be rightly fitted to the desired context when distinguishing true signals from noises. Some factors could also influence the sensitivity and specificity of positive selection detection methods. For example, genetic drift can drive a derived allele to fixation, which should be distinguished from selection. Ratnakumar et al. proposed that genes identified as targets of positive selection had a significant tendency to exhibit the genomic signature of GC-biased gene conversion (50). We also identified that the nucleotide substitutions ratio (W->S/S->W) in recent selection dataset of three populations was significantly elevated than that in all genes (Supplementary Methods). Recently, a study showed that pervasive genetic hitchhiking drives the simultaneous emergence of mutational cohorts in yeast (51), and the loss-of-function mutations can contribute to the adaptation of bacteria by rewiring a regulatory or a metabolic network (52). These findings also pointed out new strategies to track the positive selection signals in human populations.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online, including [53–58].

FUNDING

Research Grants Council [781511M] of Hong Kong and NSFC [91229105] of China. Funding for open access charge: NSFC [91229105] of China.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Hurst LD. Fundamental concepts in genetics: genetics and the understanding of selection. Nat. Rev. Genet. 2009;10:83–93. doi: 10.1038/nrg2506. [DOI] [PubMed] [Google Scholar]
  • 2.Sturm RA. Molecular genetics of human pigmentation diversity. Hum. Mol. Genet. 2009;18:R9–R17. doi: 10.1093/hmg/ddp003. [DOI] [PubMed] [Google Scholar]
  • 3.Kwiatkowski DP. How malaria has affected the human genome and what human genetics can teach us about malaria. Am. J. Hum. Genet. 2005;77:171–192. doi: 10.1086/432519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 2004;74:1111–1120. doi: 10.1086/421051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Natural selection has driven population differentiation in modern humans. Nat. Genet. 2008;40:340–345. doi: 10.1038/ng.78. [DOI] [PubMed] [Google Scholar]
  • 6.Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES. Positive natural selection in the human lineage. Science. 2006;312:1614–1620. doi: 10.1126/science.1124309. [DOI] [PubMed] [Google Scholar]
  • 7.Akey JM. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 2009;19:711–722. doi: 10.1101/gr.086652.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155:1405–1413. doi: 10.1093/genetics/155.3.1405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419:832–837. doi: 10.1038/nature01140. [DOI] [PubMed] [Google Scholar]
  • 11.Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol. 2006;4:e72. doi: 10.1371/journal.pbio.0040072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Liu X, Ong RT, Pillai EN, Elzein AM, Small KS, Clark TG, Kwiatkowski DP, Teo YY. Detecting and characterizing genomic signatures of positive selection in global populations. Am. J. Hum. Genet. 2013;192:866–881. doi: 10.1016/j.ajhg.2013.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pavlidis P, Zivkovic D, Stamatakis A, Alachiotis N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol. Biol. Evol. 2013;30:2224–2234. doi: 10.1093/molbev/mst112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Albrechtsen A, Moltke I, Nielsen R. Natural selection and the distribution of identity-by-descent in the human genome. Genetics. 2010;186:295–308. doi: 10.1534/genetics.110.113977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Han L, Abney M. Using identity by descent estimation with dense genotype data to detect positive selection. Eur. J. Hum. Genet. 2013;21:205–211. doi: 10.1038/ejhg.2012.148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Weir BS, Hill WG. Estimating F-statistics. Annu. Rev. Genet. 2002;36:721–750. doi: 10.1146/annurev.genet.36.050802.093940. [DOI] [PubMed] [Google Scholar]
  • 17.Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20:393–402. doi: 10.1101/gr.100545.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005;15:901–913. doi: 10.1101/gr.3577405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, et al. Natural selection on protein-coding genes in the human genome. Nature. 2005;437:1153–1157. doi: 10.1038/nature04240. [DOI] [PubMed] [Google Scholar]
  • 21.Waldman YY, Tuller T, Keinan A, Ruppin E. Selection for translation efficiency on synonymous polymorphisms in recent human evolution. Genome Biol. Evol. 2011;3:749–761. doi: 10.1093/gbe/evr076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fraser HB. Gene expression drives local adaptation in humans. Genome Res. 2013;23:1089–1096. doi: 10.1101/gr.152710.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Grossman SR, Shlyakhter I, Karlsson EK, Byrne EH, Morales S, Frieden G, Hostetter E, Angelino E, Garber M, Zuk O, et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science. 2010;327:883–886. doi: 10.1126/science.1183863. [DOI] [PubMed] [Google Scholar]
  • 24.Park J, Hwang S, Lee YS, Kim SC, Lee D. SNP@Ethnos: a database of ethnically variant single-nucleotide polymorphisms. Nucleic Acids Res. 2007;35:D711–D715. doi: 10.1093/nar/gkl962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cheng F, Chen W, Richards E, Deng L, Zeng C. SNP@Evolution: a hierarchical database of positive selection on the human genome. BMC Evol. Biol. 2009;9:221. doi: 10.1186/1471-2148-9-221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hancock AM, Witonsky DB, Alkorta-Aranburu G, Beall CM, Gebremedhin A, Sukernik R, Utermann G, Pritchard JK, Coop G, Di Rienzo A. Adaptations to climate-mediated selective pressures in humans. PLoS Genet. 2011;7:e1001375. doi: 10.1371/journal.pgen.1001375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu FL, Bonnen PE, de Bakker PIW, Deloukas P, Gabriel SB, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lamason RL, Mohideen MA, Mest JR, Wong AC, Norton HL, Aros MC, Jurynec MJ, Mao X, Humphreville VR, Humbert JE, et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science. 2005;310:1782–1786. doi: 10.1126/science.1116238. [DOI] [PubMed] [Google Scholar]
  • 30.Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC, Chanock SJ, Wang J. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 2012;40:D1047–D1054. doi: 10.1093/nar/gkr1182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li MJ, Sham PC, Wang J. Genetic variant representation, annotation and prioritization in the post-GWAS era. Cell Res. 2012;22:1505–1508. doi: 10.1038/cr.2012.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011;27:718–719. doi: 10.1093/bioinformatics/btq671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis C, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li MJ, Wang LY, Xia Z, Sham PC, Wang J. GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res. 2013;41:W150–W158. doi: 10.1093/nar/gkt456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Casto AM, Feldman MW. Genome-wide association study snps in the human genome diversity project populations: does selection affect unlinked snps with shared trait associations? PLoS Genet. 2011;7:e1001226. doi: 10.1371/journal.pgen.1001266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Turchin MC, Chiang CW, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 2012;44:1015–1019. doi: 10.1038/ng.2368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Daub JT, Hofer T, Cutivet E, Dupanloup I, Quintana-Murci L, Robinson-Rechavi M, Excoffier L. Evidence for polygenic adaptation to pathogens in the human genome. Mol. Biol. Evol. 2013;30:1544–1558. doi: 10.1093/molbev/mst080. [DOI] [PubMed] [Google Scholar]
  • 39.Li JJ, Liu Y, Xin XF, Kim TS, Cabeza EA, Ren J, Nielsen R, Wrana JL, Zhang ZL. Evidence for positive selection on a number of microrna regulatory interactions during recent human evolution. PLoS Genet. 2012;8:1002578. doi: 10.1371/journal.pgen.1002578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Grossman SR, Andersen KG, Shlyakhter I, Tabrizi S, Winnicki S, Yen A, Park DJ, Griesemer D, Karlsson EK, Wong SH, et al. Identifying recent adaptations in large-scale genomic data. Cell. 2013;152:703–713. doi: 10.1016/j.cell.2013.01.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Raj T, Kuchroo M, Replogle JM, Raychaudhuri S, Stranger BE, De Jager PL. Common risk alleles for inflammatory diseases are targets of recent positive selection. Am. J. Hum. Genet. 2013;92:517–529. doi: 10.1016/j.ajhg.2013.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A. Patterns of positive selection in six mammalian genomes. PLoS Genet. 2008;4:e1000144. doi: 10.1371/journal.pgen.1000144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Enard D, Depaulis F, Roest Crollius H. Human and non-human primate genomes share hotspots of positive selection. PLoS Genet. 2010;6:e1000840. doi: 10.1371/journal.pgen.1000840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Arbiza L, Gronau I, Aksoy BA, Hubisz MJ, Gulko B, Keinan A, Siepel A. Genome-wide inference of natural selection on human transcription factor binding sites. Nat. Genet. 2013;45:723–729. doi: 10.1038/ng.2658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Smith JD, McManus KF, Fraser HB. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol Biol Evol. 2013;30:2509–2518. doi: 10.1093/molbev/mst134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R, et al. Forces shaping the fastest evolving regions in the human genome. PLoS Genet. 2006;2:e168. doi: 10.1371/journal.pgen.0020168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Gotea V, Petrykowska HM, Elnitski L. Bidirectional promoters as important drivers for the emergence of species-specific transcripts. PLoS One. 2013;8:e57323. doi: 10.1371/journal.pone.0057323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Excoffier L, Hofer T, Foll M. Detecting loci under selection in a hierarchically structured population. Heredity. 2009;103:285–298. doi: 10.1038/hdy.2009.74. [DOI] [PubMed] [Google Scholar]
  • 49.Ramirez-Soriano A, Calafell F. FABSIM: a software for generating FST distributions with various ascertainment biases. Bioinformatics. 2008;24:2790–2791. doi: 10.1093/bioinformatics/btn531. [DOI] [PubMed] [Google Scholar]
  • 50.Ratnakumar A, Mousset S, Glemin S, Berglund J, Galtier N, Duret L, Webster MT. Detecting positive selection within genomes: the problem of biased gene conversion. Philos. Trans. R Soc. Lond. B Biol. Sci. 2010;365:2571–2580. doi: 10.1098/rstb.2010.0007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lang GI, Rice DP, Hickman MJ, Sodergren E, Weinstock GM, Botstein D, Desai MM. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature. 2013;500:571–574. doi: 10.1038/nature12344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hottes AK, Freddolino PL, Khare A, Donnell ZN, Liu JC, Tavazoie S. Bacterial adaptation through loss of function. PLoS Genet. 2013;9:e1003617. doi: 10.1371/journal.pgen.1003617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Wigginton JE, Cutler DJ, Abecasis GR. A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:887–893. doi: 10.1086/429864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl Acad. Sci. USA. 1979;76:5269–5273. doi: 10.1073/pnas.76.10.5269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Carlson CS, Thomas DJ, Eberle MA, Swanson JE, Livingston RJ, Rieder MJ, Nickerson DA. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 2005;15:1553–1565. doi: 10.1101/gr.4326505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Wright S. Genetical structure of populations. Nature. 1950;166:247–249. doi: 10.1038/166247a0. [DOI] [PubMed] [Google Scholar]
  • 57.Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population-structure. Evolution. 1984;38:1358–1370. doi: 10.1111/j.1558-5646.1984.tb05657.x. [DOI] [PubMed] [Google Scholar]
  • 58.Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using gerp plus. Plos Comput. Biol. 2010;6:1001025. doi: 10.1371/journal.pcbi.1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES