Skip to main content
Molecular Biology and Evolution logoLink to Molecular Biology and Evolution
. 2024 Feb 20;41(3):msae034. doi: 10.1093/molbev/msae034

Adaptive Selection of Cis-regulatory Elements in the Han Chinese

Shuai Liu 1,2,#, Huaxia Luo 3,#, Peng Zhang 4, Yanyan Li 5, Di Hao 6, Sijia Zhang 7,8, Tingrui Song 9, Tao Xu 10,11,, Shunmin He 12,13,✉,c
Editor: Weiwei Zhai
PMCID: PMC10917166  PMID: 38377343

Abstract

Cis-regulatory elements have an important role in human adaptation to the living environment. However, the lag in population genomic cohort studies and epigenomic studies, hinders the research in the adaptive analysis of cis-regulatory elements in human populations. In this study, we collected 4,013 unrelated individuals and performed a comprehensive analysis of adaptive selection of genome-wide cis-regulatory elements in the Han Chinese. In total, 12.34% of genomic regions are under the influence of adaptive selection, where 1.00% of enhancers and 2.06% of promoters are under positive selection, and 0.06% of enhancers and 0.02% of promoters are under balancing selection. Gene ontology enrichment analysis of these cis-regulatory elements under adaptive selection reveals that many positive selections in the Han Chinese occur in pathways involved in cell–cell adhesion processes, and many balancing selections are related to immune processes. Two classes of adaptive cis-regulatory elements related to cell adhesion were in-depth analyzed, one is the adaptive enhancers derived from neanderthal introgression, leads to lower hyaluronidase level in skin, and brings better performance on UV-radiation resistance to the Han Chinese. Another one is the cis-regulatory elements regulating wound healing, and the results suggest the positive selection inhibits coagulation and promotes angiogenesis and wound healing in the Han Chinese. Finally, we found that many pathogenic alleles, such as risky alleles of type 2 diabetes or schizophrenia, remain in the population due to the hitchhiking effect of positive selections. Our findings will help deepen our understanding of the adaptive evolution of genome regulation in the Han Chinese.

Keywords: adaptation, enhancer, promoter, archaic introgression, UV radiation, wound healing

Introduction

Adaptive selection is one of the major contributors to the development and differentiation of genetic background affecting human phenotypes and the susceptibility to diseases between ethnic populations (Chun and Fay 2011; Rees et al. 2020; Benton et al. 2021). For instances, the major allele of TRPM8 helps the North European adapt to cold temperature, simultaneously increases the genetic risk of migraine (Benton et al. 2021), and the adaptive alcohol metabolism in East Asians increases the risk of cancers (Chang et al. 2023). Therefore, studying adaptive selection in population genomes can help us understand the evolutionary direction of phenotypes and disease susceptibility.

Han Chinese are the largest and most geographically representative ethnic group in the world. They are also the most widely distributed and genetically representative group of people in East Asia. East Asia, where the Han Chinese live, has a very different geographical, climatic, historical and cultural environment from Africa and Europe, which shapes the genetic specificity of the Han Chinese. The immune-related HLA genes and the IGH genes, and the alcohol metabolism-related ADH1B and ALDH genes, have been found harnessing their strengths in recent adaptation of the Han Chinese (Chiang et al. 2018; Cong et al. 2022; Luo et al. 2023). However, it remains unclear that how much extent of the genomes are subject to adaptive selection. By virtue of the NyuWa Genome Project, which provides whole-genome sequencing data for sufficiently large-scale Han Chinese samples, we have the opportunity to capture genome-wide selection signals accurately as much as possible and characterize the adaptive landscape of the genomes of Han Chinese. Furthermore, there are still many mysteries about how many cis-regulatory elements (CREs) are involved and how they are involved in the adaptive evolution of the Han Chinese, when we realized that noncoding elements matter in phenotypic development as protein-coding genes (Gingeras 2007; Fraser 2013; Spielmann and Mundlos 2016; Chatterjee and Ahituv 2017; Gallagher and Chen-Plotkin 2018). Due to the small effects of genetic determinants of complex traits and uncertain consequences of mutations at noncoding regions (Pasaniuc and Price 2017), a large number of noncoding variants are hindered from deciphering human adaptations. Benefiting from well-annotated CREs based on the accumulation of epigenomic data (The ENCODE Project Consortium 2012; Kundaje et al. 2015; Stunnenberg et al. 2016; Boix et al. 2021), many genomic CREs are readable to elucidate their worth in human phenotypic adaptation.

There are many approaches to detect adaptive selections in population genomes. Positive selection and balancing selection are two distinct types of adaptive selection on the basis of their effect on allele frequencies (Wu et al. 2017), the former is directional by increasing the population frequency of favored alleles, and is tend to reduce the genetic diversity nearby the favored locus by selective sweep process (Smith and Haigh 1974), while the latter maintains the genetic diversity in the population and the direction of phenotype evolution is in fluctuation dependent of the relative advantages between genotypes (Charlesworth and Willis 2009). As supplementary table S1, Supplementary Material online shows, many methods could identify the selective sweeps with different strategies and emphasis. People could use iHS (Voight et al. 2006), nSL (Ferrer-Admetlla et al. 2014), and XPEHH (Sabeti et al. 2007) to test very recent hard sweeps (i.e. sweeps of new mutations), and could cover more soft sweeps (i.e. sweeps of standing variants) using SDS (Field et al. 2016) and iHH12 (Garud et al. 2015). For earlier selective sweeps, we could apply XPCLR (Chen et al. 2010), Tajima's D (Tajima 1989), and Fst (Meirmans and Hedrick 2011). Finally, the Tajima's D statistics and Beta statistics (Siewert and Voight 2017) are often used to detect the balancing selections in the population.

In this study, taking advantage of large-scale deep whole-genome sequencing data (∼28×) of Han Chinese, we scanned the adaptive selections across the Han Chinese genomes and focused on the phenotype evolution of CREs driven by adaptive selection. Manipulating nine methods (supplementary table S1, Supplementary Material online) to identify adaptive selections, we measured the approximate proportion of genomic regions under adaptive selection in the Han Chinese, and determined the CREs and the biological pathways they are involved in under positive selection and balancing selection. Next, we analyzed the highly archaic-introgressed CREs in the Han Chinese and explained their potential mechanisms for adaptation. Furthermore, we deduced the potential evolutionary tendency of the wound-healing-related CREs, which is a visible type out of pathways enriched by the adaptive selection. Finally, we evaluated the impact of those adaptive variants on nearby disease risk alleles due to hitchhiking effects. Our works describe a fine adaptive selection landscape across genomes of Han Chinese, and provide additional insights to the phenotype evolution driven by adaptive selection of gene regulation, and our analytical framework would also inspire analyzing and evaluating the noncoding mutations participated in adaptation.

Results

Population Genetic Signatures of the Han Chinese

Han Chinese are the largest ethnic group in the world, of which the underlying population structure need to be understood to accurately detect adaptive selections across the human genome. In this project, 4,013 unrelated genome samples of Han Chinese were adopted for population genetic analysis, and almost all of these samples are from China east of the Tengchong–Heihe line (supplementary fig. S1, Supplementary Material online), mainly populated by Han Chinese people. Consistent with previous conclusions (Chiang et al. 2018; Zhang et al. 2021; Cong et al. 2022), our population structure analyses also reveal that there is obvious genetic differentiation between northern Han Chinese and southern Han Chinese (Fig. 1a and b), both in the sites frequency spectrum (SFS) (supplementary fig. S2A, Supplementary Material online) and in the extended haplotype homozygosity (EHH) (supplementary fig. S2B, Supplementary Material online) features. By inference of joint demography of northern Han Chinese and southern Han Chinese, we found an extreme population bottleneck of ancient Han Chinese about 30,000 yr ago (Fig. 1c), which is consistent with the population bottlenecks found in East Asian populations during the Last Glacial Period found by other studies (Terhorst et al. 2017; Okada et al. 2018; Cong et al. 2022; Luo et al. 2023), causing an obviously lowest genetic diversity of East Asian populations (supplementary fig. S3, Supplementary Material online).

Fig. 1.

Fig. 1.

Genetic structure and population history of the Han Chinese. a) The composition of the ADMIXTURE population structure of the East Asian population. The three coloured components represent the three inferred ancestral genetic components. Here, the blue area is proposed as ancestral component 1, the red area is ancestral component 2, and the gold area is ancestral component 3. “KHV”, “CDX”, “CHS”, “CHB”, and “JPT” refer to the Vietnamese, Chinese Dai, southern Han Chinese, northern Han Chinese, and Japanese in 1KGP, while “South Chinese” and “North Chinese” refer to the southern Han Chinese and the northern Han Chinese in the NyuWa cohort, respectively. “Chinese” refers to samples from other regions of China in the NyuWa cohort that are not divided into southern Han Chinese or northern Han Chinese or whose geographic location information is unknown. b) Principal component 1 (PC1) and principal component 2 (PC2) of the PCA results of genome-wide simple variations (SNP, InDel) in the Han Chinese. The orange dots and turquoise dots in the figure represent the northern Han Chinese and the southern Han Chinese in the NyuWa cohort, respectively. c) Inference of the historical effective population size of the Han Chinese. The orange-red line and green line in the figure represent the northern Han Chinese and the southern Han Chinese in the NyuWa cohort, respectively.

It is known that population structure may cause interferences on identifying adaptive selections, thus we evaluated the impact of sample size and population structure on detecting adaptive selection with designed experiments. By six repeat tests with the chosen iHS and Tajima's D each as the representations of the EHH-based methods and the SFS-based methods of identifying adaptive selections, we found that the statistical powers of both the EHH-based method and the SFS-based method are negatively correlated with the sample size in general (supplementary fig. S4, Supplementary Material online), and the population structure of the samples would significantly decrease their statistical powers in detecting adaptive selections (supplementary fig. S5, Supplementary Material online). Therefore, it is necessary for us to conduct adaptive selection tests on the northern Han Chinese and southern Han Chinese, respectively.

The Genomic Region under Adaptive Selection in the Han Chinese

To identify the genomic regions that are significantly subject to adaptive selection in the Han Chinese, no matter in the northern Han Chinese or in the southern Han Chinese, we adopted eight statistics to capture the footprints of positive selections, including SDS (Field et al. 2016), iHS (Voight et al. 2006), nSL (Ferrer-Admetlla et al. 2014), iHH12 (Garud et al. 2015), Tajima's D (Tajima 1989), XPEHH (Sabeti et al. 2007), XPCLR (Chen et al. 2010), Fst (Meirmans and Hedrick 2011), and combined Beta (Siewert and Voight 2017) and Tajima's D to trace the balancing selections in the north Han Chinese and south Han Chinese, respectively (supplementary fig. S6 to S17, Supplementary Material online; see Methods for details), and we concatenated the genomic regions significantly under the adaptive selection in either northern Han Chinese or southern Han Chinese together, because these methods focus on adaptive selection detection from different aspects (supplementary table S1, Supplementary Material online). In total, there are more than 354 megabase (Mb) genomic regions (12.34% of the human autosomes) under the influence of adaptive selection in the Han Chinese (Fig. 2a and b, supplementary table S2, Supplementary Material online), of which positive selection acts on 11.4% of genomic regions, close to the 10% of positively selected genomic regions described in modern humans previously (Hernandez et al. 2011). Positive selections are more common than balancing selections, and there is a very small proportion of genomic region (0.04% of the genome, including HLA region on chromosome 6p21.3) that are under long-term balancing selection and undergone selective sweeps in the recent past (Fig. 2b). For those positive selection occurred in recent 20,000 years (supported by iHS, nSL, or iHH12) (Grossman et al. 2013), the Han Chinese occupies around 9% of genomic regions, and the Europeans have no more than 10% sharing 53% of recent positive selections of Han Chinese, while the Africans have more than 20% sharing 64% of recent positive selections of Han Chinese (Fig. 2c). In those genomic regions under the influence of the adaptive selections in the Han Chinese, we noticed that approximate 15.9% protein-coding regions are enriched, while 15.6% of promoters and 14.0% of enhancers are enriched (Fig. 2d), this discovery follows the logic that protein-coding genes are the most conserved, promoters are second, and enhancers are relatively more relaxed. By calculating the composite of these multiple selection signals, we found that the stable long-term selections are located in the protein-coding regions (supplementary fig. S17, Supplementary Material online), which suggests that the adaptive selections driven by the CREs are relative weak, and are unstable and easy disturbed by the environmental changes.

Fig. 2.

Fig. 2.

Genomic regions under the driving force of the adaptive selection in the Han Chinese. a) The genomic regions under adaptive selection in the Han Chinese. Positive selection (orange), balancing selection (purple), under balancing selection and positive selection (red). b) The proportion of genomic regions under the influence of different types of adaptive selections. c) The proportion of genomic regions under recent positive selection (iHS, nSL, and iHH12) in different populations. The labels on the top of the bars are the percentage of genomic regions under selection. d) The percentage of enhancers, promoters, and protein-coding elements (CDSs in the gencode v39 database) under selection. The labels on the top of the bars are the number of genomic elements under selection. These genomic elements with 90% of their length located in adaptive regions are count.

Candidate Positively Selected CREs in the Han Chinese

To fine mapping the CREs causal for the adaptation of Han Chinese, we focused on those CREs that are supported by at least one selection signal and are conserved across primate species. From the result, approximately 1.16% of enhancers and 1.91% of promoters are under positive selection, with the positively selected promoters twice as enriched as the positively selected enhancers (Fig. 3a). This indicates that promoters have a higher correlation with positive selection of the Han Chinese compared to enhancers, and it might be as a result of that large amounts of redundant enhancers providing phenotypic robustness to the environmental pressures (Osterwalder et al. 2018). Nonetheless, there is no significant difference in singleton enrichment between promoters and enhancers under positive selection (Fig. 3b), which suggests that there is little difference in the evolutionary pressures they undergone. To identify mutations that lead to adaptive changes in these positively selected CREs, we focus on the mutations that have significant selection scores and function as an expression quantitative trait locus (eQTLs) or a splicing quantitative trait locus (sQTLs). In total, there are 1,808 adaptive eQTLs and sQTLs found in the 1,745 positively selected CREs (supplementary tables S3 and S4, Supplementary Material online), most of these CREs occupy only one adaptive xQTL (eQTL or sQTL), without considering regulatory elements without adaptive xQTL (supplementary fig. S18, Supplementary Material online).

Fig. 3.

Fig. 3.

Candidate positively selected CREs in the Han Chinese. a) The number and the percentage of CREs under positive selection in the Han Chinese. The chi-square test reveals a significant difference (P-value = 3.17e−57) in genome-wide enrichment of positively selected enhancers and promoters. b) The evolutionary force (see Methods for details) differences between enhancers and promoters under positive selection. The Mann–Whitney U test is used to perform statistical tests. c, d) The proportion of positively selected CREs under different evolutionary pressure. c) Enhancers. d) Promoters.

To systematically understand the potential phenotypes involved in adaptation to survival pressures, we performed gene ontology (GO) enrichment analysis for the positively selected CREs in the Han Chinese by tissue (see Methods for details). There are many biological processes involved in the adaptation of Han Chinese under positive selection, including cell–cell adhesion, negative regulation of viral life cycle, positive regulation of osteoblast differentiation, response to retinoic acid, etc. (Table 1, supplementary table S5, Supplementary Material online). Among them, the cell–cell adhesion process is found to involve promoters and enhancers in a wide range of tissues, and is related to a number of important systematic functions such as the embryo development and the tissue reconstruction, which might help the Han Chinese to adapt to environmental changes.

Table 1.

Significant GO biological process of CREs active in tissues under the positive selection in the Han Chinese (only GO items with tissue number not less than 10 are shown, see supplementary table S5, Supplementary Material online for full version)

GO term GO annotation CREs Tissue number
GO:2000049 Positive regulation of cell–cell adhesion mediated by cadherin Enhancer 20
GO:1903901 Negative regulation of viral life cycle Enhancer 20
GO:0060026 Convergent extension Enhancer 19
GO:0071300 Cellular response to retinoic acid Enhancer 18
GO:0032526 Response to retinoic acid Enhancer 13
GO:0035357 Peroxisome proliferator activated receptor signaling pathway Enhancer 12
GO:0019372 Lipoxygenase pathway Enhancer 10
GO:0045669 Positive regulation of osteoblast differentiation Promoter 111
GO:2000049 Positive regulation of cell–cell adhesion mediated by cadherin Promoter 87
GO:1903265 Positive regulation of tumor necrosis factor-mediated signaling pathway Promoter 87
GO:0048536 Spleen development Promoter 27
GO:0060346 Bone trabecula formation Promoter 11
GO:2000736 Regulation of stem cell differentiation Promoter 10

We further evaluated differences in mutational constraint between the promoters and enhancers under the positive selection in the Han Chinese. A higher mutational constraint score indicates a lower observed mutation rate than expected, resulting in a slower speed of evolution. Conversely, a lower mutational constraint score indicates a higher observed mutation rate than expected, resulting in a faster speed of evolution (Bergeron et al. 2023; Chen et al. 2024). The data shows that approximately 18.8% of enhancers and 19.8% of promoters that were positively selected CREs are under high mutational constraint (Fig. 3c and d). This suggests that they have been evolving slowly in the human population since the late Pleistocene. In contrast, only a small percentage of enhancers (0.4%) and promoters (0.5%) are under fast evolution (Fig. 3c and d).

The Neanderthal-derived Enhancer under Recent Positive Selection Promote UV Radiation Resistance

To explore the phenotypic effects of the archaic introgression on the modern humans, we identified the neanderthal and denisovan genetic introgressions in the NyuWa Han Chinese population. The neanderthal introgression rate in the Han Chinese individuals ranges from 1% to 3.6%, while the denisovan introgression rate in the Han Chinese ranges from 0.03% to 0.26% (supplementary fig. S19, Supplementary Material online). In contrast, the neanderthal introgressions in the Han Chinese exhibit a significantly higher mean and variance than the denisovan introgressions (supplementary fig. S19, Supplementary Material online). Although the archaic introgressions account for only a very small fraction of modern human genomes, they are highly heterogeneous between individuals for that neanderthal introgressions cover more than 40% of the autosomal region in the Han Chinese genomes, and the denisovan introgressions cover approximately 7.8% of the autosomal region (supplementary fig. S20, Supplementary Material online). This high heterogeneity is largely due to the variability of archaic introgressions across different regions of the genome (Fig. 4a). We focus on the genomic regions that are highly introgressed with archaic segments (Fig. 4a, supplementary fig. S21, supplementary table S6, Supplementary Material online), because they are in high frequency in the Han Chinese which could be driven by recent positive selections. Few CREs located in the chromosome 3p13.11 region (Table 2), responsible for the transcriptional regulation of HYAL genes encoding hyaluronidase (Averbeck et al. 2007; Ruszová et al. 2014), are found to be highly neanderthal-introgressed and positively selected, and participated in the biological processes involved in the catabolic process and the cellular response to UV-B (Table 3). HYAL2 has been reported in neanderthal introgression of East Asians with a latitude adaptation for the strength of UV radiation (Deng and Xu 2017). Additionally, we discovered positively selected CREs of the HYAL1 inherited from neanderthals in the Han Chinese, which have obviously higher level of population introgression rates in East Asians than in non-East-Asian populations (Fig. 4b). There are four eQTLs identified: 3:50185260T > C, 3:50241285G > C, 3:50274469T > C, and 3:50330436C > A (in the direction of the ancient allele to the derived allele). The frequency of the neanderthal alleles 3:50274469T and 3:50330436C is much higher in East Asians than in non-East-Asian populations. This is consistent with the distribution of neanderthal introgression rates among populations (Fig. 4b), indicating that these two ancient neanderthal alleles are favored in East Asian populations.

Fig. 4.

Fig. 4.

Adaptive CREs derived from archaic introgression in the Han Chinese. a) The neanderthal introgressions rate and the denisova introgressions rate across the human genome in the Han Chinese. The upper dashed line marks the significant threshold for the neanderthal introgressions rate, and the lower dashed line marks the significant threshold for the denisova introgressions. The red circle marks the positively selected CREs overlapping with the highly introgressed archaic alleles in the Han Chinese. b) The introgression rates and allele frequencies of the four polymorphic sites located in the positively selected enhancers with high neanderthal introgression rate in 1KGP populations. Upper: Neanderthal introgression rates; Bottom: allele frequencies of ancestral alleles.

Table 2.

The positively selected neanderthal enhancers in the Han Chinese and their regulated genes

CHR Start End CRE types Regulated genes in EpiMap
chr3 50185062 50185320 Enhancer CYB561D2, GNAI2, NAA80, RASSF1, SEMA3F, TUSC2, UBA7
chr3 50241220 50241400 Enhancer APEH, C3orf18, CACNA2D2, CAMKV, CYB561D2, DOCK3, GNAI2, HEMK1, HYAL1, HYAL2, HYAL3, IFRD2, NAA80, NPRL2, RASSF1, RBM5, SEMA3F, TRAIP, TUSC2, UBA7
chr3 50274380 50274560 Enhancer AMIGO3, APEH, C3orf62, CAMKV, CYB561D2, GMPPB, GNAI2, HEMK1, HYAL3, IFRD2, IP6K1, LSMEM2, MAPKAPK3, NAA80, NPRL2, RASSF1, RBM5, RBM6, SEMA3B, TRAIP, TUSC2
chr3 50330227 50330833 Enhancer CACNA2D2, CISH, CYB561D2, HYAL1, HYAL2, NAA80, NPRL2, RASSF1, RBM5, UBA7, ZMYND10

Table 3.

Significant GO biological process of the positively selected neanderthal enhancers in the Han Chinese

GO biological process Fold enrichment Raw P-value FDR
Cellular response to UV-B (GO:0071493) >100 1.38e−6 4.30e−3
Hyaluronan catabolic process (GO:0030214) >100 1.99e−8 3.09e−4
Response to UV-B (GO:0010224) 94.88 6.10e−6 1.58e−2
Glycosaminoglycan catabolic process (GO:0006027) 94.88 1.46e−7 1.13e−3
Hyaluronan metabolic process (GO:0030212) 91.61 1.66e−7 8.59e−4
Aminoglycan catabolic process (GO:0006026) 78.14 2.98e−7 1.16e−3
Mucopolysaccharide metabolic process (GO:1903510) 29.19 1.21e−5 2.68e−2

To explain the adaptive mechanisms for the neanderthal-derived enhancers under positive selection, we evaluated the effects of the favored alleles by combining transcriptomics and epigenetics features. The two favored neanderthal-derived alleles of the adaptive enhancers are in high genetic linkage (r2=0.991; D=0.999)) (supplementary fig. S22, Supplementary Material online) and have similar transcriptional regulatory effects on their targeted genes (Fig. 5a), which both neanderthal alleles of them have significantly lower expression levels for HYAL1 in the skin, and they also are associated with a significantly higher expression level for HYAL3 in the thyroid, as well as a significantly higher expression level for ZMYND10 in the ovary (Fig. 5a). To identify the causal eQTL for adaptation, we evaluated the influence of mutations on enhancer activity and transcription factor (TF) binding. We found that mutation 3:50274469C > T (in the direction of the derived allele to the neanderthal allele) greatly reduced the activity level of CCCTC-binding factor (CTCF) binding, and caused an extreme loss of TF binding for CTCF, Structural Maintenance of Chromosomes 3 (SMC3), and RAD21 Cohesin Complex Component (RAD21) (Fig. 5b and c). The CTCF, SMC3, and RAD21 are essential TFs for remote transcriptional regulation mediated by chromatin interaction. Therefore, we deduce that the mutation 3:50274469C > T could have a very strong effect on phenotypic fitness by cutting-off the interaction of the enhancer to the HYAL1 gene. In contrast, the mutation 3:50330436A > C (in the direction of the derived allele to the neanderthal allele) does not affect the activity level of the enhancer or the affinity of TF binding (Fig. 5b and c). Therefore, we speculate that this mutation is not actually an eQTL, but rather a result of being highly linked to 3:50274469C > T, or it may function in unknown regulatory mechanisms. The decrease in HYAL1 gene expression in the skin can be explained by the loss-of-function of chromatin loop-dependent TF-binding sites such as CTCF. However, this explanation does not account for the increase in the gene expression levels of HYAL3 in the thyroid and ZMYND10 in the ovary (Fig. 5a). It is possible that there are additional regulatory mechanisms responsible for their upregulated expression. We speculated that the upregulated expression of ZMYND10 in the ovary may promote the development and maturation of egg cells by facilitating the process of meiosis, and this may confer fitness advantages, as the ZMYND10 is involved in the assembly and synthesis of ciliary dynein, and mutations in this gene can cause primary ciliary dyskinesia and poor sperm motility (Moore et al. 2013; Mali et al. 2018).

Fig. 5.

Fig. 5.

The consequences of the adaptive mutations in the adaptive CREs derived from neanderthal introgression. a) The relative expression level between genotypes for eQTLs on their targeted genes. There are two adaptive eQTLs in the neandertal-derived adaptive enhancers, 3:50274469T > C, 3:50330436C > A, where the 3:50274469T and 3:50330436C are derived from neanderthal. The relative expression level between genotypes and the P-value of expression level between alleles are referred from GTEx database. b) Changes in activity level of the promoter characteristic, enhancer characteristic or CTCF binding characteristic predicted by the DeepSEA Sei-framework after the mutations occur. The differences are calculated by the predicted probability of the reference allele and the alternative allele for a regulatory feature (paltpref). c) The level of loss of function or gain of function of mutation on the TF-binding sites. The scores are predicted by FABIAN-variant (Steinhaus et al. 2022). ‘Likely gain’ indicates the extent to which TF binding affinity may be gained, while ‘Likely loss’ indicates the extent to which TF binding affinity may be lost.

In order to estimate the allele frequency history and evolutionary direction of the adaptive enhancer, we used an approximate full-likelihood method based on the haplotype data where the adaptive mutation is located (Stern et al. 2019). We found that the frequency of 3:50274469T in the Han Chinese began to increase under positive selection about 200 generations (5,600 yr) ago (supplementary fig. S23, Supplementary Material online), and the selection coefficient was approximately 0.052. Based on Fisher and Wright's theory, a mutation with a selection coefficient s that satisfies |2Ns| < 1 (where N is the effective population size) can be considered neutral, that is, it is not affected by natural selection and is only affected by genetic drift (Nei et al. 2010). Ohta's near-neutral theory states that a mutation can be considered nearly neutral if the selection coefficient s satisfies 0.2 < |2Ns| < (Ohta 1973). Therefore, the 3:50274469C > T satisfied that 2 × 20,000 × 0.05 = 2,080 >> 4, indicating that this polymorphic site has been subject to very strong positive selection, leading to the genetic differentiation between Han Chinese and Europeans or Africans (Fig. 4b). Additionally, it seems that these haplotypes carrying favored alleles have undergone multiple expansion events in the evolutionary history of the Han Chinese, based on the phylogenetic tree of these haplotypes (supplementary fig. S24, Supplementary Material online).

The Positively Selected CREs Were Prone to Inhibit Coagulation and Promote Angiogenesis and Wound Healing

Wound healing is a potentially adaptive phenotype involving cell–cell adhesion processes (Khain et al. 2007). Wound healing involves a series of cellular events, including hemostasis, inflammation, proliferation, and dermal remodeling, and plays an important role in reducing the risk of infection, anemia, and shock (Wilkinson and Hardman 2020). We have known that the intercellular adhesion processes play an important role in the adaptive evolution of the Han Chinese (Table 1). Of these biological processes, wound healing was promoted by the transcriptional downregulation of HYAL genes involved in the production of hyaluronidase (Fig. 5), since hyaluronic acid can combine with fibrin, deposit to form a structural matrix at the wound site, and can also stimulate the production of collagen to promote wound healing and repair (Garg and Hales 2004; Aya and Stern 2014; Graça et al. 2020). By co-localizing the positively selected CREs with genes in wound healing-related pathways, we further found a few adaptive CREs targeting genes VKORC1, PRSS53, METAP1, and HPSE that play key roles in wound healing.

Recent positive selections have impacted variants of CREs involved in wound healing in the Han Chinese. The VKORC1 gene is responsible for reducing inactive vitamin K 2,3-epoxide to active vitamin K, which is a required cofactor for blood coagulation enzymes (Rost et al. 2004). The PRSS53 gene encodes a serine protease that can degrade pro-urokinase to fibrin monomer, which is the basis for thrombus formation (Cal et al. 2006). The mutation 16:31093557G > A on the promoter acts as eQTL and reduces the expression levels of the genes VKORC1 and PRSS53 (Fig. 6a and b). This reduction in expression levels could also be reflected in the predicted decrease in the activity level of the promoter (Fig. 6c). The METAP1 gene has potential functions involved in angiogenesis, and its inhibitor could be applied to repress tumors (Hu et al. 2006). There are three genetically linked sQTLs in the adaptive enhancers of METAP1: 4:98994721C > T, 4:99029266A > G, and 4:99059904C > T. These sQTLs function to increase the intron-excision ratio of transcripts (Fig. 6d and e) by promoting the activity level of the splicing enhancers, which are a type of enhancer that recruits spliceosomes (Jobbins et al. 2018) (Fig. 6f). HPSE encodes heparanase that is known to promote angiogenesis and wound healing (Zcharia et al. 2005; Wang et al. 2020). The eQTLs 4:83316705A > G, 4:83316723T > C, and 4:83317947C > G, located in the adaptive enhancers of HPSE, show a decrease in the expression level of the HPSE gene in blood (Fig. 6g and h). Only the 4:83316705A > G consistently affects epigenetics and transcriptomics (Fig. 6i), making it the likely causal variant for adaptation.

Fig. 6.

Fig. 6.

Evolutionary trends of CREs related to coagulation and wound healing in the Han Chinese. a to c) The functional consequences of adaptive mutation in promoter chr16:31093420-31093645, which regulates the genes VKORC1 and PRSS53. b) The relative expression level between genotypes for the eQTL 16:31093557G > A, referred from the GTEx database. c) Changes in activity level of the promoter characteristic, enhancer characteristic or CTCF binding characteristic predicted by the DeepSEA Sei-framework after the mutation 16:31093557G > A occurs. d to f) The functional consequences of adaptive mutations in enhancers chr4:98994604-98994780, chr 4:99029140-99029300, and chr4:99059710-99059937, which regulate the genes METAP1. e) The relative intron–excision ratio between genotypes for the sQTLs 4:98994721C > T, 4:99029266A > G, 4:99059904C > T, referred from the GTEx database. f) Changes in activity level of the promoter characteristic, enhancer characteristic or CTCF binding characteristic predicted by DeepSEA Sei-framework after the mutation 4:98994721C > T, 4:99029266A > G, 4:99059904C > T occur. g to i) The functional consequences of adaptive mutations in enhancers chr4:83316540-83316780 and chr4:83317760-83317960, which regulate the gene HPSE. h) The relative expression level of HPSE between genotypes for the eQTLs 4:83316705A > G, 4:83316723T > C, and 4:83317947C > G, referred from the GTEx database. i) Changes in activity level of the promoter characteristic, enhancer characteristic or CTCF binding characteristic predicted by the DeepSEA Sei-framework after the mutations 4:83316705A > G, 4:83316723T > C, and 4:83317947C > G occur. j) The allele frequencies of 16:31093557A, 4:98994721T, and 4:83316705G in the NyuWa Han Chinese and in the populations in 1KGP.

Since the evolutionary direction of the human phenotypes is a fundamental question in human genetics, we further analyzed the potential evolutionary direction for these wound healing-related adaptive CREs. By comparing the allele frequencies in different populations worldwide (Fig. 6j) and inferring historical trajectories of allele frequencies (supplementary fig. S25A, Supplementary Material online), it was determined that the 16:31093557A (derived allele) was favored. The 16:31093557A has gradually increased with a selection coefficient of 0.00012 (2 × 20,000 × 0.00012 = 4.8 > 4, indicating a weak selection) since nearly 10,000 yr ago (supplementary fig. S25A, Supplementary Material online) under the recent positive selection. Because this adaptive variant results in reduced expression level of the VKROC1, we infer that positive selection on this adaptive variant causes suppressed coagulation in East Asian populations, which is consistent with the fact that East Asians require smaller doses of antithrombotic drugs (Kaye et al. 2017). And this genetic loci is a known target (“NM_024006.6(VKORC1):c.174-136C > T” recorded in the ClinVar database) for drugs including warfarin. Similarly, we determined that the 4:99029266G (derived allele) was positively selected due to its high frequency in East Asian populations and a low frequency in other populations (Fig. 6j). It is inferred that haplotypes carrying 4:99029266G undergone a rapid increase during the recent 10,000 yr, with a selection coefficient of 0.0014 (2 × 20,000 × 0.0014 = 56 > 4) (supplementary fig. S25B, Supplementary Material online). Based on the fact that inhibitors of this gene product are good antiangiogenic drugs (Hu et al. 2006), we infer that the increase in METAP1 gene expression efficiency driven by positive selection is beneficial to promoting angiogenesis. Although there is no significant genetic differentiation between populations for 4:83316705A and 4:83316705G (Fig. 6j), the 4:83316705A (ancestral allele) has a much higher frequency in the population compared to the ancestral one, and it has been increasing in the population with a selection coefficient of 0.00046 (2 × 20,000 × 0.00046 = 18.4 > 4) over the last 1,000 generations of the trajectory of allele frequency changes (supplementary fig. S25C, Supplementary Material online), so that, to some extent, we can consider the 4:83316705A as the favored allele. Based on the functional experiments of heparinase (Zcharia et al. 2005; Wang et al. 2020), we infer that the favored enhancers of HPSE in the Han Chinese promote angiogenesis and wound healing. These wound healing related adaptive variants have shaped the unique genetic pattern of East Asians (Fig. 6j), with multiple potential selective sweeps during the phylogenetic process (supplementary fig. S26 to S28, Supplementary Material online).

The Adaptive Evolution of the CREs under Balancing Selection

Balancing selection is a type of adaptive selection that is crucial for maintaining genetic diversity in populations. Combining Beta and Tajima's D statistics, we found that 0.19% of enhancers and 0.09% of promoters are under balancing selection, where the enhancers are three times more enriched than promoters (Fig. 7a), which is the opposite of the difference between enhancer and promoter enrichment under positive selection (Fig. 3a). Additionally, there is no significant difference in singleton enrichment between the promoters and the enhancers under the balancing selection (Fig. 7b), as well as the singleton enrichment of CREs between under balancing selection and under positive selection (Fig. 7c). The SFS of these CREs under balancing selection indicates that the allele frequencies of the loci within these CREs are deviated to moderate frequencies in the population (Fig. 7d), which is typical of genomic regions under balancing selection. In addition to the known HLA genes responsible for the antigen presentation and immunity processes (Radwan et al. 2020), we performed GO enrichment analysis to understand the reason for the CREs under balancing selection. The analysis revealed that these CREs play an important role in maintaining the homeostatic state of an organism, including immunomodulatory mediated by activation of NF-κB-inducing kinase activity (Thu and Richmond 2010), regulation of apoptosis, response to mechanical stimulus or chemical stimulus, etc. (Table 4, supplementary tables S7 and S8, Supplementary Material online). It could be seen that the biological processes involved in balancing selection help individuals cope with complex and changing environmental conditions.

Fig. 7.

Fig. 7.

The CREs under balancing selection in the Han Chinese. a) The proportion CREs under the balancing selection. b) The difference on evolutionary force (see Methods for details) between the enhancers and promoters under balancing selection. The Mann–Whitney U test is used to perform statistical tests. c) The difference on evolutionary force in CREs between under balancing selection and under positive selection. The Mann–Whitney U test is used to perform statistical tests. d) The SFS in CREs under balancing selection compared to neutral regions. The main figure compares all the SNVs in the CREs under balancing selection and neutral selection, while the inset one compares common SNVs with MAF greater than 0.05 in the CREs under balancing selection and neutral selection. We selected 2,000 loci randomly in the neutral regions, which is close to the number of loci in the CREs under balancing selection.

Table 4.

Significant GO biological process of CREs active in tissues under the balancing selection in the Han Chinese (only GO items with tissue number not less than 40 are shown, see supplementary table S7, Supplementary Material online for full version)

GO term GO annotation CRE types Tissue Number
GO:0002503 Peptide antigen assembly with MHC class II protein complex Enhancer 81
GO:0002381 Immunoglobulin production involved in immunoglobulin-mediated immune response Enhancer 79
GO:0050870 Positive regulation of T cell activation Enhancer 70
GO:0007250 Activation of NF-kappaB-inducing kinase activity Enhancer 69
GO:0036462 TRAIL-activated apoptotic signaling pathway Enhancer 69
GO:0001916 Positive regulation of T cell mediated cytotoxicity Enhancer 46
GO:0002483 Antigen processing and presentation of endogenous peptide antigen Enhancer 46
GO:0002468 Dendritic cell antigen processing and presentation Enhancer 43
GO:0019886 Antigen processing and presentation of exogenous peptide antigen via MHC class II Enhancer 42
GO:0071260 Cellular response to mechanical stimulus Enhancer 41
GO:0032831 Positive regulation of CD4-positive, CD25-positive, alpha-beta regulatory T cell differentiation Enhancer 40

Many Pathogenic Alleles Remain in the Population due to the Hitchhiking Effect

To investigate the impact of positive selection of adaptive CREs on disease risk alleles, we conducted linkage analysis for the favored alleles and their nearby potential pathogenic alleles within a 1 Mb radius. Our findings indicate that nearly half of the likely pathogenic alleles from the ClinVar database tend to be maintained in the population, due to the hitchhiking effect caused by positive selection (supplementary fig. S29, Supplementary Material online). Similar trends were observed in disease-related loci from the GWAS Catalog (Fig. 8a to f). Furthermore, irrespective of the strength of the hitchhiking effect (|ρ|>0.8: highly linked; 0.5<|ρ|0.8: intermediate linked; 0.2<|ρ|0.5: slightly linked), Complex diseases such as Type 2 diabetes, Schizophrenia, Asthma, Neuroticism, Coronary artery disease, and cancer have the largest number of risk genetic factors affected by the hitchhiking effect (Fig. 8a to f). This is directly associated with the number of risky loci for the complex disease in the human genome, because the number of genetic risk loci for these complex diseases is also the largest among all diseases (supplementary fig. S30, Supplementary Material online). These results present that positive selections in the Han Chinese speed up the elimination of half of the pathogenic variants from the population, while also slowing down the rate at which the other half of the pathogenic variants are eliminated. This sheds light on the evolutionary reasons why these complex diseases are prevalent in the human population.

Fig. 8.

Fig. 8.

The number of disease-related alleles affected by the hitchhiking effect in the Han Chinese. Only top 10 are shown. a to f) The disease-related alleles whose correlation coefficient (ρ) with adaptive variants. a) ρ>0.8. b) ρ<0.8. c) 0.8>ρ>0.5. d) 0.8<ρ<0.5. e) 0.5>ρ>0.2. f) 0.5<ρ<0.2.

Discussion

Detecting CREs under adaptive selection is important work for illustrating the adaptive evolution at the transcriptional regulation level. This work reveals that approximately 12% of the genomic region in the Han Chinese is affected by adaptive selection. Specifically, a large number of CREs under positive selection are related to cell–cell adhesion. Notably, a neanderthal-derived enhancer of HYAL1 reduces the expression level of hyaluronidase, which may aid Han Chinese and East Asians in resisting the damage of UV radiation, and protecting against skin cancer (Arnold et al. 2022). Considering the synergistic effect of skin color adaptation to UV radiation (Jablonski and Chaplin 2010), we propose the hypothesis that, as the lighter skin color of Eurasians diminishes their resistance to UV radiation, individuals in East Asian populations who have inherited favorable haplotypes of the HYAL gene from neanderthals predominate in the population. Additionally, we analyzed the CREs involved in the wound healing processes, and discovered their adaptive function in enhancing the wound healing abilities of the Han Chinese. We infer that the Han Chinese's need for rapid wound healing stems from the dramatic increase in population density over the past 10,000 yr (Fig. 1c), which have led to frequent wars and conflicts between tribes or groups. There is also an unexpected finding that the positively selected VKORC1 reduces blood clotting ability in East Asians. Considering the lower incidence of thrombus in East Asians than other ethnic groups (Liao et al. 2014), we speculated that inhibited coagulation ability under positive selection is one of the genetic factors reducing thrombus risk of East Asians, despite that it is not clear for environmental factors driven the positive selection of coagulation ability.

Currently, a deficiency exists in that most of the enhancers we referred to were predicted from the epigenomic data, such as histone modifications and chromatin accessibility. Additionally, the associations between enhancers and genes were also predicted from the epigenomic data and transcriptomic data (Boix et al. 2021), leading to a small but inevitable deviation in the identification of truly target genes. For instance, the enhancer (chr3:50274380-50274560) is not targeted toward HYAL1, however, we found the eQTL (3:50274469T > C) in the GTEx database, which demonstrates a genuine regulatory relationship between them. Genomic data, along with supporting epigenome data, transcriptome and other omics data, are of great demand and useful for future research in human genetics, including evolutionary genetics and molecular genetic analysis of human phenotypes. As the direct target of natural selection is the phenotypes of individuals, it is important to utilize omics data from individuals in specific geographical environments to gain a better understanding of the molecular mechanisms behind organismal adaptation. This includes changes above the genomic level, such as mutations that alter TF-binding affinity, histone modifications, or chromatin interactions. In addition, supporting multi-omics data can minimize bias and improve the accuracy of identifying genome interactions and gene regulatory relationships.

The trade-offs between adaptive evolution and disease susceptibility are intriguing problems to explore and discuss. The interaction and co-evolution among multiple forces of natural selection lack quantitative models for analysis. To address these issues, attention must be paid to several factors: firstly, accurately localizing favored alleles; secondly, accurately estimating of selection intensity; thirdly, precisely identifying the causal risky loci for the disease; and fourthly, quantitatively measuring the interaction of disease risk genes and adaptive selection genes in populations. Due to the limitations of currently available data, and the immaturity of research methods, further effort is required to deepen the study of disease epidemiology.

Materials and Methods

Genome-wide Genetic Variations Resource

In this study, millions of high-quality single nucleotide variants (SNVs) and insertions and deletions (InDels) of 4,013 unrelated Han Chinese individuals were updated from the early NyuWa Genome Resource and followed the same pipeline of variant-calling, quality control, phasing and haplotype reconstruction (Zhang et al. 2021). To provide a comparison, we downloaded haplotypes of 2,504 unrelated individuals from the 1,000 Genome Project (1KGP) (Byrska-Bishop et al. 2022) and haplotypes of 893 unrelated individuals from the Human Genome Diversity Project (HGDP) (Bergström et al. 2020). Besides, a reciprocal imputation strategy (Huang et al. 2015) was used to integrate haplotypes panels from NyuWa, 1KGP, and HGDP.

Population Structure and Demographic History Analysis

To avoid interference in detecting adaptive selection, it is important to consider the population structure of genome samples. Previous studies have reported the genetic differences between northern and southern Han Chinese in previous studies (Chiang et al. 2018; Zhang et al. 2021; Cong et al. 2022). Therefore, we examined the genetic structure of the samples in this study. First, we used ADMIXTURE 1.3 (Alexander et al. 2009) to characterize the genetic structure of 4,013 unrelated samples (northern, southern and other Han Chinese, described as supplementary table S9, Supplementary Material online) in the NyuWa cohort referring to 504 unrelated East Asian samples in the 1KGP. We apply PLINK (Purcell et al. 2007) to remove linkage disequilibrium (LD) by excluding high LD region (Anderson et al. 2010), and thinning the single nucleotide polymorphisms (SNPs) to at least 2 kb apart. Only the SNPs with minor allele frequency (MAF) greater than 0.01 were used as the input of ADMIXTURE, and the ancestor number (K) was defined as 3 (Zhang et al. 2021). Second, we also performed a principal components analysis (PCA) for deconstructing population structure, following the same SNPs filtering pipeline as in ADMIXTURE analysis.

We applied PCA in the detection of batch effects between cohorts. To ensure a balanced sample size between the Han Chinese from the NyuWa Genome Project and the populations from 1KGP and HGDP, we downsampled the NyuWa Han Chinese to achieve comparability. After careful consideration, we randomly selected 200 samples from the 4,013 NyuWa Han Chinese individuals. As a result, the different geographical populations are well distinguished in the PC1 and PC2 (supplementary fig. S31, Supplementary Material online). This indicates that the batch effect is negligible when compared to the differences between intercontinental populations. To further distinguish any genetic bias occurred in genome-wide variants between northern Han Chinese in the NyuWa Genome Project and CHB in 1KGP, and between southern Han Chinese in the NyuWa Genome Project and CHS in 1KGP, we calculated the inbreeding coefficient of these group pairs. The genetic biases between the two cohorts are were found to be located in ENCODE blacklist regions (supplementary fig. S32, supplementary table S10, Supplementary Material online), which were removed without affecting further analysis.

Demographic history plays a crucial role in the adaptive evolution of human populations. Therefore, we estimated the joint demography of northern and southern Han Chinese by using SMC++ (Terhorst et al. 2017), to reconstruct the population size history of Han Chinese. Two high-depth WGS samples (WGC107476D: ∼51×; WGC107477D: ∼44×) were selected as distinguished pair from northern Han Chinese, and additional 86 samples as supplements. Two high-depth WGS samples (17-G-300: ∼57×; 17-G-369: ∼56×) were selected as distinguished pair from southern Han Chinese, and additional 86 samples as supplements. We ran SMC++ assuming a mutation rate of 1.25×108 per site per generation, inferring effective population size from 38 generations (approximately 1,000 yr assuming a generation time of 26 yr) ago to 40,000 generations (over 1,000,000 yr) ago.

Assessments on Statistical Power Interference of Sample Size and Population Structure

We detected positive selections mainly considering the genomic characteristics of haplotype extension and changes of SFS. Tajima's D and iHS were chosen to examine the impact of sample size and population structure on positive selection detection. First, to evaluate the effect of sample size on the statistical power of detecting positive selection, we randomly sampled 100, 200, 300 northern Han Chinese lacking population structure six times as test datasets. Second, to test the effect of population structure, we designed six repeated tests with total sample size of 400, of which the size of northern Han Chinese is 50, 100, 200, 300, and the remaining samples are randomly sampled from the southern Han Chinese.

Regulatory Element Annotations

The well-annotated CREs (including genomic locations and enhancer-gene functional associations) were downloaded from the EpiMap (Boix et al. 2021) (http://compbio.mit.edu/epimap/). We assigned promoter-gene pairs when the distance between the promoter and the nearest transcripts of the downstream gene is no more than 1 kb. We removed the “Cancer” or “Other” types out of 833 tissues or cells in EpiMap, and merged the remaining ones into 201 tissues or cells (supplementary table S11, Supplementary Material online) by intersection of CREs between the same type of tissues or cells. In total, 189,887 promoters with an average length of 229 bases occupied 1.47% of the human genome size, and 1,356,695 enhancers with an average length of 221 bases occupied 10.00% of the genome.

Linkage Disequilibrium and Recombination Rate Map

LD and recombination rate across the genome are the important parameters for population genetic studies, and reveal the genomic interactions on a population scale. We implemented the popLDdecay (Zhang et al. 2019) to generate the LD decay plot for NyuWa Han Chinese, and East Asian (725 samples), European (656 samples), African (759 samples) from the integrated datasets of 1KGP and HGDP. It can be seen that when the distance between the sites reaches 25 kb, the LD between the sites drops below 0.1 (supplementary fig. S33, Supplementary Material online), indicating that the average distance between two sites in the genomes exceeds 25 kb, and the interaction between sites can be ignored. We used the LDBlockShow (Dong et al. 2021) to visualize the LD for the genomic regions of interest. To estimate the genome-wide recombination rates, we applied FastEPRR (Gao et al. 2016) with a sliding window size of 5 kb and step size of 1 kb.

Identification of Adaptive Selections

We performed genome-wide scan of adaptive selections with nine statistical tests (supplementary table S1, Supplementary Material online), of which iHS (Voight et al. 2006), nSL (Ferrer-Admetlla et al. 2014), Tajima's D (Tajima 1989), Fst (Meirmans and Hedrick 2011), XPEHH (Sabeti et al. 2007), and XPCLR (Chen et al. 2010) were designed for detecting recent hard sweeps. To identify more soft sweeps, SDS (Field et al. 2016) and iHH12 (Garud et al. 2015) were considered. The normalized SDS was calculated for whole 4,013 unrelated individuals of NyuWa Han Chinese following the (https://github.com/yairf/SDS) pipeline. To calculate Fst, XPEHH, and XPCLR, the CEU in Europe and the YRI in Africa from 1KGP were selected as reference populations, and the CHB and CHS from 1KGP were selected as test populations. All the other statistics for adaptive selections are calculated in the northern Han Chinese and southern Han Chinese from NyuWa Genome Resource, respectively. To determine the ancestral allele, the ancestral genome was downloaded from https://ftp.ensembl.org/pub/release-106/fasta/ancestral_alleles/homo_sapiens_ancestor_GRCh38.tar.gz. To focus on the genomic regions under positive selections rather than negative selections, the common loci with MAF greater than 1% were chosen as foci of statistical tests. The normalized iHS, nSL, iHH12, XPEHH metrics were scanned by selscan (v1.3.0), from which the sites with FDR (Benjamini–Hochberg, BH) less than 0.05 (two-tailed intervals for iHS and nSL while right-tailed intervals for iHH12 and XPEHH) are considered potentially positive selection. We used vcftools to calculate the Tajima's D (50 kb length; 50 kb step length) and the Fst statistics (50 kb length; 25 kb step length), as well as the θπ (50 kb length; 50 kb step length) to reduce the false positive results of the first two. Extremely reduced polymorphism part (the former: Tajima's D <2 and lowest 1% θπ; the latter: highest 1% Fst and lowest 1% θπ(obj)/θπ(ref)) was considered as significant positive selections. We utilized https://github.com/hardingnj/xpclr to calculate the XPCLR (50 kb length; 10 kb step length), and the FDR (BH) of XPCLR less than 0.05 (right tailed interval) were considered as significant positive selections. To identify the long-term balancing selection signals, foci with MAF greater than 0.15 were utilized for calculating Beta statistic (Siewert and Voight 2017), and the FDR (BH) less than 0.05 (right-tailed interval) and simultaneously satisfy Tajima's D greater than 2 were considered as significant balancing selections.

Composite of Multiple Positive Selections

To locate the robust and high-resolution selection across the genome, we further calculated the composite of multiple signals (CMS) (Grossman et al. 2010) of those positive selection signals computed before,

CMS=i=1nP(Si|Selected)×πP(Si|Selected)×π+P(Si|Neutral)×(1π)

where P(Si|Selected) is the probability of the statistic score in the adaptive regions for each selection statistic, and P(Si|Neutral) is the probability of the statistic score not in the adaptive regions. The π is a prior probability of selection, and in this work, we integrated the evolution conservation and functional genomic features of sequences,

π=l×i=16{0.8|state=True,0.2|state=False}

where l represents the LINSIGHT scores (Huang et al. 2017) indicating sequence conservation, and the latter six factors represent the assumed prior probability for six functional genomic annotations, including protein coding and UTR region defined in Gencode (v39) (1), transcription factor binding sites (TFBSs) defined in Cistrome (2) (Zheng et al. 2019), promoters (3) and enhancers (4) defined in EpiMap (Boix et al. 2021), CTCF binding sites (5) defined by the overlapping of CTCF binding peaks from ENCODE (Vierstra et al. 2020) and CTCF binding sites scanned by fimo (Grant et al. 2011) with the CTCF motif download from JASPAR (Fornes et al. 2020), and the eQTLs and sQTLs found in GTEx (6) (The GTEx Consortium 2020). When a variant mapped in one of these six states, we assumed a weight of 0.8 for the prior probability of selection, otherwise a weight of 0.2 was multiplied.

Definition of Adaptive Regions and Identification of Candidate Adaptive CREs

We designated the genomic regions affected by the adaptive selection as “adaptive regions”. These regions are under adaptive selection either in response to environmental changes and causal for fitness changes, or are passively under the adaptive selection due to genetic linkage to the causal adaptive variants. For the selection signals based on sliding windows, we identified the adaptive regions as those where selection was significant. For selection signals based on genetic loci, we defined the adaptive regions as the 25 kb region upstream and downstream of the loci where selection was significant. When c0ounting the genomic elements within the adaptive regions, at least 99% of the length of the genomic element will be counted.

We delimited the CREs located in the adaptive regions and any significant selection signals supporting the CREs and were conserved in the primates as the candidate adaptive CREs. First, the enhancer or promoter in adaptive regions when its median value of SDS, iHS, nSL, iHH12, XPEHH, and Beta statistics (locus-based) is not satisfied the thresholds of significance, is considered as neutral and is removed. Second, as for those window-based statistics such as XPCLR, Fst and Tajima's D statistics, we extracted the windows met the thresholds of significance, and started at their flanks and extended the length to 500 kb, and then we calculated the iSAFE statistics (Akbari et al. 2018) for looking for possible positively selected variants. We considered the CREs with common variants whose iSAFE score greater than 0.1 as the candidate positively selection variants. Finally, the CREs in conserved regions with LINSIGHT score greater than 0.106 (this threshold is close to the 0.95%ile of genome-wide polymorphic sites and equal to the median of LINSIGHT scores of conserved TFBSs, Huang et al. 2017) are determined as positively selected CREs.

Identification of Potential Adaptive Variants

We further identify the potential adaptive variants by combining their sequence evolution characteristics and functional genomic characteristics. First, the variants located in the adaptive CREs, with any one of locus-based selection statistics is significant, or iSAFE score greater than 0.1 in the significantly selected windows. Second, significant selected variants are conserved in evolution with LINSGHT score greater than 0.106 are included. Next, we only considered significant selected variants overlapping the eQTLs or sQTLs in the GTEx database (The GTEx Consortium 2020) (https://gtexportal.org/home/), which could significantly change gene expression or transcript stability by the mutations. Finally, when the adaptive eQTLs or sQTLs fall at TF-binding site or chromatin loop anchor, we considered the variants causal for the adaptive selection.

Singleton Enrichment and Mutational Constraints for the Adaptive CREs

To analyze differences in evolutionary force between different CRE groups, we performed a singleton enrichment analysis on these CRE groups. The flanking regions with equal length compared to the CREs were selected as background regions, then we calculated the relative singleton abundance of the CRE to the background regions. Let E be the CRE to be test, then B (Bu, the upstream flanking region of E, or Bd, the downstream flanking region of E) be the background. Let SE and SB be the numbers of singletons that fall into the regions E and B, respectively, and let NE and NB be the numbers of nonsingleton variants that fall into these regions. We calculated the odds ratio (SE/NE)/(SN/NN) for all the promoters and enhancers, respectively, if the odds ratio is greater than 1, we say that this CRE is enrichment for singletons and is tolerate for mutations, while the odds ratio is less than 1, we say that this CRE is depleted of singletons and is mutation intolerance (Telis et al. 2020). We use the Mann–Whitney U test to compare the singleton enrichment between different CRE groups.

To evaluate the mutational constraints of adaptive CREs, we downloaded the genomic constraint scores by 1 kb regions (https://gnomad.broadinstitute.org/) from gnomAD database. We then estimated the constraint scores of the promoters and enhancers by mapping the gnomAD regional constraint scores into these CREs with 99% overlapping rate. The adaptive CREs with constraint score greater than 4 are under large selective pressure, have significant lower mutation rate than expected, and are in a slower evolutionary process, while the adaptive CREs with constraint score less than −4 are under relax selective pressure, have significant higher mutation rate than expected, and are in a faster evolutionary process.

Functional Enrichment Analysis of Adaptive CREs

GO enrichment analysis was performed to understand the major pathways of promoters and enhancers that have evolved in the recent adaptive evolution of humans. GO annotations were downloaded from http://geneontology.org/. By referring to enhancer–gene relationships annotated by EpiMap (Boix et al. 2021), and the genes in GENCODE (version 39) adjacent to promoters, the active genes regulated by promoters or enhancers in tissues are recorded. These genes were mapped into 201 tissues and cells according to the regulator–gene relationships, and were set as the background of the enrichment analysis. Fisher's test was performed for promoters and enhancers under selection in each tissue or cell compared to their background, and those GO items with FDR less than 0.001 are considered as significant for adaptation. When tissue-based functional enrichment analysis failed to enrich for gene pathway annotations because the gene set was too small, we directly performed functional enrichment analysis on this gene set in the context of all human gene sets.

Identification of Archaic Introgression

We used ArchaicSeeker2.0 (Yuan et al. 2021) to identify the neanderthal and denisova genetic introgressions in modern human genomes, including 4,013 unrelated Han Chinese individuals from the NyuWa Genome Project, and nonadmixed populations in East Asia, South Asia, and Europe from the 1KGP. The outgroup is selected with latest chimpanzee assembly (GCF_002880755.1), and reference genome is set as hg38. We defined the genomic regions with the highest 0.1% archaic introgression rate as the highly archaic introgression regions (Fig. 4a, supplementary fig. S21, Supplementary Material online).

Functional Genomic Impact Assessment of Genomic Variants

We assessed the impact of candidate adaptive variants at three levels of molecular phenotypic impact to identify causal adaptive variants of high-likelihood. First, we referred to the tissue-specific gene expression consequences between genotypes from GTEx database. Second, we applied a deep learning model Sei Framework (Chen et al. 2022) to predict changes in regulatory activity in different tissue types after a mutation occurred on ancestral DNA. Finally, we used the FABIAN-variant software (Steinhaus et al. 2022) to assess that to which degree DNA variants affect the binding of the known TFBSs of human.

Haploblock Definition

Highly linked haploblocks in local genomes were defined. The biallelic variations with MAF not less than 5% in the NyuWa Han Chinese were used for LD calculation. According to the LD score across the human genome, four requirements were set for haploblock partitioning: (1) at least three polymorphic sites are included in haploblocks; (2) two isolated sites are tolerated in large haploblocks with more than 20 polymorphic sites; (3) the average LD score of a haploblock is higher than 0.95; (4) the average LD score between extend sites upstream or downstream of the core haploblock (defined in step 3) and the polymorphic sites in the core haploblock, is higher than 0.8.

Allele Frequency Trajectories Estimation

An approximate full-likelihood method (Stern et al. 2019) was chosen to infer selection and allele frequency trajectories of adaptive sites. The haploblock containing adaptive loci, was chosen for the inferring of allele frequency trajectories, with the minimal interference from recombination for population history inference. Note that insertion and deletion sites in the haploblock were excluded. To prepare the data for the approximate full-likelihood calculation for allele frequency trajectories, Relate (Speidel et al. 2019) was used to estimate the genealogy and population history of the haploblock, with the effective population size of the Han Chinese was set to 20,000, the mutation rate was set to 1.25e−8, the years per generation was set to 28, the number of sampling times of branch lengths was set to 100, and other parameters were set as default. Finally, allele frequency trajectories were estimated for the recent 1,000 generations.

Phylogenetic Tree Reconstruction

To describe the relationships among haplotypes of 4,013 unrelated Han Chinese individuals from the NyuWa Genome Project, we reconstructed the phylogenetic trees for the haploblocks under adaptive selection that we are interested in. We did not consider the insertions and deletions in the phylogenetic reconstruction for these adaptive haploblocks. We used MEGA X (Kumar et al. 2018) to reconstruct the phylogenetic tree with a neighbor-joining method under Kimura 2-parameter model, with a uniform rate among sites. To beauty the trees, iTOL (Letunic and Bork 2021) was employed.

Analysis of Hitchhiking Effect of Disease-related Alleles

Due to the LD in the genome, there are many disease-susceptible alleles under the hitchhiking effect of adaptive selection, and maintain a certain frequency in the population and are difficult to be eliminated (Tang et al. 2022). We used the Pearson correlation coefficient between the favored alleles and likely pathogenic alleles to describe the hitchhiking effect of adaptive selection on the disease risk alleles due to LD.

ρ(A,B)=E[(AμA)(BμB)]σAσB

A and B represent the state vectors of the two alleles at the two loci. “1” and “0” represent whether the positively selected allele or disease susceptibility allele exists on the haplotype. When the correlation test is significant (P-value less than 0.05) and the correlation coefficient is greater than 0.2, it means that the favored allele of A has a significant preference to carry the disease susceptibility allele of B, while the correlation coefficient is less than −0.2, indicating that the favored allele of A is significantly inclined to carry the normal allele of B.

Here, we perform a Pearson correlation test for the adaptive variants in adaptive CREs and their nearby disease-associated sites from the GWAS catalog (screened based on phenotype) and pathogenic or likely pathogenic sites from the ClinVar database (including “Pathogenic” and “Likely Pathogenic” tags) within 1 Mb. The reference haplotype data set here comes from the NyuWa Han Chinese cohort, so it is required that the disease risk loci exist in the NyuWa Han Chinese.

Supplementary Material

msae034_Supplementary_Data

Acknowledgments

We thank the people for generously contributing samples to the NyuWa dataset. Data analysis and computing resources were supported by the Center for Big Data Research in Health (http://bigdata.ibp.ac.cn), Institute of Biophysics, Chinese Academy of Sciences. The raw data was deposited at the National Genome Center, China.

Contributor Information

Shuai Liu, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Huaxia Luo, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

Peng Zhang, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

Yanyan Li, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

Di Hao, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

Sijia Zhang, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Tingrui Song, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

Tao Xu, National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan 250117, Shandong, China.

Shunmin He, Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online.

Funding

This work was supported by Strategic Priority Research Program of the Chinese Academy of Sciences [XDB38040300 (S.-M.H.)]; National Key R&D Program of China [2021YFF0703701 (S.-M.H.), 2021YFF0704500 (P.Z.)]; 14th Five-year Informatization Plan of Chinese Academy of Sciences [CAS-WX2021SF-0203 (S.-M.H.)]; National Natural Science Foundation of China [91940306 (S.-M.H.), 31871294 (S.-M.H.), 31970647 (P.Z.), 32200478 (Y.-Y.L.)]; special investigation on science and technology basic resources of the MOST, China [2019FY100102 (P.Z.)]; China Postdoctoral Science Foundation [2022M713311 (Y.-Y.L.)].

Data Availability

The DNA sequencing data of NyuWa genome samples used in this study have been deposited in the Genome Sequence Archive (GSA) in National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences, under accession number HRA004185 (https://ngdc.cncb.ac.cn/gsa-human/). These data are available under restricted access for privacy protection and can be obtained by application on the GSA database website (https://ngdc.cncb.ac.cn/gsahuman/) following the guidance of “Request Data” on this website. These data have also been deposited in the National Omics Data Encyclopedia (NODE) of the Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, under accession number OEP002803 (http://www.biosino.org/node). The user can register and login to this website and follow the guidance of “Request for Restricted Data” to request the data.

The reference genome GRCh38 used in this study is available at https://console.cloud.google.com/storage/browser/genomicspublic-data/resources/broad/hg38/v0/. The high LD region used in this study from https://genome.sph.umich.edu/wiki/Regions_of_high_linkage_disequilibrium_(LD).

References

  1. Akbari  A, Vitti  JJ, Iranmehr  A, Bakhtiari  M, Sabeti  PC, Mirarab  S, Bafna  V. Identifying the favoured mutation in a positive selective sweep. Nat Methods. 2018:15(4):279–282. 10.1038/nmeth.4606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alexander  DH, Novembre  J, Lange  K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009:19(9):1655–1664. 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Anderson  CA, Pettersson  FH, Clarke  GM, Cardon  LR, Morris  AP, Zondervan  KT. Data quality control in genetic case-control association studies. Nat Protoc.  2010:5(9):1564–1573. 10.1038/nprot.2010.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Arnold  M, Singh  D, Laversanne  M, Vignat  J, Vaccarella  S, Meheus  F, Cust  AE, de Vries  E, Whiteman  DC, Bray  F. Global burden of cutaneous melanoma in 2020 and projections to 2040. JAMA Dermatol. 2022:158(5):495–503. 10.1001/jamadermatol.2022.0160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Averbeck  M, Gebhardt  CA, Voigt  S, Beilharz  S, Anderegg  U, Termeer  CC, Sleeman  JP, Simon  JC. Differential regulation of hyaluronan metabolism in the epidermal and dermal compartments of human skin by UVB irradiation. J Invest Dermatol.  2007:127(3):687–697. 10.1038/sj.jid.5700614. [DOI] [PubMed] [Google Scholar]
  6. Aya  KL, Stern  R. Hyaluronan in wound healing: rediscovering a major player. Wound Repair Regen. 2014:22(5):579–593. 10.1111/wrr.12214. [DOI] [PubMed] [Google Scholar]
  7. Benton  ML, Abraham  A, LaBella  AL, Abbot  P, Rokas  A, Capra  JA. The influence of evolutionary history on human health and disease. Nat Rev Genet. 2021:22(5):269–283. 10.1038/s41576-020-00305-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bergeron  LA, Besenbacher  S, Zheng  J, Li  P, Bertelsen  MF, Quintard  B, Hoffman  JI, Li  Z, St. Leger  J, Shao  C, et al.  Evolution of the germline mutation rate across vertebrates. Nature. 2023:615(7951):285–291. 10.1038/s41586-023-05752-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bergström  A, McCarthy  SA, Hui  R, Almarri  MA, Ayub  Q, Danecek  P, Chen  Y, Felkel  S, Hallast  P, Kamm  J, et al.  Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020:367(6484):eaay5012. 10.1126/science.aay5012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Boix  CA, James  BT, Park  YP, Meuleman  W, Kellis  M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature. 2021:590(7845):300–307. 10.1038/s41586-020-03145-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Byrska-Bishop  M, Evani  US, Zhao  X, Basile  AO, Abel  HJ, Regier  AA, Corvelo  A, Clarke  WE, Musunuri  R, Nagulapalli  K, et al.  High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell. 2022:185(18):3426–3440.e19. 10.1016/j.cell.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cal  S, Peinado  JR, Llamazares  M, Quesada  V, Moncada-Pazos  A, Garabaya  C, López-Otín  C. Identification and characterization of human polyserase-3, a novel protein with tandem serine-protease domains in the same polypeptide chain. BMC Biochem. 2006:7(1):9. 10.1186/1471-2091-7-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Chang  T-G, Yen  T-T, Wei  C-Y, Hsiao  T-H, Chen  I-C. Impacts of ADH1B rs1229984 and ALDH2 rs671 polymorphisms on risks of alcohol-related disorder and cancer. Cancer Med. 2023:12(1):747–759. 10.1002/cam4.4920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Charlesworth  D, Willis  JH. The genetics of inbreeding depression. Nat Rev Genet. 2009:10(11):783–796. 10.1038/nrg2664. [DOI] [PubMed] [Google Scholar]
  15. Chatterjee  S, Ahituv  N. Gene regulatory elements, Major drivers of human disease. Annu Rev Genomics Hum Genet. 2017:18(1):45–63. 10.1146/annurev-genom-091416-035537. [DOI] [PubMed] [Google Scholar]
  16. Chen  H, Patterson  N, Reich  D. Population differentiation as a test for selective sweeps. Genome Res. 2010:20(3):393–402. 10.1101/gr.100545.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chen  KM, Wong  AK, Troyanskaya  OG, Zhou  J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat Genet. 2022:54(7):940–949. 10.1038/s41588-022-01102-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Chen  S, Francioli  LC, Goodrich  JK, Collins  RL, Kanai  M, Wang  Q, Alföldi  J, Watts  NA, Vittal  C, Gauthier  LD, et al.  A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024:625(7993):92–100. 10.1038/s41586-023-06045-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chiang  CWK, Mangul  S, Robles  C, Sankararaman  S. A comprehensive map of genetic variation in the World's largest ethnic group—Han Chinese. Mol Biol Evol.  2018:35(11):2736–2750. 10.1093/molbev/msy170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Chun  S, Fay  JC. Evidence for hitchhiking of deleterious mutations within the human genome. PLOS Genet. 2011:7(8):e1002240. 10.1371/journal.pgen.1002240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Cong  P-K, Bai  W-Y, Li  J-C, Yang  M-Y, Khederzadeh  S, Gai  S-R, Li  N, Liu  Y-H, Yu  S-H, Zhao  W-W, et al.  Genomic analyses of 10,376 individuals in the westlake BioBank for Chinese (WBBC) pilot project. Nat Commun.  2022:13(1):2939. 10.1038/s41467-022-30526-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Deng  L, Xu  S. Adaptation of human skin color in various populations. Hereditas. 2017:155(1):1. 10.1186/s41065-017-0036-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Dong  S-S, He  W-M, Ji  J-J, Zhang  C, Guo  Y, Yang  T-L. LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files. Brief Bioinform. 2021:22(4):bbaa227. 10.1093/bib/bbaa227. [DOI] [PubMed] [Google Scholar]
  24. ENCODE Project Consortium . An integrated encyclopedia of DNA elements in the human genome. Nature. 2012:489(7414):57–74. 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Ferrer-Admetlla  A, Liang  M, Korneliussen  T, Nielsen  R. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol Biol Evol.  2014:31(5):1275–1291. 10.1093/molbev/msu077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Field  Y, Boyle  EA, Telis  N, Gao  Z, Gaulton  KJ, Golan  D, Yengo  L, Rocheleau  G, Froguel  P, McCarthy  MI, et al.  Detection of human adaptation during the past 2000 years. Science. 2016:354(6313):760–764. 10.1126/science.aag0776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Fornes  O, Castro-Mondragon  JA, Khan  A, van der Lee  R, Zhang  X, Richmond  PA, Modi  BP, Correard  S, Gheorghe  M, Baranašić  D, et al.  JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020:48(D1):D87–D92. 10.1093/nar/gkz1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Fraser  HB. Gene expression drives local adaptation in humans. Genome Res. 2013:23(7):1089–1096. 10.1101/gr.152710.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Gallagher  MD, Chen-Plotkin  AS. The post-GWAS era: from association to function. Am J Hum Genet.  2018:102(5):717–730. 10.1016/j.ajhg.2018.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Gao  F, Ming  C, Hu  W, Li  H. New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era. G3 (Bethesda). 2016:6(6):1563–1571. 10.1534/g3.116.028233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Garg  HG, Hales  CA. Chemistry and biology of hyaluronan. Elsevier Science; 2004. 10.1016/B978-0-08-044382-9.X5030-4. [DOI] [Google Scholar]
  32. Garud  NR, Messer  PW, Buzbas  EO, Petrov  DA. Recent selective sweeps in north American Drosophila melanogaster show signatures of soft sweeps. PLOS Genet. 2015:11(2):e1005004. 10.1371/journal.pgen.1005004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Gingeras  TR. Origin of phenotypes: genes and transcripts. Genome Res. 2007:17(6):682–690. 10.1101/gr.6525007. [DOI] [PubMed] [Google Scholar]
  34. Graça  MFP, Miguel  SP, Cabral  CSD, Correia  IJ. Hyaluronic acid—based wound dressings: a review. Carbohydr Polym.  2020:241:116364. 10.1016/j.carbpol.2020.116364. [DOI] [PubMed] [Google Scholar]
  35. Grant  CE, Bailey  TL, Noble  WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011:27(7):1017–1018. 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Grossman  SR, Andersen  KG, Shlyakhter  I, Tabrizi  S, Winnicki  S, Yen  A, Park  DJ, Griesemer  D, Karlsson  EK, Wong  SH, et al.  Identifying recent adaptations in large-scale genomic data. Cell. 2013:152(4):703–713. 10.1016/j.cell.2013.01.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Grossman  SR, Shylakhter  I, Karlsson  EK, Byrne  EH, Morales  S, Frieden  G, Hostetter  E, Angelino  E, Garber  M, Zuk  O, et al.  A composite of multiple signals distinguishes causal variants in regions of positive selection. Science. 2010:327(5967):883–886. 10.1126/science.1183863. [DOI] [PubMed] [Google Scholar]
  38. GTEx Consortium . The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020:369(6509):1318–1330. 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Hernandez  RD, Kelley  JL, Elyashiv  E, Melton  SC, Auton  A, McVean  G; 1000 GENOMES PROJECT; Sella  G, Przeworski  M. Classic selective sweeps were rare in recent human evolution. Science. 2011:331(6019):920–924. 10.1126/science.1198878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hu  X, Addlagatta  A, Lu  J, Matthews  BW, Liu  JO. Elucidation of the function of type 1 human methionine aminopeptidase during cell cycle progression. Proc Natl Acad Sci USA. 2006:103(48):18148–18153. 10.1073/pnas.0608389103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Huang  J, Howie  B, McCarthy  S, Memari  Y, Walter  K, Min  JL, Danecek  P, Malerba  G, Trabetti  E, Zheng  H-F, et al.  Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun.  2015:6(1):8111. 10.1038/ncomms9111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Huang  Y-F, Gulko  B, Siepel  A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet.  2017:49(4):618–624. 10.1038/ng.3810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Jablonski  NG, Chaplin  G. Human skin pigmentation as an adaptation to UV radiation. Proc Natl Acad Sci U S A. 2010:107(supplement_2):8962–8968. 10.1073/pnas.0914628107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Jobbins  AM, Reichenbach  LF, Lucas  CM, Hudson  AJ, Burley  GA, Eperon  IC. The mechanisms of a mammalian splicing enhancer. Nucleic Acids Res. 2018:46(5):2145–2158. 10.1093/nar/gky056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Kaye  JB, Schultz  LE, Steiner  HE, Kittles  RA, Cavallari  LH, Karnes  JH. Warfarin pharmacogenomics in diverse populations. Pharmacother J Hum Pharmacol Drug Ther. 2017:37(9):1150–1163. 10.1002/phar.1982. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Khain  E, Sander  LM, Schneider-Mizell  CM. The role of cell-cell adhesion in wound healing. J Stat Phys. 2007:128(1-2):209–218. 10.1007/s10955-006-9194-8. [DOI] [Google Scholar]
  47. Kumar  S, Stecher  G, Li  M, Knyaz  C, Tamura  K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol.  2018:35(6):1547–1549. 10.1093/molbev/msy096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kundaje  A, Meuleman  W, Ernst  J, Bilenky  M, Yen  A, Heravi-Moussavi  A, Kheradpour  P, Zhang  Z, Wang  J, Ziller  MJ, et al.  Integrative analysis of 111 reference human epigenomes. Nature. 2015:518(7539):317–330. 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Letunic  I, Bork  P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021:49(W1):W293–W296. 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Liao  S, Woulfe  T, Hyder  S, Merriman  E, Simpson  D, Chunilal  S. Incidence of venous thromboembolism in different ethnic groups: a regional direct comparison study. J Thromb Haemost. 2014:12(2):214–219. 10.1111/jth.12464. [DOI] [PubMed] [Google Scholar]
  51. Luo  H, Zhang  P, Zhang  W, Zheng  Y, Hao  D, Shi  Y, Niu  Y, Song  T, Li  Y, Zhao  S, et al.  Recent positive selection signatures reveal phenotypic evolution in the Han Chinese population. Sci Bull. 2023:68(20):2391–2404. 10.1016/j.scib.2023.08.027. [DOI] [PubMed] [Google Scholar]
  52. Mali  GR, Yeyati  PL, Mizuno  S, Dodd  DO, Tennant  PA, Keighren  MA, zur Lage  P, Shoemark  A, Garcia-Munoz  A, Shimada  A, et al.  ZMYND10 functions in a chaperone relay during axonemal dynein assembly. eLife. 2018:7:e34389. 10.7554/eLife.34389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Meirmans  PG, Hedrick  PW. Assessing population structure: FST and related measures. Mol Ecol Resour. 2011:11(1):5–18. 10.1111/j.1755-0998.2010.02927.x. [DOI] [PubMed] [Google Scholar]
  54. Moore  DJ, Onoufriadis  A, Shoemark  A, Simpson  MA, zur Lage  PI, de Castro  SC, Bartoloni  L, Gallone  G, Petridi  S, Woollard  WJ, et al.  Mutations in ZMYND10, a gene essential for proper axonemal assembly of inner and outer dynein arms in humans and flies, cause primary ciliary dyskinesia. Am J Hum Genet. 2013:93(2):346–356. 10.1016/j.ajhg.2013.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Nei  M, Suzuki  Y, Nozawa  M. The neutral theory of molecular evolution in the genomic era. Annu Rev Genomics Hum Genet. 2010:11(1):265–289. 10.1146/annurev-genom-082908-150129. [DOI] [PubMed] [Google Scholar]
  56. Ohta  T. Slightly deleterious mutant substitutions in evolution. Nature. 1973:246(5428):96–98. 10.1038/246096a0. [DOI] [PubMed] [Google Scholar]
  57. Okada  Y, Momozawa  Y, Sakaue  S, Kanai  M, Ishigaki  K, Akiyama  M, Kishikawa  T, Arai  Y, Sasaki  T, Kosaki  K, et al.  Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat Commun. 2018:9(1):1631. 10.1038/s41467-018-03274-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Osterwalder  M, Barozzi  I, Tissières  V, Fukuda-Yuzawa  Y, Mannion  BJ, Afzal  SY, Lee  EA, Zhu  Y, Plajzer-Frick  I, Pickle  CS, et al.  Enhancer redundancy provides phenotypic robustness in mammalian development. Nature. 2018:554(7691):239–243. 10.1038/nature25461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Pasaniuc  B, Price  AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet. 2017:18(2):117–127. 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Purcell  S, Neale  B, Todd-Brown  K, Thomas  L, Ferreira  MAR, Bender  D, Maller  J, Sklar  P, de Bakker  PIW, Daly  MJ, et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007:81(3):559–575. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Radwan  J, Babik  W, Kaufman  J, Lenz  TL, Winternitz  J. Advances in the evolutionary understanding of MHC polymorphism. Trends Genet. 2020:36(4):298–311. 10.1016/j.tig.2020.01.008. [DOI] [PubMed] [Google Scholar]
  62. Rees  JS, Castellano  S, Andrés  AM. The genomics of human local adaptation. Trends Genet. 2020:36(6):415–428. 10.1016/j.tig.2020.03.006. [DOI] [PubMed] [Google Scholar]
  63. Rost  S, Fregin  A, Ivaskevicius  V, Conzelmann  E, Hörtnagel  K, Pelz  H-J, Lappegard  K, Seifried  E, Scharrer  I, Tuddenham  EGD, et al.  Mutations in VKORC1 cause warfarin resistance and multiple coagulation factor deficiency type 2. Nature. 2004:427(6974):537–541. 10.1038/nature02214. [DOI] [PubMed] [Google Scholar]
  64. Ruszová  E, Cheel  J, Pávek  S, Moravcová  M, Hermannová  M, Matějková  I, Spilková  J, Velebný  V, Kubala  L. Epilobium angustifolium extract demonstrates multiple effects on dermal fibroblasts in vitro and skin photo-protection in vivo. Gen Physiol Biophys. 2014:32(3):347–359. 10.4149/gpb_2013031. [DOI] [PubMed] [Google Scholar]
  65. Sabeti  PC, Varilly  P, Fry  B, Lohmueller  J, Hostetter  E, Cotsapas  C, Xie  X, Byrne  EH, McCarroll  SA, Gaudet  R, et al.  Genome-wide detection and characterization of positive selection in human populations. Nature. 2007:449(7164):913–918. 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Siewert  KM, Voight  BF. Detecting long-term balancing selection using allele frequency correlation. Mol Biol Evol  2017:34(11):2996–3005. 10.1093/molbev/msx209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Smith  JM, Haigh  J. The hitch-hiking effect of a favourable gene. Genet Res. 1974:23(1):23–35. 10.1017/S0016672300014634. [DOI] [PubMed] [Google Scholar]
  68. Speidel  L, Forest  M, Shi  S, Myers  SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019:51(9):1321–1329. 10.1038/s41588-019-0484-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Spielmann  M, Mundlos  S. Looking beyond the genes: the role of non-coding variants in human disease. Hum Mol Genet. 2016:25(R2):R157–R165. 10.1093/hmg/ddw205. [DOI] [PubMed] [Google Scholar]
  70. Steinhaus  R, Robinson  PN, Seelow  D. FABIAN-variant: predicting the effects of DNA variants on transcription factor binding. Nucleic Acids Res. 2022:50(W1):W322–W329. 10.1093/nar/gkac393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Stern  AJ, Wilton  PR, Nielsen  R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLOS Genet. 2019:15(9):e1008384. 10.1371/journal.pgen.1008384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Stunnenberg  HG, Hirst  M, Abrignani  S, Adams  D, de Almeida  M, Altucci  L, Amin  V, Amit  I, Antonarakis  SE, Aparicio  S, et al.  The international human epigenome consortium: a blueprint for scientific collaboration and discovery. Cell. 2016:167(5):1145–1149. 10.1016/j.cell.2016.11.007. [DOI] [PubMed] [Google Scholar]
  73. Tajima  F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989:123(3):585–595. 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Tang  J, Huang  M, He  S, Zeng  J, Zhu  H. Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep. 2022:40(11):111351. 10.1016/j.celrep.2022.111351. [DOI] [PubMed] [Google Scholar]
  75. Telis  N, Aguilar  R, Harris  K. Selection against archaic hominin genetic variation in regulatory regions. Nat Ecol Evol. 2020:4(11):1558–1566. 10.1038/s41559-020-01284-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Terhorst  J, Kamm  JA, Song  YS. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet.  2017:49(2):303–309. 10.1038/ng.3748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Thu  YM, Richmond  A. NF-κB inducing kinase: a key regulator in the immune system and in cancer. Cytokine Growth Factor Rev. 2010:21(4):213–226. 10.1016/j.cytogfr.2010.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Vierstra  J, Lazar  J, Sandstrom  R, Halow  J, Lee  K, Bates  D, Diegel  M, Dunn  D, Neri  F, Haugen  E, et al.  Global reference mapping of human transcription factor footprints. Nature. 2020:583(7818):729–736. 10.1038/s41586-020-2528-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Voight  BF, Kudaravalli  S, Wen  X, Pritchard  JK. A map of recent positive selection in the human genome. PLOS Biol. 2006:4(3):e72. 10.1371/journal.pbio.0040072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Wang  C, Wei  Y, Wang  G, Zhou  Y, Zhang  J, Xu  K. Heparanase potentiates the invasion and migration of pancreatic cancer cells via epithelial-to-mesenchymal transition through the Wnt/β-catenin pathway. Oncol Rep.  2020:44(2):711–721. 10.3892/or.2020.7641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Wilkinson  HN, Hardman  MJ. Wound healing: cellular mechanisms and pathological outcomes. Open Biol. 2020:10(9):200223. 10.1098/rsob.200223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Wu  Q, Han  T-S, Chen  X, Chen  J-F, Zou  Y-P, Li  Z-W, Xu  Y-C, Guo  Y-L. Long-term balancing selection contributes to adaptation in Arabidopsis and its relatives. Genome Biol. 2017:18(1):217. 10.1186/s13059-017-1342-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Yuan  K, Ni  X, Liu  C, Pan  Y, Deng  L, Zhang  R, Gao  Y, Ge  X, Liu  J, Ma  X, et al.  Refining models of archaic admixture in Eurasia with ArchaicSeeker 2.0. Nat Commun. 2021:12(1):6232. 10.1038/s41467-021-26503-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Zcharia  E, Zilka  R, Yaar  A, Yacoby-Zeevi  O, Zetser  A, Metzger  S, Sarid  R, Naggi  A, Casu  B, Ilan  N, et al.  Heparanase accelerates wound angiogenesis and wound healing in mouse and rat models. FASEB J. 2005:19(2):211–221. 10.1096/fj.04-1970com. [DOI] [PubMed] [Google Scholar]
  85. Zhang  C, Dong  S-S, Xu  J-Y, He  W-M, Yang  T-L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics. 2019:35(10):1786–1788. 10.1093/bioinformatics/bty875. [DOI] [PubMed] [Google Scholar]
  86. Zhang  P, Luo  H, Li  Y, Wang  Y, Wang  J, Zheng  Y, Niu  Y, Shi  Y, Zhou  H, Song  T, et al.  NyuWa genome resource: a deep whole-genome sequencing-based variation profile and reference panel for the Chinese population. Cell Rep. 2021:37(7):110017. 10.1016/j.celrep.2021.110017. [DOI] [PubMed] [Google Scholar]
  87. Zheng  R, Wan  C, Mei  S, Qin  Q, Wu  Q, Sun  H, Chen  C-H, Brown  M, Zhang  X, Meyer  CA, et al.  Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res. 2019:47(D1):D729–D735. 10.1093/nar/gky1094. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

msae034_Supplementary_Data

Data Availability Statement

The DNA sequencing data of NyuWa genome samples used in this study have been deposited in the Genome Sequence Archive (GSA) in National Genomics Data Center, China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences, under accession number HRA004185 (https://ngdc.cncb.ac.cn/gsa-human/). These data are available under restricted access for privacy protection and can be obtained by application on the GSA database website (https://ngdc.cncb.ac.cn/gsahuman/) following the guidance of “Request Data” on this website. These data have also been deposited in the National Omics Data Encyclopedia (NODE) of the Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, under accession number OEP002803 (http://www.biosino.org/node). The user can register and login to this website and follow the guidance of “Request for Restricted Data” to request the data.

The reference genome GRCh38 used in this study is available at https://console.cloud.google.com/storage/browser/genomicspublic-data/resources/broad/hg38/v0/. The high LD region used in this study from https://genome.sph.umich.edu/wiki/Regions_of_high_linkage_disequilibrium_(LD).


Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press

RESOURCES