Skip to main content
Cell Genomics logoLink to Cell Genomics
. 2025 Jul 24;5(9):100954. doi: 10.1016/j.xgen.2025.100954

Genomic insights into the demographic history and local adaptation of wild boars across Eurasia

Zishuai Wang 1, Zixin Li 1, Tao Huang 2, Jianhai Chen 3,4, Pan Xu 5, Ruimin Qiao 6, Hongwei Yin 1, Chengyi Song 7, Dongjie Zhang 8, Di Liu 8, Shuhong Zhao 3,9, Martien AM Groenen 10, Ole Madsen 10, Yanlin Zhang 11,, Lijing Bai 1,∗∗, Kui Li 1,12,∗∗∗
PMCID: PMC12534701  PMID: 40712570

Summary

Wild boars exhibit genetic and phenotypic diversity shaped by migrations and local adaptations. Their expansion across Eurasia, especially in Central Asia, remains underexplored. Here, we present newly sequenced whole-genome data of 47 wild boars from Eastern Asia, Central Asia, and Europe, combined with 49 existing genomes, creating a comprehensive dataset of 96 individuals. Our analyses show that Asian wild boars and Southeast Asian Suids split ∼3.6 million years ago (mya), with Central Asian and Southern Chinese ancestors diverging ∼1.8 mya. The split between Central Asian and European-Near East ancestors occurred ∼0.9 mya, followed by a European-Near East divergence ∼0.6 mya. We identify signatures of local adaptation in Central Asian populations, including two positively selected variants in LPIN1, associated with lipid metabolism, and a missense mutation in ALPK2, linked to meat traits. These findings provide insights into wild boar dispersal and adaptation and shed light on domestic pig breeding.

Keywords: wild boar, Eurasian expansion, local adaptation, demographic history, LPIN1, ALPK2, Central Asian

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Wild boars diverged into distinct Eurasian lineages between 3.6 and 0.6 mya

  • Signatures of local adaptation for metabolic, pigmentation, and cold adaptation

  • Two tightly linked adaptive enhancer variants modulate LPIN1 expression

  • ALPK2 missense mutations fixed in European pigs associate with meat production


Wang et al. analyzed whole-genome data from 96 wild boars across Eurasia to reconstruct their evolutionary history and westward expansion through Central Asia. They identified key divergence events and adaptive variants in Central Asian populations, providing new insights into wild boar evolution and informing research on pig domestication.

Introduction

Sus scrofa, commonly known as the wild boar, is a widely distributed mammal capable of adapting to a broad range of habitats across Eurasia.1 Emerging in Southeast Asia around 3–4 million years ago (mya),2,3,4 the wild boar has expanded its range across Eurasia and into North Africa, demonstrating remarkable adaptability to diverse environmental conditions.5,6,7,8,9,10 Physically robust, with powerful limbs and sharp tusks, the wild boar occupies a wide range of habitats, from dense forests to open grasslands. Its omnivorous and opportunistic diet, which includes roots, fruits, small animals, and carrion, enables it to exploit various food resources and adapt to seasonal fluctuations. Socially, wild boars form cohesive family groups that support cooperative foraging and predator defense. Environmental factors limiting its range expansion include deep snow in winter and the arid climates of the Gobi Desert and Mongolian steppe.11 This wide geographic distribution has resulted in extensive genetic diversity and ecological flexibility, making the species a valuable model for studying population genetics, evolutionary history, and adaptation.

Southeast Asia, particularly Island Southeast Asia (ISEA) and mainland Southeast Asia (MSEA), is considered the center of origin for the wild boar and a biodiversity hotspot for the genus Sus.12,13 Wild boars from MSEA, especially the "Mekong region" identified by Wu et al.,14 show high genetic diversity and encompass nearly all major East Asian mitochondrial haplotypes. Molecular phylogenetic studies have consistently indicated a significant divergence between Asian and European wild boar (EUW) populations, estimated to have occurred during the Mid-Pleistocene, ∼1.6–0.8 mya.15,16,17,18,19 Despite considerable progress in characterizing the genetic structure of wild boars across East Asia,14,20,21 little is known about the populations in Central Asia. Central Asian wild boars (CAWs), situated at the crossroads of East Asia and Europe, likely played a role in gene flow and migration between distinct wild boar lineages, providing key insights into the dispersal patterns that contributed to the spread of wild boars across these regions. CAWs exhibit unique genetic diversity characterized by a clear discordance between their maternal lineage, firmly within the Asian clade, and their paternal lineage, closely affiliated with western populations. This genetic pattern underscores Central Asia’s critical role as a region of historical gene flow and a key to understanding the evolutionary history and dispersal of wild boars across Eurasia.1,22 Whole-genome analyses are required to better understand the routes followed by wild boars during their westward expansion through Central Asia, as well as to infer the demographic history and potential local adaptations associated with this expansion.

In this study, we aim to clarify the dispersal history and genetic diversity of wild boars across their broad geographic range, focusing on underrepresented Central Asian populations. By integrating 47 newly sequenced whole-genome data with 49 existing genomes, we investigate the evolutionary relationships and migration routes that have shaped wild boar populations from Southeast Asia to Europe. Our analysis further seeks to identify genomic regions associated with local adaptation in Central Asia, providing insights into the role of natural selection in shaping the genetic landscape of these populations. Previous studies, which mainly relied on mitochondrial data, could not accurately resolve divergence times or comprehensively identify genetic variants involved in local adaptations specific to Central Asian populations. Our genomic analyses improve divergence time estimates, revealing that CAWs diverged from East Asian lineages ∼1.8 mya and from European lineages around 0.9 mya, and further identify genomic variants likely linked to environmental adaptations unique to the Central Asian region. These findings enhance our understanding of the evolutionary dynamics and adaptive processes of wild boars across diverse ecological and geographic contexts.

Results

Sample collection and genome-wide variation in wild boars

We sequenced the genomes of 47 wild boars across Eastern Asia, Central Asia—a region previously underrepresented in genetic studies—and Europe, achieving an average genomic coverage exceeding 30-fold. To establish a comprehensive view of genetic diversity among wild boars, these new data were integrated with 49 publicly available genomes (Table S1), including 7 whole-genome sequences from another wild Sus species (Sus cebifrons) to facilitate demographic history analyses. After quality control (Figure S1), we encompass a wide geographic range, representing 52 wild boars from Eastern Asia (32 from Southern China, 10 from Northern China, and 10 from Korea), 27 from Europe, 3 from the Near East, and 7 from Central Asia, as well as 7 Sus cebifrons individuals from Southeast Asia (Figure 1A; Table S1). Given that our Central Asian samples were collected near the boundary regions of this area, we performed an mtDNA haplotype-based analysis to compare these 7 samples with previously published Central Asian samples.1 Our analysis revealed that these 7 samples were closely related to other Central Asian samples, suggesting they are genetically representative of wild boars within this broader geographic region (Figure S2). This study generated a total of 4.82 × 1010 reads, with an average sequencing depth of 22.82× and a genome coverage of 98.73% (Table S1). After aligning reads to the Sus scrofa 11.1 reference genome and applying stringent quality control filters, we identified 39,943,269 single-nucleotide polymorphisms (SNPs). Of these SNPs, 14.4 million are in intergenic regions, 15.7 million are in intronic regions, and 0.79 million are in exonic regions (Table S2).

Figure 1.

Figure 1

Geographic distribution and genetic structure of 89 wild boars

(A) Worldwide distribution of studied wild boars. New, new samples used in this study; previous, samples published in previous studies.

(B) Neighbor-joining tree constructed using autosomal SNPs. SCW, Southern Chinese wild boars; NAW, Northeast Asian wild boars; CAW, Central Asian wild boars; EUW, European wild boars; NEW, Near Eastern wild boars; OUT, outgroup (Sus cebifrons).

(C) Plots of principal components 1 and 2 for the 89 wild boars.

(D) Admixture pattern for K = 2 and K = 3 reveals additional population structures among all 89 wild boars. The geographic locations are at the bottom.

(E) Genetic diversity of the different groups.

(F) Linkage disequilibrium patterns for the different groups.

Genetic diversity and population structure

Both the phylogenetic tree (Figure 1B) and the principal-component analysis (PCA) (Figures 1C and S3) separated the wild boars into two clades, one including all East Asian wild boars (EAWs) and the other including the CAWs, EUWs, and Near Eastern wild boars (NEWs). Individual ancestry coefficients were inferred with admixture to further assess population structure. With K = 2, which is also the optimal K-value (Figure S4), the ADMIXTURE analysis recapitulated a similar pattern as the PCA and phylogenetic tree. With K = 3, the third ancestral component (in blue) appears mainly in Central Asian, Near Eastern, and Greek wild boars (Figure 1D).

Based on their geographical distribution, we subdivided the EAWs into the Southern Chinese wild boars (SCWs) and the Northeast Asian wild boars (NAWs; including individuals from Northeast China and Korea). Nucleotide diversity (π) analysis showed that the SCWs had the highest levels of genetic diversity, followed by NEAs, CAWs, EUWs, and NEWs (Figures 1E and S5). This trend indicates a progressive reduction in genetic diversity moving from Southeast Asia toward Europe. The level of inbreeding measured by runs of homozygosity (ROH) was lower in EAWs than in other groups (Figure S6). Genetic distances estimated via the pairwise fixation index (FST) ranged from 0.112 (CAWs-EUWs) to 0.520 (NAWs-NEWs) (Figure S7). Additionally, linkage disequilibrium (LD) decay analysis revealed more rapid LD decay in the Southern Chinese population compared to other groups, suggesting its larger effective population size and higher genetic diversity over time (Figure 1F). Overall, the observed genetic patterns reflect a clear geographic gradient of genetic variation from Southeast Asia to Northern Asia, Central Asia, and finally, Europe and the Near East, providing valuable insights into the spread of wild boars across Eurasia.

Migration tracks and demographic history of the wild boar

To investigate the migration and expansion of wild boars from Southeast Asia to Europe, we examined phylogenetic relationships and gene flow using TreeMix with Sus cebifrons as the outgroup. The relationships inferred from the TreeMix analysis (Figures 2A, S8, and S9) are consistent with patterns observed in the phylogenetic relationships (Figure 1C). In addition, TreeMix detected a migration event from IMG (Inner Mongolia) to Xinjiang. Following the divergence between Sus cebifrons and the wild boar, the earliest split within the wild boar separated the CAWs from the SCWs. Subsequent branching led to the divergence of NAWs from SCWs and EUWs from CAWs, with branch length reflecting the geographic distances from their origins in Southeast Asia.

Figure 2.

Figure 2

Demographic and migration histories for the wild boars

(A) Tree topology inferred from TreeMix when one migratory tract is allowed. CAW, Central Asian wild boars; SCW, Southern Chinese wild boar; NAW, Northeast Asian wild boars; EUW, European wild boars; NEW, Near East wild boars; IMG, Inner Mongolia; SCN, South China; NCN, North China; NEC, Northeast China.

(B) Demographic history of four geographical populations using MSMC. The gray shading marks the Pastonian interglacial (PI) stage (600–800 kya), the Mindel glaciation (455–300 kya), and the Würm glaciation (WG; 110–10 kya).

(C) Inferred population demographic history of wild boars using the joint site-frequency spectra.

(D) A proposed migratory history for wild boars from southern East Asia to Europe based on the evidence from our study.

To infer the demographic history of wild boars, we applied multiple sequentially Markovian coalescent (MSMC) analysis to 8 wild boar individuals, each with sequencing coverage exceeding 30×. The MSMC trajectories were largely consistent across individuals, indicating strong population cohesion throughout much of the species’ evolutionary history (Figure 2B). The demographic history of wild boars is marked by population fluctuations that correspond to glacial cycles during the Pleistocene. Around 600–800 thousand years ago (kya), all wild boar populations experienced growth during the Pastonian interglacial stage, followed by a decline during the Mindel glaciation (455–300 kya), which persisted until the end of the Würm glaciation (110–10 kya).

To investigate population divergence, we performed a site-frequency spectrum (SFS)-based analysis. First, we used SVDquartets to generate a phylogenetic tree, which was consistent with the relationships inferred from our TreeMix analysis (Figure S10). We then applied fastsimcoal2 to model demographic history, such as divergence times and migration rates between two groups that are geographically adjacent (Figure S11A), using SVDquartets inferred topology as input. Our analysis estimated that the divergence between Sus cebifrons and wild boars occurred ∼3.66 mya. Following this, the ancestors of CAWs/EUWs/NE split off from the ancestor of SCWs/NAWs 1.8 mya. At 1 mya, the ancestors of the NAWs branched off from SCWs, migrating northeast into China and Korea. The lineages leading to the NEWs and EUWs separated from CAWs ∼0.95 mya, with the divergence between NEWs and EUWs occurring around 0.68 mya (Figures 2C, 2D, and S11B).

The increased population size of EUWs ∼0.5 mya, as depicted in Figure 2B, may suggest a population expansion in EUWs following their divergence from NEWs. A significant migration rate of 10.1% was identified between SCWs and NAWs, likely attributable to the minimal geographical barriers separating these two groups (Figure S11C). Notably, no significant population size differences were detected among wild boar individuals (Figure S11D). Collectively, these findings elucidate a detailed timeline of population divergence events among Sus species and within wild boars across Eurasia.

Genomic signatures of selection on autosomes related to local adaptation

Originating in the wet and warm environments of Southeast Asia, wild boars migrated across Eurasia and expanded into high-latitude regions. This migration likely imposed selective pressures that drove adaptation to diverse environmental conditions. To explore the genetic basis of these adaptations, we conducted pairwise comparisons between CAWs and SCWs, which inhabit two different environments (Table S3). We performed window-based (10 kb) FST, θπ ratio (θπ2/θπ1), and cross-population composite likelihood ratio (XP-CLR) analyses. By merging the top 5% of high-scoring regions from each analysis, we identified 601 putative selected regions overlapping 320 genes (Figure 3A; Table S4). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed a significant over-representation of genes associated with metabolic pathways (Figure S12; Table S4). These findings suggest that historical exposure to cold environments likely imposed strong selective pressures on basal metabolic rates, an important adaptation mechanism, given that most mammals increase basal metabolic activity in response to cold.23,24

Figure 3.

Figure 3

Genome-wide annotations of selective genome signatures for environmental adaptation

(A) Distribution of the FST, log2(θπ ratio), and XP-CLR values for Central Asian wild boars (CASs) versus Southern China wild boars (SCNs). The dashed vertical and horizontal lines indicate the 5% significance thresholds for log2(θπ ratio) (1.7) and FST (0.45), respectively. XP-CLR values larger than 1.9 are colored as purple. The red dot indicates the strong selected region, corresponding to the ADAMTS20 and PDPK1 gene, identified through the integrated analysis of these three methods.

(B) Strong selective signals on ADAMTS20 based on FST, log2(θπ ratio) and XP-CLR for CASs versus SCNs.

(C) Manhattan plots of the FST (CASs versus SCNs). The red curve depicts the mean FST calculated over a sliding window of 100 consecutive individual FST.

(D) Enrichment within 15 chromatin states of the 601 putative selective regions. TssA, strongly active promoters/transcripts; TssAHet, flanking active TSS without ATAC; TxFlnk, transcribed at gene; TxFlnkWk, weak transcribed at gene; TxFlnkHe, Transcribed region without ATAC; EnhA, strong active enhancer; EnhAMe, medium enhancer with ATAC; EnhAWk, weak active enhancer; EnhAHet, active enhancer without ATAC; EnhPois, poised enhancer; ATAC_Is, ATAC island; TssBiv, bivalent/poised TSS; Repr, repressed polycomb; ReprWk, weak repressed polycomb; Qui, quiescent. Values greater than 1 (dashed line) indicate significant enrichment. Each datapoint represents one of 14 different tissues.

(E) 601 selective genomic regions enrichment within tissue-specific enhancers. Values > 1 (dashed line) indicate significant enrichment. All-common (shared among all tissues), gut-common (shared among gut tissues), and brain-common (shared among brain tissues).

(F) Significantly enriched QTL terms for the 601 selective regions.

In particular, ADAMTS20, which ranked in the top 5% of genes by FST (0.77), log2(θπ ratio) (3.81), and XP-CLR (4.22) (Figure 3A, red dots), was identified as a key target of selection, implicated in the regulation of melanocyte differentiation and pigmentation,25 and emerged as a key target of selection (Figures 3B; Table S4). Among the top 10 identified genes by FST, PDPK1, a central regulator of melanocyte proliferation—whose loss results in reduced pigmentation in mice26—was also identified as a gene under selection (Figure 3C). Notably, PDPK1 has been recognized for its role in local adaptation to skin pigmentation in African human populations,27 further underscoring its evolutionary significance. The identification of ADAMTS20 and PDPK1 suggests that as wild boars expanded from lower-latitude to higher-latitude regions, these populations adapted to varying levels of ultraviolet (UV) radiation. In higher-latitude environments, where UV exposure is reduced, regulation of melanocyte differentiation and proliferation likely became crucial for maintaining adequate pigmentation, which in turn provides protection from environmental stresses. This highlights the adaptive importance of these genes in responding to environmental changes.

Our functional annotation analysis revealed that the 601 putative selective regions were predominantly enriched in open chromatin regions, with secondary enrichment in enhancer regions (Figure 3D; Table S4). This distribution suggests that adaptive evolution is more likely driven by regulatory changes in non-coding regions than by alterations in coding sequences. The significant concentration of selective sweeps in these regulatory regions underscores the critical role of gene regulation in enabling adaptive responses to environmental pressures.

Furthermore, our tissue-specific enhancer enrichment analysis identified the brain, spleen, and gut as key organs for these regulatory adaptations (Figure 3E; Table S4). These findings suggest that these organs play critical roles in mediating adaptive responses: the brain, central to behavioral responses and environmental perception, adapts to new stimuli and conditions; the spleen, vital for immune function, often responds to new pathogens and immunological challenges; and the gut, essential for nutrient absorption and host-microbiome interactions, may adapt to dietary or ecological changes.

Our QTL (quantitative trait locus) enrichment analyses reveal selective sweeps that are strongly linked to traits crucial for cold adaptation. These traits include variations in rib number, body length, and subcutaneous fat thickness—features that are integral to survival in colder climates (Figure 3F; Table S4). The observed increase in rib count and body length in certain populations may indicate structural adaptations that contribute to a more robust body form. Generally, mammals in warmer regions tend to be smaller than those in colder regions, likely due to the evolutionary advantage of a higher surface area-to-volume ratio, which facilitates more efficient heat dissipation.28 Additionally, the enhanced subcutaneous fat thickness serves as a vital insulative layer, offering protection against extreme temperatures in high-latitude environments.

Naturally selected variants for local adaptation

To detect selective SNPs associated with high-latitude adaptation, we employed the XP extended haplotype homozygosity (XP-EHH) test, which identifies SNPs where a selected allele has reached, or is approaching, fixation in one population while remaining polymorphic in another. This analysis identified 18,582 variants as the top 1% loci (Table S5). Among these 18,582 variants, we identified 18 variants annotated as pig causal variants in the Online Mendelian Inheritance in Animals (OMIA) database29 (Table S6). Additionally, 534 variants were also detected by three other methods (FST, θπ ratio, and XP-CLR). Consistent with the distribution shown in Figure 3D, 49% of these shared variants (262 out of 534) are in regions of open chromatin, 24% (129 out of 534) are in enhancer regions, and only 5% are within transcribed gene regions (Figure 4A).

Figure 4.

Figure 4

Two mutations affect the expression of LPIN1 in skeletal muscle through regulating the enhancer activity

(A) Regulatory elements annotation of the 534 variants detected as selective variants by four methods (FST, θπ ratio, XP-EHH, and XP-CLR).

(B) Linkage disequilibrium (LD) analysis for rs340542212 and rs324682561 using PLINK, showing the CC/TT alleles are in phase with R2 = 0.93 (a measure of the correlation between the two SNPs indicating how much of the variation in one SNP can be explained by the other) and D′ = 0.98 (a measure of LD that is less influenced by allele frequencies than R2).

(C) rs340542212 and rs324682561 overlap a muscle-specific enhancer (yellow) in the intron of LPIN1. For the bottom image, each color represents a specific chromatin state, with the corresponding relationships shown in (A).

(D) The patterns of genotypes of rs340542212 and rs324682561 among the five wild boar groups and Sus cebifrons, which were regarded as the outgroup (OUT). CAW, Central Asian wild boars; SCW, Southern Chinese wild boar; NAW, Northeast Asian wild boars; EUW, European wild boars; NEW, Near East wild boars.

(E) Allele frequencies at rs340542212 in Eurasian populations located at different regions.

(F) Expression levels of LPIN1 in different tissues. Expression levels are shown in log10(TPM).

(G) Expression levels of LPIN1 in skeletal muscle tissue for different genotypes of rs340542212 and rs324682561 in domestic pigs.

(H) Luciferase reporter assay of rs340542212 and rs324682561 in skeletal muscle cells. n = 10. "CC" refers to cells transfected with the luciferase reporter vector carrying the rs340542212-C/rs324682561-C haplotype, while "TT" refers to cells transfected with the vector carrying the rs340542212-T/rs324682561-T haplotype. Cells transfected with the pGL4.23-control vector were used as a negative control.

The Wilcoxon signed-rank tests were used in (G) and (H).

As shown in Figure 4A, there are 13 variants within strong active enhancers. Among them, two tightly linked SNPs (rs340542212 and rs324682561; Figure 4B) were found to overlap with a muscle-specific enhancer in an intron of LPIN1 (Figure 4C), suggesting a potential regulatory role in muscle-related adaptation. LPIN1 is a magnesium-dependent phosphatidic acid phosphohydrolase enzyme that plays a crucial role in triglyceride synthesis by catalyzing the dephosphorylation of phosphatidic acid to form diacylglycerol.30 This enzyme is essential for adipocyte differentiation and also functions as a nuclear transcriptional coactivator, interacting with peroxisome proliferator-activated receptors to regulate the expression of genes involved in lipid metabolism.31 In Lipin-1-deficient mice, the absence of this enzyme leads to lipodystrophy, characterized by underdeveloped adipocytes, impaired triglyceride utilization during fasting, and transient fatty liver and skeletal muscle myopathy due to disrupted glycerolipid homeostasis.32 Haplotype analysis revealed that these two variants are naturally selected in high-latitude populations, including NAWs, CAWs, and EUWs (Figure 4D). The ancestral alleles, rs340542212-C and rs324682561-C, have higher frequencies (0.85) in low-latitude regions (Southern China) compared to other populations in the high-latitude regions (Figures 4E and S13). LPIN1 is highly expressed in adipose, testis, and muscle tissues (Figure 4F), and the C allele is associated with lower expression of LPIN1 in pig tissues, as indicated by PigGTEx data33 (Figure 4G). Luciferase reporter assays conducted in pig primary myoblast cells showed that the ancestral C allele exhibits decreased enhancer activity compared to the derived T allele (Figure 4H). These findings suggest that the rs340542212 and rs324682561 mutations in the muscle-specific enhancer influence triglyceride synthesis by modulating the enhancer activity and gene expression of LPIN1.

We also identified 26 missense mutations among the 18,582 variants, spanning 9 genes (Table S7; Figures S14–S21). Notably, five missense mutations were found within the ALPK2 gene, including one mutation, rs327271406, which has reached fixation in both CAW and EUW populations (Figures 5A and 5B). Further analysis using data from the PigGTEx project confirmed that these mutations are also fixed in European domestic pigs (Figure 5C). ALPK2 is highly expressed in heart and muscle tissues (Figure 5D), and phenome-wide association studies (phWASs)34 indicate a strong association between ALPK2 and meat production traits (Figures 5E and S22; Table S8). This finding is consistent with the established observation that European domestic pig breeds exhibit superior growth and meat production traits compared to their Asian counterparts, suggesting that the fixation of these mutations in ALPK2 may contribute to these advantageous phenotypic traits.

Figure 5.

Figure 5

Genetic basis of ALPK2 for the meat production trait

(A) XP-EHH values around the ALPK2 gene locus.

(B) Haplotype pattern of the five 5 missense mutations within the ALPK2 gene among the five wild boar groups and Sus cebifrons, which were regarded as the outgroup (OUT). CAW, Central Asian wild boars; SCW, Southern Chinese wild boar; NAW, Northeast Asian wild boars; EUW, European wild boars; NEW, Near East wild boars.

(C) Allele frequency of the 5 missense mutation sites among pig populations in the PigGTEx project. ASW, Asian wild boars; ASD, Asian domestic pigs; EUW, European wild boars; EUD, European domestic pigs.

(D) Expression levels of ALPK2 in different tissues. Expression levels are shown in log10(TPM).

(E) Point plot shows the association between ALPK2 and complex analysis. Data are cited from PigBiobank,34 and each point represents the association between ALPK2 and each trait in gene-based association analysis. The correspondence between the abbreviation and the full name of each trait is provided in Table S8.

Discussion

In this study, we conducted an in-depth analysis of high-coverage whole-genome sequencing (WGS) data from 89 wild boars, covering a broad geographic range from Southeast Asia to Europe, including previously underexplored regions in Central Asia. Our findings provide a first genomic understanding of CAWs, enabling preliminary investigation into their Eurasian expansion routes and clarifying previously uncertain aspects of their dispersal history.

Earlier studies have made significant strides in tracing the migration patterns of wild boars and domestic pigs, though certain details remain unresolved. For instance, Larson et al. traced the initial dispersal from ISEA to the Indian subcontinent, followed by a gradual expansion into Eastern Asia and eventually reaching Western Europe.12,35 However, their work did not provide a detailed analysis of the specific migration pathways. Wu et al. later examined potential routes within East Asia,14 but the relationships between these East Asian lineages and their western counterparts remained unclear. Additionally, a recent phylogeographic study using mitochondrial DNA and Y chromosome data from wild boars proposed new hypotheses on the potential expansion routes across Asia, yet the historical connections between these lineages were not fully explored.22

Building on this foundation, our analysis provides a refined and more detailed picture of the evolutionary history of the wild boar. We estimate that the divergence between Sus cebifrons and wild boar occurred ∼3.6 mya, which is consistent with previous studies.3,4,36 Following the expansion from Southeast Asia into Southern China, a significant bifurcation within wild boar took place around 1.8 mya, resulting in two distinct clades: SCWs and CAWs. About 1 mya, the ancestors of NAWs diverged from the Southern Chinese lineage and expanded northeastward into northeastern China and Korea. Subsequently, around 0.95 mya, EUWs and NEWs diverged from the Central Asian populations and split at 0.68 mya, illustrating the complex dynamics of population migration and adaptation within the wild boar lineage.

By integrating high-coverage WGS data with historical genetic analyses, our study provides a more detailed and accurate reconstruction of ancient migration routes. While our findings partly align with previously proposed migration paths based on mitochondrial DNA and Y chromosome data,22 the lack of samples from the Indian subcontinent limits our ability to fully reconstruct the dispersal history and assess potential interactions among all regional populations of wild boars.

Previous studies on phenotype-associated variants in pigs have primarily focused on the effects of domestication and artificial selection on genomic variation.14,37,38,39,40 In contrast, our study identified naturally selected loci that may be involved in phenotypic and physiological adaptation to diverse environments across wild boar populations. Similar to the domestication process, we found that local adaptation was predominantly influenced by mutations in regulatory regions rather than coding regions.41,42 Among diverse environmental factors across Eurasia, cold climate likely represents a selective pressure driving local adaptations within CAW populations. Not surprisingly, gene functional enrichment analysis for selective regions in CAWs revealed that metabolic processes were the primary categories, supporting the hypothesis that efficient energy regulation and metabolism are essential for survival in colder climates.

Moreover, the functions of naturally selected genes showing significant differences in non-synonymous SNP allele frequencies between CAW and SCW populations provided further evidence for selection signatures on genes related to biological processes such as melanocyte differentiation (ADAMTS20) and pigmentation (PDPK1), lipid metabolism (LPIN), and muscle development (ALPK2). These findings indicate that local adaptation in wild boars has shaped a variety of physiological and biochemical pathways, highlighting the intricate relationship between genetic variation and environmental pressures. However, the specific environmental pressures or selective forces driving the observed genetic differences between these populations remain uncertain and require further investigation.

In summary, our study provides a refined understanding of the wild boar’s migration and adaptation, revealing detailed expansion routes and climate-driven genetic changes. The findings emphasize the role of regulatory regions in adaptation. By integrating high-resolution genomic data, we advance the understanding of wild boar evolutionary history and set the stage for future research on local adaptation and environmental impacts on mammalian evolution.

Limitations of the study

Although our dataset includes seven Central Asian samples, it provides only a preliminary genomic perspective on CAWs, and more extensive sampling from this region is essential to thoroughly characterize population structure, adaptive evolution, and gene flow across Eurasia. Additionally, the absence of South Asian samples constrains our understanding of how geographical barriers and ecological factors in South Asia have shaped the evolutionary trajectories of wild boars. While we have identified several candidate genes potentially involved in adaptations related to melanocyte differentiation, pigmentation, or metabolic processes, functional validation remains challenging. Like many genomic studies, our findings rely primarily on associations and indirect evidence rather than direct experimental confirmation. Future integrative research combining ecological data, experimental approaches, and broader geographic sampling will be necessary to conclusively determine the adaptive significance of these genetic variants.

Resource availability

Lead contact

Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Kui Li (likui@caas.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

  • This paper analyzes existing, publicly available data. The accessions for these data can be found in Table S1.

  • All the WGS data newly generated in this study are available under CNCB GSA (https://ngdc.cncb.ac.cn/) accession PRJCA016120.

  • No other custom code/software was used for data analysis in the study. The publicly available software and algorithms used in the present study are listed in the key resources table.

Acknowledgments

This work was supported by grants from the Shenzhen Outstanding Talents Training Fund (grant no. 202102), the National Key R&D Program of China (grant nos. 2021YFD301201 and 2024YFF0728800), the National Natural Science Foundation of China (grant nos. 31972539, 31501933, and 32102513), the STI 2030—Major Projects (grant no. 2022ZD04017), the National Natural Science Foundation of Guangxi Province (grant no. 2024GXNSFAAO10105), the Basic Research Center for Livestock and Poultry Sciences (CAAS-BRC-LP-2025-01), the Science, Technology and Innovation Commission of Shenzhen Municipality (grant no. JCYJ20180306173644635), the Science Technology Innovation and Industrial Development of Shenzhen Dapeng New District (grant nos. PT20170201 and PT202101-05), and the China Scholarship Council (grant no. 202003250046).

Author contributions

This study was conceived by K.L., Y.Z., L.B., and Z.W. The analysis was performed by Z.W. and Y.Z. Z.L. performed the biological experiments. T.H., J.C., P.X., R.Q., H.Y., C.S., D.Z., D.L., and S.Z. collected DNA samples of the wild boars. M.A.M.G. and O.M. provided guidance on data analysis and interpretation. Z.W., Y.Z., K.L., L.B., M.A.M.G., and O.M. wrote the manuscript. All authors have read and approved this manuscript.

Declaration of interests

The authors declare no competing interests.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Critical commercial assays

Endo-free Plasmid Mini Kit II Omega Cat#D6950-02
Dual-Luciferase® Reporter Assay System Promega Cat#E1960

Deposited data

WGS data of wild boars newly generated This paper CNCB: PRJCA016120
WGS data of Europe and Asia wild boars Groenen et al.16 NCBI: PRJEB1683
WGS data of Europe wild boars Bianco et al.43 NCBI: PRJNA320525
WGS data of Switzerland wild boars Letko et al.44 NCBI: PRJEB29465
WGS data of Netherlands wild boars Liu et al.36 NCBI: PRJEB30129
WGS data of Sus Cebifrons Liu et al.45 NCBI: PRJEB35180
WGS data of Sus Cebifrons Bosse et al.46 NCBI: PRJEB9326
WGS data of Europe wild boars Nosková et al.47 NCBI: PRJEB39374
WGS data of Europe and Asia wild boars Frantz et al.48 NCBI: PRJEB9922
WGS data of Chinese wild boars Ai et al.49 NCBI: PRJNA213179
WGS data of Korean wild boar Ramírez et al.50 NCBI: PRJNA255085
WGS data of Korean wild boar Molnár et al.51 NCBI: PRJNA239399
WGS data of Korean wild boar Choi et al.52 NCBI: PRJNA260763

Software and algorithms

Trimmomatic v0.39 Bolger et al.53 https://github.com/timflutre/trimmomatic
BWA-MEM v0.7.5a-r405 Li et al.54 https://github.com/bwa-mem2/bwa-mem2
Picard v2.21.2 Van der Auwera et al.55 http://broadinstitute.github.io/picard/
GATK v4.1.4.1 Van der Auwera et al.55 https://github.com/broadinstitute/gatk/releases
Bcftools v1.9 Li56 https://github.com/samtools/bcftools
Samtools v1.9 Li56 https://github.com/samtools
Beagle v5.1 Browning et al.57 https://github.com/beagle-dev/beagle-lib
SnpEff v.4.3 Cingolani et al.58 https://pcingola.github.io/SnpEff/
PLINK v1.90 Chang et al.59 https://github.com/chrchang/plink-ng
GenABEL Aulchenko et al.60 https://github.com/cran/GenABEL/blob/master/R/GenABEL.R
Vcftools v1.2.6 Danecek et al.61 https://vcftools.github.io/
MEGA v12 Kumar et al.62 https://www.megasoftware.net/dload_win_beta
ADMIXTURE v1.3.0 Alexander et al.63 https://speciationgenomics.github.io/ADMIXTURE/
PopLDdecay v3.30 Zhang et al.64 https://github.com/BGI-shenzhen/PopLDdecay
MSMC2 Schiffels et al.65 https://github.com/stschiff/msmc2
Treemix v1.13-7 Pickrell et al.66 https://github.com/carolindahms/TreeMix
PAUP v4.0 Swofford et al.67 https://paup.phylosolutions.com/
fastsimcoal2 Excoffier et al.68 https://speciationgenomics.github.io/fastsimcoal2/
xpclr Chen et al.69 https://bioconda.github.io/recipes/xpclr/README.html
XPEHH Sabeti et al.70 https://github.com/szpiech/selscan
DAVID Huang et al.71 https://davidbioinformatics.nih.gov/
LOLA v1.32.0 Sheffield et al.72 https://github.com/nsheff/LOLA
GALLO v1.4 Fonseca et al.73 https://github.com/pablobio/GALLO

Experimental model details

Isolation of pig primary myoblast and cell culture

Pig primary myoblasts were isolated from a 2-day-old Duroc pig as described previously74 and cultured in DMEM/F12 -Dulbecco’s Modified Eagle Medium (DMEM/F12, Gibco, USA) supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin (PS, Thermo Scientific, USA) at 37°C in a 5% CO2 incubator.

Method details

Whole-genome sequencing of wild boars

Total genomic DNA of samples with information about geographic origin were collected. DNA fragments were processed and sequenced using the Illumina HiSeq 2500 platforms. Furthermore, published genomic data for 72 wild boars were downloaded from NCBI. Raw reads were filtered with the following criteria: reads with unidentified nucleotides exceeded 10% were discarded and reads with the proportion of low-quality base (Phred score ≤5) larger than 50% were discarded.

Read mapping and SNP calling

We processed and analyzed the whole genome sequence data using the following pipeline. Briefly, we trimmed the raw sequence reads by Trimmomatic (v0.39),53 with the default parameters. Both low-quality bases and adapter sequences were removed to ensure high-quality data for downstream analysis. Clean reads were mapped to Sscrofa11.1 using BWA-MEM (v0.7.5a-r405) with default parameters.54 We marked duplicated reads by Picard (v2.21.2) (http://broadinstitute.github.io/picard/). We removed 16 samples with low read depth (<5×). Finally, we kept 108 samples for jointly calling variants using the Genome Analysis Toolkit (GATK) (v4.1.4.1)55 with parameters: QD > 2, MQ < 40, FS > 60, SOR >3, MQRankSum < −12.5 and ReadPosRankSum < −8, yielding ∼197 million SNPs. We removed SNPs with MAF <0.01 and/or missing call rate >0.9 using bcftools (v1.9)56 and employed Beagle (v5.1)57 to phase the filtered variants and impute sporadically missing genotypes. Finally, a total of 42,523,218 SNPs were retained. These SNPs were annotated by SnpEff v.4.3.58

Sample filtering, genetic diversity, and population structure

We obtained 2,573,646 linkage disequilibrium (LD) independent SNPs using PLINK (v1.90) with parameters: --indep pairwise 1000 100 0.2.59 We calculated the IBS matrix among the 108 samples using GenABEL60 package in R. Then, 98 unrelated wild boars were selected from 108 samples by removing samples that have a common ancestor within 3 generations. We used all high-quality SNPs to calculate nucleotide diversity (π) and population genetic differentiation (FST) with 10-kb nonoverlapping windows using vcftools (v1.2.6).61

The principal components analysis (PCA) was conducted using PLINK (v1.90).59 A neighbor-joining phylogenetic tree was constructed using Mega (v12)61 with default parameters. The population structure was analyzed using the expectation–maximization algorithm in ADMIXTURE (v1.3.0),63 with the ancestry-specifying K ranging from 2 to 10 and 10,000 iterations per run. LD was calculated using PopLDdecay (v3.30).64

mtDNA haplotype-based analysis for Central Asian wild boars

To validate the genetic origin of the Central Asian samples using in this study, we performed mtDNA haplotype-based analysis. Briefly, we downloaded the mtDNA CR haplotypes for Central Asian, other Asian, and European wild boars from Markov et al.1 and extracted the corresponding mtDNA haplotypes from BAM files of our own samples using SAMtools consensus.55 We then constructed a Neighbor-Joining (NJ) tree of the combined set of samples using the MEGA12 software.61 The result indicates that our Central Asian wild boars cluster within the same clade as the CA1-CA4 haplotypes, which are specific to Central Asia, as reported by Markov et al.1

Effective population size and divergence time inference

We estimated effective population sizes using Multiple Sequentially Markovian Coalescent (MSMC2)65 with -p ‘4 + 50∗1 + 4+6’.75 The porcine generation time (g) = 5 years, and neutral mutation rate per generation (μ) = 2.5 × 10−8 were based on previous reports.16

We used Treemix v1.13-766 to test models of possible admixture 89 wild boars with Sus cebifrons used as outgroup. 2,573,646 LD independent SNPs were used as input. Migrations from m = 0 to m = 9 were tested, with 10 replicates per m to obtain the best models.

We used PAUP∗version 4.0 to run SVDquartets67 to estimate the branching pattern among the five populations (SAW, SCW, NAW, CAW, and EUW). SNPs were filtered by bcftools with the following parameter: -e 'AC = = 0 || AC = = AN ||F_MISSING >0.2' -m2 -M2.76 A Ruby script (https://github.com/mmatschiner/tutorials/blob/master/species_tree_inference_with_snp_data/src/convert_vcf_to_nexus.rb) was used to convert the VCF file into PAUP∗ input file.

The joint SFS approach implemented in fastsimcoal268 was performed to model more recent demographic fluctuations and respective divergence times based on the species tree estimation by SVDquartets. To mitigate the effect of LD, we filtered out the SNPs located within the 15 chrome states regions. The multidimensional folded SFS for all five populations was generated with easySFS. The conditional maximization algorithm (ECM) was used to maximize the likelihood of each parameter while keeping the others stabilized. This ECM procedure runs through 40 cycles where each composite likelihood was calculated using 100,000 coalescent simulations. The best parameters were determined according to the maximum value of the likelihoods and Akaike information criterion.

Selective signatures analysis

To identify genomic regions that may have been subject to selection for different environment adaptation, we scanned the genome comparisons between Central Asian and Southern Chinese wild boars. We calculated θπ for each population and the FST between the two populations using VCFtools with a genomewide sliding window strategy (10 kb in length with 10-kb step).61 The log value of θπ ratio (θπSCN/θπCAS) was estimated. Multilocus allele frequency differentiation based selective sweeps (XP-CLR) were detected using xpclr tool69 with a 10kb sliding window. Candidate regions under positive selection were extracted based on the empirical top 5% value of FST, log2 (θπ ratio) and XPCLR.

We used cross-population extended haplotype heterozygosity test (XP-EHH)70 to detect the SNPs that have reached or are approaching fixation in one population, while remaining polymorphic in another. The top 1% outliers of all SNPs were selected as candidate selective variances.

Functional enrichment analysis of selective signatures

We pinpointed genes located in the candidate selective regions based on the Sscrofa11.1 reference genome assembly. Genes that overlapped with these regions were selected for further analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of these genes was implemented by using the Database for Annotation, Visualization and Integrated Discovery (DAVID).71

We downloaded from publicly available datasets77 that describe the genomic features including promoters (TssA, TssAHet, and TssBiv), TSS-proximal transcribed regions (TxFlnk, TxFlnkWk, and TxFlnkHet), enhancers (EnhA, EnhAMe, EnhAWk, EnhAHet, and EnhPois), repressed regions (Repr and ReprWk), and quiescent regions (Qui) for 14 tissues including stomach, spleen, muscle, lung, liver, jejunum, ileum, hypothalamus, duodenum, cortex, colon, cerebellum, cecum, and adipose in a BED file format. A total of 17 modules were used for tissue-specific enhancer enrichment analysis. These 17 modules included all-common (presented in all tissues), gut-common (presented in all 5 intestinal tissues including stomach, jejunum, duodenum, ileum, and colon), brain-common (presented in all 3 brain tissues including cortex, cerebellum, and hypothalamus) and 14 tissue-specific modules. The significance of enrichment analysis for candidate selective regions were performed by using the R package LOLA v1.32.072 with 15 chromatin states for 14 different pig tissues used as reference annotation dataset. We used R package GALLO v1.473 for QTL enrichment analyses against the pig QTL Database, and items with adjusted P-values under 0.01 from multiple tests were considered as enriched ones.

Functional analyses of rs340542212 and rs324682561 at the LPIN1 locus

Considering the rs340542212 and rs324682561rs variants lie within a skeletal muscle-specific enhancer, primary pig myoblasts were used for luciferase assays to preserve the native cis-regulatory and transcription factor environment required for accurate measurement of allele-specific enhancer activity. A 51 bp sequence containing rs340542212 and rs324682561 with 25 bp up and 25 downstream of rs340542212 was synthesis and cloned into the pGL4.23-basic Luciferase Reporter Vector (GeneCreate, China). Candidate functional mutations of rs340542212 and rs324682561 were synthesized as well. Upon reaching approximately 60% confluency, the Pig primary myoblasts were transfected with plasmids. The luciferase vector containing the SNP site (experimental vector; 1 μg) and the control reporter vector pRL-TK (20 ng) were co-transfected into the Pig primary myoblasts at a 50:1 ratio using Attractene Transfection Reagent (Qiagen, Germany) according to the manufacturer’s protocol. After 24 h of transfection, the transfected cells were collected and lysed. Luciferase activity was then measured using the Dual-Luciferase Reporter Assay System (Promega, USA), and the results were normalized to Renilla luciferase activity.

Published: July 24, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xgen.2025.100954.

Contributor Information

Yanlin Zhang, Email: yanlinzhang@hkust-gz.edu.cn.

Lijing Bai, Email: bailijing@caas.cn.

Kui Li, Email: likui@caas.cn.

Supplemental information

Document S1. Figures S1–S22 and Tables S2, S3, S5, and S7
mmc1.pdf (3.2MB, pdf)
Table S1. Information for datasets and samples used in this study, related to Figure 1

Tab A: detailed information of all projects we collected. Tab B: detailed information of samples used in this study.

mmc2.xlsx (26.7KB, xlsx)
Table S4. Enrichment analysis of the 601 selective genomic regions

Tab A, Selection signals of the 601 genomic regions, related to Figures 3A and 3B. Tab B, Genes overlap with the 601 selective regions, related to Figure 3C. Tab C, The enriched KEGG trems of the 601 selective genes, related to Figure S14. Tab D, Definitions and abbreviations of 15 chromatin states, related to Figure 3D. Tab E, Enrichment analysis of the 601 selective regions for 15 chromatin states, related to Figure 3D. Tab F, Enrichment analysis of the 601 selective regions for tissue-specific enhancer, related to Figure 3E. Tab G, Enrichment analysis of the 601 selective regions for pig QTLs, related to Figure 3F.

mmc3.xlsx (87.1KB, xlsx)
Table S6. Statistics on the overlap between wild boar variants and the causal variants listed for domestic pigs in OMIA, related to Figure 5
mmc4.xlsx (16.9KB, xlsx)
Table S8. Phenome-wide association study analysis result of ALPK2 gene, related to Figures 5E and S22
mmc5.xlsx (17.5KB, xlsx)
Document S2. Transparent peer review records for Wang et al.
mmc6.pdf (989.2KB, pdf)
Document S3. Article plus supplemental information
mmc7.pdf (12.4MB, pdf)

References

  • 1.Markov N.I., Bykova E.A., Esipov A.V., Nurtazin S.T., Ranyuk M.N., Matrosova V.A. Between the east and the west: genetic uniqueness of the Central-Asian wild boar (Sus scrofa) on the basis of maternal and paternal markers. Mamm. Biol. 2024;104:333–344. doi: 10.1007/s42991-024-00411-9. [DOI] [Google Scholar]
  • 2.Choi S.K., Lee J.-E., Kim Y.-J., Min M.-S., Voloshina I., Myslenkov A., Oh J.G., Kim T.-H., Markov N., Seryodkin I., et al. Genetic structure of wild boar (Sus scrofa) populations from East Asia based on microsatellite loci analyses. BMC Genet. 2014;15:85. doi: 10.1186/1471-2156-15-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Frantz L.A.F., Schraiber J.G., Madsen O., Megens H.-J., Bosse M., Paudel Y., Semiadi G., Meijaard E., Li N., Crooijmans R.P.M.A., et al. Genome sequencing reveals fine scale diversification and reticulation history during speciation in Sus. Genome Biol. 2013;14:R107–R112. doi: 10.1186/gb-2013-14-9-r107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Frantz L., Meijaard E., Gongora J., Haile J., Groenen M.A.M., Larson G. The evolution of Suidae. Annu. Rev. Anim. Biosci. 2016;4:61–85. doi: 10.1146/annurev-animal-021815-111155. [DOI] [PubMed] [Google Scholar]
  • 5.Hongo H., Ishiguro N., Watanobe T., Shigehara N., Anezaki T., Long V.T., Binh D.V., Tien N.T., Nam N.H. Variation in mitochondrial DNA of Vietnamese pigs: relationships with Asian domestic pigs and Ryukyu wild boars. Zoolog. Sci. 2002;19:1329–1335. doi: 10.2108/zsj.19.1329. [DOI] [PubMed] [Google Scholar]
  • 6.Markov N., Pankova N., Filippov I. Wild boar (Sus scrofa L.) in the north of Western Siberia: history of expansion and modern distribution. Mamm. Res. 2019;64:99–107. [Google Scholar]
  • 7.García G., Vergara J., Lombardi R. Genetic characterization and phylogeography of the wild boar Sus scrofa introduced into Uruguay. Genet. Mol. Biol. 2011;34:329–337. doi: 10.1590/S1415-47572011005000015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lewis J.S., Farnsworth M.L., Burdett C.L., Theobald D.M., Gray M., Miller R.S. Biotic and abiotic factors predicting the global distribution and population density of an invasive large mammal. Sci. Rep. 2017;7 doi: 10.1038/srep44152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Markov N., Economov A., Hjeljord O., Rolandsen C.M., Bergqvist G., Danilov P., Dolinin V., Kambalin V., Kondratov A., Krasnoshapka N., et al. The wild boar Sus scrofa in northern Eurasia: a review of range expansion history, current distribution, factors affecting the northern distributional limit, and management strategies. Mamm Rev. 2022;52:519–537. [Google Scholar]
  • 10.Blench R., MacDonald K. Routledge; 2006. The Origins and Development of African Livestock: Archaeology, Genetics, Linguistics and Ethnography. [Google Scholar]
  • 11.Barrios-Garcia M.N., Ballari S.A. Impact of wild boar (Sus scrofa) in its introduced and native range: a review. Biol. Invasions. 2012;14:2283–2300. [Google Scholar]
  • 12.Larson G., Cucchi T., Fujita M., Matisoo-Smith E., Robins J., Anderson A., Rolett B., Spriggs M., Dolman G., Kim T.-H., et al. Phylogeny and ancient DNA of Sus provides insights into neolithic expansion in Island Southeast Asia and Oceania. Proc. Natl. Acad. Sci. USA. 2007;104:4834–4839. doi: 10.1073/pnas.0607753104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Melletti M., Meijaard E. Ecology, Conservation and Management of Wild Pigs and Peccaries. Cambridge University Press; 2017. [DOI] [Google Scholar]
  • 14.Wu G.-S., Yao Y.-G., Qu K.-X., Ding Z.-L., Li H., Palanichamy M.G., Duan Z.-Y., Li N., Chen Y.-S., Zhang Y.-P. Population phylogenomic analysis of mitochondrial DNA in wild boars and domestic pigs revealed multiple domestication events in East Asia. Genome Biol. 2007;8 doi: 10.1186/gb-2007-8-11-r245. R245–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fernández A.I., Alves E., Óvilo C., Rodríguez M.C., Silió L. Divergence time estimates of East Asian and European pigs based on multiple near complete mitochondrial DNA sequences. Anim. Genet. 2011;42:86–88. doi: 10.1111/j.1365-2052.2010.02068.x. [DOI] [PubMed] [Google Scholar]
  • 16.Groenen M.A.M., Archibald A.L., Uenishi H., Tuggle C.K., Takeuchi Y., Rothschild M.F., Rogel-Gaillard C., Park C., Milan D., Megens H.-J., et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012;491:393–398. doi: 10.1038/nature11622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Vilaça S.T., Biosa D., Zachos F., Iacolina L., Kirschning J., Alves P.C., Paule L., Gortazar C., Mamuris Z., Jędrzejewska B., et al. Mitochondrial phylogeography of the European wild boar: the effect of climate on genetic diversity and spatial lineage sorting across Europe. J. Biogeogr. 2014;41:987–998. [Google Scholar]
  • 18.Giuffra E., Kijas J.M., Amarger V., Carlborg Ö., Jeon J.-T., Andersson L. The origin of the domestic pig: independent domestication and subsequent introgression. Genetics. 2000;154:1785–1791. doi: 10.1093/genetics/154.4.1785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Larson G., Dobney K., Albarella U., Fang M., Matisoo-Smith E., Robins J., Lowden S., Finlayson H., Brand T., Willerslev E., et al. Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science. 2005;307:1618–1621. doi: 10.1126/science.1106927. [DOI] [PubMed] [Google Scholar]
  • 20.Watanobe T., Ishiguro N., Okumura N., Nakano M., Matsui A., Hongo H., Ushiro H. Ancient mitochondrial DNA reveals the origin of Sus scrofa from Rebun Island, Japan. J. Mol. Evol. 2001;52:281–289. doi: 10.1007/s002390010156. [DOI] [PubMed] [Google Scholar]
  • 21.Ramayo Y., Shemeret’Eva I., Pérez-Enciso M. Mitochondrial DNA diversity in wild boar from the Primorsky Krai Region (East Russia) Anim. Genet. 2011;42:96–99. doi: 10.1111/j.1365-2052.2010.02074.x. [DOI] [PubMed] [Google Scholar]
  • 22.Choi S.K., Kim K.S., Ranyuk M., Babaev E., Voloshina I., Bayarlkhagva D., Chong J.-R., Ishiguro N., Yu L., Min M.-S., et al. Asia-wide phylogeography of wild boar (Sus scrofa) based on mitochondrial DNA and Y-chromosome: Revising the migration routes of wild boar in Asia. PLoS One. 2020;15 doi: 10.1371/journal.pone.0238049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rolfe D.F., Brown G.C. Cellular energy utilization and molecular origin of standard metabolic rate in mammals. Physiol. Rev. 1997;77:731–758. doi: 10.1152/physrev.1997.77.3.731. [DOI] [PubMed] [Google Scholar]
  • 24.Avaria-Llautureo J., Hernández C.E., Rodríguez-Serrano E., Venditti C. The decoupled nature of basal metabolic rate and body temperature in endotherm evolution. Nature. 2019;572:651–654. doi: 10.1038/s41586-019-1476-9. [DOI] [PubMed] [Google Scholar]
  • 25.Silver D.L., Hou L., Somerville R., Young M.E., Apte S.S., Pavan W.J. The secreted metalloprotease ADAMTS20 is required for melanoblast survival. PLoS Genet. 2008;4 doi: 10.1371/journal.pgen.1000003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Scortegagna M., Ruller C., Feng Y., Lazova R., Kluger H., Li J.-L., De S.K., Rickert R., Pellecchia M., Bosenberg M., Ronai Z.A. Genetic inactivation or pharmacological inhibition of Pdk1 delays development and inhibits metastasis of BrafV600E:: Pten–/–melanoma. Oncogene. 2014;33:4330–4339. doi: 10.1038/onc.2013.383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fan S., Spence J.P., Feng Y., Hansen M.E.B., Terhorst J., Beltrame M.H., Ranciaro A., Hirbo J., Beggs W., Thomas N., et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell. 2023;186:923–939.e14. doi: 10.1016/j.cell.2023.01.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Clavel J., Morlon H. Accelerated body size evolution during cold climatic periods in the Cenozoic. Proc. Natl. Acad. Sci. USA. 2017;114:4183–4188. doi: 10.1073/pnas.1606868114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Nicholas F.W. Online Mendelian Inheritance in Animals (OMIA): a comparative knowledgebase of genetic disorders and other familial traits in non-laboratory animals. Nucleic Acids Res. 2003;31:275–277. doi: 10.1093/nar/gkg074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wiedmann S., Fischer M., Koehler M., Neureuther K., Riegger G., Doering A., Schunkert H., Hengstenberg C., Baessler A. Genetic variants within the LPIN1 gene, encoding lipin, are influencing phenotypes of the metabolic syndrome in humans. Diabetes. 2008;57:209–217. doi: 10.2337/db07-0083. [DOI] [PubMed] [Google Scholar]
  • 31.Loos R.J.F., Rankinen T., Pérusse L., Tremblay A., Després J.P., Bouchard C. Association of lipin 1 gene polymorphisms with measures of energy and glucose metabolism. Obesity. 2007;15:2723–2732. doi: 10.1038/oby.2007.324. [DOI] [PubMed] [Google Scholar]
  • 32.Mitra M.S., Chen Z., Ren H., Harris T.E., Chambers K.T., Hall A.M., Nadra K., Klein S., Chrast R., Su X., et al. Mice with an adipocyte-specific lipin 1 separation-of-function allele reveal unexpected roles for phosphatidic acid in metabolic regulation. Proc. Natl. Acad. Sci. USA. 2013;110:642–647. doi: 10.1073/pnas.1213493110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Teng J., Gao Y., Yin H., Bai Z., Liu S., Zeng H., Bai L., Cai Z., Zhao B., et al. PigGTEx Consortium A compendium of genetic regulatory effects across pig tissues. Nat. Genet. 2024;56:112–123. doi: 10.1038/s41588-023-01585-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zeng H., Zhang W., Lin Q., Gao Y., Teng J., Xu Z., Cai X., Zhong Z., Wu J., Liu Y., et al. PigBiobank: a valuable resource for understanding genetic and biological mechanisms of diverse complex traits in pigs. Nucleic Acids Res. 2024;52:D980–D989. doi: 10.1093/nar/gkad1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Larson G., Fuller D.Q. The evolution of animal domestication. Annu. Rev. Ecol. Evol. Syst. 2014;45:115–136. [Google Scholar]
  • 36.Liu L., Bosse M., Megens H.-J., Frantz L.A.F., Lee Y.-L., Irving-Pease E.K., Narayan G., Groenen M.A.M., Madsen O. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nat. Commun. 2019;10:1992. doi: 10.1038/s41467-019-10017-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tchernov E., Horwitz L.K. Body size diminution under domestication: unconscious selection in primeval domesticates. J. Anthropol. Archaeol. 1991;10:54–75. [Google Scholar]
  • 38.Ramos-Onsins S.E., Burgos-Paz W., Manunza A., Amills M. Mining the pig genome to investigate the domestication process. Heredity. 2014;113:471–484. doi: 10.1038/hdy.2014.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ottoni C., Flink L.G., Evin A., Geörg C., De Cupere B., Van Neer W., Bartosiewicz L., Linderholm A., Barnett R., Peters J., et al. Pig domestication and human-mediated dispersal in western Eurasia revealed through ancient DNA and geometric morphometrics. Mol. Biol. Evol. 2013;30:824–832. doi: 10.1093/molbev/mss261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rubin C.-J., Megens H.-J., Barrio A.M., Maqbool K., Sayyab S., Schwochow D., Wang C., Carlborg Ö., Jern P., Jørgensen C.B., et al. Strong signatures of selection in the domestic pig genome. Proc. Natl. Acad. Sci. USA. 2012;109:19529–19536. doi: 10.1073/pnas.1217149109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Carneiro M., Rubin C.-J., Di Palma F., Albert F.W., Alföldi J., Martinez Barrio A., Pielberg G., Rafati N., Sayyab S., Turner-Maier J., et al. Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science. 2014;345:1074–1079. doi: 10.1126/science.1253714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Medugorac I., Graf A., Grohs C., Rothammer S., Zagdsuren Y., Gladyr E., Zinovieva N., Barbieri J., Seichter D., Russ I., et al. Whole-genome analysis of introgressive hybridization and characterization of the bovine legacy of Mongolian yaks. Nat. Genet. 2017;49:470–475. doi: 10.1038/ng.3775. [DOI] [PubMed] [Google Scholar]
  • 43.Bianco E., Soto H.W., Vargas L., Pérez-Enciso M. The chimerical genome of Isla del Coco feral pigs (Costa Rica), an isolated population since 1793 but with remarkable levels of diversity. Mol. Ecol. 2015;24:2364–2378. doi: 10.1111/mec.13182. [DOI] [PubMed] [Google Scholar]
  • 44.Letko A., Schauer A.M., Derks M.F.L., Grau-Roma L., Drögemüller C., Grahofer A. Phenotypic and genomic analysis of cystic hygroma in pigs. Genes. 2021;12:207. doi: 10.3390/genes12020207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liu L., Bosse M., Megens H.J., de Visser M., AM Groenen M., Madsen O. Genetic consequences of long-term small effective population size in the critically endangered pygmy hog. Evol. Appl. 2021;14:710–720. doi: 10.1111/eva.13150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Bosse M., Megens H.-J., Madsen O., Crooijmans R.P.M.A., Ryder O.A., Austerlitz F., Groenen M.A.M., de Cara M.A.R. Using genome-wide measures of coancestry to maintain diversity and fitness in endangered and domestic pig populations. Genome Res. 2015;25:970–981. doi: 10.1101/gr.187039.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Nosková A., Bhati M., Kadri N.K., Crysnanto D., Neuenschwander S., Hofer A., Pausch H. Characterization of a haplotype-reference panel for genotyping by low-pass sequencing in Swiss Large White pigs. BMC Genom. 2021;22:1–14. doi: 10.1186/s12864-021-07610-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Frantz L.A.F., Schraiber J.G., Madsen O., Megens H.-J., Cagan A., Bosse M., Paudel Y., Crooijmans R.P.M.A., Larson G., Groenen M.A.M. Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat. Genet. 2015;47:1141–1148. doi: 10.1038/ng.3394. [DOI] [PubMed] [Google Scholar]
  • 49.Ai H., Fang X., Yang B., Huang Z., Chen H., Mao L., Zhang F., Zhang L., Cui L., He W., et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat. Genet. 2015;47:217–225. doi: 10.1038/ng.3199. [DOI] [PubMed] [Google Scholar]
  • 50.Ramírez O., Burgos-Paz W., Casas E., Ballester M., Bianco E., Olalde I., Santpere G., Novella V., Gut M., Lalueza-Fox C., et al. Genome data from a sixteenth century pig illuminate modern breed relationships. Heredity. 2015;114:175–184. doi: 10.1038/hdy.2014.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Molnár J., Nagy T., Stéger V., Tóth G., Marincs F., Barta E. Genome sequencing and analysis of Mangalica, a fatty local pig of Hungary. BMC Genom. 2014;15:1–12. doi: 10.1186/1471-2164-15-761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Choi J.-W., Chung W.-H., Lee K.-T., Cho E.-S., Lee S.-W., Choi B.-H., Lee S.-H., Lim W., Lim D., Lee Y.-G., et al. Whole-genome resequencing analyses of five pig breeds, including Korean wild and native, and three European origin breeds. DNA Res. 2015;22:259–267. doi: 10.1093/dnares/dsv011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Van der Auwera G.A., Carneiro M.O., Hartl C., Poplin R., Del Angel G., Levy-Moonshine A., Jordan T., Shakir K., Roazen D., Thibault J., et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics. 2013;43:11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Browning B.L., Zhou Y., Browning S.R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 2018;103:338–348. doi: 10.1016/j.ajhg.2018.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Cingolani P., Platts A., Wang L.L., Coon M., Nguyen T., Wang L., Land S.J., Lu X., Ruden D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. fly. 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4 doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Aulchenko Y.S., Ripke S., Isaacs A., Van Duijn C.M. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23:1294–1296. doi: 10.1093/bioinformatics/btm108. [DOI] [PubMed] [Google Scholar]
  • 61.Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kumar S., Stecher G., Suleski M., Sanderford M., Sharma S., Tamura K. MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing. Mol. Biol. Evol. 2024;41 doi: 10.1093/molbev/msae263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Zhang C., Dong S.-S., Xu J.-Y., He W.-M., Yang T.-L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics. 2019;35:1786–1788. doi: 10.1093/bioinformatics/bty875. [DOI] [PubMed] [Google Scholar]
  • 65.Schiffels S., Wang K. MSMC and MSMC2: the multiple sequentially markovian coalescent. Statistical population genomics. Humana; 2020. pp. 147–165. [DOI] [PubMed] [Google Scholar]
  • 66.Pickrell J.K., Pritchard J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1002967. e1002967–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.PAUP D.S. Sinauer Associates; 2003. Phylogenetic Analysis Using Parsimony (∗ and Other Methods) [Google Scholar]
  • 68.Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V.C., Foll M. Robust demographic inference from genomic and SNP data. PLoS Genet. 2013;9 doi: 10.1371/journal.pgen.1003905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Chen H., Patterson N., Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20:393–402. doi: 10.1101/gr.100545.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Sabeti P.C., Varilly P., Fry B., Lohmueller J., Hostetter E., Cotsapas C., Xie X., Byrne E.H., McCarroll S.A., Gaudet R., et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Huang D.W., Sherman B.T., Lempicki R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  • 72.Sheffield N.C., Bock C. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics. 2016;32:587–589. doi: 10.1093/bioinformatics/btv612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Fonseca P.A.S., Suárez-Vega A., Marras G., Cánovas Á. GALLO: An R package for genomic annotation and integration of multiple data sources in livestock for positional candidate loci. GigaScience. 2020;9 doi: 10.1093/gigascience/giaa149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Ding S., Wang F., Liu Y., Li S., Zhou G., Hu P. Characterization and isolation of highly purified porcine satellite cells. Cell Death Discov. 2017;3:17003–17011. doi: 10.1038/cddiscovery.2017.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Zhang M., Yang Q., Ai H., Huang L. Revisiting the evolutionary history of pigs via de novo mutation rate estimation in a three-generation pedigree. Genom. Proteom. Bioinform. 2022;20:1040–1052. doi: 10.1016/j.gpb.2022.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Narasimhan V., Danecek P., Scally A., Xue Y., Tyler-Smith C., Durbin R. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics. 2016;32:1749–1751. doi: 10.1093/bioinformatics/btw044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Pan Z., Yao Y., Yin H., Cai Z., Wang Y., Bai L., Kern C., Halstead M., Chanthavixay G., Trakooljul N., et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat. Commun. 2021;12:5848. doi: 10.1038/s41467-021-26153-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S22 and Tables S2, S3, S5, and S7
mmc1.pdf (3.2MB, pdf)
Table S1. Information for datasets and samples used in this study, related to Figure 1

Tab A: detailed information of all projects we collected. Tab B: detailed information of samples used in this study.

mmc2.xlsx (26.7KB, xlsx)
Table S4. Enrichment analysis of the 601 selective genomic regions

Tab A, Selection signals of the 601 genomic regions, related to Figures 3A and 3B. Tab B, Genes overlap with the 601 selective regions, related to Figure 3C. Tab C, The enriched KEGG trems of the 601 selective genes, related to Figure S14. Tab D, Definitions and abbreviations of 15 chromatin states, related to Figure 3D. Tab E, Enrichment analysis of the 601 selective regions for 15 chromatin states, related to Figure 3D. Tab F, Enrichment analysis of the 601 selective regions for tissue-specific enhancer, related to Figure 3E. Tab G, Enrichment analysis of the 601 selective regions for pig QTLs, related to Figure 3F.

mmc3.xlsx (87.1KB, xlsx)
Table S6. Statistics on the overlap between wild boar variants and the causal variants listed for domestic pigs in OMIA, related to Figure 5
mmc4.xlsx (16.9KB, xlsx)
Table S8. Phenome-wide association study analysis result of ALPK2 gene, related to Figures 5E and S22
mmc5.xlsx (17.5KB, xlsx)
Document S2. Transparent peer review records for Wang et al.
mmc6.pdf (989.2KB, pdf)
Document S3. Article plus supplemental information
mmc7.pdf (12.4MB, pdf)

Data Availability Statement

  • This paper analyzes existing, publicly available data. The accessions for these data can be found in Table S1.

  • All the WGS data newly generated in this study are available under CNCB GSA (https://ngdc.cncb.ac.cn/) accession PRJCA016120.

  • No other custom code/software was used for data analysis in the study. The publicly available software and algorithms used in the present study are listed in the key resources table.


Articles from Cell Genomics are provided here courtesy of Elsevier

RESOURCES