High-quality Population-specific Haplotype-resolved Reference Panel in the Genomic and Pangenomic Eras

Qingxin Yang; Yuntao Sun; Shuhan Duan; Shengjie Nie; Chao Liu; Hong Deng; Mengge Wang; Guanglin He

doi:10.1093/gpbjnl/qzaf022

. 2025 Mar 10;23(6):qzaf022. doi: 10.1093/gpbjnl/qzaf022

High-quality Population-specific Haplotype-resolved Reference Panel in the Genomic and Pangenomic Eras

Qingxin Yang ^1,^2,^3,^#, Yuntao Sun ^4,^5,⁶, Shuhan Duan ^7,^8,⁹, Shengjie Nie ¹⁰, Chao Liu ¹¹, Hong Deng ^12,^✉, Mengge Wang ^13,^14,^15,^16,^#,^✉, Guanglin He ^17,^18,^#,^✉

Editor: Minxian Wang

¹ Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China

² Center for Archaeological Science, Sichuan University, Chengdu 610000, China

³ School of Forensic Medicine, Kunming Medical University, Kunming 650500, China

⁴ Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China

⁵ Center for Archaeological Science, Sichuan University, Chengdu 610000, China

⁶ West China School of Basic Science & Forensic Medicine, Sichuan University, Chengdu 610000, China

⁷ Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China

⁸ Center for Archaeological Science, Sichuan University, Chengdu 610000, China

⁹ School of Basic Medical Sciences, North Sichuan Medical College, Nanchong 637100, China

¹⁰ School of Forensic Medicine, Kunming Medical University, Kunming 650500, China

¹¹ Anti-Drug Technology Center of Guangdong Province, Guangzhou 510230, China

¹² School of Forensic Medicine, Kunming Medical University, Kunming 650500, China

¹³ Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China

¹⁴ Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China

¹⁵ Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing 400331, China

¹⁶ Human Genetics and Forensic Genomics Research Institute, College of Basic Medicine, Chongqing Medical University, Chongqing 400331, China

¹⁷ Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China

¹⁸ Center for Archaeological Science, Sichuan University, Chengdu 610000, China

^✉

Corresponding authors: denghong@kmmu.edu.cn (Deng H), menggewang@wchscu.cn (Wang M), guanglinhe@wchscu.edu.cn (He G).

Equal contribution.

Roles

Minxian Wang: Handling Editor

PMCID: PMC13175255 PMID: 40059317

Abstract

Large-scale international and regional human genomic and pangenomic resources derived from population-scale biobanks and ancient DNA sequences have provided significant insights into human evolution and the genetic determinants of complex diseases and traits. Despite these advances, challenges persist in optimizing the integration of phasing tools, merging haplotype reference panels (HRPs), developing imputation algorithms, and fully exploiting the diverse applications of post-imputation data. This review comprehensively summarizes the advancements, applications, limitations, and future directions of HRPs in human genomics research. Recent progress in the reconstruction of HRPs, based on over 830,000 human whole-genome sequences, has been synthesized, highlighting the broad spectrum of human genetic diversity captured. Additionally, we recapitulate advancements in 56 HRPs for global and regional populations. The evaluation of imputation accuracy indicated that Beagle and Glimpse are the most effective tools for phasing and imputing data from genotyping arrays and low-coverage sequencing, respectively. A critical strategy for selecting an appropriate HRP involves matching the population background of target groups with HRP reference populations and considering multi-ancestry or homogeneous genetic structures. The necessity of a single, integrative, high-quality HRP that captures haplotype structures and genetic diversity across various genetic variation types from globally representative populations is emphasized to support both modern and ancient genomic research and advance human precision medicine.

Keywords: Haplotype reference panel, Imputation accuracy, Genotype imputation, Population genomics, Genomic medicine

Graphical abstract

For image description, please refer to the figure legend and surrounding text.

Introduction

High-coverage whole-genome sequencing (WGS) has long been regarded as the gold standard for genotyping and identifying single-nucleotide polymorphisms (SNPs), insertions and deletions (InDels), and other structural variations (SVs) [1–6]. However, its high cost presents a significant barrier to population-scale genomic studies involving large cohorts. Consequently, much of the current research on modern humans depends on microarrays or low-depth WGS, which complicates genotype calling and restricts the ability to capture global genetic diversity [7]. Similar challenges have been observed in studies involving ancient DNA (aDNA) [8] and human leukocyte antigen (HLA) genes [9,10]. The established theoretical frameworks suggest that individuals within genetically close populations share population-specific long haplotype segments inherited from common ancestors. Accordingly, population-specific haplotype reference panels (HRPs), which consist of DNA sequences characterized by linkage disequilibrium (LD) and are tailored to document patterns of human genetic diversity, have been developed to facilitate the imputation of common, low-frequency, and rare genetic variations not directly genotyped in medical genome research or missing variants in aDNA studies.

Genetic variant information can be predicted or inferred through genotype imputation based on the documented patterns of haplotype structure in HRPs [11]. The accuracy of imputation depends mainly on the genetic architecture of the target populations (multi-ancestry heterogeneous background or genetically homogeneous structures) [12–14], algorithms such as the hidden Markov model (HMM) or deep learning employed by phasing and imputation tools [12–14], and properties of HRPs, such as marker density, the allele frequency spectrum, and the composition of reference populations [15–17]. Most imputation tools, including IMPUTE [18] and Markov Chain Haplotyping algorithm (MaCH) [19], leverage the HMM algorithm; however, updates to these tools have primarily targeted improvements in computational efficiency or memory requirements. In contrast, advances in HRPs, designed to capture the full breadth of human genetic diversity across distinct modern populations at scale, have significantly enhanced the quality of HRPs and imputation performance [20]. As a result, numerous HRPs have been constructed based on global genomic projects, such as the 1000 Genomes Project (1KGP), the Haplotype Reference Consortium (HRC) program, and Trans-Omics for Precision Medicine (TOPMed) [1–3]. More recently, population-specific HRPs, tailored to regional populations such as Han Chinese from the NyuWa Genome resource, Westlake BioBank for Chinese (WBBC), the Chinese Academy of Sciences Precision Medicine Initiative (CASPMI) project, and China Metabolic Analytics Project (ChinaMAP), Korean HRPs from the Northeast Asian Reference Database (NARD), or meta-Asian HRPs from the South and East Asian reference Database (SEAD), have improved the genetic discovery of these target populations [4,14,21–23]. Following imputation, low-coverage WGS or array-based databases enable the capture of more low-frequency or rare variations, increasing marker density, enhancing the statistical power of large-scale genome-wide association studies (GWASs) or meta-based work across different cohorts, and offering a cost-effective genotyping approach for downstream analyses, including polygenic risk score (PRS) estimation, genetic genealogy reconstruction, and demographic genetic history inference [24,25].

The imputation process introduces systematic errors that are difficult to avoid, particularly when analyzing rare and low-frequency variants [26]. As the minor allele frequency (MAF) decreases, the imputation error rate increases [27]. Furthermore, imputation performance reliant on LD is shaped by haplotype patterns and variant spectra, which vary across genetically distinct populations [4]. Recent genetic studies have demonstrated that demographic events, including severe human bottlenecks occurring during migration out of Africa, along with biological adaptations driven by environmental extremes, pathogen exposure, dietary changes [28], and other evolutionary forces, such as mutation, migration, admixture, introgression, and recombination, significantly reshaped patterns of genetic diversity, allele frequency spectra, and LD patterns (Figure 1A). Template switching rates of theta values in the multi-ancestry HRPs also influence the imputation accuracy and imputation quality metrics of minor ancestry [29]. Populations with closer genetic affinities tend to share longer haplotypes and similar variant spectra, consistent with coalescence theory, which facilitates the acquisition of more precise haplotype information. However, HRPs, primarily established from European-specific haplotype data, have proven inadequate for genotype imputation in African, Asian, and other underrepresented indigenous populations [30,31]. This European-centric bias has led to inaccuracies in interpreting population-specific genetic foundations and conducting medical genetic studies focused on complex phenotypes in genetically diverse non-European populations [22,32,33]. Evidence indicates that reducing imputation error rates for low-coverage WGS and low-density array data can be achieved through factors such as larger sample sizes, deeper sequencing depths, better population representation, and closer genetic matching between reference and target populations (Figure 1B) [20,34,35].

Thus, constructing an integrative HRP encompassing worldwide population-specific haplotypes with diverse patterns of genetic diversity remains critical for advancing both population and medical genetics research. The foundational concepts and evaluation of HRPs have been extensively reviewed in prior studies [36,37]. Earlier reviews provide detailed insights into the interplay between imputation advancements and GWASs, particularly regarding common and rare diseases [11,36,38]. Over the past eight years, human genomics research and computational genomics have undergone transformative and explosive advancements driven by innovations in multi-ancestry and population-specific HRPs. These developments have significantly advanced both basic research and clinical applications [1,14,25,39,40]. This review synthesizes recent advances in HRPs and genotype imputation tools, providing an overview of five key areas: (1) the progress and geographical and ethnic distribution patterns of populations in global human genome projects and corresponding HRPs, with a focus on large-scale multi-ancestry or population-specific high-quality HRPs; (2) the current status and efficiency of state-of-the-art phasing and imputation tools in the era of large-scale genomics; (3) performance assessments of publicly available HRPs and the necessity for high-quality merged HRPs (Figure 1C); (4) applications of imputed datasets in human genetics, genome science, genome medicine, and forensic science (Figure 1D); and (5) challenges in HRP applications and future directions in the context of the telomere-to-telomere (T2T)/pangenome-based paradigm shift.

Advances in worldwide human genome projects and corresponding HRPs in the past two decades

The draft human genome sequence was completed two decades ago, marking a pivotal advancement supported by the Genome Reference Consortium’s publication of human reference genomes. Innovations in sequencing and computational techniques have accelerated the transition from single-genome studies to large-scale population genomic projects. In 2005, the International HapMap Consortium released the first haplotype map of the human genome, followed by the first HRP [20,41], marking a significant milestone (Figure 2A). Subsequent phases, including the pilot, phase 1, and phase 3 of the 1KGP, as well as the expanded 1KGP leveraging high-depth WGS technology, have provided valuable insights into the genomic diversity across major continental populations (Figure 2B) [3,42–44]. These initiatives have uncovered previously uncharacterized gaps in missing sequences and diversity, which were attributed to the limitations of array-based genotyping in earlier studies, and have also identified extensive human genetic variations, such as SNPs, InDels, and SVs. This progress heralded a new era of comprehensive human DNA sequencing and research. The availability of data from these genomic projects has enabled integrative analyses of human genome variations, contributing to the development of numerous large-size, multi-ancestry, high-quality HRPs or regional population-specific HRPs. These efforts have significantly advanced our understanding of the genomic architecture and genetic determinants of clinical diseases and complex physical traits (Figure 2C–F; Table 1, Table S1).

Table 1.

Key characteristics and comparisons of all HRPs

Name	Published time	Sample size	Ancestry	Depth	Total number of variants	Phasing algorithm	Imputation algorithm	Assessment criterion	Target population	Imputation performance measurement
HapMap2 panel	2009	443	Multi-ethnic	/	513,008	/	MaCH	r ²	/	/
1KGP pilot phase	2010	179	Multi-ethnic	3.6×	16,224,519	/	IMPUTE	r ²	/	/
1KGP1	2012	1092	Multi-ethnic	5.1×	28,975,367	SHAPEIT2	Beagle	R ²	/	SHAPEIT2 vs. Thunder vs. Beagle
GoNL	2014	769	Netherlands	13×	21,524,538	BEAGLE	IMPUTE2	R ²	Dutch	GoNL + 1KGP vs. GoNL + 1KGP1 European vs. GoNL vs. 1KGP1 vs. 1KGP1 European
Sardinians HRP	2015	2120	Sardinians	20×	21,131,214	/	/	r ²	Sardinians	Sardinians HRP vs. 1KGP phase3 vs. 1KGP phase1
Icelanders HRP	2015	2636	Iceland	7×	45,492,035	SHAPEIT2	Beagle	R ²	/	/
UK10K	2015	3781	European	4×	29,809,603	SHAPEIT2	IMPUTE2	r ²	UK	UK10K + 1KGP vs. UK10K vs. 1KGP
AGVP	2015	320	African	32.4×	24,574,727	SHAPEIT2	/	r ²	/	/
1KJPN	2015	1070	Japanese	7.4×	49,143,605	SHAPEIT2	IMPUTE2	R ²	Japanese	1KJPN + 1KGP vs. 1KJPN vs. 1KGP vs. 1KGP JPT
1KGP3	2015	2504	Multi-ethnic	4×	17,600,000	MaCH	Minimac	R ²	Multi-ethnic	1KGP phase3 vs. 1KGP phase1
HRC	2016	32,470	European	4×–8×	39,635,008	SHAPEIT3	IMPUTE2	R ²	CEU	HRC vs. UK10K vs. 1KGP3
CAAPA	2016	642	African	35×	41,163,897	/	/	/	/	/
HELIC MANOLIS	2017	249	Cretan	4×	9,554,503	SHAPEIT2	IMPUTE2	r ²	/	/
EGCUT	2017	2244	Estonian	30×	16,536,512	SHAPEIT2	IMPUTE2	r²/number of well-imputed SNVs	/	/
AGRP	2017	265	Amish and Mennonite	30×	1,081,253	SHAPEIT2	IMPUTE2	R ²	AGRP	AGRP + 1KGP vs. HRC vs. AGRP vs. 1KGP
NIPT	2018	141,431	Chinese	0.06×	9,040,000	/	STITCH	r ²/number of well-imputed SNVs	/	/
Mongolian HRP	2018	175	Mongolian	21.8×	16,526,134	SHAPEIT2	IMPUTE2	R ²	Mongolian	Mongolian HRP + 1KGP3 vs. Mongolian HRP vs. 1KGP3 East Asian + Mongolian HRP vs. 1KGP3
AJ HRP	2018	738	Ashkenazi Jewish	30×	26,400,000	SHAPEIT2	IMPUTE2	R ²/genotype discordance	Ashkenazi Jewish	AJ HRP + 1KGP3 vs. AJ HRP vs. HRC vs. 1KGP3 vs. UK10K + 1KGP vs. UK10K vs. 1KGP
CONVERGE	2018	11,670	Chinese	1.7×	24,114,249	/	/	/	/	/
SG10K	2019	4810	Multi-ethnic	13.7×	98,273,706	Eagle2	Beagle4	r ²	Multi-ethnic	SG10K + 1KGP3 vs. SG10K vs. 1KGP3
NARD	2019	1779	Multi-ethnic	10×–20×	44,444,122	SHAPEIT3	Minimac3	R ²/genotype discordance	Korean	NARD + 1KGP3 vs. NARD vs. 1KGP3 vs. HRC
GAsP	2019	1654	Asian	30×	21,494,814	SHAPEIT2	Eagle2	/	/	/
UGR	2019	1978	African	4×	46,000,000	SHAPEIT2	IMPUTE2	/	/	/
CASPMI	2019	597	Chinese	25×–35×	/	/	/	/	/	/
PGG.Han	2020	114,783	Chinese	30×–80×	8,056,973	/	/	/	/	/
Korea1K	2020	1094	Korean	31×	38,800,000	SHAPEIT2	Minimac3	R²	Korean	Korea1K + 1KGP3 vs. Korea1K vs. 1KGP3
TWB	2021	1445	Chinese	/	/	SHAPEIT2	SHAPEIT2	R ²/concordance	Chinese	TWB + 1KGP East Asian vs. TWB vs. 1KGP East Asian
TR	2021	773	Turkish	34×	45,981,721	SHAPEIT2	IMPUTE2	R ²	Balkan	TR + 1KGP3 vs. TR vs. 1KGP3
TOPMed	2021	97,256	Multi-ethnic	30×	308,107,085	Eagle2	Minimac4	R ²	UKB	/
QGP1	2021	6218	Qatari	30×	68,107,887	Eagle2	Minimac3	R ²/number of well-imputed SNVs/r²	Qatari	QGP vs. 1KGP3 vs. HRC vs. CAAPA vs. HapMap2
NyuWa	2021	2902	Chinese	26.2×	19,256,267	SHAPEIT4	Minimac4	R ²	Multi-ethnic	NyuWa + 1KGP3 vs. NyuWa vs. TOPMed vs. GAsP vs. 1KGP3 vs. HRC
ChinaMAP	2021	10,588	Chinese	40.8×	59,010,860	/	/	/	/	/
AFAM	2021	2294	Sub-Saharan African	15×	52,500,000	SHAPEIT4	Minimac4	R ²	African Americans	TOPMed vs. AFAM vs. 1KGP3 vs. HRC vs. CAAPA
WBBC	2022	4535	Chinese	13.9×	81,498,995	SHAPEIT2	Minimac4	r²/number of well-imputed SNVs/non-reference genotype concordance rate	Multi-ethnic	WBBC + 1KGP East Asian vs. WBBC + 1KGP vs. WBBC vs. 1KGP3 vs. CONVERGE
UKB	2022	149,960	Multi-ethnic	32.5×	643,747,446	/	IMPUTE2	/	/	/
NSCLC	2022	6004	Chinese	30×	100,565,590	SHAPEIT4	Minimac4	r ²/number of well-imputed SNVs	Chinese	NSCLC vs. 1KGP3
GCAT\|Panel	2022	690	European	30×	35,431,441	SHAPEIT4	IMPUTE2	r ²	Multi-ethnic	/
Expanded 1KGP	2022	3202	Multi-ethnic	34×	70,768,225	SHAPEIT2	IMPUTE2	R ²	Multi-ethnic	/
KRG pilot	2022	1490	Korean	29.0×	13,637,761	SHAPEIT4	Minimac4	R ²/number of well-imputed SNVs	Multi-ethnic	Differences across target populations
SABE	2022	1171	Brazilians	38.6×	/	SHAPEIT2	IMPUTE2	r ²	Brazilians	SABE + 1KGP3 vs. SABE vs. 1KGP3
CKB	2023	9964	Chinese	15.41×	129,743,542	Beagle5.2	Beagle5.2	R ²/number of well-imputed SNVs/precision/sensitivity	Chinese	ChinaMAP vs. CKB vs. TOPMed vs. NyuWa vs. extended 1KGP
NARD2	2023	14,393	Multi-ethnic	/	/	Beagle5.0	Minimac4	R ²/number of well-imputed SNVs	KOR	NARD2 vs. NARD vs. TOPMed
FinnGen	2023	3775	Finnish	30×	/	Beagle4.1	Beagle4.1	r ²	/	/
1KTGP	2023	1064	Chinese	11.8×	28,200,000	SHAPEIT2	IMPUTE2	/	/	/
jMorp	2023	54,302	Japanese	/	/	/	/	/	/	/
MCPS10k	2023	9950	Mexican	30×	/	SHAPEIT4	IMPUTE5	R ²	MCPS individuals	MCPS10k vs. TOPMed
Samoan panel	2023	1285	Samoan	/	/	Eagle2	Minimac4	r ²/number of well-imputed SNVs	Samoans	1KGP3 + Samoan panel vs. TOPMed vs. 1KGP3
LVBMC	2023	502	Latvian	35.7×	/	Eagle2	Beagle4.1	Number of well-imputed SNVs	Latvians	/
BIGCS	2024	2245	Chinese	6.63×	47,239,473	Beagle4.0	Minimac3	R ²	Chinese	BIGCS + TOPMed vs. BIGCS vs. TOPMed vs. 1KGP3 vs. GAsP vs. HRC
ADSP	2024	16,564	Multi-ethnic	/	54,000,000	SHAPEIT4	Minimac3	R ²/r²	Multi-ethnic	Differences across target populations
SEA HRP	2024	2550	Southeast Asian	/	113,851,450	Beagle5 vs. SHAPEIT4 and IMPUTE5		r ²/number of well-imputed SNVs/non-reference disconcordance rate	Orang Asli	SEA HRP vs. 1KGP3
Korea4K	2024	3614	Korean	27.75×	26,210,741	SHAPEIT2	Minimac3	R ²	Korean	Korea4K vs. Korea1K
GEL	2024	78,195	European	30×	342,573,817	SHAPEIT4	IMPUTE5	r ²	Multi-ethnic	Differences across target populations
SEAD	2024	11,067	Multi-ethnic	/	80,367,720	SHAPEIT2	Minimac4	r ²/number of well-imputed SNVs/non-reference concordance rate	Chinese	SEAD vs. SG10K vs. WBBC vs. 1KGP vs. GAsP
INDp	2024	217	Indonesian	30×	10,144,296	SHAPEIT2	IMPUTE2	r ²/non-reference concordance rate	West Javanese	INDp + 1KGP East Asian vs. INDp vs. 1KGP East Asian
GMGD	2024	476	Chinese	5.5×	16,336,982	Beagle5.2	Beagle5.2	R ²	/	/

Open in a new tab

Note: R², the squared Pearson’s correlation coefficient (aggregated R² or dosage R²) between the true genotypes and imputed genotype dosages. r², the score generated by software without the true genotype (Rsq and INFO). In the “imputation performance measurement” column, we list the reference panels in order of their performance in the target population, from best to worst. The complete details can be found in Table S1. HRP, haplotype reference panel; MaCH, Markov Chain Haplotyping algorithm; HapMap, International HapMap Project; 1KGP, 1000 Genomes Project; GoNL, Genome of the Netherlands Project; UK10K, 10,000 UK Genome Sequences; AGVP, African Genome Variation Project; 1KJPN, a reference panel of 1070 Japanese individuals; HRC, Haplotype Reference Consortium (release 1.1); CAAPA, Consortium on Asthma among African-ancestry Populations in the Americas; HELIC MANOLIS, HELIC Pomak collection and the MANOLIS Cohorts; HELIC, Hellenic Isolated Cohorts; MANOLIS, Mylopotamos; EGCUT, Estonian Biobank of the Estonian Genome Center, University of Tartu; AGRP, Anabaptist Genome Reference Panel; NIPT, Non-invasive prenatal testing in China; CONVERGE, the China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology; SG10K, whole-genome-sequence 10,000 Singaporeans; NARD, Northeast Asian Reference Database; GAsP, GenomeAsia 100K Project pilot phase; UGR, Uganda Genome Resource; CASPMI, the Chinese Academy of Sciences Precision Medicine Initiative project; PGG.Han, Han Chinese Genome Initiative; Korea1K, the Korean Genome Project; TWB, Taiwan Biobank; TR, Turkish Variome; TOPMed, Trans-Omics for Precision Medicine; QGP1, Qatar Genome Program Phase 1; NyuWa, the NyuWa Genome resource; ChinaMAP, China Metabolic Analytics Project; AFAM, African Americans reference panel; WBBC, Westlake BioBank for Chinese pilot project; GCAT|Panel, GCAT|Genomes for Life Cohort; NSCLC, Non-small cell lung cancer in China; KRG, the Korean Reference Genome project; SABE, The Health, Well-being and Aging Study; CKB, China Kadoorie Biobank; UKB, UK Biobank; FinnGen, FinnGen project; 1KTGP, 1000 Tibetans Genomes Project; jMorp, The Tohoku Medical Megabank project; MCPS10k, the Mexico City Prospective Study; Samoan panel, Samoan-specific genotype reference panels; LVBMC, the Latvian population-specific reference panel; BIGCS, Born in Guangzhou Cohort Study; ADSP, the Alzheimer’s Disease Sequencing Project; SEA HRP, the Southeast Asian Specific Reference Panel; Korea4K, the Korean Genome Project; GEL, the Genomics England dataset; SEAD, the South and East Asian Reference Database reference panel; INDp, the Indonesian panel; GMGD, Guizhou Multi-Ethnic Genome Database; JPT, Japanese in Tokyo, Japan; CEU, Utah Residents with Northern and Western European Ancestry; SNV, single-nucleotide variant.

Recent advancements in high-quality HRP innovations have emerged at a rapid pace. In Europe, the HRC program and subsequent European genomic initiatives, including the Genome of the Netherlands Project (GoNL), 10,000 UK Genome Sequences (UK10K), Genomics England (GEL), and other genomic studies of Estonians, Sardinians, Cretans, Ashkenazi Jewish, Icelanders, and Latvians, have yielded high-quality HRPs and extensive genetic datasets [39,45–58]. These large-scale human genomic resources have facilitated high-quality HRP construction, fine-scale genetic analyses, and significant advancements in precision medicine across diverse European populations (Figure 2E). Genetic analysis revealed a striking European bias in human genome research, with 86% of GWASs conducted in populations of European ancestry, resulting in an overrepresentation of European populations in HRPs [28,33,59]. Efforts to develop genomic projects focused on Asian populations have aimed to address the underrepresentation and selection bias of non-European groups in human genome research and to illuminate novel genetic determinants of East Asian-specific diseases and health traits. Over 30 initiatives focused on East Asians, Southeast Asians, or single-ethnic groups such as Chinese Mongolians and Tibetans have recently been launched, significantly contributing to this effort (Figure 2C) [4,12,13,22,23,32,60–82]. Northern Americans, historically dominated by migrants from other continents with complex mixed ancestry, exhibit diverse genetic backgrounds. The peopling of the Americas is traced back to at least the late Paleolithic period [83,84]. Native American populations possess 14% to 38% ancestry related to MA-1, a 24,000-year-old individual from Mal’ta in south-central Siberia, with the remainder of their ancestry deriving from East Asians [83,84]. Latin American populations comprise both admixed and predominantly indigenous subpopulations, many of which remain genetically uncharacterized. Genomic initiatives in Northern America, such as the TOPMed program, the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), the African Americans reference panel (AFAM), the Anabaptist Genome Reference Panel (AGRP), and others, have aimed to diversify genetic collections across these groups (Figure 2F) [1,40,85–90]. Africa, recognized as the cradle of modern humans, harbors immense genetic and linguistic diversity, including Afro-Asiatic, Nilo-Saharan, Khoisan, and Niger-Congo language groups. However, few cohort studies have been conducted to explore the genetic architecture and disease susceptibility in African populations, with notable exceptions including the African Genome Variation Project and Uganda Genome Resource [31,91]. For Pacific islanders, only one HRP specific to Samoan haplotype genotypes has been developed, while public reference panels lack sufficient samples from Oceania. This underrepresentation exacerbates the health disparities faced by Pacific islanders [92]. Collectively, these genomic projects and the reference panels developed in parallel underscore global efforts to advance equitable healthcare and precision medicine. Seventeen of these panels are accessible through publicly available online imputation platforms, encompassing samples from European, Asian, African, and indigenous Oceanian ancestries (Table S1).

In addition to enhancing diversity by including genetically distinct continental populations, imputation accuracy was found to improve with an increased number of haplotypes and samples in the HRPs, particularly for rare and low-frequency variants [11,25,38]. Additionally, closer genetic proximity between the target population and the populations in HRPs corresponded to higher imputation accuracy, as demonstrated in these validation studies [25,58,93,94]. For example, when the TOPMed HRP was used, the mean imputation r-squared (r²) for European variants was 0.93, compared to 0.62 for Papua New Guineans [25]. For populations with excess African ancestry, both the TOPMed and AGRP HRPs demonstrated superior imputation accuracy [95]. Meta-imputation utilizing HRPs from the TOPMed and the expanded 1KGP yielded the best results for the Pakistani population [96]. The inclusion of diverse reference panels has been shown to increase imputation accuracy for rare variants significantly [38]. Analysis of sample size and ancestry composition in high-quality HRPs (sequencing depth ≥ 30×, sample size > 1000) revealed that individuals of European ancestry accounted for 60.7%. In fine-scale regional or isolated populations, the selection bias or Han bias has hindered the equitable representation of ethnolinguistically diverse ethnic minorities within regional population genomic cohorts (Figure 3). Asian HRPs with unspecified sequencing depths were excluded from this analysis (Table S1). While large-scale, population-specific initiatives are essential for unlocking the full potential of genome sequencing, non-European populations remain underrepresented in genomic studies owing to the high cost of sequencing. Consequently, the development of high-quality, population-specific HRPs may offer a more effective strategy for advancing genomic research in developing countries. Given the limited integration of population genetic backgrounds in current molecular anthropological and medical studies, a combined approach is recommended. This approach would involve merging high-quality sequencing data from genetically diverse cohorts, such as the 1KGP and Human Genome Diversity Project (HGDP), or integrating population-specific reference panels with multi-ethnic resources, such as the TOPMed and SEAD reference panels. Such efforts would facilitate the creation of a comprehensive, multi-ancestry, high-quality HRP that better represents underrepresented populations or those with mixed ancestral origins, including European, American, Oceanian, Singaporean, and South African populations.

Optimal combination of phasing and imputation tools

Haplotype phasing, including methods such as family-based phasing, physical phasing, and LD-based phasing, serves as a cornerstone in statistical and population genetics. This approach facilitates the identification of genetic variant combinations on individual chromosomes and underpins the construction of reference panels. Family-based phasing utilizes kinship information, including trios or multigenerational family data, to allocate alleles to haplotypes with high accuracy based on Mendelian inheritance patterns. However, its efficacy relies on the availability of comprehensive family datasets. Physical phasing employs experimental approaches, such as long-read and single-molecule sequencing technologies (e.g., PacBio and Nanopore) and next-generation sequencing, to determine haplotypes directly, often leveraging trios or physically dissected chromosomal markers. LD-based phasing infers haplotypes via LD patterns derived from population-level data. Tools such as SHAPEIT [97] and Beagle [98] have been developed to facilitate haplotype phasing in individuals without family data. However, the accuracy of these methods can be affected by population-specific LD structures.

In the current well-designed statistical phasing strategy applied to array-based data, whole-genome data [whole-exome sequencing (WES) and WGS] primarily rely on haplotype diversity patterns driven by LD within populations or utilize coalescent theory to identify shared identity-by-descent (IBD) segments among samples [99]. The most widely used phasing tools include Eagle [17], SHAPEIT [97], and Beagle [98] (Table S2). Phasing performance is typically assessed using trio samples to establish the true haplotypes of offspring, which are then compared with those inferred by phasing algorithms. Phasing accuracy was evaluated through the switch error rate (SER), defined as the fraction of consecutive heterozygous genotypes that are incorrectly phased. In an analysis of array data from 500 European trio samples, the reported SERs were as follows: SHAPEIT4 (0.117%), Beagle5 (0.125%), and Eagle2 (0.178%) [100]. Updates to SHAPEIT5 and Beagle5.4 have further improved the phasing accuracy for rare variants. In evaluations with European trio samples, both methods demonstrated low SERs (< 0.2%) when considering Axiom array loci alone. However, for rare variants in WGS data (with minor allele counts between 11 and 20 or allele frequencies less than 0.01), their SERs were 4.36% and 8.76%, respectively, whereas Eagle2 did not significantly improve the phasing performance for rare variants. Enhanced phasing accuracy for rare variants has also contributed to improved genotype imputation accuracy within corresponding high-resolution panels [97]. An independent assessment by Cole et al. examined the phasing performance of SHAPEIT5.1.1 and Beagle5.4 on genetically distinct populations (Table S2). Overall, Beagle demonstrated a lower SER than that SHAPEIT did, achieving extremely low SERs ranging from 0.012% to 0.028% for array data from individuals of European, Ashkenazi Jewish, African-admixed, and American-admixed ancestry. Conversely, average SERs for East Asian and South Asian individuals were reported to be 0.36% (standard deviation 0.17%) and 0.50% (standard deviation 0.34%), respectively. In WGS data, both SHAPEIT and Beagle displayed strong performance within European samples (SER = 0.021%) but exhibited suboptimal results for East Asian samples (SER = 3.49%) [101]. Under comparable conditions, phasing with Beagle yielded superior imputation results, and phasing with a reference panel outperformed reference-free approaches [102].

The imputation tools, developed to optimize uncertain genotype likelihoods and bridge gaps in sparsely mapped reads, were designed to maximize the utility of HRPs. Their application significantly improved the accuracy and statistical power of low-coverage WGS data following genotype imputation [39,103]. Constantly updated HRPs, incorporating more individuals from different ancestral populations and newly reported variants, facilitate iterative improvements to these tools, enhancing the accuracy of rare variant detection while minimizing computational time and memory requirements. Various genotyping and sequencing methods, including SNP tagging approaches for genotype-wide SNPs and low-coverage WGS, are commonly employed. Key tools in this domain include IMPUTE, Beagle, and Minimac (Table S2) [18,98,104]. Validation studies indicated that, under consistent conditions, the performance of these imputation tools was broadly comparable, with R² differences of no more than 0.01. Beagle demonstrated superior performance, followed by IMPUTE, with Minimac ranking last [102]. Additionally, QUILT and Glimpse, which utilize Gibbs sampling and HMM, were designed explicitly for low-pass WGS data [105,106]. For low-coverage aDNA and non-invasive prenatal testing data, Glimpse slightly outperformed QUILT and Beagle [107,108]. Moreover, both versions of Glimpse exhibited similar accuracy levels for ancient genomic data [109,110], with version 1 showing superior accuracy for data with 0.1× coverage and variants with a MAF > 0.02. In conclusion, Beagle is recommended for phasing and imputation of array data using a unified HRP, while Glimpse remains the preferred tool for low-coverage WGS data in modern and ancient populations.

Performance and disparities of diverse HRPs

The size and number of high-quality HRPs have increased with advancements in large-scale human genomic cohorts, necessitating the development of a comprehensive reference panel for global or regional population use. This is essential for the medical and population genetics communities. We systematically summarized the effects of multiple factors, such as the target population, reference panel composition, and computational strategies, on imputation accuracy (Figure 1). Recently, several HRPs have been validated as highly effective for genetically similar populations. However, limitations in publicly available resources and challenges in selecting appropriate panels for specific target populations have impeded their broader application. The genotype imputation accuracy was evaluated via two main metrics: r², derived without reference to true genotypes, and aggregated R² (also known as dosage R²), which reflects the squared Pearson’s correlation coefficients between imputed dosage and true genotypes [111]. Aggregated R² values were obtained by grouping markers according to the MAF. In addition to standard indices, concordance between predicted and observed genotypes, high-r² variants, and the density or coverage of high-quality imputed markers were incorporated as key metrics for evaluating the validation performance of the custom panels [25]. Bai et al. designed hundreds of customized reference panels with varying haplotype sizes and diversity to investigate the relationship between imputation accuracy and panel composition [37]. It was demonstrated that simulated reference panels with differing diversity yielded varying imputation performance in Han Chinese and European populations. For Han Chinese, imputation accuracy plateaued when haplotype diversity within the reference panels was limited. An intriguing explanation was proposed via this work, attributing higher “diversity acceptability” in Western Eurasians, which was inconsistent with contributions from three ancestral sources: European hunter-gatherers, Near East farmers, and Steppe Yamnaya herders [112]. This finding indicates that ancient complex demographic events have an obvious influence on imputation performance. This systematic evaluation underscores the need for large-scale, high-quality reference panels representing underrepresented populations, such as Han Chinese. To address this gap, the Han Chinese-specific WBBC panel was developed and integrated with multi-ancestry sources to form the SEAD panel, enhancing imputation accuracy for Han Chinese and broader Asian populations [14,63].

Cahoon et al. evaluated the imputation performance of the large-scale, multi-ancestry, state-of-the-art TOPMed reference panel, which demonstrated higher simulation accuracy in European populations but did not yield similar improvements for underrepresented non-European populations. The r² estimates were shown to correlate with genetic distance to European populations and were overestimated in non-European populations [25]. Shi et al. further highlighted that estimated template switching rates in the meta-ancestry reference panel may contribute to inflated r² values [29]. However, this had a limited impact on evaluating the imputation performance of population-specific panels. Comparative analyses have been conducted via genomics-based reference panels from various population cohorts. Notable examples include TOPMed and AGRP in Africans [95], ChinaMAP and WBBC for Han Chinese [13,63], and NARD for Northeast Asians [22]. These findings highlight the importance of incorporating diverse genomic resources to improve imputation precision and health equity.

We addressed this issue by imputing Chinese genomic diversity using all available high-quality HRPs. Our validation work underscored the importance of using population-matched reference panels for accurate genomic imputation. The imputation performance of publicly available HRPs was assessed via WGS data from 224 East Asian samples in the HGDP as the ground truth dataset. Quality control was conducted using VCFtools with parameters including “--maf 0.002”, “--max-missing 0.95”, “--minQ 30”, and “--hwe 1e-10”. Inconsistencies in reference genome versions among different HRPs were resolved using triple-liftOver to convert genomic coordinates to hg19 [113], and allele switches or strand flips were corrected by comparing allelic data against HRC data via Will Rayner’s tools. Imputation was halted if more than 10% of allele switches were detected during quality control (Figure 4A). Seventeen HRPs were accessed via imputation websites (Table S1), with imputed data obtained from eight usable HRPs after excluding HGDP samples already represented in the HRPs (Table 2).

Table 2.

Basic information of publicly available HRPs and the number of variants obtained after imputing chromosome 22

Reference panel	Sample size	Reference genome	Ancestry	Number of imputed sites
Expanded 1KGP	3202	GRCh38	Multi-ethnic	680,128
CAAPA	642	GRCh37	African American	381,379
GAsP	1654	GRCh37	Asian	288,682
HRCr1.1	32,470	GRCh37	European	524,544
Samoan	1285	GRCh37	Oceanian	222,128
HapMap2	443	GRCh37	Multi-ethnic	33,805
WBBC	4535	GRCh38	Chinese	494,305
SEAD	11,067	GRCh38	Asian	1,228,248

Open in a new tab

Note: Expanded 1KGP, CAAPA, GAsP, HRCr1.1, Samoan, and HapMap2 were imputed at https://imputationserver.sph.umich.edu/#!pages/home, which uses Eagle2.4 and Minimac4. WBBC and SEAD were imputed at https://imputationserver.westlake.edu.cn/index.html via SHAPEIT2 and Minimac4.

The squared Pearson’s correlation coefficients (aggregated R²) between the true genotypes and imputed genotype dosages were calculated to assess imputation accuracy. The results demonstrated a positive correlation between imputation accuracy and increasing MAF. For variants with MAFs ranging from 0.002 to 0.005, the GAsP HRP exhibited the highest accuracy (aggregated R² = 0.497). For variants with MAFs between 0.01 and 0.05, panels containing larger Asian sample sizes performed better, with SEAD achieving the highest accuracy (Figure 4B). Overall, the SEAD HRP was found to be the most suitable for the East Asian population in the HGDP. When r² statistics were evaluated, the expanded 1KGP HRP showed overall advantages (Figure 4C), underscoring the potential for imputation tool metrics to overestimate accuracy and lead to erroneous conclusions [25,29]. Benchmarking should thus avoid relying solely on r² or aggregated R² values. Further evaluation of imputation performance across 18 East Asian populations indicated that the WBBC panel exhibited the highest overall performance, although imputation accuracy varied across populations (Figure 4D). The WBBC panel showed better performance for Han populations but poorer results for Uyghur population (Figure 4E), likely due to the predominance of Han samples in the panel, emphasizing the importance of population matching in enhancing imputation accuracy. In contrast, the Uyghur population achieved better accuracy with the more diverse expanded 1KGP panel (Figure 4F), reflecting its mixed Eastern and Western ancestry, which was underrepresented in the WBBC HRP [114]. Overall, target populations that closely match those in the HRPs demonstrate superior imputation accuracy, particularly when the target population is homogeneous, or its genetic diversity is well-represented in large multi-ancestry reference panels.

Benefits and applications of genotype imputation

Genomic medicine and statistical genetics

Complex disorders related to Mendelian diseases and non-communicable diseases substantially contribute to the healthcare burden, primarily driven by a combination of polygenic genetic architecture and diverse environmental risk factors. GWASs have elucidated the complex relationships between genotype and phenotype [115], identifying numerous SNPs linked to common complex diseases and traits; however, these variants explain only a fraction of the observed genetic variance or heritability. Rare SNP mutations or SVs may contribute to the missing heritability. Early applications of phasing and imputation innovations focused on GWAS-based medical discoveries, aiming to dissect heritability by increasing SNP density, enhancing genomic coverage in meta-analyses, and fine-mapping causal variants [11,36,38]. Low-coverage WGS and microarray-based genotyping have proven to be cost-effective and reliable approaches when combined with genotype imputation based on population-specific HRPs.

Recent large-scale GWASs or multi-ancestry GWAS meta-analyses via genotype imputation have demonstrated that combining new genomic resources with imputation strategies significantly expands the pool of available genetic variants, enhancing association signals, facilitating the identification of causal variants, and enabling the meta-analysis of multiple cohorts [5,11,14,39,40,97,110]. Rigorous quality control practices for SNP data in GWASs have been established to ensure data accuracy and reliability [116,117]. The UK Biobank (UKB) cohort, encompassing approximately 500,000 individuals, represents a crucial resource for well-powered GWASs on various quantitative traits [118], including body mass index [119], type 2 diabetes mellitus [120], and major depressive disorder [121]. Notably, Gaynor et al. highlighted that single-variant and gene-based association analyses using WES combined with imputed array data yield signal detection rates within 1% of those achieved with WGS data [122]. Yang et al. conducted single-variant testing and variant-set analysis on East Asian-based GWAS for hip and femoral neck bone mineral density traits, identifying the SEAD-imputed rare variant rs60103302 near SNTG1 as associated with hip bone mineral density. Similarly, the GEL-imputed UKB GWAS revealed numerous rare trait-associated variants [39]. These newly identified causal variants offer valuable insights into the genetic architecture of human diseases or traits, allowing for the stratification of individuals at elevated risk for specific diseases. Polygenic risk score reconstruction can be performed to estimate genetic predispositions, incorporating individual variability in relevant quantitative traits [123]. This knowledge has contributed to improved patient outcomes through early detection, prevention, and targeted treatment strategies.

Population genetics

The reconstruction of genetic architecture in ethnolinguistically diverse groups in population genetics has proven crucial for understanding human origins, evolution, migration, and admixture history [124]. However, limited access to resources and financial constraints have historically restricted sampling and genotyping efforts [125]. The application of HRPs and imputation techniques has mitigated these limitations, facilitating the collection of more extensive genomic diversity data. Population-specific HRP-based genotype imputation has enabled the accurate reconstruction of missing genotype data, enhancing dataset completeness and providing higher-resolution data for downstream analyses of admixture modeling, biological adaptation, archaic introgression, and medical relevance interpretation [126]. Moreover, imputation increases marker density in integrative genomic datasets, thereby enabling more statistically robust analyses of mutation, recombination, natural selection, genetic drift, and gene flow. This process transforms individual genotype data into shared haplotypes derived from multiple reference populations, reducing or correcting false positives or negatives caused by population stratification [127]. As a result, fine-scale reconstructions of population evolutionary processes are possible, elucidating demographic histories among diverse populations and establishing a foundational framework for precision medicine systems [128]. Borda et al. evaluated haplotype discrepancies between genotyping-only and imputed datasets by calculating IBD segments. Their findings demonstrated that using exclusively imputed data for IBD analysis did not introduce bias for segments exceeding four centimorgans. Leveraging imputed data, this work provided a detailed depiction of fine-scale population structure, recent gene flow, and long-distance migration across Latin America [129]. Overall, the availability of high-quality genomic resources has dramatically advanced the study of human evolutionary history, facilitating the use of sophisticated algorithms and tools to analyze high-density DNA sequence variation and offering deeper insights into complex evolutionary processes [130].

Pharmacogenomics

Precision medicine aims to understand individualized disease progression and treatment responses. Genetic variations may influence susceptibility to specific diseases, either increasing or reducing risk, and they also affect responses to certain medications, a field known as pharmacogenomics [131]. Pharmacogenomics explores how genetic variation impacts individual drug responses, particularly in relation to absorption, distribution, metabolism, and excretion (ADME) processes. Given the increasing costs and slow pace of new drug discovery, there has been growing interest in drug repurposing, the practice of adapting existing drugs to treat both common and rare diseases [132]. Studies have indicated that genes identified through GWAS as being associated with disease traits are more likely to encode druggable proteins than other genomic regions [133]. Genotype imputation, utilizing population-specific HRPs, has been applied to estimate missing genomic data and predict individual drug responses, such as the HLA-B*15:02 variant related to carbamazepine responses and other genes related to clopidogrel, peginterferon, and warfarin reactions [74]. This approach facilitates the development and optimization of personalized drug therapies, identifying potential drug targets and advancing the understanding of drug mechanisms [134].

Prenatal screening

The integration of ultra-low-depth sequencing data from non-invasive prenatal testing (NIPT) with genotype imputation leveraging population-specific HRPs has increased the coverage, resolution, and overall comprehensiveness of genotype data for prenatal screening [60,135]. Improvements in genotype imputation performance have been observed with increasing NIPT sequencing depth and the expansion of reference panels [107]. Additionally, WGS on a fetus has been applied to identify potential conditions that may manifest during infancy or childhood, with the objectives of prevention, treatment, or preparation for the child’s arrival [136]. Imputation has proven valuable in genetic prediction and counseling, enriching the understanding of familial genetic histories, predicting disease risk, and facilitating personalized recommendations. Moreover, more precise association studies of complex diseases have been conducted for prenatal genetic diagnosis. Sequencing of fetal DNA allows parents to assess potential health risks, thereby supporting informed decisions regarding pregnancy continuation or termination [137]. The potential transition of genomic sequencing from a specialized test to a broadly accessible healthcare resource will necessitate the collection of high-quality outcome data from large cohorts and sustained efforts in monitoring screening program effectiveness [5]. Achieving these aims in an evidence-based, equitable, and sustainable manner remains essential for safeguarding the well-being of newborns.

Paleogenomics

The analysis of aDNA extracted from fossils and ancient hominin remains has significantly reshaped human genetic history. However, accurate genotype determination from ancient genomes remains challenging due to limited sequencing depth caused by DNA degradation and microbial contamination [138,139]. Consequently, pseudo-diploids are often employed in population genetic and medical studies, leveraging genomic variations rather than true diploids. Despite these difficulties, evidence has shown that imputation using HRPs constructed from modern humans with similar ancestry represents a reliable method for enhancing aDNA studies within the diploid-based research paradigm, even for populations with coverage depths as low as 0.5× [8]. Martiniano et al. first used imputation and haplotype-based methods in aDNA research, imputing genome-wide diploid genotypes from 14 Middle Neolithic to Middle Bronze Age individuals from Portugal, which identified close relationships between local hunter-gatherers and later Iberian Neolithic populations [140]. Eske et al. analyzed over 5000 imputed ancient genomes from western Eurasia, uncovering genetic changes driven by admixture among ancient steppe pastoralists, agriculturalists, and hunter-gatherers [141]. Their findings highlighted a heightened genetic predisposition to multiple sclerosis, introduced by steppe pastoralists [142], possibly as a selective adaptation to livestock-borne pathogens. The height disparities observed between northern and southern Europeans were linked to varying levels of steppe ancestry. Neolithic farmer ancestry was enriched with risk alleles for emotional traits, whereas Western hunter-gatherer lineages presented a higher prevalence of alleles linked to diabetes and Alzheimer’s disease. Additionally, alleles for lactase persistence emerged in Europe approximately 6000 years ago [143]. Ringbauer et al. developed the ancIBD tool, further advancing the use of imputed aDNA in ancient human history reconstruction [144]. The complete diploid ancient genome offers critical insights into the origins and spatiotemporal evolution of human diseases and traits, elucidating the influence of migration, admixture, and natural selection on disease emergence and development. These findings provide new perspectives on disease evolution and potential therapeutic strategies. With the exponential increase in sequenced ancient genomes, a greater representation of diverse ancestries and time periods is anticipated, offering a unique opportunity to expand HRPs via ancient genomes and standardize imputation methodologies.

Forensic science and forensic investigative genetic genealogy

Degraded forensic samples often hinder the generation of high-quality, high-coverage genomes required for forensic applications [145]. Genotype imputation utilizing population-specific HRPs has been widely applied in forensic genetic genealogy. In specific criminal cases, degraded DNA biomaterials may limit data availability. The imputation of missing loci enhances the comprehensiveness of genotype profiles, improving parentage testing, individual identification, forensic phenotype prediction, biogeographic ancestry inference, and phylogenetic reconstruction. This method enhances the accuracy of DNA comparisons, aiding in the identification and familial relationship determination of criminal suspects [146]. A notable example underscoring this approach’s impact is the Golden State Killer case. Despite raising privacy concerns and sparking intense debate within the academic community [147,148], forensic genealogy has demonstrated significant potential in resolving cold cases and bringing perpetrators to justice [149]. Its utility extends to identifying missing persons, locating relatives, and identifying human remains.

Population-specific, high-quality HRPs have shown tremendous value across diverse fields, including human evolutionary studies, disease genomics, pharmacogenomics, prenatal screening, paleogenomics, and forensic science. Continual updates to reference panels and imputation algorithms remain critical to maintaining the precision of imputed genotypes and advancing both research and clinical applications.

Challenges and perspectives

Inclusion of underrepresented ethnolinguistically diverse populations in the HRPs

The application of high-quality HRPs presents significant opportunities and challenges in the evolutionary genomics and pangenomics eras (Figure 5). As precision medicine, population genetics, and evolutionary biology advance, HRPs have become critical tools for improving genotype imputation, ancestry inference, and disease association studies. However, a key challenge lies in the underrepresentation of ethnolinguistically diverse populations in existing human genomic cohorts and the summarized panels. Large-scale, medical-driven human genomics data have primarily been collected from participants in metropolitan areas, often failing to capture the diversity of anthropologically informed local populations [25,39]. Consequently, current panels exhibit biases toward data from a limited number of ancestries, reducing imputation accuracy in underrepresented groups [25]. This shortfall hinders the comprehensive characterization of global genetic diversity, exacerbating health disparities and undermining equitable and personalized healthcare initiatives.

Historically, genomic studies have focused predominantly on populations of European or North American ancestry, perpetuating so-called European bias [150,151]. Our findings demonstrate that 60.7% of high-quality HRPs derived from European descendants inadequately represent non-European genetic variants, resulting in imprecise imputation. To address this issue, regional genome projects have been launched to construct population-specific HRPs, increasing diversity to some extent. However, the lack of internationally standardized sequencing and data-processing guidelines has introduced inconsistencies, complicating joint analyses and masking true signals. The Global Alliance for Genomics and Health has sought to address these challenges by promoting collaboration and interoperability, aiming to develop a standardized framework for capturing human genetic diversity.

High-quality genomic infrastructure and bioinformatics facilitating data sharing

Generating high-quality, population-specific HRPs requires large-scale, high-resolution sequencing data, which remains costly and resource-intensive. The need for robust data curation, quality control, and significant computational resources further complicates the development and maintenance of these panels. Key foundational elements include biorepositories and computing infrastructure, funding strategies, capacity building, global consortia cooperation, and stakeholder will from research and funders. Data sharing and equity have been longstanding points of debate in genomic research and reference panel merging and optimization, as the availability of genomic data plays a pivotal role in advancing precision medicine and personalized treatments [152]. Currently, the common formats for sharing raw genomic data can be categorized into two types: intra-federation sharing and cloud-based sharing. For example, data from the UKB have attracted global scientific interest, with its sharing through cloud-based platforms, significantly advancing genomic research in the UK. Similarly, initiatives such as TOPMed and All of Us have expanded data sharing among their members. The International Hundred Thousand Plus Cohort Consortium has brought together over 100 cohorts from 43 countries, encompassing more than 50 million participants [153].

However, unrestricted data sharing is not universally appropriate. In 2016, representatives from academia, industry, funding agencies, and scholarly publishers established the FAIR Data Principles, emphasizing data that are findable, accessible, interoperable, and reusable [154]. In 2019, the Global Indigenous Data Alliance introduced the CARE Principles, focusing on collective benefit, authority to control, responsibility, and ethics in indigenous data governance [155]. Genomic data sharing must adhere to both FAIR and CARE principles to maximize value while addressing ethical, legal, and social implications. Advanced algorithms are essential to enable secure sharing, evaluation, and selection of genomic data. For example, the Recombine and Share Haplotypes method generates synthetic HRPs by simulating hypothetical descendants of reference panel samples after a user-defined number of meiosis events [156]. Meta-imputation offers another approach, leveraging multiple imputation servers based on different reference panels to improve the accuracy of target sample imputation [114]. These strategies partially integrate the benefits of multiple HRPs while addressing privacy concerns.

Although numerous HRPs have been generated, only a limited number are publicly accessible (Figure 2). Enhanced data accessibility could facilitate reciprocal imputation approaches, integrating the strengths of multiple HRPs. Achieving a balance between data utility and privacy preservation requires collaboration among the technical, regulatory, and ethical communities. Experts in computer security, genetics, computer science, ethics, and privacy law must work together to establish policies that support efficient, privacy-respecting genomic data sharing [157].

T2T-level HRPs and multi-variant integration

The advent of large-scale genomic datasets [158,159], the completion of the T2T human genome sequence [160,161], and the development of human pangenome projects [162,163] have underscored the importance of comprehensively capturing human genetic diversity and elucidating the functional roles of population-specific sequences and variations (Figure 5). The T2T reference genome and pangenome-based third-generation sequencing paradigm have provided geneticists unprecedented opportunities to develop high-quality HRPs with multiple ancestries and genetic variant types. In the T2T and pangenomic eras, integrating complex SVs and rare haplotypes into HRPs introduces additional complexity, requiring sophisticated algorithms and analytical approaches. Technological advancements, particularly in long-read sequencing and phased genome assembly, have enabled significant progress in refining HRPs. Improved methodologies for resolving complex haplotypes and rare alleles are essential for enhancing panel specificity and accuracy. Collaborative efforts across global consortia and ongoing algorithmic innovations will be key to addressing existing limitations.

Empirical studies on the impact of rare variants on complex traits and diseases remain limited, largely due to challenges in genotype imputation for rare variants, compounded by European bias, small sample sizes, and inadequate sequencing depth [38]. Previous HRPs predominantly included SNPs but rarely incorporated InDels or SVs, largely due to challenges in high-quality SV calling and phasing. This absence has severely constrained downstream analyses, as complex variations undetected by SNPs likely account for a significant portion of unexplained heritability [164]. To address these limitations, the expanded 1KGP has improved HRPs by integrating high-confidence InDels and SVs [42]. However, short-read WGS methods are limited in detecting SVs in highly repetitive genomic regions compared with long-read sequencing approaches. Recent efforts, such as SNP/short tandem repeat (STR) combined panels, have enabled genome-wide STR imputation [165,166]. However, current imputation tools remain optimized solely for SNPs, highlighting the need for further improvements to accommodate other variant types.

The future objective for HRPs is to establish a pangenome HRP based on the T2T assembly. Such advancements will enable large-scale cohort studies in precision medicine, encompassing targeted prevention, treatment, and diagnosis. High-quality HRPs hold transformative potential across diverse fields, including paleogenomics, forensic science, pharmacogenomics, and clinical diagnostics. Expanding and diversifying these panels while maintaining stringent quality standards is essential to maximize their impact. Addressing these challenges will foster more equitable research outcomes, drive biomedical innovation, and deepen our understanding of human genetic diversity. Despite these challenges, haplotype-resolved reference panels offer significant opportunities to advance precision medicine, enhance the detection of disease-associated variants, and provide deeper insights into population histories. Future efforts should prioritize emerging technologies, such as long-read sequencing and artificial intelligence-driven bioinformatics tools, to overcome existing limitations. Collaboration across scientific, medical, and indigenous communities will be crucial to ensuring the quality, inclusivity, and ethical application of HRPs in the genomic and pangenomic landscapes.

The trade-off between multi-ancestry integrative HRPs and population-specific HRPs

Multi-ancestry integrative HRPs incorporate high-quality genomic data from individuals of diverse genetic backgrounds, often spanning multiple continents or genetically distinct subgroups within a continent, as exemplified by the state-of-the-art TOPMed reference panel [1]. TOPMed-based imputed datasets identify more variant sites and high-impact consequence variants than those generated from other panel-imputed datasets, highlighting the potential of multi-ancestral reference panels to provide more comprehensive genotypic information in the context of mixed and heterogeneous target populations with different ancestral sources. In contrast, population-specific panels focus on genomic data from relatively homogeneous populations with well-characterized genetic backgrounds, such as the NyuWa, ChinaMAP, and WBBC panels, which are optimized for imputation within Han Chinese populations [4,21,63].

Several key factors, including haplotype diversity, panel size, genotype imputation accuracy, preservation of the LD structure, privacy and data sharing considerations, adaptability, and hardware requirements for big data processing, influence the choice between these panel types. Multi-ancestry panels capture a broader spectrum of genetic variation, making them crucial for studies involving globally diverse populations. By integrating data from multiple populations, these panels offer a more comprehensive view of haplotype diversity. However, population-specific panels, which focus on a single group, typically achieve higher resolution and more precise genetic insights for that population. They present greater genotype imputation accuracy, especially for rare variants, as they are optimized to reflect unique genetic variants and LD patterns. In contrast, multi-ancestry panels, while accommodating a range of heterogeneous populations, may suffer from reduced accuracy for specific groups due to the need to balance genetic variation across heterogeneous datasets. These panels also face challenges in preserving LD structure, as LD patterns differ significantly between populations. Population-specific panels, by concentrating on a single group, better maintain the characteristic LD structure, thereby increasing the resolution of population-specific analyses.

From a privacy perspective, multi-ancestry panels offer enhanced protection by blending haplotypes across diverse populations, reducing the risk of directly linking genotypes to phenotypes and facilitating broader data sharing while safeguarding individual privacy. In contrast, population-specific panels may require additional privacy measures due to their targeted design. Despite this, population-specific panels provide fine-grained insights into genetic studies within a specific population, making them ideal for tailored research. Multi-ancestry panels, on the other hand, are better suited for cross-population analyses and global studies, offering broader applicability. Both multi-ancestry and population-specific HRPs present unique advantages and challenges. The selection of an appropriate panel should align with the imputed objectives, the genetic characteristics of the target population, and privacy requirements. A flexible approach, leveraging one or both panel types based on research needs, may maximize the utility of HRPs across diverse genetic studies.

CRediT author statement

Qingxin Yang: Formal analysis, Methodology, Writing – original draft, Visualization, Writing – review & editing. Yuntao Sun: Visualization. Shuhan Duan: Investigation. Shengjie Nie: Project administration. Chao Liu: Project administration, Supervision. Hong Deng: Funding acquisition, Writing – review & editing. Mengge Wang: Funding acquisition, Supervision, Writing – review & editing. Guanglin He: Conceptualization, Writing – review & editing, Resources. All authors have read and approved the final manuscript.

Competing interests

The authors have declared no competing interests.

Supplementary Material

qzaf022_Supplementary_Data

qzaf022_supplementary_data.zip^{(47.6KB, zip)}

Acknowledgments

This study was supported by the National Natural Science Foundation of China (Grant Nos. 82402203 and 82202078), the Major Project of the National Social Science Foundation of China (Grant No. 23&ZD203), the Open Project of the Key Laboratory of Forensic Genetics of the Ministry of Public Security (Grant Nos. 2022FGKFKT05 and 2024FGKFKT02), the Center for Archaeological Science of Sichuan University (Grant Nos. 23SASA01 and 24SASB03), the 1·3·5 Project for Disciplines of Excellence, West China Hospital, Sichuan University (Grant No. ZYJC20002), and the Sichuan Science and Technology Program (Grant No. 2024NSFSC1518), China. We acknowledge Grammarly (https://app.grammarly.com/) for its invaluable contribution to refining the language and enhancing the readability of this manuscript and BioRender (https://app.biorender.com/) for its assistance in creating the figures. Thanks to the China National Supercomputing Center in Chengdu for providing storage for sequencing data and computational resources.

Contributor Information

Qingxin Yang, Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China; School of Forensic Medicine, Kunming Medical University, Kunming 650500, China.

Yuntao Sun, Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China; West China School of Basic Science & Forensic Medicine, Sichuan University, Chengdu 610000, China.

Shuhan Duan, Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China; School of Basic Medical Sciences, North Sichuan Medical College, Nanchong 637100, China.

Shengjie Nie, School of Forensic Medicine, Kunming Medical University, Kunming 650500, China.

Chao Liu, Anti-Drug Technology Center of Guangdong Province, Guangzhou 510230, China.

Hong Deng, School of Forensic Medicine, Kunming Medical University, Kunming 650500, China.

Mengge Wang, Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China; Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing 400331, China; Human Genetics and Forensic Genomics Research Institute, College of Basic Medicine, Chongqing Medical University, Chongqing 400331, China.

Guanglin He, Department of Oto-Rhino-Laryngology & Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China; Center for Archaeological Science, Sichuan University, Chengdu 610000, China.

Supplementary material

Supplementary material is available at Genomics, Proteomics & Bioinformatics online (https://doi.org/10.1093/gpbjnl/qzaf022).

ORCID

0009-0007-5099-6925 (Qingxin Yang)

0009-0007-8419-1420 (Yuntao Sun)

0009-0005-7214-581X (Shuhan Duan)

0000-0001-6414-6621 (Shengjie Nie)

0000-0001-5633-3929 (Chao Liu)

0009-0003-9889-6465 (Hong Deng)

0000-0002-3673-1855 (Mengge Wang)

0000-0002-6614-5267 (Guanglin He)

References

[1]. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021;590:290–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2]. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 2016;48:1279–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3]. The 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015;526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4]. Zhang P, Luo H, Li Y, Wang Y, Wang J, Zheng Y, et al. NyuWa Genome resource: a deep whole-genome sequencing-based variation profile and reference panel for the Chinese population. Cell Rep 2021;37:110017. [DOI] [PubMed] [Google Scholar]
[5]. Stark Z, Scott RH. Genomic newborn screening for rare diseases. Nat Rev Genet 2023;24:755–66. [DOI] [PubMed] [Google Scholar]
[6]. He G, Yao H, Duan S, Luo L, Sun Q, Tang R, et al. Pilot work of the 10K Chinese People Genomic Diversity Project along the Silk Road suggests a complex east-west admixture landscape and biological adaptations. Sci China Life Sci 2025;68:914–33. [DOI] [PubMed] [Google Scholar]
[7]. Li X, Wang M, Su H, Duan S, Sun Y, Chen H, et al. Evolutionary history and biological adaptation of Han Chinese people on the Mongolian Plateau. hLife 2024;2:296–313. [Google Scholar]
[8]. Sousa da Mota B, Rubinacci S, Cruz Dávalos DI, Amorim CEG, Sikora M, Johannsen NN, et al. Imputation of ancient human genomes. Nat Commun 2023;14:3660. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9]. Cook S, Choi W, Lim H, Luo Y, Kim K, Jia X, et al. Accurate imputation of human leukocyte antigens with CookHLA. Nat Commun 2021;12:1264. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10]. Sakaue S, Gurajala S, Curtis M, Luo Y, Choi W, Ishigaki K, et al. Tutorial: a statistical genetics guide to identifying HLA alleles driving complex disease. Nat Protoc 2023;18:2625–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11]. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet 2010;11:499–511. [DOI] [PubMed] [Google Scholar]
[12]. Choi J, Kim S, Kim J, Son HY, Yoo SK, Kim CU, et al. A whole-genome reference panel of 14,393 individuals for East Asian populations accelerates discovery of rare functional variants. Sci Adv 2023;9:eadg6319. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13]. Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res 2020;30:717–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14]. Yang MY, Zhong JD, Li X, Tian G, Bai WY, Fang YH, et al. SEAD reference panel with 22,134 haplotypes boosts rare variant imputation and genome-wide association analysis in Asian populations. Nat Commun 2024;15:10839. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15]. Loh PR, Palamara PF, Price AL. Fast and accurate long-range phasing in a UK Biobank cohort. Nat Genet 2016;48:811–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16]. O’Connell J, Sharp K, Shrine N, Wain L, Hall I, Tobin M, et al. Haplotype estimation for biobank-scale data sets. Nat Genet 2016;48:817–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17]. Loh PR, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 2016;48:1443–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18]. Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet 2016;48:1284–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19]. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 2010;34:816–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20]. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, et al. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 2009;84:235–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21]. Li L, Huang P, Sun X, Wang S, Xu M, Liu S, et al. The ChinaMAP reference panel for the accurate genotype imputation in Chinese populations. Cell Res 2021;31:1308–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22]. Yoo SK, Kim CU, Kim HL, Kim S, Shin JY, Kim N, et al. NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants. Genome Med 2019;11:64. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23]. Du Z, Ma L, Qu H, Chen W, Zhang B, Lu X, et al. Whole genome analyses of Chinese population and de novo assembly of a northern Han genome. Genomics Proteomics Bioinformatics 2019;17:229–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24]. Pasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H, et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet 2012;44:631–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25]. Cahoon JL, Rui X, Tang E, Simons C, Langie J, Chen M, et al. Imputation accuracy across global human populations. Am J Hum Genet 2024;111:979–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26]. Si Y, Vanderwerff B, Zöllner S. Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms. Genetics 2021;217:iyab011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27]. The International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007;449:851–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28]. He G, Wang M, Luo L, Sun Q, Yuan H, Lv H, et al. Population genomics of Central Asian peoples unveil ancient Trans-Eurasian genetic admixture and cultural exchanges. hLife 2024;2:554–62. [Google Scholar]
[29]. Shi M, Tanikawa C, Munter HM, Akiyama M, Koyama S, Tomizuka K, et al. Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels. Brief Bioinform 2023;25:bbad509. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30]. Lin Y, Liu L, Yang S, Li Y, Lin D, Zhang X, et al. Genotype imputation for Han Chinese population using Haplotype Reference Consortium as reference. Hum Genet 2018;137:431–6. [DOI] [PubMed] [Google Scholar]
[31]. Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 2015;517:327–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32]. Zheng W, He Y, Guo Y, Yue T, Zhang H, Li J, et al. Large-scale genome sequencing redefines the genetic footprints of high-altitude adaptation in Tibetans. Genome Biol 2023;24:73. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33]. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature 2016;538:161–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34]. Jostins L, Morley KI, Barrett JC. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur J Hum Genet 2011;19:662–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35]. Bomba L, Walter K, Soranzo N. The impact of rare and low-frequency genetic variants in common disease. Genome Biol 2017;18:77. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36]. Das S, Abecasis GR, Browning BL. Genotype imputation from large reference panels. Annu Rev Genomics Hum Genet 2018;19:73–96. [DOI] [PubMed] [Google Scholar]
[37]. Bai WY, Zhu XW, Cong PK, Zhang XJ, Richards JB, Zheng HF. Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity. Brief Bioinform 2019;21:bbz108. [DOI] [PubMed] [Google Scholar]
[38]. Hoffmann TJ, Witte JS. Strategies for imputing and analyzing rare variants in association studies. Trends Genet 2015;31:556–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39]. Shi S, Rubinacci S, Hu S, Moutsianas L, Stuckey A, Need AC, et al. A Genomics England haplotype reference panel and imputation of UK Biobank. Nat Genet 2024;56:1800–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40]. All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 2024;627:340–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41]. The International HapMap Consortium. A haplotype map of the human genome. Nature 2005;437:1299–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42]. Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022;185:3426–40.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43]. The 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44]. Delaneau O, Marchini J, The 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat Commun 2014;5:3934. [DOI] [PMC free article] [PubMed] [Google Scholar]
[45]. Xue Y, Mezzavilla M, Haber M, McCarthy S, Chen Y, Narasimhan V, et al. Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations. Nat Commun 2017;8:15927. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46]. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 2014;46:818–25. [DOI] [PubMed] [Google Scholar]
[47]. UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies rare variants in health and disease. Nature 2015;526:82–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48]. Sidore C, Busonero F, Maschio A, Porcu E, Naitza S, Zoledziewska M, et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat Genet 2015;47:1272–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49]. Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet 2015;47:435–44. [DOI] [PubMed] [Google Scholar]
[50]. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 2016;538:238–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
[51]. Southam L, Gilly A, Süveges D, Farmaki AE, Schwartzentruber J, Tachmazidou I, et al. Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits. Nat Commun 2017;8:15606. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52]. Mitt M, Kals M, Pärn K, Gabriel SB, Lander ES, Palotie A, et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 2017;25:869–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
[53]. Lencz T, Yu J, Palmer C, Carmi S, Ben-Avraham D, Barzilai N, et al. High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 2018;137:343–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54]. Valls-Margarit J, Galván-Femenía I, Matías-Sánchez D, Blay N, Puiggròs M, Carreras A, et al. GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing. Nucleic Acids Res 2022;50:2464–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
[55]. Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, et al. The sequences of 150,119 genomes in the UK Biobank. Nature 2022;607:732–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
[56]. Banasik K, Møller PL, Techlo TR, Holm PC, Walters GB, Ingason A, et al. DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals. BMC Genom Data 2023;24:30. [DOI] [PMC free article] [PubMed] [Google Scholar]
[57]. Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023;613:508–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
[58]. Reščenko R, Brīvība M, Atava I, Rovīte V, Pečulis R, Silamiķelis I, et al. Whole-genome sequencing of 502 individuals from Latvia: the first step towards a population-specific reference of genetic variation. Int J Mol Sci 2023;24:15345. [DOI] [PMC free article] [PubMed] [Google Scholar]
[59]. Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell 2019;177:26–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
[60]. Liu S, Huang S, Chen F, Zhao L, Yuan Y, Francis SS, et al. Genomic analyses from non-invasive prenatal testing reveal genetic associations, patterns of viral infections, and Chinese population history. Cell 2018;175:347–59.e14. [DOI] [PubMed] [Google Scholar]
[61]. Iglesias AI, Mishra A, Vitart V, Bykhovskaya Y, Höhn R, Springelkamp H, et al. Cross-ancestry genome-wide association analysis of corneal thickness strengthens link between complex and Mendelian eye diseases. Nat Commun 2018;9:1864. [DOI] [PMC free article] [PubMed] [Google Scholar]
[62]. Wei CY, Yang JH, Yeh EC, Tsai MF, Kao HJ, Lo CZ, et al. Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese. NPJ Genom Med 2021;6:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
[63]. Cong PK, Bai WY, Li JC, Yang MY, Khederzadeh S, Gai SR, et al. Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project. Nat Commun 2022;13:2939. [DOI] [PMC free article] [PubMed] [Google Scholar]
[64]. Wang C, Dai J, Qin N, Fan J, Ma H, Chen C, et al. Analyses of rare predisposing variants of lung cancer in 6,004 whole genomes in Chinese. Cancer Cell 2022;40:1223–39.e6. [DOI] [PubMed] [Google Scholar]
[65]. Gao Y, Zhang C, Yuan L, Ling Y, Wang X, Liu C, et al. PGG.Han: the Han Chinese genome database and analysis platform. Nucleic Acids Res 2020;48:D971–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[66]. Jeon S, Bhak Y, Choi Y, Jeon Y, Kim S, Jang J, et al. Korean Genome Project: 1094 Korean personal genomes with clinical information. Sci Adv 2020;6:eaaz7835. [DOI] [PMC free article] [PubMed] [Google Scholar]
[67]. Kars ME, Başak AN, Onat OE, Bilguvar K, Choi J, Itan Y, et al. The genetic structure of the Turkish population reveals high levels of variation and admixture. Proc Natl Acad Sci U S A 2021;118:e2026076118. [DOI] [PMC free article] [PubMed] [Google Scholar]
[68]. Razali RM, Rodriguez-Flores J, Ghorbani M, Naeem H, Aamer W, Aliyev E, et al. Thousands of Qatari genomes inform human migration history and improve imputation of Arab haplotypes. Nat Commun 2021;12:5929. [DOI] [PMC free article] [PubMed] [Google Scholar]
[69]. Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, et al. IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic Acids Res 2021;49:D1225–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
[70]. Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun 2015;6:8018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[71]. Okada Y, Momozawa Y, Sakaue S, Kanai M, Ishigaki K, Akiyama M, et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat Commun 2018;9:1631. [DOI] [PMC free article] [PubMed] [Google Scholar]
[72]. Tadaka S, Katsuoka F, Ueki M, Kojima K, Makino S, Saito S, et al. 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome. Hum Genome Var 2019;6:28. [DOI] [PMC free article] [PubMed] [Google Scholar]
[73]. Wu D, Dou J, Chai X, Bellis C, Wilm A, Shih CC, et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell 2019;179:736–49.e15. [DOI] [PubMed] [Google Scholar]
[74]. GenomeAsia 100K Consortium. The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature 2019;576:106–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
[75]. Cengnata A, Deng L, Yap WS, Lim LR, Leong CO, Xu S, et al. A genotype imputation reference panel specific for native Southeast Asian populations. NPJ Genom Med 2024;9:47. [DOI] [PMC free article] [PubMed] [Google Scholar]
[76]. Hwang MY, Choi NH, Won HH, Kim BJ, Kim YJ. Analyzing the Korean reference genome with meta-imputation increased the imputation accuracy and spectrum of rare variants in the Korean population. Front Genet 2022;13:1008646. [DOI] [PMC free article] [PubMed] [Google Scholar]
[77]. Jeon S, Choi H, Jeon Y, Choi WH, Choi H, An K, et al. Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups. Gigascience 2024;13:giae014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[78]. Tadaka S, Kawashima J, Hishinuma E, Saito S, Okamura Y, Otsuki A, et al. jMorp: Japanese Multi-Omics Reference Panel update report 2023. Nucleic Acids Res 2024;52:D622–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
[79]. Ardiansyah E, Riza AL, Dian S, Ganiem AR, Alisjahbana B, Setiabudiawan TP, et al. Sequencing whole genomes of the West Javanese population in Indonesia reveals novel variants and improves imputation accuracy. Front Genet 2025:15:1492602. [DOI] [PMC free article] [PubMed] [Google Scholar]
[80]. He Y, Lei C, Wan C, Zeng S, Zhang T, Luo F, et al. A comprehensive whole genome database of ethnic minority populations. Sci Rep 2024;14:13954. [DOI] [PMC free article] [PubMed] [Google Scholar]
[81]. Yu C, Lan X, Tao Y, Guo Y, Sun D, Qian P, et al. A high-resolution haplotype-resolved reference panel constructed from the China Kadoorie Biobank study. Nucleic Acids Res 2023;51:11770–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
[82]. Huang S, Liu S, Huang M, He JR, Wang C, Wang T, et al. The Born in Guangzhou Cohort Study enables generational genetic discoveries. Nature 2024;626:565–73. [DOI] [PubMed] [Google Scholar]
[83]. Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hünemeier T, Petzl-Erler ML, et al. Genetic evidence for two founding populations of the Americas. Nature 2015;525:104–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
[84]. Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 2014;505:87–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
[85]. Mathias RA, Taub MA, Gignoux CR, Fu W, Musharoff S, O’Connor TD, et al. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 2016;7:12522. [DOI] [PMC free article] [PubMed] [Google Scholar]
[86]. O’Connell J, Yun T, Moreno M, Li H, Litterman N, Kolesnikov A, et al. A population-specific reference panel for improved genotype imputation in African Americans. Commun Biol 2021;4:1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
[87]. Hou L, Kember RL, Roach JC, O’Connell JR, Craig DW, Bucan M, et al. A population-specific reference panel empowers genetic studies of Anabaptist populations. Sci Rep 2017;7:6079. [DOI] [PMC free article] [PubMed] [Google Scholar]
[88]. Cheng PL, Wang H, Dombroski BA, Farrell JJ, Horng I, Chung T, et al. A specialized reference panel with structural variants integration for improving genotype imputation in Alzheimer disease and related dementias. HGG Adv 2025;6:100487. [DOI] [PMC free article] [PubMed]
[89]. Naslavsky MS, Scliar MO, Yamamoto GL, Wang JYT, Zverinova S, Karp T, et al. Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil. Nat Commun 2022;13:1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
[90]. Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 2023;622:784–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
[91]. Gurdasani D, Carstensen T, Fatumo S, Chen G, Franklin CS, Prado-Martinez J, et al. Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell 2019;179:984–1002.e36. [DOI] [PMC free article] [PubMed] [Google Scholar]
[92]. Carlson JC, Krishnan M, Liu S, Anderson KJ, Zhang JZ, Yapp TJ, et al. Improving imputation quality in Samoans through the integration of population-specific sequences into existing reference panels. medRxiv 2023;23297835. [Google Scholar]
[93]. Herzig AF, Velo-Suarez L, FrEx Consortium, FranceGenRef Consortium, Dina C, Redon R, et al. How local reference panels improve imputation in French populations. Sci Rep 2024;14:370. [DOI] [PMC free article] [PubMed] [Google Scholar]
[94]. Deng T, Zhang P, Garrick D, Gao H, Wang L, Zhao F. Comparison of genotype imputation for SNP array and low-coverage whole-genome sequencing data. Front Genet 2022;12:704118. [DOI] [PMC free article] [PubMed] [Google Scholar]
[95]. Sengupta D, Botha G, Meintjes A, Mbiyavanga M, AWI-Gen Study, H3Africa Consortium, et al. Performance and accuracy evaluation of reference panels for genotype imputation in sub-Saharan African populations. Cell Genom 2023;3:100332. [DOI] [PMC free article] [PubMed] [Google Scholar]
[96]. Xu J, Liu D, Hassan A, Genovese G, Cote AC, Fennessy B, et al. Evaluation of imputation performance of multiple reference panels in a Pakistani population. HGG Adv 2025;6:100395. [DOI] [PMC free article] [PubMed] [Google Scholar]
[97]. Hofmeister RJ, Ribeiro DM, Rubinacci S, Delaneau O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat Genet 2023;55:1243–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[98]. Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet 2021;108:1880–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
[99]. Browning SR, Browning BL. Haplotype phasing: existing methods and new developments. Nat Rev Genet 2011;12:703–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
[100]. Delaneau O, Zagury JF, Robinson MR, Marchini JL, Dermitzakis ET. Accurate, scalable and integrative haplotype estimation. Nat Commun 2019;10:5436. [DOI] [PMC free article] [PubMed] [Google Scholar]
[101]. Williams CM, O’Connell J, Jewett EFreyman WA, 23andMe Research Team, Gignoux CR, et al. Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses. HGG Adv 2026;7:100526. [DOI] [PMC free article] [PubMed] [Google Scholar]
[102]. De Marino A, Mahmoud AA, Bose M, Bircan KO, Terpolovsky A, Bamunusinghe V, et al. A comparative analysis of current phasing and imputation software. PLoS One 2022;17:e0260177. [DOI] [PMC free article] [PubMed] [Google Scholar]
[103]. Huang J, Howie B, McCarthy S, Memari Y, Walter K, Min JL, et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 2015;6:8111. [DOI] [PMC free article] [PubMed] [Google Scholar]
[104]. Rubinacci S, Delaneau O, Marchini J. Genotype imputation using the Positional Burrows Wheeler Transform. PLoS Genet 2020;16:e1009049. [DOI] [PMC free article] [PubMed] [Google Scholar]
[105]. Davies RW, Kucka M, Su D, Shi S, Flanagan M, Cunniff CM, et al. Rapid genotype imputation from sequence with reference panels. Nat Genet 2021;53:1104–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
[106]. Rubinacci S, Ribeiro DM, Hofmeister RJ, Delaneau O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat Genet 2021;53:120–6. [DOI] [PubMed] [Google Scholar]
[107]. Liu S, Liu Y, Gu Y, Lin X, Zhu H, Liu H, et al. Utilizing non-invasive prenatal test sequencing data for human genetic investigation. Cell Genom 2024;4:100669. [DOI] [PMC free article] [PubMed] [Google Scholar]
[108]. Cox SL, Moots HM, Stock JT, Shbat A, Bitarello BD, Nicklisch N, et al. Predicting skeletal stature using ancient DNA. Am J Biol Anthropol 2022;177:162–74. [Google Scholar]
[109]. Çubukcu H, Kilinç GM. Evaluation of genotype imputation using Glimpse tools on low coverage ancient DNA. Mamm Genome 2024;35:461–73. [DOI] [PubMed] [Google Scholar]
[110]. Rubinacci S, Hofmeister RJ, Sousa da Mota B, Delaneau O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat Genet 2023;55:1088–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
[111]. Yu K, Das S, LeFaive J, Kwong A, Pleiness J, Forer L, et al. Meta-imputation: an efficient method to combine genotype data after imputation with multiple reference panels. Am J Hum Genet 2022;109:1007–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
[112]. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 2014;513:409–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
[113]. Sheng X, Xia L, Cahoon JL, Conti DV, Haiman CA, Kachuri L, et al. Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing. HGG Adv 2023;4:100159. [DOI] [PMC free article] [PubMed] [Google Scholar]
[114]. Pan Y, Zhang C, Lu Y, Ning Z, Lu D, Gao Y, et al. Genomic diversity and post-admixture adaptation in the Uyghurs. Natl Sci Rev 2022;9:nwab124. [DOI] [PMC free article] [PubMed] [Google Scholar]
[115]. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet 2019;20:467–84. [DOI] [PubMed] [Google Scholar]
[116]. Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT. Data quality control in genetic case-control association studies. Nat Protoc 2010;5:1564–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
[117]. Winkler TW, Day FR, Croteau-Chonka DC, Wood AR, Locke AE, Mägi R, et al. Quality control and conduct of genome-wide association meta-analyses. Nat Protoc 2014;9:1192–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
[118]. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018;562:203–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[119]. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry. Hum Mol Genet 2018;27:3641–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[120]. Zhao W, Rasheed A, Tikkanen E, Lee JJ, Butterworth AS, Howson JMM, et al. Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with coronary heart disease. Nat Genet 2017;49:1450–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
[121]. Hyde CL, Nagle MW, Tian C, Chen X, Paciga SA, Wendland JR, et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat Genet 2016;48:1031–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[122]. Gaynor SM, Joseph T, Bai X, Zou Y, Boutkov B, Maxwell EK, et al. Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank. Nat Genet 2024;56:2345–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
[123]. Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, et al. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024;25:8–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
[124]. Lewontin RC. Population genetics. Annu Rev Genet 1973;7:1–17. [Google Scholar]
[125]. Bradburd GS, Ralph PL. Spatial population genetics: it’s about time. Annu Rev Ecol Evol Syst 2019;50:427–49. [Google Scholar]
[126]. Nagar SD, Conley AB, Jordan IK. Population structure and pharmacogenomic risk stratification in the United States. BMC Biol 2020;18:140. [DOI] [PMC free article] [PubMed] [Google Scholar]
[127]. Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Dunham AS, et al. Population structure, stratification, and introgression of human structural variation. Cell 2020;182:189–99.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]
[128]. Aneli S, Birolo G, Matullo G. Twenty years of the Human Genome Diversity Project. Hum Popul Genet Genom 2022:2:0005. [Google Scholar]
[129]. Borda V, Loesch DP, Guo B, Laboulaye R, Veliz-Otani D, French JN, et al. Genetics of Latin American Diversity Project: insights into population genetics and association studies in admixed groups in the Americas. Cell Genom 2024;4:100692. [DOI] [PMC free article] [PubMed] [Google Scholar]
[130]. Schraiber JG, Akey JM. Methods and models for unravelling human evolutionary history. Nat Rev Genet 2015;16:727–40. [DOI] [PubMed] [Google Scholar]
[131]. Barbarino JM, Whirl-Carrillo M, Altman RB, Klein TE. PharmGKB: a worldwide resource for pharmacogenomic information. Wiley Interdiscip Rev Syst Biol Med 2018;10:e1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
[132]. Reay WR, Cairns MJ. Advancing the use of genome-wide association studies for drug repurposing. Nat Rev Genet 2021;22:658–71. [DOI] [PubMed] [Google Scholar]
[133]. Sanseau P, Agarwal P, Barnes MR, Pastinen T, Richards JB, Cardon LR, et al. Use of genome-wide association studies for drug repositioning. Nat Biotechnol 2012;30:317–20. [DOI] [PubMed] [Google Scholar]
[134]. Zhou K, Pedersen HK, Dawed AY, Pearson ER. Pharmacogenomics in diabetes mellitus: insights into drug action and drug discovery. Nat Rev Endocrinol 2016;12:337–46. [DOI] [PubMed] [Google Scholar]
[135]. Norton ME. Noninvasive prenatal testing to analyze the fetal genome. Proc Natl Acad Sci U S A 2016;113:14173–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
[136]. Haidar H, Le Clerc-Blain J, Vanstone M, Laberge AM, Bibeau G, Ghulmiyyah L, et al. A qualitative study of women and partners from Lebanon and Quebec regarding an expanded scope of noninvasive prenatal testing. BMC Pregnancy Childbirth 2021;21:54. [DOI] [PMC free article] [PubMed] [Google Scholar]
[137]. Chan KCA, Jiang P, Sun K, Cheng YKY, Tong YK, Cheng SH, et al. Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends. Proc Natl Acad Sci U S A 2016;113:E8159–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
[138]. Hui R, D’Atanasio E, Cassidy LM, Scheib CL, Kivisild T. Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes. Sci Rep 2020;10:18542. [DOI] [PMC free article] [PubMed] [Google Scholar]
[139]. Ausmees K, Sanchez-Quinto F, Jakobsson M, Nettelblad C. An empirical evaluation of genotype imputation of ancient DNA. G3 (Bethesda) 2022;12:jkac089. [DOI] [PMC free article] [PubMed] [Google Scholar]
[140]. Martiniano R, Cassidy LM, Ó’Maoldúin R, McLaughlin R, Silva NM, Manco L, et al. The population genomics of archaeological transition in west Iberia: investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet 2017;13:e1006852. [DOI] [PMC free article] [PubMed] [Google Scholar]
[141]. Allentoft ME, Sikora M, Refoyo-Martínez A, Irving-Pease EK, Fischer A, Barrie W, et al. Population genomics of post-glacial western Eurasia. Nature 2024;625:301–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
[142]. Barrie W, Yang Y, Irving-Pease EK, Attfield KE, Scorrano G, Jensen LT, et al. Elevated genetic risk for multiple sclerosis emerged in steppe pastoralist populations. Nature 2024;625:321–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
[143]. Irving-Pease EK, Refoyo-Martínez A, Barrie W, Ingason A, Pearson A, Fischer A, et al. The selection landscape and genetic legacy of ancient Eurasians. Nature 2024;625:312–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
[144]. Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, et al. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024;56:143–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
[145]. Wang M, Chen H, Luo L, Huang Y, Duan S, Yuan H, et al. Forensic investigative genetic genealogy: expanding pedigree tracing and genetic inquiry in the genomic era. J Genet Genomics 2025;52:460–72. [DOI] [PubMed] [Google Scholar]
[146]. Erlich Y, Shor T, Pe’er I, Carmi S. Identity inference of genomic data using long-range familial searches. Science 2018;362:690–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
[147]. Ram N, Murphy EE, Suter SM. Regulating forensic genetic genealogy. Science 2021;373:1444–6. [DOI] [PubMed] [Google Scholar]
[148]. May T. Sociogenetic risks — ancestry DNA testing, third-party identity, and protection of privacy. N Engl J Med 2018;379:410–2. [DOI] [PubMed] [Google Scholar]
[149]. Dowdeswell TL. Forensic genetic genealogy: a profile of cases solved. Forensic Sci Int Genet 2022;58:102679. [DOI] [PubMed] [Google Scholar]
[150]. Bentley AR, Callier S, Rotimi CN. Diversity and inclusion in genomic research: why the uneven progress? J Community Genet 2017;8:255–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
[151]. Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies. Trends Genet 2009;25:489–94. [DOI] [PubMed] [Google Scholar]
[152]. Jones KM, Cook-Deegan R, Rotimi CN, Callier SL, Bentley AR, Stevens H, et al. Complicated legacies: the human genome at 20. Science 2021;371:564–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
[153]. Manolio TA, Goodhand P, Ginsburg G. The International Hundred Thousand Plus Cohort Consortium: integrating large-scale cohorts to address global scientific challenges. Lancet Digit Health 2020;2:e567–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
[154]. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[155]. Carroll SR, Herczog E, Hudson M, Russell K, Stall S. Operationalizing the CARE and FAIR principles for indigenous data futures. Sci Data 2021;8:108. [DOI] [PMC free article] [PubMed] [Google Scholar]
[156]. Cavinato T, Rubinacci S, Malaspinas AS, Delaneau O. A resampling-based approach to share reference panels. Nat Comput Sci 2024;4:360–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
[157]. Wang S, Jiang X, Singh S, Marmor R, Bonomi L, Fox D, et al. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Ann N Y Acad Sci 2017;1387:73–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
[158]. Ouzhuluobu, He Y, Lou H, Cui C, Deng L, Gao Y, et al. De novo assembly of a Tibetan genome and identification of novel structural variants associated with high-altitude adaptation. Natl Sci Rev 2020;7:391–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
[159]. Lou H, Gao Y, Xie B, Wang Y, Zhang H, Shi M, et al. Haplotype-resolved de novo assembly of a Tujia genome suggests the necessity for high-quality population-specific genome references. Cell Syst 2022;13:321–33.e6. [DOI] [PubMed] [Google Scholar]
[160]. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science 2022;376:44–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
[161]. Chao KH, Zimin AV, Pertea M, Salzberg SL. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3 (Bethesda) 2023;13:jkac321. [DOI] [PMC free article] [PubMed] [Google Scholar]
[162]. Wang T, Antonacci-Fulton L, Howe K, Lawson HA, Lucas JK, Phillippy AM, et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 2022;604:437–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
[163]. Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, et al. A pangenome reference of 36 Chinese populations. Nature 2023;619:112–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
[164]. Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet 2018;19:286–98. [DOI] [PubMed] [Google Scholar]
[165]. Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 2018;9:4397. [DOI] [PMC free article] [PubMed] [Google Scholar]
[166]. Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, et al. A deep population reference panel of tandem repeat variation. Nat Commun 2023;14:6711. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

qzaf022_Supplementary_Data

qzaf022_supplementary_data.zip^{(47.6KB, zip)}

[qzaf022-B1] [1]. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021;590:290–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B2] [2]. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 2016;48:1279–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B3] [3]. The 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature 2015;526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B4] [4]. Zhang P, Luo H, Li Y, Wang Y, Wang J, Zheng Y, et al. NyuWa Genome resource: a deep whole-genome sequencing-based variation profile and reference panel for the Chinese population. Cell Rep 2021;37:110017. [DOI] [PubMed] [Google Scholar]

[qzaf022-B5] [5]. Stark Z, Scott RH. Genomic newborn screening for rare diseases. Nat Rev Genet 2023;24:755–66. [DOI] [PubMed] [Google Scholar]

[qzaf022-B6] [6]. He G, Yao H, Duan S, Luo L, Sun Q, Tang R, et al. Pilot work of the 10K Chinese People Genomic Diversity Project along the Silk Road suggests a complex east-west admixture landscape and biological adaptations. Sci China Life Sci 2025;68:914–33. [DOI] [PubMed] [Google Scholar]

[qzaf022-B7] [7]. Li X, Wang M, Su H, Duan S, Sun Y, Chen H, et al. Evolutionary history and biological adaptation of Han Chinese people on the Mongolian Plateau. hLife 2024;2:296–313. [Google Scholar]

[qzaf022-B8] [8]. Sousa da Mota B, Rubinacci S, Cruz Dávalos DI, Amorim CEG, Sikora M, Johannsen NN, et al. Imputation of ancient human genomes. Nat Commun 2023;14:3660. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B9] [9]. Cook S, Choi W, Lim H, Luo Y, Kim K, Jia X, et al. Accurate imputation of human leukocyte antigens with CookHLA. Nat Commun 2021;12:1264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B10] [10]. Sakaue S, Gurajala S, Curtis M, Luo Y, Choi W, Ishigaki K, et al. Tutorial: a statistical genetics guide to identifying HLA alleles driving complex disease. Nat Protoc 2023;18:2625–41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B11] [11]. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet 2010;11:499–511. [DOI] [PubMed] [Google Scholar]

[qzaf022-B12] [12]. Choi J, Kim S, Kim J, Son HY, Yoo SK, Kim CU, et al. A whole-genome reference panel of 14,393 individuals for East Asian populations accelerates discovery of rare functional variants. Sci Adv 2023;9:eadg6319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B13] [13]. Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res 2020;30:717–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B14] [14]. Yang MY, Zhong JD, Li X, Tian G, Bai WY, Fang YH, et al. SEAD reference panel with 22,134 haplotypes boosts rare variant imputation and genome-wide association analysis in Asian populations. Nat Commun 2024;15:10839. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B15] [15]. Loh PR, Palamara PF, Price AL. Fast and accurate long-range phasing in a UK Biobank cohort. Nat Genet 2016;48:811–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B16] [16]. O’Connell J, Sharp K, Shrine N, Wain L, Hall I, Tobin M, et al. Haplotype estimation for biobank-scale data sets. Nat Genet 2016;48:817–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B17] [17]. Loh PR, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 2016;48:1443–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B18] [18]. Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet 2016;48:1284–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B19] [19]. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 2010;34:816–34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B20] [20]. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, et al. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 2009;84:235–50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B21] [21]. Li L, Huang P, Sun X, Wang S, Xu M, Liu S, et al. The ChinaMAP reference panel for the accurate genotype imputation in Chinese populations. Cell Res 2021;31:1308–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B22] [22]. Yoo SK, Kim CU, Kim HL, Kim S, Shin JY, Kim N, et al. NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants. Genome Med 2019;11:64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B23] [23]. Du Z, Ma L, Qu H, Chen W, Zhang B, Lu X, et al. Whole genome analyses of Chinese population and de novo assembly of a northern Han genome. Genomics Proteomics Bioinformatics 2019;17:229–47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B24] [24]. Pasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H, et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet 2012;44:631–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B25] [25]. Cahoon JL, Rui X, Tang E, Simons C, Langie J, Chen M, et al. Imputation accuracy across global human populations. Am J Hum Genet 2024;111:979–89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B26] [26]. Si Y, Vanderwerff B, Zöllner S. Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms. Genetics 2021;217:iyab011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B27] [27]. The International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007;449:851–61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B28] [28]. He G, Wang M, Luo L, Sun Q, Yuan H, Lv H, et al. Population genomics of Central Asian peoples unveil ancient Trans-Eurasian genetic admixture and cultural exchanges. hLife 2024;2:554–62. [Google Scholar]

[qzaf022-B29] [29]. Shi M, Tanikawa C, Munter HM, Akiyama M, Koyama S, Tomizuka K, et al. Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels. Brief Bioinform 2023;25:bbad509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B30] [30]. Lin Y, Liu L, Yang S, Li Y, Lin D, Zhang X, et al. Genotype imputation for Han Chinese population using Haplotype Reference Consortium as reference. Hum Genet 2018;137:431–6. [DOI] [PubMed] [Google Scholar]

[qzaf022-B31] [31]. Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 2015;517:327–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B32] [32]. Zheng W, He Y, Guo Y, Yue T, Zhang H, Li J, et al. Large-scale genome sequencing redefines the genetic footprints of high-altitude adaptation in Tibetans. Genome Biol 2023;24:73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B33] [33]. Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature 2016;538:161–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B34] [34]. Jostins L, Morley KI, Barrett JC. Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur J Hum Genet 2011;19:662–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B35] [35]. Bomba L, Walter K, Soranzo N. The impact of rare and low-frequency genetic variants in common disease. Genome Biol 2017;18:77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B36] [36]. Das S, Abecasis GR, Browning BL. Genotype imputation from large reference panels. Annu Rev Genomics Hum Genet 2018;19:73–96. [DOI] [PubMed] [Google Scholar]

[qzaf022-B37] [37]. Bai WY, Zhu XW, Cong PK, Zhang XJ, Richards JB, Zheng HF. Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity. Brief Bioinform 2019;21:bbz108. [DOI] [PubMed] [Google Scholar]

[qzaf022-B38] [38]. Hoffmann TJ, Witte JS. Strategies for imputing and analyzing rare variants in association studies. Trends Genet 2015;31:556–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B39] [39]. Shi S, Rubinacci S, Hu S, Moutsianas L, Stuckey A, Need AC, et al. A Genomics England haplotype reference panel and imputation of UK Biobank. Nat Genet 2024;56:1800–3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B40] [40]. All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 2024;627:340–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B41] [41]. The International HapMap Consortium. A haplotype map of the human genome. Nature 2005;437:1299–320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B42] [42]. Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022;185:3426–40.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B43] [43]. The 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B44] [44]. Delaneau O, Marchini J, The 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat Commun 2014;5:3934. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B45] [45]. Xue Y, Mezzavilla M, Haber M, McCarthy S, Chen Y, Narasimhan V, et al. Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations. Nat Commun 2017;8:15927. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B46] [46]. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 2014;46:818–25. [DOI] [PubMed] [Google Scholar]

[qzaf022-B47] [47]. UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies rare variants in health and disease. Nature 2015;526:82–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B48] [48]. Sidore C, Busonero F, Maschio A, Porcu E, Naitza S, Zoledziewska M, et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat Genet 2015;47:1272–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B49] [49]. Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet 2015;47:435–44. [DOI] [PubMed] [Google Scholar]

[qzaf022-B50] [50]. Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature 2016;538:238–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B51] [51]. Southam L, Gilly A, Süveges D, Farmaki AE, Schwartzentruber J, Tachmazidou I, et al. Whole genome sequencing and imputation in isolated populations identify genetic associations with medically-relevant complex traits. Nat Commun 2017;8:15606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B52] [52]. Mitt M, Kals M, Pärn K, Gabriel SB, Lander ES, Palotie A, et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 2017;25:869–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B53] [53]. Lencz T, Yu J, Palmer C, Carmi S, Ben-Avraham D, Barzilai N, et al. High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 2018;137:343–55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B54] [54]. Valls-Margarit J, Galván-Femenía I, Matías-Sánchez D, Blay N, Puiggròs M, Carreras A, et al. GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing. Nucleic Acids Res 2022;50:2464–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B55] [55]. Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, et al. The sequences of 150,119 genomes in the UK Biobank. Nature 2022;607:732–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B56] [56]. Banasik K, Møller PL, Techlo TR, Holm PC, Walters GB, Ingason A, et al. DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals. BMC Genom Data 2023;24:30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B57] [57]. Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023;613:508–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B58] [58]. Reščenko R, Brīvība M, Atava I, Rovīte V, Pečulis R, Silamiķelis I, et al. Whole-genome sequencing of 502 individuals from Latvia: the first step towards a population-specific reference of genetic variation. Int J Mol Sci 2023;24:15345. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B59] [59]. Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell 2019;177:26–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B60] [60]. Liu S, Huang S, Chen F, Zhao L, Yuan Y, Francis SS, et al. Genomic analyses from non-invasive prenatal testing reveal genetic associations, patterns of viral infections, and Chinese population history. Cell 2018;175:347–59.e14. [DOI] [PubMed] [Google Scholar]

[qzaf022-B61] [61]. Iglesias AI, Mishra A, Vitart V, Bykhovskaya Y, Höhn R, Springelkamp H, et al. Cross-ancestry genome-wide association analysis of corneal thickness strengthens link between complex and Mendelian eye diseases. Nat Commun 2018;9:1864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B62] [62]. Wei CY, Yang JH, Yeh EC, Tsai MF, Kao HJ, Lo CZ, et al. Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese. NPJ Genom Med 2021;6:10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B63] [63]. Cong PK, Bai WY, Li JC, Yang MY, Khederzadeh S, Gai SR, et al. Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project. Nat Commun 2022;13:2939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B64] [64]. Wang C, Dai J, Qin N, Fan J, Ma H, Chen C, et al. Analyses of rare predisposing variants of lung cancer in 6,004 whole genomes in Chinese. Cancer Cell 2022;40:1223–39.e6. [DOI] [PubMed] [Google Scholar]

[qzaf022-B65] [65]. Gao Y, Zhang C, Yuan L, Ling Y, Wang X, Liu C, et al. PGG.Han: the Han Chinese genome database and analysis platform. Nucleic Acids Res 2020;48:D971–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B66] [66]. Jeon S, Bhak Y, Choi Y, Jeon Y, Kim S, Jang J, et al. Korean Genome Project: 1094 Korean personal genomes with clinical information. Sci Adv 2020;6:eaaz7835. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B67] [67]. Kars ME, Başak AN, Onat OE, Bilguvar K, Choi J, Itan Y, et al. The genetic structure of the Turkish population reveals high levels of variation and admixture. Proc Natl Acad Sci U S A 2021;118:e2026076118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B68] [68]. Razali RM, Rodriguez-Flores J, Ghorbani M, Naeem H, Aamer W, Aliyev E, et al. Thousands of Qatari genomes inform human migration history and improve imputation of Arab haplotypes. Nat Commun 2021;12:5929. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B69] [69]. Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, et al. IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic Acids Res 2021;49:D1225–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B70] [70]. Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun 2015;6:8018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B71] [71]. Okada Y, Momozawa Y, Sakaue S, Kanai M, Ishigaki K, Akiyama M, et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat Commun 2018;9:1631. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B72] [72]. Tadaka S, Katsuoka F, Ueki M, Kojima K, Makino S, Saito S, et al. 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome. Hum Genome Var 2019;6:28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B73] [73]. Wu D, Dou J, Chai X, Bellis C, Wilm A, Shih CC, et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell 2019;179:736–49.e15. [DOI] [PubMed] [Google Scholar]

[qzaf022-B74] [74]. GenomeAsia 100K Consortium. The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature 2019;576:106–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B75] [75]. Cengnata A, Deng L, Yap WS, Lim LR, Leong CO, Xu S, et al. A genotype imputation reference panel specific for native Southeast Asian populations. NPJ Genom Med 2024;9:47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B76] [76]. Hwang MY, Choi NH, Won HH, Kim BJ, Kim YJ. Analyzing the Korean reference genome with meta-imputation increased the imputation accuracy and spectrum of rare variants in the Korean population. Front Genet 2022;13:1008646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B77] [77]. Jeon S, Choi H, Jeon Y, Choi WH, Choi H, An K, et al. Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups. Gigascience 2024;13:giae014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B78] [78]. Tadaka S, Kawashima J, Hishinuma E, Saito S, Okamura Y, Otsuki A, et al. jMorp: Japanese Multi-Omics Reference Panel update report 2023. Nucleic Acids Res 2024;52:D622–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B79] [79]. Ardiansyah E, Riza AL, Dian S, Ganiem AR, Alisjahbana B, Setiabudiawan TP, et al. Sequencing whole genomes of the West Javanese population in Indonesia reveals novel variants and improves imputation accuracy. Front Genet 2025:15:1492602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B80] [80]. He Y, Lei C, Wan C, Zeng S, Zhang T, Luo F, et al. A comprehensive whole genome database of ethnic minority populations. Sci Rep 2024;14:13954. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B81] [81]. Yu C, Lan X, Tao Y, Guo Y, Sun D, Qian P, et al. A high-resolution haplotype-resolved reference panel constructed from the China Kadoorie Biobank study. Nucleic Acids Res 2023;51:11770–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B82] [82]. Huang S, Liu S, Huang M, He JR, Wang C, Wang T, et al. The Born in Guangzhou Cohort Study enables generational genetic discoveries. Nature 2024;626:565–73. [DOI] [PubMed] [Google Scholar]

[qzaf022-B83] [83]. Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hünemeier T, Petzl-Erler ML, et al. Genetic evidence for two founding populations of the Americas. Nature 2015;525:104–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B84] [84]. Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 2014;505:87–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B85] [85]. Mathias RA, Taub MA, Gignoux CR, Fu W, Musharoff S, O’Connor TD, et al. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 2016;7:12522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B86] [86]. O’Connell J, Yun T, Moreno M, Li H, Litterman N, Kolesnikov A, et al. A population-specific reference panel for improved genotype imputation in African Americans. Commun Biol 2021;4:1269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B87] [87]. Hou L, Kember RL, Roach JC, O’Connell JR, Craig DW, Bucan M, et al. A population-specific reference panel empowers genetic studies of Anabaptist populations. Sci Rep 2017;7:6079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B88] [88]. Cheng PL, Wang H, Dombroski BA, Farrell JJ, Horng I, Chung T, et al. A specialized reference panel with structural variants integration for improving genotype imputation in Alzheimer disease and related dementias. HGG Adv 2025;6:100487. [DOI] [PMC free article] [PubMed]

[qzaf022-B89] [89]. Naslavsky MS, Scliar MO, Yamamoto GL, Wang JYT, Zverinova S, Karp T, et al. Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil. Nat Commun 2022;13:1004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B90] [90]. Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 2023;622:784–93. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B91] [91]. Gurdasani D, Carstensen T, Fatumo S, Chen G, Franklin CS, Prado-Martinez J, et al. Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell 2019;179:984–1002.e36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B92] [92]. Carlson JC, Krishnan M, Liu S, Anderson KJ, Zhang JZ, Yapp TJ, et al. Improving imputation quality in Samoans through the integration of population-specific sequences into existing reference panels. medRxiv 2023;23297835. [Google Scholar]

[qzaf022-B93] [93]. Herzig AF, Velo-Suarez L, FrEx Consortium, FranceGenRef Consortium, Dina C, Redon R, et al. How local reference panels improve imputation in French populations. Sci Rep 2024;14:370. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B94] [94]. Deng T, Zhang P, Garrick D, Gao H, Wang L, Zhao F. Comparison of genotype imputation for SNP array and low-coverage whole-genome sequencing data. Front Genet 2022;12:704118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B95] [95]. Sengupta D, Botha G, Meintjes A, Mbiyavanga M, AWI-Gen Study, H3Africa Consortium, et al. Performance and accuracy evaluation of reference panels for genotype imputation in sub-Saharan African populations. Cell Genom 2023;3:100332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B96] [96]. Xu J, Liu D, Hassan A, Genovese G, Cote AC, Fennessy B, et al. Evaluation of imputation performance of multiple reference panels in a Pakistani population. HGG Adv 2025;6:100395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B97] [97]. Hofmeister RJ, Ribeiro DM, Rubinacci S, Delaneau O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat Genet 2023;55:1243–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B98] [98]. Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet 2021;108:1880–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B99] [99]. Browning SR, Browning BL. Haplotype phasing: existing methods and new developments. Nat Rev Genet 2011;12:703–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B100] [100]. Delaneau O, Zagury JF, Robinson MR, Marchini JL, Dermitzakis ET. Accurate, scalable and integrative haplotype estimation. Nat Commun 2019;10:5436. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B101] [101]. Williams CM, O’Connell J, Jewett EFreyman WA, 23andMe Research Team, Gignoux CR, et al. Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses. HGG Adv 2026;7:100526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B102] [102]. De Marino A, Mahmoud AA, Bose M, Bircan KO, Terpolovsky A, Bamunusinghe V, et al. A comparative analysis of current phasing and imputation software. PLoS One 2022;17:e0260177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B103] [103]. Huang J, Howie B, McCarthy S, Memari Y, Walter K, Min JL, et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 2015;6:8111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B104] [104]. Rubinacci S, Delaneau O, Marchini J. Genotype imputation using the Positional Burrows Wheeler Transform. PLoS Genet 2020;16:e1009049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B105] [105]. Davies RW, Kucka M, Su D, Shi S, Flanagan M, Cunniff CM, et al. Rapid genotype imputation from sequence with reference panels. Nat Genet 2021;53:1104–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B106] [106]. Rubinacci S, Ribeiro DM, Hofmeister RJ, Delaneau O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat Genet 2021;53:120–6. [DOI] [PubMed] [Google Scholar]

[qzaf022-B107] [107]. Liu S, Liu Y, Gu Y, Lin X, Zhu H, Liu H, et al. Utilizing non-invasive prenatal test sequencing data for human genetic investigation. Cell Genom 2024;4:100669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B108] [108]. Cox SL, Moots HM, Stock JT, Shbat A, Bitarello BD, Nicklisch N, et al. Predicting skeletal stature using ancient DNA. Am J Biol Anthropol 2022;177:162–74. [Google Scholar]

[qzaf022-B109] [109]. Çubukcu H, Kilinç GM. Evaluation of genotype imputation using Glimpse tools on low coverage ancient DNA. Mamm Genome 2024;35:461–73. [DOI] [PubMed] [Google Scholar]

[qzaf022-B110] [110]. Rubinacci S, Hofmeister RJ, Sousa da Mota B, Delaneau O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat Genet 2023;55:1088–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B111] [111]. Yu K, Das S, LeFaive J, Kwong A, Pleiness J, Forer L, et al. Meta-imputation: an efficient method to combine genotype data after imputation with multiple reference panels. Am J Hum Genet 2022;109:1007–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B112] [112]. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 2014;513:409–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B113] [113]. Sheng X, Xia L, Cahoon JL, Conti DV, Haiman CA, Kachuri L, et al. Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing. HGG Adv 2023;4:100159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B114] [114]. Pan Y, Zhang C, Lu Y, Ning Z, Lu D, Gao Y, et al. Genomic diversity and post-admixture adaptation in the Uyghurs. Natl Sci Rev 2022;9:nwab124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B115] [115]. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet 2019;20:467–84. [DOI] [PubMed] [Google Scholar]

[qzaf022-B116] [116]. Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT. Data quality control in genetic case-control association studies. Nat Protoc 2010;5:1564–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B117] [117]. Winkler TW, Day FR, Croteau-Chonka DC, Wood AR, Locke AE, Mägi R, et al. Quality control and conduct of genome-wide association meta-analyses. Nat Protoc 2014;9:1192–212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B118] [118]. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018;562:203–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B119] [119]. Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry. Hum Mol Genet 2018;27:3641–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B120] [120]. Zhao W, Rasheed A, Tikkanen E, Lee JJ, Butterworth AS, Howson JMM, et al. Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with coronary heart disease. Nat Genet 2017;49:1450–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B121] [121]. Hyde CL, Nagle MW, Tian C, Chen X, Paciga SA, Wendland JR, et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat Genet 2016;48:1031–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B122] [122]. Gaynor SM, Joseph T, Bai X, Zou Y, Boutkov B, Maxwell EK, et al. Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank. Nat Genet 2024;56:2345–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B123] [123]. Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, et al. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024;25:8–25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B124] [124]. Lewontin RC. Population genetics. Annu Rev Genet 1973;7:1–17. [Google Scholar]

[qzaf022-B125] [125]. Bradburd GS, Ralph PL. Spatial population genetics: it’s about time. Annu Rev Ecol Evol Syst 2019;50:427–49. [Google Scholar]

[qzaf022-B126] [126]. Nagar SD, Conley AB, Jordan IK. Population structure and pharmacogenomic risk stratification in the United States. BMC Biol 2020;18:140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B127] [127]. Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Dunham AS, et al. Population structure, stratification, and introgression of human structural variation. Cell 2020;182:189–99.e15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B128] [128]. Aneli S, Birolo G, Matullo G. Twenty years of the Human Genome Diversity Project. Hum Popul Genet Genom 2022:2:0005. [Google Scholar]

[qzaf022-B129] [129]. Borda V, Loesch DP, Guo B, Laboulaye R, Veliz-Otani D, French JN, et al. Genetics of Latin American Diversity Project: insights into population genetics and association studies in admixed groups in the Americas. Cell Genom 2024;4:100692. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B130] [130]. Schraiber JG, Akey JM. Methods and models for unravelling human evolutionary history. Nat Rev Genet 2015;16:727–40. [DOI] [PubMed] [Google Scholar]

[qzaf022-B131] [131]. Barbarino JM, Whirl-Carrillo M, Altman RB, Klein TE. PharmGKB: a worldwide resource for pharmacogenomic information. Wiley Interdiscip Rev Syst Biol Med 2018;10:e1417. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B132] [132]. Reay WR, Cairns MJ. Advancing the use of genome-wide association studies for drug repurposing. Nat Rev Genet 2021;22:658–71. [DOI] [PubMed] [Google Scholar]

[qzaf022-B133] [133]. Sanseau P, Agarwal P, Barnes MR, Pastinen T, Richards JB, Cardon LR, et al. Use of genome-wide association studies for drug repositioning. Nat Biotechnol 2012;30:317–20. [DOI] [PubMed] [Google Scholar]

[qzaf022-B134] [134]. Zhou K, Pedersen HK, Dawed AY, Pearson ER. Pharmacogenomics in diabetes mellitus: insights into drug action and drug discovery. Nat Rev Endocrinol 2016;12:337–46. [DOI] [PubMed] [Google Scholar]

[qzaf022-B135] [135]. Norton ME. Noninvasive prenatal testing to analyze the fetal genome. Proc Natl Acad Sci U S A 2016;113:14173–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B136] [136]. Haidar H, Le Clerc-Blain J, Vanstone M, Laberge AM, Bibeau G, Ghulmiyyah L, et al. A qualitative study of women and partners from Lebanon and Quebec regarding an expanded scope of noninvasive prenatal testing. BMC Pregnancy Childbirth 2021;21:54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B137] [137]. Chan KCA, Jiang P, Sun K, Cheng YKY, Tong YK, Cheng SH, et al. Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends. Proc Natl Acad Sci U S A 2016;113:E8159–68. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B138] [138]. Hui R, D’Atanasio E, Cassidy LM, Scheib CL, Kivisild T. Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes. Sci Rep 2020;10:18542. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B139] [139]. Ausmees K, Sanchez-Quinto F, Jakobsson M, Nettelblad C. An empirical evaluation of genotype imputation of ancient DNA. G3 (Bethesda) 2022;12:jkac089. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B140] [140]. Martiniano R, Cassidy LM, Ó’Maoldúin R, McLaughlin R, Silva NM, Manco L, et al. The population genomics of archaeological transition in west Iberia: investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet 2017;13:e1006852. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B141] [141]. Allentoft ME, Sikora M, Refoyo-Martínez A, Irving-Pease EK, Fischer A, Barrie W, et al. Population genomics of post-glacial western Eurasia. Nature 2024;625:301–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B142] [142]. Barrie W, Yang Y, Irving-Pease EK, Attfield KE, Scorrano G, Jensen LT, et al. Elevated genetic risk for multiple sclerosis emerged in steppe pastoralist populations. Nature 2024;625:321–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B143] [143]. Irving-Pease EK, Refoyo-Martínez A, Barrie W, Ingason A, Pearson A, Fischer A, et al. The selection landscape and genetic legacy of ancient Eurasians. Nature 2024;625:312–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B144] [144]. Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, et al. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024;56:143–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B145] [145]. Wang M, Chen H, Luo L, Huang Y, Duan S, Yuan H, et al. Forensic investigative genetic genealogy: expanding pedigree tracing and genetic inquiry in the genomic era. J Genet Genomics 2025;52:460–72. [DOI] [PubMed] [Google Scholar]

[qzaf022-B146] [146]. Erlich Y, Shor T, Pe’er I, Carmi S. Identity inference of genomic data using long-range familial searches. Science 2018;362:690–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B147] [147]. Ram N, Murphy EE, Suter SM. Regulating forensic genetic genealogy. Science 2021;373:1444–6. [DOI] [PubMed] [Google Scholar]

[qzaf022-B148] [148]. May T. Sociogenetic risks — ancestry DNA testing, third-party identity, and protection of privacy. N Engl J Med 2018;379:410–2. [DOI] [PubMed] [Google Scholar]

[qzaf022-B149] [149]. Dowdeswell TL. Forensic genetic genealogy: a profile of cases solved. Forensic Sci Int Genet 2022;58:102679. [DOI] [PubMed] [Google Scholar]

[qzaf022-B150] [150]. Bentley AR, Callier S, Rotimi CN. Diversity and inclusion in genomic research: why the uneven progress? J Community Genet 2017;8:255–66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B151] [151]. Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies. Trends Genet 2009;25:489–94. [DOI] [PubMed] [Google Scholar]

[qzaf022-B152] [152]. Jones KM, Cook-Deegan R, Rotimi CN, Callier SL, Bentley AR, Stevens H, et al. Complicated legacies: the human genome at 20. Science 2021;371:564–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B153] [153]. Manolio TA, Goodhand P, Ginsburg G. The International Hundred Thousand Plus Cohort Consortium: integrating large-scale cohorts to address global scientific challenges. Lancet Digit Health 2020;2:e567–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B154] [154]. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B155] [155]. Carroll SR, Herczog E, Hudson M, Russell K, Stall S. Operationalizing the CARE and FAIR principles for indigenous data futures. Sci Data 2021;8:108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B156] [156]. Cavinato T, Rubinacci S, Malaspinas AS, Delaneau O. A resampling-based approach to share reference panels. Nat Comput Sci 2024;4:360–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B157] [157]. Wang S, Jiang X, Singh S, Marmor R, Bonomi L, Fox D, et al. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Ann N Y Acad Sci 2017;1387:73–83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B158] [158]. Ouzhuluobu, He Y, Lou H, Cui C, Deng L, Gao Y, et al. De novo assembly of a Tibetan genome and identification of novel structural variants associated with high-altitude adaptation. Natl Sci Rev 2020;7:391–402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B159] [159]. Lou H, Gao Y, Xie B, Wang Y, Zhang H, Shi M, et al. Haplotype-resolved de novo assembly of a Tujia genome suggests the necessity for high-quality population-specific genome references. Cell Syst 2022;13:321–33.e6. [DOI] [PubMed] [Google Scholar]

[qzaf022-B160] [160]. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science 2022;376:44–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B161] [161]. Chao KH, Zimin AV, Pertea M, Salzberg SL. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3 (Bethesda) 2023;13:jkac321. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B162] [162]. Wang T, Antonacci-Fulton L, Howe K, Lawson HA, Lucas JK, Phillippy AM, et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 2022;604:437–46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B163] [163]. Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, et al. A pangenome reference of 36 Chinese populations. Nature 2023;619:112–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B164] [164]. Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet 2018;19:286–98. [DOI] [PubMed] [Google Scholar]

[qzaf022-B165] [165]. Saini S, Mitra I, Mousavi N, Fotsing SF, Gymrek M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat Commun 2018;9:4397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[qzaf022-B166] [166]. Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, et al. A deep population reference panel of tandem repeat variation. Nat Commun 2023;14:6711. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

High-quality Population-specific Haplotype-resolved Reference Panel in the Genomic and Pangenomic Eras

Qingxin Yang, (杨青鑫)

Yuntao Sun, (孙韵韬)

Shuhan Duan, (段淑涵)

Shengjie Nie, (聂胜洁)

Chao Liu, (刘超)

Hong Deng, (邓虹)

Mengge Wang, (王萌鸽)

Guanglin He, (何光林)

Roles

Abstract

Graphical abstract

Introduction

Figure 1.

Advances in worldwide human genome projects and corresponding HRPs in the past two decades

Figure 2.

Table 1.

Figure 3.

Optimal combination of phasing and imputation tools

Performance and disparities of diverse HRPs

Figure 4.

Table 2.

Benefits and applications of genotype imputation

Genomic medicine and statistical genetics

Population genetics

Pharmacogenomics

Prenatal screening

Paleogenomics

Forensic science and forensic investigative genetic genealogy

Challenges and perspectives

Inclusion of underrepresented ethnolinguistically diverse populations in the HRPs

Figure 5.

High-quality genomic infrastructure and bioinformatics facilitating data sharing

T2T-level HRPs and multi-variant integration

The trade-off between multi-ancestry integrative HRPs and population-specific HRPs

CRediT author statement

Competing interests

Supplementary Material

Acknowledgments

Contributor Information

Supplementary material

ORCID

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases