Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Aug 12;25(8):e70024. doi: 10.1111/1755-0998.70024

High Imputation Accuracy Can Be Achieved Using a Small Reference Panel in a Natural Population With Low Genetic Diversity

Hui Zhen Tan 1,2,, Katarina C Stuart 1, Tram Vi 1,2, Annabel Whibley 1,3, Sarah Bailey 1, Patricia Brekke 4, Anna W Santure 1,2,
PMCID: PMC12550488  PMID: 40797301

ABSTRACT

Genotype imputation, the inference of missing genotypes using a reference set of population haplotypes, is a cost‐effective tool for improving the quality and quantity of genetic datasets. Imputation is usually applied to large and well‐characterised datasets of humans and livestock, even though it could also benefit smaller natural populations. This study aims to understand the best practices and effectiveness of imputation with a small reference panel for species with low genetic diversity, using a case study of a population of the hihi/stitchbird ( Notiomystis cincta ). We used a leave‐one‐out method to test imputation on 30 high‐coverage hihi individuals where SNPs were masked before being imputed with Beagle v5.4. Imputation accuracy was measured using r 2, the correlation between imputed and ground truth genotype dosages. We tested combinations of five imputation parameters, the inclusion of two linkage maps, reference panels of different sizes and compositions and targets of various SNP densities and sporadic missingness. We achieved mean r 2 exceeding 0.95 in most tests from a small reference panel of high‐fecundity individuals. Imputation accuracy was not improved by including a linkage map and decreased at very low SNP densities. Imputed SNPs were filtered using r 2 to assess downstream heterozygosity calculations, the site frequency spectrum (SFS) and inference of runs of homozygosity (ROHs). We found that filtering and SNP density greatly affected heterozygosity and SFS at low SNP densities but that ROH inference was relatively robust to both. We provide a template for testing and optimising imputation in other wild populations.

Keywords: Beagle, conservation genomics, genotype imputation, leave‐one‐out, linkage map, reference panel

1. Introduction

Genotype imputation refers to the inference of genotypes missing in a dataset (Marchini and Howie 2010). Genotype imputation operates on the principle that within a population, genetic variation is limited to a small proportion of the genome (Daetwyler et al. 2014) while genetic linkage results in extensive haplotype segments shared among individuals (Yun et al. 2009). There are typically two parts to imputation. First, the diversity of haplotypes within the population is identified from a reference panel of individuals sequenced at high coverage or genotyped at high SNP density (Marchini and Howie 2010). Second, the corresponding haplotypes of a target (the sample to be imputed) are identified based on its typed genotypes and used to fill in the untyped genotypes, resulting in imputation (Yun et al. 2009). In the absence of a reference panel, the diversity of haplotypes can also be identified from sequencing reads (Davies et al. 2016).

Genotype imputation has various applications and can increase the quality and quantity of genetic datasets (Phocas 2022). Imputation increases genotype density across the genome, which can benefit downstream analyses, such as by improving the power and/or resolution of association studies in identifying causal SNPs for traits of interest (S. R. Browning 2008; Hayward et al. 2019; Naj 2019). Imputation also allows the application of bioinformatic methods that cannot accommodate missingness in the input, such as redundancy analysis (Chambers et al. 2023). Imputation can drastically lower sequencing costs per sample since samples can be sequenced at low coverage before using imputation to fill in missing genotypes. By lowering per sample costs, larger population datasets can be generated comprising more individuals for the same sequencing cost. The cost of sequencing many samples at high coverage, despite having decreased over the years, remains prohibitive, especially for species with large genomes (Watowich et al. 2023). Imputation can also facilitate the transition of datasets from reduced‐representation sequencing to whole‐genome coverage or from lower density to higher density SNP array genotypes, allowing sample continuity in long‐term datasets (Jiang et al. 2022).

Imputation is routinely used in large and well‐characterised datasets such as in humans or livestock to leverage its benefits and cost‐effectiveness (Georges et al. 2019; Phocas 2022), and there are a number of studies that have assessed imputation best practices in these systems (e.g., Lloret‐Villas et al. 2023; Pook et al. 2020; Shi et al. 2019). However, researchers studying wild and natural populations are also prominent end users of genomic tools that could also benefit from imputation. Genomics is routinely applied to natural populations to advance knowledge in ecology and evolutionary biology (Luikart et al. 2018), such as through understanding speciation dynamics (Nosil and Feder 2012) and characterising population structure (Lou et al. 2021). Genomics also has practical applications in the management of natural populations, such as in understanding the transmission of wildlife diseases and in invasive species where genotype–environment associations can be used to predict invasive potential (Blanchong et al. 2016; McGaughran et al. 2024). In threatened species, genomics is applied to characterise reasons for population decline, delineate management units, determine extinction risk, assess inbreeding depression, minimise inbreeding and loss of genetic diversity, and model conservation outcomes such as population recovery (Bergner et al. 2014, 2016; Dussex et al. 2021; Frankham 2010; Guhlin et al. 2023; Hohenlohe et al. 2021; White et al. 2015). Datasets of wild, natural populations could benefit from the potential cost savings and increase in quality that imputation can provide (Watowich et al. 2023).

Datasets of wild, natural populations are usually such that relatively small sample numbers are standard, and where different species will vary in their effective population size, and hence genetic diversity. Since imputation software algorithms are commonly optimised for large datasets like livestock and humans, there is much to learn about its utility in natural populations. Watowich et al. (2023) demonstrated that imputation can be performed to high accuracy in a natural population of the Gelada monkey ( Theropithecus gelada ). Using 68 individuals in the reference panel, the imputed sequences reliably estimated relatedness and population structure. However, imputation accuracy when using default imputation parameters and lower reference panel sizes, and the utility of imputed sequences in downstream analyses, for example, to calculate basic population statistics and high‐resolution inbreeding results (including using imputed genotype probabilities rather than genotype calls) remains undetermined. Conservation geneticists often must balance the number of samples and sequencing coverage against available funding when performing sequencing. Hence, we want to test imputation under multiple scenarios to aid with decision‐making.

This study aims to understand imputation best practices and effectiveness by focusing on a case study based on a small, natural population of the Aotearoa New Zealand endemic, the hihi/stitchbird ( Notiomystis cincta ). The hihi underwent a drastic population decline in the 1800s and currently exists in a single remnant island population and seven reintroduced populations (Miskelly and Powlesland 2013; Taylor et al. 2005). The hihi population on the island of Tiritiri Matangi (36.6012° S, 174.8894° E) is the largest reintroduced population and is intensively monitored, where past studies have revealed low genetic diversity and signatures of inbreeding depression (Brekke et al. 2010, 2011; Duntsch et al. 2023). We used a fast and user‐friendly imputation programme to determine the best parameters and input data structures for achieving high imputation accuracy and assessed whether including a linkage map improves accuracy. We also investigated post‐imputation filtering thresholds by comparing population heterozygosity estimates, site frequency spectra and inbreeding estimates between raw and filtered imputed genotypes. Our analyses, therefore, provide an analysis template for studies of other wild populations to test and optimise imputation efficacy and support the use of small reference panels for species with low genetic diversity.

2. Materials and Methods

2.1. Sampling and Sequencing

As described in Stuart et al. (2024), we selected 31 hihi individuals for sequencing. Of the 31 individuals, 27 were from the reintroduced population on Tiritiri Matangi. The remaining four were from the remnant island population of Te Hauturu‐o‐Toi/Little Barrier Island (36.1946° S, 175.0753° E). All individuals were selected based on their high lifetime total number of offspring, except two samples (female and male) from Te Hauturu‐o‐Toi that were selected as they were the individuals that our reference genomes were constructed from (Bailey et al. 2025). Blood samples from all individuals were extracted using the DNeasy Blood & Tissue Kits (Qiagen). Whole‐genome sequence (WGS) libraries were prepared by AgResearch, New Zealand, using both the Illumina DNA Prep kit and the Nextera XT kit and later sequenced on an Illumina NovaSeq 6000 platform with read lengths of 2 × 150 bp. These samples were sequenced at high coverage (~20×) to obtain ground truth for our imputation tests and to constitute our reference panel for potential downstream imputation with more samples.

2.2. Raw Read Processing and SNP Calling and Filtering

As detailed in Stuart et al. (2024), we trimmed adaptors from raw reads using TrimGalore v0.6.7 (Krueger et al. 2021), then merged reads across lanes for each individual. Reads were then aligned to our high‐quality female reference genome (Aotearoa Genomic Data Repository, Project Code NZ‐00034 and Dataset ID AGDR00034 https://doi.org/10.57748/ZD00‐D451) using BWA‐MEM (Bailey et al. 2025; Li and Durbin 2009). We used MarkDuplicates in Picard v2.26.10 (Broad Institute 2019) to mark reads as duplicates for subsequent removal during SNP calling. We then called SNPs using BCFtools v1.13 mpileup (Li 2011), which identified 4,911,121 SNPs. We excluded one Tiritiri Matangi sample from downstream analyses as it had excess heterozygosity, which suggests potential sample contamination. Our final dataset for downstream analyses consists of 30 samples, including 26 from Tiritiri Matangi and four from Te Hauturu‐o‐Toi. Sixteen of the samples were related in three close family groups, which contained two, three and 11 individuals, with one, two and eight parent‐offspring pairs respectively (Stuart et al. 2024).

In preparation for imputation, SNPs were filtered using BCFtools v1.19 (Li 2011) to only retain SNPs on the 33 autosomal chromosomes, spanning 915 Mb, that were present in the available hihi high‐density linkage map (Tan et al. 2024). Filtering to exclude autosomes without a linkage map ensured that genetic distances of all SNPs used in downstream imputation can be informed by our available linkage maps. Non‐variant sites, singletons and multi‐allelic SNPs were removed using BCFtools v1.19 (Li 2011). For imputation, we only retained SNPs with a minimum genotype depth of 5 and no missingness, resulting in a dataset of 2,629,352 autosomal SNPs (mean depth = 24.3×).

2.3. Imputation Tests on High‐Coverage Individuals

We used Beagle 5.4 to perform imputation, which employs a Hidden Markov model and implements a sliding window for fast and memory‐efficient computation (Browning et al. 2018, 2021). In both the reference panel and the target, haplotype phasing is first performed to distinguish between sequences of variants inherited from each parent, using a combination of two‐stage and progressive phasing algorithms (Browning et al. 2021). With the updated haplotype phase of each reference and target sample, missing alleles within the target are then imputed to be the allele with the maximal probability.

We used a leave‐one‐out approach to perform our imputation tests (Ramnarine et al. 2015; Watowich et al. 2023; Figure 1). One sample was selected as the target each time, while the remaining 29 (n − 1) samples were retained in the reference panel. This was repeated across all samples, with a different sample as the target each time (Figure 1). We generated respective test datasets using the leave‐one‐out approach before performing phasing and imputation on each test dataset. To create the target files, we masked a portion of randomly distributed SNPs (usually 5%, 131,469 SNPs) using the ‘vcfrandomsample’ function in vcflib (Garrison et al. 2022). The same set of SNPs was removed across all target samples.

FIGURE 1.

FIGURE 1

A schematic of the leave‐one‐out approach where one sample is selected for imputation tests. The selected sample has a portion of its genotypes masked to form the target (the sample to be imputed). The masked genotypes are then imputed and evaluated against the original genotypes (ground truth) to assess imputation accuracy. Each time, a different sample is selected, and the process is repeated until all samples have been used as the target once.

The leave‐one‐out approach allows us to assess imputation accuracy against a ground truth, which is the original file before genotypes were masked. This further allows us to evaluate accuracy using metrics that directly compare the imputed and true genotypes (Ramnarine et al. 2015). Here, we opted to use r 2, which measures the correlation of dosage values between the imputed and true genotypes (Ramnarine et al. 2015) which we calculated using a python script available at https://github.com/TorkamaniLab/imputation_accuracy_calculator (Chen et al. 2020). To achieve per‐variant r 2 values, imputed genotypes of all samples within the same test were merged using BCFtools v1.19 ‐‐merge (Li 2011) after leave‐one‐out imputation and before calculations.

2.3.1. Tests of Imputation Parameters

First, we wanted to determine the best Beagle parameters for imputation of our whole‐genome resequencing data given the relatively small reference panel. The parameters that we tested were effective population size (N e), window, iterations, imp‐states and imp‐segment (Table 1; B. Browning 2022). These parameters have previously been shown to be important in improving imputation accuracy in small populations (B. Browning 2022; Jiang et al. 2022; Pook et al. 2020). We tested N e on its own but tested all combinations of the remaining four parameters (window, iterations, imp‐states, imp‐segment). The N e parameter reflects effective population size and affects the probability of transitioning from one reference haplotype to another (Browning et al. 2018). It can be automatically estimated by Beagle 5.4 if input genotypes are unphased (B. Browning 2022). We tested N e = default (automatically estimated), 10, 100, 1000, 10,000, 100,000 and 1,000,000 to capture a range of values as informed by past hihi studies (Brekke et al. 2011). The window parameter determines the cM length of sliding windows in the analysis (B. Browning 2022). We tested window = 40 (default), 60 and 80 to allow a longer stretch of the genome to be considered in each window during imputation to account for high linkage disequilibrium in the hihi (Lee et al. 2022). The iterations parameter sets the number of iterations used to estimate genotype phase (B. Browning 2022). We tested iterations = 12 (default) and 24 for potential gains in overall imputation accuracy (Pook et al. 2020). The imp‐states parameter is the number of model states used to impute ungenotyped markers (B. Browning 2022). We tested imp‐states = 500, 1000 and 1600 (default) to allow fewer model states owing to low genetic diversity in the hihi (Pook et al. 2020). The imp‐segment parameter specifies the minimum cM length of haplotype segments that will be considered for a target haplotype (B. Browning 2022). We tested imp‐segment = 6 (default), 20 and 50 to account for high linkage disequilibrium in the hihi (Lee et al. 2022). We also tested whether the inclusion of linkage maps improved imputation by informing phasing and haplotype clustering (Table 1). We tested two linkage maps: a low‐density version of the linkage map constructed using Lep‐MAP3 (Rastas 2017) comprising 957 SNPs (Tan et al. 2024) and a linkage map constructed using CRI‐MAP v2.4 (Green et al. 1990) comprising 1663 SNPs (Bailey et al. 2025; Scherer 2017). To construct the low‐density Lep‐MAP3 map, map positions were inferred every 1 Mb using the map position of the nearest SNP on the original high‐density map (33,890 autosomal SNPs). We did not use linear interpolation (as implemented in Beagle) to infer genetic distances for our WGS SNPs as many regions had a recombination rate of zero, an artefact of genetic positions on the linkage map being shared by many SNPs, which would have resulted in a reduction in imputation accuracy (Figure S1). The two maps were tested using default Beagle settings and with parameters tweaked to avoid increased error rates (Pook et al. 2020). We used increased iterations (= 24) and window size (= 60) as they have been shown to improve imputation accuracy (see results for parameter tests; Table 1; Pook et al. 2020). As the genome‐average recombination rate in our hihi linkage map (1.99 cM/Mb; Tan et al. 2024) is double that of Beagle's default assumption of 1 cM/Mb (B. Browning 2022), haplotype segments will be longer (cM) when our hihi linkage map is applied. Therefore, we also increased imp‐segment (= 20) to prevent smaller haplotype segments from being jointly considered (Table 1; Pook et al. 2020).

TABLE 1.

Summary of all imputation tests performed to understand the best parameters and input data structure for imputation with a small reference panel using Beagle 5.4.

Description Tests Number of reference individuals Number of reference SNPs Number of target SNPs
Parameter test N e: Effective population size Automatically estimated (default), 10, 100, 1,000, 10,000, 100,000, 1,000,000 29 2,629,352 2,497,883
Iterations: number of iterations used to estimate genotype phase 12 (default), 24
Window: cM length of sliding window 40 (default), 60, 80
imp‐states: number of model states used to impute ungenotyped markers 500, 1000, 1600 (default)
imp‐segment: minimum cM length of haplotype segments that will be considered for a target haplotype 6 (default), 20, 50
Recombination map: genetic position of markers Default parameters with Lep‐MAP3 low‐density map
With Lep‐MAP3 low‐density map and iterations = 24, window = 60, imp‐segment = 20
Default parameters with CRI‐MAP map
With CRI‐MAP map and iterations = 24, window = 60, imp‐segment = 20
Input data structure test Reference panel composition and size With distant samples 29 2,629,352 2,497,883
Without distant samples 24
Downsample randomly (10 repeats) 25, 20, 15, 10, 5
Family group Family 1, family 2, family 3, not in family group 29 2,629,352 2,497,883
SNP density (SNPs/Mb) 2728 29 2,629,352 2,497,883
2101 1,923,957
1577 1,443,558
1088 996,326
197 180,407
49 (SNP‐chip) 44,727
20 (RADseq) 17,978
SNP sporadic missingness 40% 30 2,629,352 69,637
50% 788,629

For tests of imputation parameters, we ran imputation once for each test dataset (i.e., a unique sample as the target each time) per unique combination of parameters tested. In all tests of imputation parameters, we masked 5% of SNPs (131,469 SNPs) to create a target with a SNP density of 2728 SNPs/Mb (2,497,883 SNPs across 915 Mb) for imputation.

2.3.2. Tests of Input Data Structure

Second, we wanted to determine the optimal input data structure. For the reference panel, an optimal composition would capture the genetic diversity within the population without introducing haplotypes from external populations, which can reduce imputation accuracy (Deng et al. 2022; Mitt et al. 2017; Pook et al. 2020). We tested the exclusion of samples with the highest average genetic distance from all other samples, which we calculated using plink2 (‐‐make‐king‐table) (Chang et al. 2015; Manichaikul et al. 2010; Purcell and Chang 2021). Using this approach, we excluded five samples: all four samples from the remnant population of Te Hauturu‐o‐Toi and one sample from Tiritiri Matangi. To disentangle the effect of the removal of distant samples on imputation accuracy from the effect of a reduced reference panel size, we also tested imputation using smaller reference panels of 25, 20, 15, 10 and 5 individuals (Table 1). Individuals were randomly sampled from all 30 high‐coverage samples, with 10 repeats per reference panel size. As the outcomes of these tests are population specific, we only used individuals from Tiritiri Matangi as the target in these tests. We also compared the imputation accuracy between the 16 samples within the three family groups observed in our reference panel and the 10 Tiritiri Matangi samples not in these family groups to determine the effect of familial relationships on accuracy. For tests of reference panel composition and family group, we ran imputation once for each test dataset (i.e., a unique sample as the target each time). For tests of reference panel size, we ran imputation 10 times for each test dataset due to the 10 sampling iterations per reference panel size.

We also wanted to validate the effectiveness of imputation from lower SNP densities due to SNPs not having been genotyped. In addition to our tests above with a SNP density of 2728 SNPs/Mb (masking of 5% of SNPs), we also down‐sampled our dataset to densities of 2101, 1577, 1088, 197, 49 and 20 SNPs/Mb (the masking of 27%, 45%, 62%, 93%, 98% and 99% of SNPs, respectively; total length of chromosomes tested is 915 Mb). The number of target SNPs in our tests ranges from 17,978 to 2,497,883 (see Table 1 for the number of target SNPs per SNP density tested). The first four additional densities correspond to the 75th percentile, median, 25th percentile and minimum individual SNP density from our trials of hihi low‐coverage whole‐genome resequencing. The last two values refer to densities from our hihi SNP‐chip and RADseq datasets (de Villemereuil et al. 2019; Duntsch et al. 2021; Lee et al. 2022). SNPs in datasets of lower SNP density are subsets of datasets with higher SNP density. For tests of SNP density, we ran imputation once for each test dataset (i.e., a unique sample as the target each time) per SNP density tested. Lastly, we wanted to validate the effectiveness of imputation from higher sporadic missingness. Sporadic missingness occurs when genotypes are missing in some (but not all) samples at positions across the genome. The reason to investigate the effect of sporadic missingness is that such missingness is imputed during haplotype phasing while ungenotyped markers (missing across all samples) are imputed after haplotype phasing. Sporadic missingness is prominent in low‐coverage whole‐genome resequencing datasets, and filters on SNP missingness in our dataset significantly impact the number of SNPs retained for downstream analyses. We wanted to understand whether a target with lower sporadic missingness but fewer SNPs or a target allowing higher sporadic missingness and with more SNPs would result in higher imputation accuracy. The leave‐one‐out approach was not used for this test since sporadic missingness cannot be tested if only one sample is in the target because any observed missingness will be classified as an ungenotyped marker since it is missing across ‘all’ samples. We used all 30 high‐coverage samples as both the target and the reference by appending ‘dup’ to sample names in the target so that Beagle would not produce an error of duplicated samples between the target and reference. In such cases, imputation accuracies achieved are the upper bound values since the correct genotypes are present in the reference. Sporadic missingness was simulated in the target file using a custom code in R. For sporadic missingness, we compared imputation between a dataset at lower sporadic missingness but with fewer SNPs (69,638 SNPs, ≤ 40% missingness per SNP) and another at higher sporadic missingness but with more SNPs (788,629 SNPs, ≤ 50% missingness per SNP; Table 1). These test values were chosen based on missingness in our trials of hihi low‐coverage whole‐genome resequencing. For tests of sporadic missingness, we ran imputation once for each test dataset (40% and 50% sporadic missingness).

All tests performed are summarised in Table 1.

2.4. Sensitivity Analyses of Basic Population Statistics and Detection of Runs of Homozygosity

Beyond looking at imputation accuracy in terms of r 2, the correlation of dosage values between the imputed and true genotypes (Ramnarine et al. 2015), we also wanted to understand the utility of imputed sequences in downstream analyses. We conducted sensitivity analyses of common imputation accuracy thresholds using the ground truth dataset and the imputed files covering a range of target SNP densities as described above. We also applied r 2 ≥ 0.6 and 0.8 filters on these imputed files as sensitivity analyses to see how imputation accuracy thresholds impact downstream analyses.

We calculated observed individual heterozygosity and the site frequency spectrum (SFS) and detected runs of homozygosity from each of the SNP datasets. Observed individual heterozygosity refers to the proportion of heterozygotes among all genotypes in a sample. We calculated observed heterozygosity using VCFtools v0.1.15 ‐‐het. We derived the number of observed heterozygous sites from the output (N_SITES−O.HOM) and divided that by the total number of SNPs in each dataset to achieve a value for individual heterozygosity. Allele frequencies were calculated using VCFtools v0.1.15 ‐‐freq (Danecek et al. 2011). We visualised the frequency of the least common allele in a folded SFS where ancestral or derived alleles are unknown and not distinguished. The folded SFS was visualised using ggplot2 (Wickham 2016).

The level of inbreeding in our datasets was inferred from runs of homozygosity identified by RZooRoH (Bertrand et al. 2019). We used a 13‐class model (K = 13) with the rates 10, 20, 30, 40, 50, 100, 200, 500, 600, 700, 1000, 2000, 2000, which represent different categories of homozygous‐by‐descent (HBD) segment lengths, with the final class representing the non‐HBD class. The 13‐class model was used as it has been shown to be the best‐fit model for the hihi, capable of capturing the smallest segments that could be detected from the hihi dataset (Duntsch et al. 2021). In contrast to our earlier hihi ROH analyses (Duntsch et al. 2021, 2023), genotype likelihoods generated by Beagle were used as input for RZooRoH in place of genotype calls to capture the uncertainty from variation in coverage from whole‐genome resequencing datasets (Lou et al. 2021). Realised autozygosity per HBD class per individual was visualised using ggplot2 (Wickham 2016) and the distribution of HBD segments per individual was summarised. RZooRoH was also run on target datasets (pre‐imputation) across the range of SNP densities to determine whether imputation was effective in overcoming biases in the inference of runs of homozygosity from low‐density datasets (i.e., both failure to detect short runs, and incorrect merger of shorter runs into longer runs; Duntsch et al. 2021; Lavanchy and Goudet 2023).

3. Results

3.1. Imputation Tests on High‐Coverage Individuals

By comparing imputed to ground truth genotypes, we find that overall, imputation using Beagle v5.4 on our high‐coverage whole‐genome resequencing dataset produced high accuracies, with mean r 2 exceeding 0.95 in most tests (Figures 2, 3, 4). Rare and low‐frequency SNPs (MAF < 0.05) presented larger variations in r 2 values and had lower accuracies than SNPs of higher frequencies (Figure 2). Imputation of the 26 Tiritiri Matangi individuals consistently achieved higher accuracies than that for the four Te Hauturu‐o‐Toi individuals (Figures 3 and 4c).

FIGURE 2.

FIGURE 2

Summary of imputation accuracy (r 2) for (a) all SNPs and (b) low‐frequency SNPs (MAF 0–0.05) in an imputed dataset with 30 high‐coverage individuals and a target SNP density of 1577 SNPs/Mb (1,185,794 SNPs—45% were imputed). The distribution of r 2 values is stratified by minor allele frequencies (MAF) calculated from the ground truth dataset. In (a), a histogram showing the number of SNPs per r 2 value bin is shown to the right. In (b), the range (0.02‐0.03) is skipped as no alleles can have that frequency given the number of individuals.

FIGURE 3.

FIGURE 3

Imputation accuracy, measured by genotype correlation r 2, of tests of imputation parameters including (a) effective population size N e values from 10 to 1,000,000, and (b) the use of linkage maps constructed using Lep‐MAP3 or CRI‐MAP, with the default parameters window (w) = 40, iterations (i) = 12 and imp‐segment (sg) = 6, or with parameters tweaked to w = 60, i = 24 and sg = 20. All tests were conducted at a target SNP density of 2728 SNPs/Mb (2,497,883 SNPs). Points represent the results for each sample and different colours represent different populations. The grey boxplots represent the distribution of r 2 values across samples, and the numbers above represent the mean r 2 value across individuals. In both tests, all 30 samples were used once as the target, with the other 29 samples (30 − 1) comprising the reference panel following the leave‐one‐out approach. Unless otherwise mentioned, imputation was performed using default parameters.

FIGURE 4.

FIGURE 4

Imputation accuracy, measured by genotype correlation r 2, of tests of input data structure, including (a) reference panel composition with and without the five most distant samples, (b) whether a sample is part of any family groups and (c) various target SNP density (SNPs/Mb). Tests in panels (a) and (b) were conducted at a target SNP density of 2728 SNPs/Mb (2,497,883 SNPs). Points represent the results for each sample and different colours represent different populations. The grey boxplots represent the distribution of r 2 values across samples, and the numbers above represent the mean r 2 value across individuals. In (a) and (b), only samples from the main study population Tiritiri Matangi (n = 26) were tested as the target. In (c), all 30 samples were used once as the target, with the other 29 samples (30 − 1) comprising the reference panel. For (a), the number of individuals in the reference panel is stated in the x‐axis tick labels. For (b), there were 29 samples (30 − 1) in the reference panel. In panel (c), the range of SNP densities results from the masking of between 5% and 99% of SNPs. Imputation was performed using the default parameters window = 40, iterations = 12 and imp‐segment = 6.

3.1.1. Imputation Parameters

Effective population size (N e) parameter settings using default (automatically estimated) and 100,000 produced the highest imputation accuracies (Figure 3a). Automatic estimates for N e were typically between 10,000 and 20,000. When tested extensively in combination with other parameters, imputation accuracy is very similar across all combinations to when the default settings were used, with the highest and lowest mean r 2 values having only a difference of 0.002 (Figure S2). When accuracy results are averaged across all tests with one parameter setting fixed, we see no parameters other than N e that were particularly crucial for imputation accuracy (Figure 3a; Figure S3). The parameter combination that resulted in the highest overall imputation accuracy (mean r 2 = 0.97) was using the default settings for N e (automatically estimated), imp‐states (= 1600) and imp‐segment (= 6), with non‐default settings for window (= 60) and iterations (= 24). Imputation accuracy with the inclusion of a linkage map is dependent on the map used. Using the map made with CRI‐MAP produced very similar accuracies to the accuracy achieved with Beagle default settings and without a map, while using the Lep‐MAP3 map resulted in a decrease in accuracy (Figure 3b). This is despite overall recombination rates of the Lep‐MAP3 map (1.99 cM/Mb, vs. CRI‐MAP's 2.56 cM/Mb) being closer to the constant recombination rate of 1 cM per Mb implemented by Beagle (Scherer 2017; Tan et al. 2024). Using more iterations, a larger sliding window size, and a higher minimum haplotype length improved accuracy marginally for both maps (Figure 3b).

3.1.2. Input Data Structure

For reference panel composition, imputation using a smaller reference panel led to only small reductions in mean accuracy even when the size of the reference panel was halved, but saw more pronounced reductions in accuracy and increased variation within the samples when the reference panel size reduced further (Figure 4a). Imputation accuracy after removal of the most distant samples from the reference panel was similar to that using a similar‐sized reference panel comprised of randomly selected individuals (Figure 4a). This suggests that the most distant samples were still informative for imputation. Imputation accuracy of individuals not part of the three family groups presented lower mean accuracy but fell within the variation seen in one of the family groups (family 2) (Figure 4b). Imputation accuracy is high (r 2 ≥ 0.95) across various target SNP densities down to 1088 SNPs/Mb (62% of SNPs masked), with a slight decrease at SNP densities of 197 SNPs/Mb (93% masked). At SNP densities ≤ 49 SNPs/Mb (98% masked) (i.e., the simulated reduced‐representation SNP‐chip and RADseq datasets), accuracy decreased sharply (Figure 4c). For sporadic missingness, a target with more SNPs (788,326 SNPs) at higher sporadic missingness (≤ 50%) achieved a much higher r 2 of 0.933, compared to 0.738 when using a target with fewer SNPs (69,638) at lower sporadic missingness (≤ 40%).

3.1.3. Basic Population Statistics and Sensitivity Analyses Using Imputed Sequences

Across all downstream analyses, we found that results from SNP densities of 2728 SNPs/Mb (5% masked) down to 1088 SNPs/Mb (62% masked) remained relatively similar to the ground truth for heterozygosity, site frequency spectra and runs of homozygosity. Greater deviations from the ground truth were observed at lower SNP densities (≤ 197 SNPs/Mb; more than 93% masked). The effect of filtering for imputation quality depended on the specific analysis performed.

3.1.3.1. Site Frequency Spectrum

Raw imputed files show similar SFS plots to our ground truth down to a target SNP density of 1088 SNPs/Mb. Below a SNP density of 197 SNPs/Mb, the proportion of SNPs imputed with an allele frequency of 0 (fixed alleles) increases sharply, contributing to a disproportionately high proportion of low‐frequency SNPs. As expected, the more stringent filtering threshold (r 2 ≥ 0.8) results in more SNPs being discarded than when a less stringent threshold is applied (r 2 ≥ 0.6), although the total remaining SNP numbers were similar between the two thresholds, particularly for moderate levels of SNP densities (Figure 5; Table 2). The high proportion of SNPs shared between datasets filtered using both thresholds is likely responsible for the similarity in results across all analyses at higher target SNP density. The r 2 thresholds applied predominantly remove low‐frequency SNPs, as seen from the reduced counts compared to the raw imputed SNPs (Figure 5). At r 2 thresholds of both 0.6 and 0.8, the number of imputed SNPs retained increases with decreasing target SNP density (i.e., an increasing number of imputed SNPs) up until a target SNP density of 197 SNPs/Mb and decreases after that, suggesting that very low SNP densities result in lower r 2 values overall (Table 2; Figure 4c). Both filtering thresholds produced datasets with very similar SFS profiles.

FIGURE 5.

FIGURE 5

Site frequency spectrums of ground truth (green graphs in the first column) and imputed SNPs with no accuracy filters (raw imputed; blue graphs, first row) and with two levels of accuracy filters: r 2 ≥ 0.6 (blue graphs, second row), r 2 ≥ 0.8 (blue graphs, last row). The ground truth data has been duplicated in the first column of each row for ease of reference. Raw imputed files show similar profiles to the ground truth except at very low target SNP density (≤ 49 SNPs/Mb), where the proportion of low‐frequency alleles increases. Both r 2 accuracy filters (0.6 and 0.8) performed similarly and predominantly resulted in the removal of low‐frequency alleles. Black areas on the histogram indicate alleles with a folded allele frequency of 0, indicating they were imputed as fixed despite being variable in the ground truth. Note the different scales on the y‐axis for the raw imputed values (top row) to accommodate the large number of fixed alleles in the low SNP density datasets.

TABLE 2.

Number of SNPs present in the target files, raw imputed files and filtered imputed files across various datasets. Information is provided for target files across different SNP densities (SNPs/Mb) and in the filtered imputed files with two levels of filtering (r 2 ≥ 0.6 or 0.8).

Target SNP density (SNPs/Mb) Target SNP number Imputed SNPs Imputed SNPs R 2 ≥ 0.6 Final SNPs R 2 ≥ 0.6 Proportion imputed in final SNPs R 2 ≥ 0.6 Imputed SNPs R 2 ≥ 0.8 Final SNPs R 2 ≥ 0.8 Proportion imputed in final SNPs R 2 ≥ 0.8 Difference in SNPs between R 2 0.6 and 0.8
2872 (ground truth) 2,629,352 0 NA NA NA NA NA NA
2728 2,497,883 131,469 114,358 2,612,241 4.38 110,921 2,608,804 4.25 3437
2101 1,923,957 705,395 613,289 2,537,246 24.17 594,932 2,518,889 23.62 18,357
1577 1,443,558 1,185,794 1,029,065 2,472,623 41.62 999,035 2,442,593 40.90 30,030
1088 996,326 1,633,026 1,414,027 2,410,353 58.66 1,372,862 2,369,188 57.95 41,165
197 180,407 2,448,945 2,051,889 2,232,296 91.92 1,948,362 2,128,769 91.53 103,527
49 (SNP‐chip) 44,727 2,584,625 1,745,262 1,789,989 97.50 1,489,990 1,534,717 97.09 255,272
20 (RADseq) 17,978 2,611,374 1,238,970 1,256,948 98.57 964,890 982,868 98.17 274,080
3.1.3.2. Individual Heterozygosity

Calculations of individual heterozygosity from our ground truth showed a median heterozygosity of 0.321. Raw imputed files all had lower heterozygosity values than the ground truth, which generally decreases with decreasing SNP densities in the target (where more SNPs were imputed). However, the deviation in results was minimal down to a SNP density of 1088 SNPs/Mb and increased drastically at the lowest SNP densities of 49 and 20 SNPs/Mb (Figure 6). When SNPs were filtered for imputation accuracy, heterozygosity values were inflated relative to the stringency of the filter applied. Filtering of SNPs at the lowest SNP densities of 49 and 20 SNPs/Mb was able to reduce the extent of deviation from the ground truth.

FIGURE 6.

FIGURE 6

Individual heterozygosity results from ground truth (green bars) and imputed datasets (blue bars). The shades of blue represent the three levels of accuracy filters (r 2, correlation between imputed and ground truth genotype dosages) applied on the imputed datasets: No filter (raw imputed file), only retaining SNPs with r 2 ≥ 0.6 and only retaining SNPs with r 2 ≥ 0.8. In imputed files, median heterozygosity calculated across all samples decreased moderately with decreasing target SNP densities down to 197 SNP/Mb and decreased drastically below a SNP density of 49 SNPs/Mb. Filtering increases median heterozygosity relative to filtering strictness.

3.1.3.3. Runs of Homozygosity

RZooRoH results of our ground truth dataset revealed that across all individuals, about half of the hihi genome is homozygous‐by‐descent (0.437–0.523) (Figure 7). Most of that is contributed by inbreeding from generations further in the past (> 100 generations ago), with the remainder being contributions from inbreeding roughly 50–100 generations ago. The total inbreeding coefficient increases with decreasing target SNP density, especially at very low SNP density (≤ 49 SNPs/Mb). At low SNP density, the proportion of inbreeding contributed by inbreeding of very recent ancestors (5–15 generations ago) also increases. ROHs detected in the ground truth file have a median length of 39.8 Mb and imputed files with target SNP density more than 1088 SNPs/Mb gave similar results (Table S1). At lower target SNP density, longer ROHs are detected, which reflect contributions from more recent inbreeding events. Both r 2 filters have minimal effect on RZooRoH results. Compared to the RZooRoH results from target files pre‐imputation, imputed files showed more similarity with the ground truth, were able to capture shorter ROHs and were more similar in the total inbreeding coefficients (Figure 7). Further, imputed data better conserved the ranking of most to least inbred individuals, especially at low SNP density (Table S2).

FIGURE 7.

FIGURE 7

RZooRoH results of the ground truth from whole‐genome resequenced SNPs, from imputed SNPs without and with filtering at r 2 thresholds of 0.6 or 0.8, and from targets pre‐imputation, at varying levels of SNP density. Each bar represents an individual, and individuals are presented in the same order across plots. The y‐axis represents the proportion of the genome in runs of homozygosity (ROHs) and the whole‐genome inbreeding coefficient per individual. Different colours represent the contribution of inbreeding over different timescales.

4. Discussion

4.1. Beagle 5.4 Can Impute Datasets to High Accuracy Despite a Small Reference Panel

Beagle v5.4 is typically used to impute from SNP arrays (Browning et al. 2018) but is increasingly applied on targets that were genotyped using low‐coverage whole‐genome resequencing (Bell et al. 2023; Hui et al. 2020; Wang et al. 2024; Watowich et al. 2023). Its application to low‐coverage whole‐genome resequencing has yielded high accuracies (Deng et al. 2022), and its computational efficiency in terms of run time is favourable with many markers (De Marino et al. 2022). We found that the performance of Beagle v5.4 is robust regardless of the parameters used, which agrees with the findings of extensive tests on earlier Beagle versions 5.0 and 5.1 (Pook et al. 2020). Interestingly, Beagle returned high accuracies despite our small reference panel. There could be a few reasons for this. First, our main study population is relatively small and inbred with low genetic diversity and high linkage disequilibrium (de Villemereuil et al. 2019; Duntsch et al. 2023; Lee et al. 2022). Despite using only 29 (30 − 1) high‐coverage individuals in our reference panel in our leave‐one‐out tests, which is one of the smaller reference panels in the literature, we achieved higher mean r 2 imputation accuracies than those of other studies with both similar‐sized and larger reference datasets (Friedenberg and Meurs 2016; Reich et al. 2022; Ye et al. 2018). Further, reducing our reference panel by half to a size of 15 individuals saw only a marginal drop in imputation accuracy. In datasets with low genetic diversity, a modest reference panel is likely sufficient to capture the diversity of haplotype clusters present and be informative for imputation regardless of imputation parameters (Friedenberg and Meurs 2016; Phocas 2022).

Additionally, the selection of high‐fecundity individuals for use on our reference panel likely improved the representation of the diversity of haplotypes in our dataset, allowing imputation at high accuracy. A similar selection strategy for Large White pigs ( Sus domesticus ) and white layer chickens ( Gallus gallus domesticus) showed that using common sires (i.e., sires utilised in a large number of matings) in the reference panel improved accuracy, especially for rare SNPs (Heidaritabar et al. 2015; Wang et al. 2024). When subsampling to a smaller reference panel, a small but similar decrease in imputation accuracy occurred, whether the most genetically distant individuals or random individuals were removed, illustrating that each reference individual contributed unique haplotypes likely on account of the shared population history. Similarly, the poorer imputation accuracy of the four Te Hauturu‐o‐Toi individuals likely reflects a suboptimal reference panel for those samples. The reference panel is mostly made up of individuals from Tiritiri Matangi, a bottlenecked population that is unlikely to capture the haplotype diversity present in the Te Hauturu‐o‐Toi population.

4.2. Imputation Parameters and Input Data Structure

4.2.1. Leave‐One‐Out Tests Allow Direct Assessment of Imputation Accuracy

The leave‐one‐out test provided a ground truth for directly assessing imputation accuracy when performing our tests. Direct assessment of imputation accuracy provides a more accurate evaluation of the imputation quality, particularly for rare variants, to prevent biases in downstream applications (Ramnarine et al. 2015). The leave‐one‐out method uses only one target sample, which prohibits the accurate calculation of the per‐variant dosage correlation metric r 2, as there are only two data points (two alleles) per variant (B. Browning, personal communication, 22 October 2024) (Dorji et al. 2024). However, imputed files can be merged across samples before evaluation against the ground truth for calculating per‐variant r 2 values.

4.2.2. Preliminary Testing Could Reveal the Suitability of Default Parameters to a Dataset

We also found that the default parameters in Beagle v5.4 were well‐optimised to our dataset. For N e, we found that imputation was more accurate when using the default setting, which automatically estimates N e, than if we had gone with a lower N e value determined by past studies (Brekke et al. 2011). We note that when using Beagle v5.2 and older versions, the default N e setting is fixed at 1,000,000 as optimised for a large, outbred population and so lowering the parameter N e would be recommended for datasets with lower genetic diversity. The marginal improvements obtained from tweaking other parameters suggest that default parameters in Beagle v5.4 worked well for our dataset and could be applied in our case. However, the effectiveness of these default settings is dataset dependent. Some parameter testing would be necessary for best outcomes, especially since the imputed dataset will likely be applied in downstream analyses. Nonetheless, for studies where imputation is used as an intermediate quality control step (Pook et al. 2020) or where detailed information on genome characteristics, such as recombination rates, can be challenging to obtain, the performance of the default settings could be encouraging where extensive testing is not possible.

4.2.3. Sufficient SNP Densities Are Crucial for Imputation Accuracy

Overall, sufficient SNP densities in the target, which for our dataset was ideally ≥ 1088 SNPs/Mb, were crucial for imputation accuracy. Imputation from reduced‐representation datasets may need to employ a larger reference panel or two‐step imputation for improvements in accuracy (Ye et al. 2018). For low‐coverage whole genome resequencing, we found that datasets with more SNPs but slightly more sporadic missingness are better for imputation accuracy (Chen et al. 2022). This is likely due to the much higher SNP densities in the target, which are more informative for phasing and imputation across genome regions. In our study, lower SNP densities were simulated by random masking of SNPs. Future studies could assess imputation accuracy in specific genomic regions that display clustered and higher rates of missingness.

4.2.4. The Addition of a Linkage Map Might Require Optimisation of Imputation Parameters

We found that providing a linkage map did not improve imputation accuracy regardless of whether parameters were tweaked in Beagle. Our tests using different maps suggest that recombination rates at fine scales are likely to be more crucial than overall recombination rates, and the provision of a suitable map is crucial since changes in parameters from their defaults only provided marginal improvements in imputation accuracy. Pook et al. (2020) suggest that including a linkage map can lead to an increase in error rates due to changes in the length and number of haplotypes considered. The use of a linkage map with higher recombination rates means that more haplotypes will meet the minimum cM length threshold (imp‐segment) and be included in the haplotype cluster, which could introduce uncertainty and errors. Further studies are needed to investigate the determinants of accuracy when using a linkage map for imputation.

4.3. Sensitivity of Analyses to Filtering

Interestingly, unfiltered datasets of down to 1088 SNPs/Mb (62% of SNPs masked) had good consistency in their site frequency spectra, heterozygosity values and inbreeding estimates relative to the ground truth values. When SNPs were filtered based on accuracy, the site frequency spectra shifted to a paucity of rare alleles and overall heterozygosity increased. For SNP density below 1088 SNPs (50% masked), larger deviations from the ground truth were apparent in the allele frequency, heterozygosity and inbreeding metrics. In allele frequency and heterozygosity analyses, which used called genotypes, filtering helped reduce deviations mainly by removing excess homozygous genotypes. This is likely owing to the lower imputation accuracy of rare alleles, a feature common to other imputation studies (Gilly et al. 2019; Ramnarine et al. 2015). In inbreeding metrics, which used genotype likelihoods, filtering SNPs by accuracy exacerbated the deviations from ground truth in some cases, perhaps due to the lower number of input SNPs (Duntsch et al. 2021). Genotype biases in imputation are also such that heterozygous regions are more likely to be wrongly imputed (Wragg et al. 2024). Filtering threshold choice should, therefore, carefully consider factors like the number of SNPs, the allele frequency spectra thereof retained post‐filtering, and the sensitivity of analyses to the presence of fixed homozygous alleles. We note, however, that RZooRoH inference of runs of homozygosity was relatively robust to even low SNP densities and strict filtering, despite the post‐imputation shifts in heterozygosity and rare allele frequencies. This suggests that utilising genotype likelihoods, rather than genotype calls, when running RZooRoH is likely to have accurately accounted for these heterozygosity and allele frequency shifts.

4.4. Imputation for Natural Populations

Conservation genomics increasingly relies on association studies and landscape genomics to understand adaptive genetic variation and to inform management strategies (Hohenlohe et al. 2021). Since these analyses improve with higher marker density and reduced missingness, our finding that imputation is feasible in smaller, often more threatened, natural populations is highly encouraging. Long‐term studies with a comprehensive understanding of population dynamics and baseline genetic information (Sheldon et al. 2022), such as genetic diversity, inbreeding and reference genome availability, are especially good candidates for imputation. These studies can leverage existing information to improve imputation accuracy through reference panel individual selection and physical mapping of genomic markers (Munyengwa et al. 2021). The reference panel size should scale according to the genetic diversity and linkage disequilibrium expected in the population, and small reference panels could return high accuracies if applied to a population with low genetic diversity. If pedigree information is available, it can be used to select individuals with high fecundity for the reference panel or to inform phasing within trios (Galla et al. 2022). This is especially important for datasets with structured populations where population ancestry should be taken into account alongside genetic distance to maximise imputation accuracy (Dekeyser et al. 2023). Studies should maximise SNP density where possible as it produces more accurate downstream results and is more effective than filtering post‐imputation. These considerations are important for imputation from low‐coverage whole‐genome resequencing data that can face high phasing error rates and inaccurate genotypes at low sequencing depth (Wragg et al. 2024). Our work provides a workflow and recommendations for improving and assessing imputation accuracies in wild population resequencing datasets to leverage the benefits of imputation.

Author Contributions

Hui Zhen Tan and Anna W. Santure designed the study. Katarina C. Stuart mapped resequencing reads and called SNPs against the reference genome assembled by Sarah Bailey and Annabel Whibley. Hui Zhen Tan designed and conducted all analyses with advice from Anna W. Santure and Tram Vi. Hui Zhen Tan produced the figures and wrote the paper with input from Anna W. Santure. Patricia Brekke contributed to hihi sampling coordination. All authors commented on and approved the final manuscript.

Disclosure

Benefit‐Sharing Statement: Benefits Generated—We described the contributions of all individuals and organisations, including an Indigenous group, public service officers and conservation rangers, in our acknowledgements section. Benefits from this research accrue from the sharing of our research findings and pipelines on public databases as described above.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Appendix S1: men70024‐sup‐0001‐AppendixS1.pdf.

MEN-25-e70024-s001.pdf (542.8KB, pdf)

Acknowledgements

We acknowledge Ngāti Manuhiri as Mana Whenua and Kaitiaki of Te Hauturu‐o‐Toi and its taonga, including hihi. We extend many thanks to the volunteers, past students and Department of Conservation staff who have contributed to hihi recovery. Our thanks to John Ewen, co‐chair of the Hihi Recovery Group, for long‐term coordination of hihi sampling, his enthusiastic support to conduct this research and feedback on the manuscript. We acknowledge Selina Patel for conducting and optimising all DNA extractions. We also thank the AgResearch Animal Genomics team, particularly Tracey Van Stijn, Rudiger Brauning and Shannon Clarke, for hihi Illumina sequencing. All work was undertaken in accordance with Department of Conservation permits 53614‐FAU and 66751‐FAU. A Strategic Science Investment Fund in Data Science from the Ministry of Business, Innovation and Employment supports Anna W. Santure and Hui Zhen Tan. The High Quality Genomes, and High Quality Genomes and Population Genomics (HQG + PG) projects I and II of Genomics Aotearoa supported Anna W. Santure, Annabel Whibley and Katarina C. Stuart and funded the whole‐genome resequencing data and sequencing of the reference genome utilised in this study. Funding from the George Mason Center for the Natural Environment and the Little Barrier Island (Hauturu) Supporters Trust supported fieldwork collection from Te Hauturu‐o‐Toi. Tram Vi is supported by a University of Auckland Faculty of Science Research Development Fund awarded to Anna W. Santure. Patricia Brekke is supported by Research England. The authors wish to acknowledge the use of New Zealand eScience Infrastructure (NeSI) high performance computing facilities, consulting support and training services as part of this research. New Zealand’s national facilities are provided by NeSI and funded jointly by NeSI's collaborator institutions and through the Ministry of Business, Innovation & Employment's Research Infrastructure programme. URL https://www.nesi.org.nz. We are very grateful to the reviewer and editor who provided helpful suggestions for the manuscript. Open access publishing is facilitated by The University of Auckland, as organised by the Council of Australian University Librarians and its Member Institutions. Open access publishing facilitated by The University of Auckland, as part of the Wiley ‐ The University of Auckland agreement via the Council of Australian University Librarians.

Handling Editor: Jason Bragg

Funding: This work was supported by Genomics Aotearoa, Little Barrier Island (Hauturu) Supporters Trust, Ministry of Business, Innovation and Employment, George Mason Centre for the Natural Environment, University of Auckland, Research England, University of Auckland Faculty of Science Research Development Fund.

Contributor Information

Hui Zhen Tan, Email: htan626@aucklanduni.ac.nz.

Anna W. Santure, Email: a.santure@auckland.ac.nz.

Data Availability Statement

Hihi are of cultural significance to the Indigenous People of Aotearoa New Zealand, the Māori, and are considered a taonga (treasured) species whose whakapapa (genealogy) is intricately tied to that of Māori. The sequencing and variant data are archived on the Aotearoa Genomic Data Repository (AGDR) and can be viewed using the Project Code NZ‐00068 and the Dataset ID AGDR00068 (https://doi.org/10.57748/a39d‐gx62). These data will be made available by request on the recommendation of Ngāti Manuhiri, the iwi (extended kinship group) that affiliates as kaitiaki (guardians) for hihi. Pipelines and analysis codes are available at https://github.com/tanhuizhen/Hihi_Imputation, and the linkage maps (marker names and locations) are available at https://github.com/tanhuizhen/Hihi_Linkage‐mapping/tree/main/Results_hihi_linkage_map.

References

  1. Bailey, S. , Guhlin J., Senanayake D. S., et al. 2025. “Assembly of Female and Male Hihi Genomes (Stitchbird; Notiomystis cincta) Enables Characterization of the W Chromosome and Resources for Conservation Genomics.” Molecular Ecology Resources 25, no. 5: e13823. 10.1111/1755-0998.13823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bell, S. M. , Evans J. M., Greif E. A., Tsai K. L., Friedenberg S. G., and Clark L. A.. 2023. “GWAS Using Low‐Pass Whole Genome Sequence Reveals a Novel Locus in Canine Congenital Idiopathic Megaesophagus.” Mammalian Genome 34, no. 3: 464–472. 10.1007/s00335-023-09991-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bergner, L. M. , Dussex N., Jamieson I. G., and Robertson B. C.. 2016. “European Colonization, Not Polynesian Arrival, Impacted Population Size and Genetic Diversity in the Critically Endangered New Zealand Kakapo.” Journal of Heredity 107, no. 7: 593–602. 10.1093/jhered/esw065. [DOI] [PubMed] [Google Scholar]
  4. Bergner, L. M. , Jamieson I. G., and Robertson B. C.. 2014. “Combining Genetic Data to Identify Relatedness Among Founders in a Genetically Depauperate Parrot, the Kakapo ( Strigops habroptilus ).” Conservation Genetics 15, no. 5: 1013–1020. 10.1007/s10592-014-0595-y. [DOI] [Google Scholar]
  5. Bertrand, A. R. , Kadri N. K., Flori L., Gautier M., and Druet T.. 2019. “RZooRoH: An R Package to Characterize Individual Genomic Autozygosity and Identify Homozygous‐by‐Descent Segments.” Methods in Ecology and Evolution 10, no. 6: 860–866. 10.1111/2041-210X.13167. [DOI] [Google Scholar]
  6. Blanchong, J. A. , Robinson S. J., Samuel M. D., and Foster J. T.. 2016. “Application of Genetics and Genomics to Wildlife Epidemiology.” Journal of Wildlife Management 80, no. 4: 593–608. 10.1002/jwmg.1064. [DOI] [Google Scholar]
  7. Brekke, P. , Bennett P. M., Santure A. W., and Ewen J. G.. 2011. “High Genetic Diversity in the Remnant Island Population of Hihi and the Genetic Consequences of Re‐Introduction.” Molecular Ecology 20, no. 1: 29–45. 10.1111/j.1365-294X.2010.04923.x. [DOI] [PubMed] [Google Scholar]
  8. Brekke, P. , Bennett P. M., Wang J., Pettorelli N., and Ewen J. G.. 2010. “Sensitive Males: Inbreeding Depression in an Endangered Bird.” Proceedings of the Royal Society B: Biological Sciences 277, no. 1700: 3677–3684. 10.1098/rspb.2010.1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Broad Institute . 2019. “Picard Toolkit.” https://broadinstitute.github.io/picard/.
  10. Browning, B. 2022. “Beagle 5.4 Documentation.”
  11. Browning, B. L. , Tian X., Zhou Y., and Browning S. R.. 2021. “Fast Two‐Stage Phasing of Large‐Scale Sequence Data.” American Journal of Human Genetics 108, no. 10: 1880–1890. 10.1016/J.AJHG.2021.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Browning, B. L. , Zhou Y., and Browning S. R.. 2018. “A One‐Penny Imputed Genome From Next‐Generation Reference Panels.” American Journal of Human Genetics 103, no. 3: 338–348. 10.1016/j.ajhg.2018.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Browning, S. R. 2008. “Missing Data Imputation and Haplotype Phase Inference for Genome‐Wide Association Studies.” Human Genetics 124, no. 5: 439–450. 10.1007/s00439-008-0568-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chambers, E. A. , Bishop A. P., and Wang I. J.. 2023. “Individual‐Based Landscape Genomics for Conservation: An Analysis Pipeline.” Molecular Ecology Resources 25: e13884. 10.1111/1755-0998.13884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chang, C. C. , Chow C. C., Tellier L. C., Vattikuti S., Purcell S. M., and Lee J. J.. 2015. “Second‐Generation PLINK: Rising to the Challenge of Larger and Richer Datasets.” GigaScience 4, no. 1: 7. 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen, L. , Yang S., Araya S., et al. 2022. “Genotype Imputation for Soybean Nested Association Mapping Population to Improve Precision of QTL Detection.” Theoretical and Applied Genetics 135, no. 5: 1797–1810. 10.1007/s00122-022-04070-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chen, S. F. , Dias R., Evans D., et al. 2020. “Genotype Imputation and Variability in Polygenic Risk Score Estimation.” Genome Medicine 12, no. 1: 1–13. 10.1186/s13073-020-00801-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Daetwyler, H. D. , Capitan A., Pausch H., et al. 2014. “Whole‐Genome Sequencing of 234 Bulls Facilitates Mapping of Monogenic and Complex Traits in Cattle.” Nature Genetics 46, no. 8: 858–865. 10.1038/ng.3034. [DOI] [PubMed] [Google Scholar]
  19. Danecek, P. , Auton A., Abecasis G., et al. 2011. “The Variant Call Format and VCFtools.” Bioinformatics 27, no. 15: 2156–2158. 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Davies, R. W. , Flint J., Myers S., and Mott R.. 2016. “Rapid Genotype Imputation From Sequence Without Reference Panels.” Nature Genetics 48, no. 8: 965–969. 10.1038/ng.3594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. De Marino, A. , Amr Mahmoud A., Bose M., et al. 2022. “A Comparative Analysis of Current Phasing and Imputation Software.” PLoS One 17: e0260177. 10.1371/journal.pone.0260177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. de Villemereuil, P. , Rutschmann A., Lee K. D., Ewen J. G., Brekke P., and Santure A. W.. 2019. “Little Adaptive Potential in a Threatened Passerine Bird.” Current Biology 29, no. 5: 889–894. 10.1016/j.cub.2019.01.072. [DOI] [PubMed] [Google Scholar]
  23. Dekeyser, T. , Génin E., and Herzig A. F.. 2023. “Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance.” Genes 14, no. 2: 410. 10.3390/genes14020410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Deng, T. , Zhang P., Garrick D., Gao H., Wang L., and Zhao F.. 2022. “Comparison of Genotype Imputation for SNP Array and Low‐Coverage Whole‐Genome Sequencing Data.” Frontiers in Genetics 12: 704118. 10.3389/fgene.2021.704118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Dorji, J. , Chamberlain A. J., Reich C. M., et al. 2024. “Mitochondrial Sequence Variants: Testing Imputation Accuracy and Their Association With Dairy Cattle Milk Traits.” Genetics Selection Evolution 56, no. 1: 1–9. 10.1186/s12711-024-00931-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Duntsch, L. , Whibley A., Brekke P., Ewen J. G., and Santure A. W.. 2021. “Genomic Data of Different Resolutions Reveal Consistent Inbreeding Estimates but Contrasting Homozygosity Landscapes for the Threatened Aotearoa New Zealand Hihi.” Molecular Ecology 30, no. 23: 6006–6020. 10.1111/mec.16068. [DOI] [PubMed] [Google Scholar]
  27. Duntsch, L. , Whibley A., de Villemereuil P., et al. 2023. “Genomic Signatures of Inbreeding Depression for a Threatened Aotearoa New Zealand Passerine.” Molecular Ecology 32, no. 8: 1893–1907. 10.1111/mec.16855. [DOI] [PubMed] [Google Scholar]
  28. Dussex, N. , van der Valk T., Morales H. E., et al. 2021. “Population Genomics of the Critically Endangered Kākāpō.” Cell Genomics 1, no. 1: 100002. 10.1016/j.xgen.2021.100002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Frankham, R. 2010. “Challenges and Opportunities of Genetic Approaches to Biological Conservation.” Biological Conservation 143, no. 9: 1919–1927. 10.1016/j.biocon.2010.05.011. [DOI] [Google Scholar]
  30. Friedenberg, S. G. , and Meurs K. M.. 2016. “Genotype Imputation in the Domestic Dog.” Mammalian Genome 27, no. 9–10: 485–494. 10.1007/s00335-016-9636-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Galla, S. J. , Brown L., Couch‐Lewis (Ngāi Tahu Te Hapū O Ngāti Wheke Ngāti Waewae) Y., et al. 2022. “The Relevance of Pedigrees in the Conservation Genomics Era.” Molecular Ecology 31, no. 1: 41–54. 10.1111/mec.16192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Garrison, E. , Kronenberg Z. N., Dawson E. T., Pedersen B. S., and Prins P.. 2022. “A Spectrum of Free Software Tools for Processing the VCF Variant Call Format: vcflib, bio‐vcf, cyvcf2, hts‐nim and slivar.” PLoS Computational Biology 18: e1009123. 10.1371/journal.pcbi.1009123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Georges, M. , Charlier C., and Hayes B.. 2019. “Harnessing Genomic Information for Livestock Improvement.” Nature Reviews Genetics 20, no. 3: 135–156. 10.1038/s41576-018-0082-2. [DOI] [PubMed] [Google Scholar]
  34. Gilly, A. , Southam L., Suveges D., et al. 2019. “Very Low‐Depth Whole‐Genome Sequencing in Complex Trait Association Studies.” Bioinformatics 35, no. 15: 2555–2561. 10.1093/bioinformatics/bty1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Green, P. , Falls K., and Crooks S.. 1990. “Documentation for CRI‐MAP, Version 2.4 (3/26/90).” https://www.animalgenome.org/tools/share/crimap/.
  36. Guhlin, J. , Le Lec M. F., Wold J., et al. 2023. “Species‐Wide Genomics of Kākāpō Provides Tools to Accelerate Recovery.” Nature Ecology & Evolution 7, no. 10: 1693–1705. 10.1038/s41559-023-02165-y. [DOI] [PubMed] [Google Scholar]
  37. Hayward, J. J. , White M. E., Boyle M., et al. 2019. “Imputation of Canine Genotype Array Data Using 365 Whole‐Genome Sequences Improves Power of Genome‐Wide Association Studies.” PLoS Genetics 15, no. 9: 1–21. 10.1371/journal.pgen.1008003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Heidaritabar, M. , Calus M. P. L., Vereijken A., Groenen M. A. M., and Bastiaansen J. W. M.. 2015. “Accuracy of Imputation Using the Most Common Sires as Reference Population in Layer Chickens.” BMC Genetics 16, no. 1: 1–14. 10.1186/s12863-015-0253-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Hohenlohe, P. A. , Funk W. C., and Rajora O. P.. 2021. “Population Genomics for Wildlife Conservation and Management.” Molecular Ecology 30, no. 1: 62–82. 10.1111/mec.15720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Hui, R. , D'Atanasio E., Cassidy L. M., Scheib C. L., and Kivisild T.. 2020. “Evaluating Genotype Imputation Pipeline for Ultra‐Low Coverage Ancient Genomes.” Scientific Reports 10, no. 1: 1–8. 10.1038/s41598-020-75387-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Jiang, Y. , Song H., Gao H., Zhang Q., and Ding X.. 2022. “Exploring the Optimal Strategy of Imputation From SNP Array to Whole‐Genome Sequencing Data in Farm Animals.” Frontiers in Genetics 13: 963654. 10.3389/fgene.2022.963654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Krueger, F. , James F., Ewels P., Afyounian E., and Schuster‐Boeckler B.. 2021. “TrimGalore.” 10.5281/zenodo.5127899. [DOI]
  43. Lavanchy, E. , and Goudet J.. 2023. “Effect of Reduced Genomic Representation on Using Runs of Homozygosity for Inbreeding Characterization.” Molecular Ecology Resources 23, no. 4: 787–802. 10.1111/1755-0998.13755. [DOI] [PubMed] [Google Scholar]
  44. Lee, K. D. , Millar C. D., Brekke P., et al. 2022. “The Design and Application of a 50 K SNP Chip for a Threatened Aotearoa New Zealand Passerine, the Hihi.” Molecular Ecology Resources 22, no. 1: 415–429. 10.1111/1755-0998.13480. [DOI] [PubMed] [Google Scholar]
  45. Li, H. 2011. “A Statistical Framework for SNP Calling, Mutation Discovery, Association Mapping and Population Genetical Parameter Estimation From Sequencing Data.” Bioinformatics 27, no. 21: 2987–2993. 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Li, H. , and Durbin R.. 2009. “Fast and Accurate Short Read Alignment With Burrows‐Wheeler Transform.” Bioinformatics 25, no. 14: 1754–1760. 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lloret‐Villas, A. , Pausch H., and Leonard A. S.. 2023. “The Size and Composition of Haplotype Reference Panels Impact the Accuracy of Imputation From Low‐Pass Sequencing in Cattle.” Genetics Selection Evolution 55, no. 1: 1–11. 10.1186/s12711-023-00809-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Lou, R. N. , Jacobs A., Wilder A. P., and Therkildsen N. O.. 2021. “A Beginner's Guide to Low‐Coverage Whole Genome Sequencing for Population Genomics.” Molecular Ecology 30, no. 23: 5966–5993. 10.1111/mec.16077. [DOI] [PubMed] [Google Scholar]
  49. Luikart, G. , Kardos M., Hand B. K., Rajora O. P., Aitken S. N., and Hohenlohe P. A.. 2018. “Population Genomics: Advancing Understanding of Nature.” In Population Genomics, edited by Rajora O. P., 3–79. Springer. 10.1007/13836_2018_60. [DOI] [Google Scholar]
  50. Manichaikul, A. , Mychaleckyj J. C., Rich S. S., Daly K., Sale M., and Chen W. M.. 2010. “Robust Relationship Inference in Genome‐Wide Association Studies.” Bioinformatics 26, no. 22: 2867–2873. 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Marchini, J. , and Howie B.. 2010. “Genotype Imputation for Genome‐Wide Association Studies.” Nature Reviews Genetics 11, no. 7: 499–511. 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]
  52. McGaughran, A. , Dhami M. K., Parvizi E., et al. 2024. “Genomic Tools in Biological Invasions: Current State and Future Frontiers.” Genome Biology and Evolution 16, no. 1: evad230. 10.1093/gbe/evad230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Miskelly, C. M. , and Powlesland R. G.. 2013. “Conservation Translocations of New Zealand Birds, 1863‐2012.” Notornis 60: 3–28. [Google Scholar]
  54. Mitt, M. , Kals M., Pärn K., et al. 2017. “Improved Imputation Accuracy of Rare and Low‐Frequency Variants Using Population‐Specific High‐Coverage WGS‐Based Imputation Reference Panel.” European Journal of Human Genetics 25, no. 7: 869–876. 10.1038/ejhg.2017.51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Munyengwa, N. , Le Guen V., Bille H. N., et al. 2021. “Optimizing Imputation of Marker Data From Genotyping‐By‐Sequencing (GBS) for Genomic Selection in Non‐Model Species: Rubber Tree ( Hevea brasiliensis ) as a Case Study.” Genomics 113, no. 2: 655–668. 10.1016/j.ygeno.2021.01.012. [DOI] [PubMed] [Google Scholar]
  56. Naj, A. C. 2019. “Genotype Imputation in Genome‐Wide Association Studies.” Current Protocols in Human Genetics 102, no. 1: 1–15. 10.1002/cphg.84. [DOI] [PubMed] [Google Scholar]
  57. Nosil, P. , and Feder J. L.. 2012. “Genomic Divergence During Speciation: Causes and Consequences.” Philosophical Transactions of the Royal Society, B: Biological Sciences 367, no. 1587: 332–342. 10.1098/rstb.2011.0263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Phocas, F. 2022. “Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools.” In Genomic Prediction of Complex Traits. Methods in Molecular Biology, Vol. 2467, edited by Ahmadi N. and Bartholomé J., 113–138. Humana Press Inc. 10.1007/978-1-0716-2205-6_4. [DOI] [PubMed] [Google Scholar]
  59. Pook, T. , Mayer M., Geibel J., et al. 2020. “Improving Imputation Quality in Beagle for Crop and Livestock Data.” G3: Genes, Genomes, Genetics 10, no. 1: 177–188. 10.1534/g3.119.400798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Purcell, S. M. , and Chang C. C.. 2021. “PLINK 2.0.” www.cog‐genomics.org/plink/2.0/.
  61. Ramnarine, S. , Zhang J., Chen L. S., et al. 2015. “When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?” PLoS One 10, no. 10: 1–18. 10.1371/journal.pone.0137601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Rastas, P. 2017. “Lep‐MAP3: Robust Linkage Mapping Even for Low‐Coverage Whole Genome Sequencing Data.” Bioinformatics 33, no. 23: 3726–3732. 10.1093/bioinformatics/btx494. [DOI] [PubMed] [Google Scholar]
  63. Reich, P. , Falker‐Gieske C., Pook T., and Tetens J.. 2022. “Development and Validation of a Horse Reference Panel for Genotype Imputation.” Genetics Selection Evolution 54, no. 1: 1–13. 10.1186/s12711-022-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Scherer, P. 2017. “A High‐Density Genetic Linkage Map Provides Insights Into the Sex‐Specific Recombination Landscape of the Hihi ( Notiomystis cincta ).” Unpublished Honours thesis, University of Auckland.
  65. Sheldon, B. C. , Kruuk L. E. B., and Alberts S. C.. 2022. “The Expanding Value of Long‐Term Studies of Individuals in the Wild.” Nature Ecology & Evolution 6, no. 12: 1799–1801. 10.1038/s41559-022-01940-7. [DOI] [PubMed] [Google Scholar]
  66. Shi, S. , Yuan N., Yang M., et al. 2019. “Comprehensive Assessment of Genotype Imputation Performance.” Human Heredity 83, no. 3: 107–116. 10.1159/000489758. [DOI] [PubMed] [Google Scholar]
  67. Stuart, K. C. , Tan H. Z., Whibley A., et al. 2024. “Both Structural Variant and Single Nucleotide Polymorphism Load Impact Lifetime Fitness in a Threatened Bird Species.” Molecular Ecology e17631. 10.1111/mec.17631. [DOI] [PubMed] [Google Scholar]
  68. Tan, H. Z. , Scherer P., Stuart K. C., et al. 2024. “A High‐Density Linkage Map Reveals Broad‐ and Fine‐Scale Sex Differences in Recombination in the Hihi (Stitchbird; Notiomystis cincta).” Heredity 133, no. 4: 1–275. 10.1038/s41437-024-00711-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Taylor, S. , Castro I., and Griffiths R.. 2005. “Hihi/Stitchbird (Notiomystis cincta) Recovery Plan 2004–09.” Threatened Species Recovery Plan 54.
  70. Wang, X. Q. , Wang L. G., Shi L. Y., et al. 2024. “Imputation Strategies for Low‐Coverage Whole‐Genome Sequencing Data and Their Effects on Genomic Prediction and Genome‐Wide Association Studies in Pigs.” Animal 18, no. 9: 101258. 10.1016/j.animal.2024.101258. [DOI] [PubMed] [Google Scholar]
  71. Watowich, M. M. , Chiou K. L., Graves B., et al. 2023. “Best Practices for Genotype Imputation From Low‐Coverage Sequencing Data in Natural Populations.” Molecular Ecology Resources 10, no. 11: e13854. 10.1111/1755-0998.13854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. White, K. L. , Eason D. K., Jamieson I. G., and Robertson B. C.. 2015. “Evidence of Inbreeding Depression in the Critically Endangered Parrot, the Kakapo.” Animal Conservation 18, no. 4: 341–347. 10.1111/acv.12177. [DOI] [Google Scholar]
  73. Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer‐Verlag New York. 10.1007/978-3-319-24277-4. [DOI] [Google Scholar]
  74. Wragg, D. , Zhang W., Peterson S., et al. 2024. “A Cautionary Tale of Low‐Pass Sequencing and Imputation With Respect to Haplotype Accuracy.” Genetics Selection Evolution 56, no. 1: 1–19. 10.1186/s12711-024-00875-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Ye, S. , Yuan X., Lin X., et al. 2018. “Imputation From SNP Chip to Sequence: A Case Study in a Chinese Indigenous Chicken Population.” Journal of Animal Science and Biotechnology 9, no. 1: 1–12. 10.1186/s40104-018-0241-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Yun, L. , Willer C., Sanna S., and Abecasis G.. 2009. “Genotype Imputation.” Annual Review of Genomics and Human Genetics 10: 387–406. 10.1146/annurev.genom.9.081307.164242. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1: men70024‐sup‐0001‐AppendixS1.pdf.

MEN-25-e70024-s001.pdf (542.8KB, pdf)

Data Availability Statement

Hihi are of cultural significance to the Indigenous People of Aotearoa New Zealand, the Māori, and are considered a taonga (treasured) species whose whakapapa (genealogy) is intricately tied to that of Māori. The sequencing and variant data are archived on the Aotearoa Genomic Data Repository (AGDR) and can be viewed using the Project Code NZ‐00068 and the Dataset ID AGDR00068 (https://doi.org/10.57748/a39d‐gx62). These data will be made available by request on the recommendation of Ngāti Manuhiri, the iwi (extended kinship group) that affiliates as kaitiaki (guardians) for hihi. Pipelines and analysis codes are available at https://github.com/tanhuizhen/Hihi_Imputation, and the linkage maps (marker names and locations) are available at https://github.com/tanhuizhen/Hihi_Linkage‐mapping/tree/main/Results_hihi_linkage_map.


Articles from Molecular Ecology Resources are provided here courtesy of Wiley

RESOURCES