A genome-structure adaptive framework for ROH-based inbreeding estimation in Penaeus vannamei

Xiang Zou; Hao Zhou; Mianyu Liu; Guangfeng Qiang; Ping Dai; Juan Sui; Jie Kong; Kun Luo; Xianhong Meng; Qun Xing; Qiang Fu; Sheng Luan

doi:10.1038/s41598-026-37622-8

. 2026 Jan 30;16:6769. doi: 10.1038/s41598-026-37622-8

A genome-structure adaptive framework for ROH-based inbreeding estimation in Penaeus vannamei

Xiang Zou ^1,^2,⁴, Hao Zhou ^1,², Mianyu Liu ^1,², Guangfeng Qiang ^1,², Ping Dai ^1,², Juan Sui ^1,², Jie Kong ^1,², Kun Luo ^1,², Xianhong Meng ^1,², Qun Xing ³, Qiang Fu ^1,^2,^✉, Sheng Luan ^1,^2,^✉

PMCID: PMC12913633 PMID: 41617790

Abstract

Reliable genomic inbreeding estimation in aquaculture species is challenged by genome fragmentation, high heterozygosity and unoptimized parameters. This study developed a genome-structure adaptive framework for optimizing Runs of Homozygosity (ROH) detection parameters, using Penaeus vannamei as a model. Taking advantage of whole-genome resequencing of five inbred families and two reference genomes with distinct continuity, we established an empirical set of non-overlapping genomic windows in PLINK to capture genomic features like SNP density. Genome coverage served as a key metric for evaluating the robustness of parameters in ROH analysis, while LD half-decay separated LD-driven homozygosity from true IBD signals. This approach identified three genome-sensitive parameters as critical to detection accuracy: SNP density, maximum inter-marker gap and minimum ROH length. Despite divergent optimal thresholds (high-contiguity: 0.5–0.8 SNP/kb, 80–100 kb gap, 30 kb minimum length; fragmented: 4 SNP/kb, 20 kb gap, 10 kb minimum length), F_ROH values remained consistent (r = 0.953) and matched pedigree expectations (F_PED = 0.25), demonstrating scalability and reproducibility. This represents the first ROH parameter optimization method tailored to crustacean genomes, providing a foundation for accurate F_ROH calculation and enhanced cross-study comparability in shrimp and other aquaculture species.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-37622-8.

Keywords: Penaeus vannamei, Runs of homozygosity (ROH), Genome structure, Inbreeding estimation, Parameter optimization

Subject terms: Biotechnology, Computational biology and bioinformatics, Genetics

Introduction

Inbreeding, defined as the production of offspring from the mating of closely related individuals, increases homozygosity ‌and reduces genetic diversity. This process raises the risk that recessive deleterious alleles become homozygous, leading to inbreeding depression¹. Inbreeding depression is the primary detrimental effect of inbreeding, mainly characterized by reduced fitness, reproductive performance², and frequent declines in growth traits in livestock and aquaculture species^3,4. Despite these potential disadvantages, controlled inbreeding with precise estimation and planned mating strategies can also be beneficial in breeding programs by purging deleterious alleles, enhancing uniformity and enabling fine mapping of target traits^5,6. Therefore, precise inbreeding assessment is critical to balance genetic improvement with maintenance of biodiversity^7,8.

The degree of inbreeding is quantified by the inbreeding coefficient (F). F has traditionally been estimated through pedigree-based path analysis⁹, but such estimates accumulate errors across generations and are sensitive to pedigree recording mistakes^10,11. Genomic approaches improve accuracy: F_HOM quantifies excess homozygosity¹², while F_GRM utilizes the genomic relationship matrix¹³. However, both methods cannot distinguish identity by descent (IBD) from identity by state (IBS), leading to potential overestimation of the true inbreeding coefficients¹⁴. Runs of homozygosity (ROH) analysis¹⁵ (F_ROH) overcomes this limitation by directly identifying contiguous homozygous segments caused by inbreeding, and is now considered the most accurate genomic estimator of F¹⁶.

Over the past decade, ROH analysis has become a standard tool in livestock genomics. It has been widely applied in cattle, pigs, sheep, horses and chickens to quantify genomic inbreeding, identify loci associated with inbreeding depression, and investigate selection signatures^17–21. These studies demonstrate that ROH provides robust and reliable insights for breeding programs and population genetics research.

In contrast, the application of ROH in marine aquaculture species remains constrained and faces methodological challenges. First, existing ROH detection methods employ sliding-window scan approaches in tools such as PLINK^22,23, but their results are highly sensitive to parameter setting. Default parameters optimized for livestock SNP arrays are frequently used in aquatic studies²⁴, despite being poorly suited to shrimp genomic features such as high heterozygosity, abundant repetitive sequences, and uneven marker distribution^25,26. Second, in penaeid shrimp and other crustaceans, reference genome assemblies often remain highly fragmented, with short contig N50 (the contig length such that 50% of the assembled genome is contained in contigs of at least this length) values²⁷ and numerous assembly gaps. Such limitations may disrupt ROH continuity and introduce false signals, yet the effects of genome fragmentation on ROH inference have rarely been systematically evaluated. Third, without species-specific parameter optimization, ROH estimates may lead to inconsistent F_ROH values, reducing reproducibility^28,29 and limiting their application in breeding decisions. Collectively, these limitations underscore the urgent need to develop optimized ROH detection strategies tailored to shrimp and other crustacean species.

In this study, we established full-sib inbred families of whiteleg shrimp (Penaeus vannamei) and performed whole-genome resequencing to systematically evaluate PLINK parameters for ROH detection. To explicitly assess how genome assembly continuity influences ROH identification and parameter sensitivity, and to verify methodological accuracy, we conducted all analyses by comparing two reference genomes with contrasting assembly qualities. Based on this comparison, we identified optimal thresholds for eight‌ parameters, with a particular focus on three critical parameters: SNP density, inter-marker gap, and minimum ROH length. To our knowledge, this is‌ the first systematic assessment of ROH parameters optimization in penaeid shrimp, and the proposed framework provides a foundation for accurate estimation of genomic inbreeding coefficients as well as methodological guidance for other aquaculture species.

Materials and methods

Inbred family construction, sequencing and genotyping

The inbred families of P. vannamei were established from a commercial breeding population maintained by BLUP Aquabreed Co., Ltd. (Weifang, Shandong Province, China). The base population was established in 2018 using several batches of domestically sourced broodstock and maintained through a family-based selective breeding program‌. In June 2023, thirteen inbred families (expected F_PED = 0.25) in G6 generation were established through artificial insemination of siblings pairs‌. Non-inbred parental stocks (F_PED ≈ 0) served as controls. When inbred progeny reached ~ 20 g body weight, five families were randomly selected for analysis. Ten individuals per family were sampled, muscle tissues were dissected, frozen in liquid nitrogen, and stored at − 80 °C. Whole-genome resequencing of 50 inbred progeny and 10 parental samples was conducted by Novogene Co., Ltd.

Clean reads were aligned to two reference genomes with contrasting assembly quality to evaluate the impact of genome assembly continuity on SNP genotyping and ROH parameter optimization, using the GTX hardware acceleration platform (Genetalks, China), which integrates FPGA-based implementations of the Smith–Waterman algorithm for sequence alignment and GATK HaplotypeCaller for variant calling³⁰. SNP genotyping followed GATK multi-sample best practices. To ensure data quality, variants were filtered using the following SNP filtering criteria: QD (Quality by Depth) < 2.0, ReadPosRankSum < −8.0, FS (FisherStrand) > 60.0, or MQ (Mapping Quality) < 40.0. No MAF or LD filtering was applied, as such procedures may reduce the sensitivity of ROH detection²⁸.

The two reference genomes used in this study are summarized as follows: Reference genome A was the first publicly available genome²⁵ (NCBI: ASM378908V1). Reference genome B was a high-contiguity assembly recently generated by our research team, with substantial improvements in contiguity and completeness (Supplementary Table S1).

Minimum SNP number in sliding window and ROH

ROH segments were identified using PLINK’s sliding window approach, in which candidate homozygous segments are called when the proportion of homozygous SNPs within a window exceeds a predefined threshold. Final ROHs were defined by applying additional filters on other parameters: (a) Maximum number of heterozygous or missing loci permitted per sliding window (--homozyg-window-het, --homozyg-window-missing); (b) Maximum allowed gap between consecutive SNPs (--homozyg-gap); (c) Minimum SNP density (--homozyg-density); (d) Minimum segment length (--homozyg-kb). The framework depicting the optimization process of all eight parameters for ROH detection is presented in Fig. 1.

Fig. 1 — Optimization framework for the eight parameters in ROH detection.

The minimum number of SNPs per ROH (--homozyg-snp) was calculated using the formula proposed by Lencz, et al.³¹ and Purfield, et al.³²:

where L is the minimum number of SNPs defining an ROH segment; n_s is the number of SNPs per individual; n_i is the total sample number; α is the acceptable false-positive ROH rate (set to 0.05); het is the mean heterozygosity across all SNPs.

The minimum SNP number per sliding window (--homozyg-window-snp) was equated to the minimum SNP requirement in ROH segment (--homozyg-snp), following the methodological framework of Meyermans, et al.²⁸.

Threshold, heterozygotes and missing SNPs of sliding window

The sliding window threshold (--homozyg-window-threshold) was calculated according to the formula proposed by Meyermans, et al.²⁸:

where N_out is the number of outer SNPs on either side of a homozygous segment that should not be included in the final ROH, and Inline graphic is the sliding window size. The “+1” allows the first tolerated SNP within the ROH boundary, and “,3” indicates flooring the result to three decimals. Considering the characteristics of our resequencing data, we adopted N_out = 1, which corresponds to a threshold of 0.018, meaning that we will discard the one outer SNPs on each side of the homozygous segment.

The maximum number of heterozygous genotypes allowed per sliding window (--homozyg-window-het) was set to‌ 3, reflecting the high genotyping error rates typically observed in whole-genome resequencing relative to SNP array³³. For the missing genotype threshold (--homozyg-window-missing), we estimated an empirical value based on the distribution of missing loci across the scanning windows: Genome-wide SNPs were partitioned into consecutive, non-overlapping windows, with window size determined by the minimum SNP requirement for ROH segments (--homozyg-snp). Within this empirical non-overlapping window set, the average number of missing loci per window was calculated. A threshold was then set above this average value to achieve an appropriate balance between ROH detection sensitivity and overall genomic coverage.

Minimum SNP density for ROH

In penaeid shrimp and other crustaceans, estimating the minimum SNP density parameter (--homozyg-density) from scaffold- or contig-level averages is unreliable due to the highly fragmented nature of their genome assemblies. In such cases, large genomic fragments fail to capture the local marker spacing relevant for ROH detection. To address this, we again utilized the empirical non-overlapping window set described in Section Threshold, heterozygotes and missing SNPs of sliding window. The optimal minimum SNP density parameter was determined as follows: First, all eight parameters were set to their most permissive thresholds (Table 1) to eliminate constraints on ROH identification. These thresholds corresponded to 100% genomic coverage, defined as the proportion of scanning windows meeting the criteria in the empirical non-overlapping window set. Subsequently, the minimum SNP density threshold was incrementally increased to evaluate its effects on genomic coverage of the empirical window set and on F_ROH estimates in both inbred and control groups. The optimal parameter was determined‌ based on the following criteria: (a) > 99% genomic coverage; (b) No significant reduction in average F_ROH estimates in either group; (c) No significant reduction in ‌mean ROH length‌, as‌ such reduction ‌could indicate‌ inappropriate parameter thresholds ‌that lead‌ to artificial fragmentation of continuous ROH segments.

Table 1.

The calculated permissive parameters for two reference genomes.

Parameters	Permissive values
Parameters	Reference A	Reference B
--homozyg-snp	110	110
--homozyg-density	20	50
--homozyg-gap	70	700
--homozyg-kb	8	4
--homozyg-window-snp	110	110
--homozyg-window-het	3	3
--homozyg-window-missing	5	5
--homozyg-window-threshold	0.018	0.018

Open in a new tab

Note: Permissive parameters are defined as sets where, during the evaluation of any single parameter, all others are maintained at thresholds ensuring 100% genomic coverage, thereby preventing additional constraints on ROH detection.

Maximum SNP gap for ROH

To define the permissible maximum gap between consecutive SNPs (--homozyg-gap) within ROH segments, we established two marker interval databases‌ using genotype data from both reference genomes. The three largest gaps‌ were excluded to minimize ‌bias from extreme values. The optimal gap threshold was determined through the same optimization approach applied to the density parameter (Section Minimum SNP density for ROH): starting from an initial threshold ensuring ‌100% genomic coverage‌ while maintaining full coverage for all other seven parameters. This gap threshold was gradually decreased until meeting the optimization criteria.

Minimum ROH length

Two complementary criteria were used to define the minimum ROH length (--homozyg-kb): (1) Scanning window length distribution: We calculated all window lengths from the empirical non-overlapping window set. The ten largest segments were excluded to minimize ‌bias from extreme values. The optimal length parameter was determined by analyzing F_ROH variation characteristics across candidate values within an empirically bounded interval. The lower bound corresponded to the smallest observed window length in the database, while the upper bound was derived from the fixed SNP count (--homozyg-snp 110) and the minimum density threshold established in Section Minimum SNP density for ROH. (2) Linkage Disequilibrium (LD) half-decay distance: Defined as the genomic distance (in base pairs) where the average r² decays to half its maximum value. It served as a critical benchmark to prevent false-positive short ROHs from stochastic LD fluctuations. Published LD half‐decay distances across P. vannamei breeding populations provide reference values ranging from approximately 18 kb to 30 kb^34–36.

F_ROH calculation and statistical analysis

ROH segments were identified using two reference genomes with a full set of optimized parameters. Individual inbreeding coefficients F_ROH were calculated according to the formula proposed by McQuillan, et al.¹⁵:

where ∑ L_ROH is the total length of ROH segments, and ∑ L_AUTO is the chromosome length. Statistical comparisons of F_ROH estimate under different parameters were conducted through one-way ANOVA with Tukey’s HSD post hoc tests (α = 0.05) using Python’s statsmodels (v0.14.4). Pearson correlation coefficients were computed with SciPy (v1.13.1), and visualizations were generated via matplotlib (v3.9.3).

Results

Descriptive statistics of the empirical non-overlapping window set

Following quality control, 32,240,946 and 33,226,287 high-quality SNPs were retained from Reference A and Reference B, respectively, across 50 inbred progeny and 10 parents. Based on formula (1), the minimum number of SNPs required for an ROH segment and a sliding window was 110 for the study population. A total of 293,119 and 302,080 scanning windows were generated for the two references. Marker density and segment length differed substantially between references. SNP density ranged from 0.583 to 582.011 SNPs/kb with an average of 56.355 SNPs/kb, and the average window length was 4.104 kb in the non-overlapping window set of Reference A, whereas reference B showed lower marker density (30.595 SNPs/kb on average; range: 0.026–1486.486.026.486 SNPs/kb) but longer average window length (8.083 kb). These results confirmed substantial differences in the genomic architecture of the references, suggesting the need for unique ROH identification parameters.

The average number of missing genotypes per scanning window was 4.214 and 3.886 in the two references. Accordingly, we set the maximum allowed‌ missing genotypes parameter (--homozyg-window-missing) to 5, slightly above these observed averages.

Effects of density thresholds on ROH identification

Based on SNP density profiles from the empirical non-overlapping window set, initial density thresholds were set to 0.05 SNPs/kb (Reference A) and 0.02 SNPs/kb (Reference B), both achieving complete genomic coverage. Changes in F_ROH and average ROH length as thresholds increased were shown in Table 2.

Table 2.

F_ROH and mean ROH length across density thresholds for two references.

Reference genome	Denisty parameter (SNPs/kb)	Genomic coverage (%)	F _ROH			Average length¹ (kb)
Reference genome	Denisty parameter (SNPs/kb)	Genomic coverage (%)	Inbred	Parental		Inbred		Parental
A	0.5	100.000	0.262 ± 0.037^a		0.074 ± 0.009^a	32.111 ± 2.439^a	18.497 ± 1.659^a
	1	99.999	0.262 ± 0.037^a		0.074 ± 0.009^a	32.110 ± 2.439^a	18.497 ± 1.659^a
	2.5	99.874	0.260 ± 0.037^a		0.073 ± 0.010^a	31.979 ± 2.411^a	18.301 ± 1.655^a
	4	99.396	0.256 ± 0.037^a		0.070 ± 0.009^a	31.663 ± 2.446^a	17.840 ± 1.653^a
	6	98.128	0.244 ± 0.035^b		0.065 ± 0.005^b	31.367 ± 2.506^a	17.210 ± 1.722^a
	8	96.312	0.233 ± 0.035^b		0.060 ± 0.008^b	31.264 ± 2.557^a	16.738 ± 1.741^a
	20	78.420	0.149 ± 0.024^c		0.034 ± 0.007^c	31.090 ± 2.557^a	15.892 ± 2.342^b
B	0.02	100.000	0.262 ± 0.048^a		0.059 ± 0.013^a	118.050 ± 58.200^a	69.842 ± 14.341^a
	0.1	99.998	0.262 ± 0.048^a		0.059 ± 0.013^a	118.041 ± 58.199^a	69.842 ± 14.346^a
	0.2	99.985	0.261 ± 0.048^a		0.059 ± 0.013^a	117.954 ± 58.200^a	69.750 ± 14.411^a
	0.5	99.849	0.256 ± 0.046^a		0.058 ± 0.013^a	116.218 ± 57.914^a	68.534 ± 14.251^a
	0.8	99.619	0.246 ± 0.044^a		0.055 ± 0.012^a	113.322 ± 57.785^a	65.653 ± 13.254^a
	1	99.456	0.240 ± 0.041^b		0.050 ± 0.012^a	111.677 ± 57.777^a	63.828 ± 12.700^a
	2	98.728	0.223 ± 0.039^b		0.047 ± 0.010^a	107.603 ± 58.435^a	59.185 ± 11.573^a
	10	85.503	0.177 ± 0.033^c		0.033 ± 0.008^b	108.419 ± 79.175^a	58.164 ± 13.804^a

Open in a new tab

¹ Average ROH segment length in the inbred and parental groups; Within a column, values sharing the same superscript letter are not significantly different (Tukey’s HSD test, P > 0.05), different superscript letters denote statistically significant differences.

For reference A, increasing the density threshold to 4 SNPs/kb had negligible effect on F_ROH in the inbred group, whereas increasing to 6 SNPs/kb caused a significant 6.87% F_ROH reduction (P < 0.05). A more stringent setting of 20 SNPs/kb (with approximately 78% genomic coverage) reduced F_ROH by 43.13%, although mean ROH length remained statistically unchanged. In the parental group, F_ROH decreased significantly by 12.162% at 6 SNPs/kb. Consequently, a density threshold of 4 SNPs/kb is recommended for reference A.

For reference B, F_ROH began declining at 0.5 SNPs/kb and became significant at 1 SNP/kb in the inbred group. The parental group exhibited similar trends, although significant differences emerged only at 10 SNPs/kb. A more stringent setting of 10 SNPs/kb threshold (with approximately 85% genomic coverage) substantially reduced total ROH length in both groups. Therefore, a 0.5–0.8 SNP/kb density threshold is recommended for Reference B.

Effects of gap thresholds on ROH identification

Comparative analysis revealed distinct gap characteristics between reference genomes. Reference A contained 32,240,909 gaps (range: 0.001–69.498 kb; mean: 38 bp), while reference B exhibited a higher number of gaps (33,226,240) with substantially longer maximum length (687.56 kb) and mean length (74 bp). Threshold sensitivity analysis showed different results (Table 3): For Reference A, a 10 kb gap threshold reduced F_ROH by 4.580% in the inbred group, accompanied by a significant 5.512% decrease in mean ROH length, while the parental group demonstrated sensitivity at a 5 kb threshold. These results support the use of a 10–20 kb threshold range. For Reference B, the inbred group displayed significant F_ROH reductions at 50 kb, and an 80 kb gap threshold is recommended.

Table 3.

F_ROH and mean ROH length across gap thresholds for two references.

Reference genome	Maximum gap (kb)	Genomic coverage (%)	F _ROH			Average length¹ (kb)
Reference genome	Maximum gap (kb)	Genomic coverage (%)	Inbred		Parental	Inbred	Parental
A	70	100.000%	0.262 ± 0.037^a	0.074 ± 0.009^a		32.110 ± 2.439^a	18.497 ± 1.659^a
	50	99.999%	0.262 ± 0.037^a	0.074 ± 0.009^a		32.109 ± 2.438^a	18.497 ± 1.659^a
	30	99.999%	0.262 ± 0.037^a	0.073 ± 0.010^a		32.079 ± 2.438^a	18.466 ± 1.667^a
	20	99.999%	0.261 ± 0.037^a	0.073 ± 0.010^a		31.978 ± 2.433^a	18.402 ± 1.666^a
	10	99.990%	0.250 ± 0.036^a	0.069 ± 0.009^a		30.340 ± 2.261^b	17.598 ± 1.586^a
	5	99.947%	0.218 ± 0.032^b	0.059 ± 0.008^b		24.557 ± 1.522^c	15.194 ± 1.323^b
	1	99.327%	0.082 ± 0.013^c	0.019 ± 0.004^c		8.189 ± 0.082^d	7.568 ± 0.186^c
B	700	100.000%	0.262 ± 0.048^a	0.059 ± 0.013^a		118.105 ± 58.190^a	69.872 ± 14.330^a
	500	99.999%	0.262 ± 0.048^a	0.059 ± 0.013^a		118.041 ± 58.199^a	69.840 ± 14.346^a
	300	99.999%	0.261 ± 0.048^a	0.059 ± 0.013^a		117.916 ± 58.138^a	69.763 ± 14.324^a
	100	99.998%	0.256 ± 0.046^a	0.058 ± 0.013^a		115.174 ± 55.867^a	67.907 ± 13.861^a
	80	99.997%	0.254 ± 0.046^a	0.057 ± 0.013^a		114.133 ± 54.742^a	67.205 ± 13.614^a
	50	99.995%	0.244 ± 0.043^b	0.052 ± 0.012^a		110.870 ± 49.769^a	64.944 ± 12.698^a
	20	99.981%	0.231 ± 0.041^b	0.050 ± 0.011^a		99.783 ± 39.999^a	58.483 ± 10.293^a
	10	99.955%	0.215 ± 0.037^b	0.045 ± 0.010^a		83.527 ± 24.920^b	51.223 ± 8.105^b
	2	99.451%	0.102 ± 0.019^c	0.021 ± 0.005^a		21.309 ± 0.586^c	17.812 ± 0.521^c

Open in a new tab

Effects of minimum length on ROH identification

Analysis of the empirical non-overlapping window set identified optimal length threshold ranges covering 99% of scanning windows: 0.5–70 kb for reference A and 1–160 kb for reference B. Table 4 shows the impact of length thresholds on F_ROH. F_ROH decreased approximately linearly with increasing thresholds in both genomes without clear inflection points for optimization. The length distributions of ROH segments under different thresholds are detailed in Supplementary Table S2. As F_ROH trends could not guide parameter selection, we adopted the empirically established LD half-decay distance (reported as 18–30 kb in P. vannamei). For Reference B, a 30 kb threshold yielded an average F_ROH of 0.247. This decreased segments under 100 kb by 29.231% in the inbred group and 52.381% in the parental group (average F_ROH = 0.0516) compared to the permissive parameter. For Reference A, however, a 30 kb threshold produced lower F_ROH (0.192) in the inbred group, likely due to assembly fragmentation. We therefore recommend a more permissive 10 kb threshold for Reference A, as this parameter corresponds to the first inflection point of F_ROH decline with increasing thresholds and produces F_ROH values equivalent to Reference B’s LD-based threshold results.

Table 4.

F_ROH of different length thresholds for two references.

Reference genome	Minimum length (kb)	F _ROH
Reference genome	Minimum length (kb)	Inbred	Parental
A	0.5	0.267 ± 0.037^a	0.079 ± 0.009^a
	5	0.260 ± 0.037^a	0.071 ± 0.010^a
	10	0.247 ± 0.037^b	0.061 ± 0.010^b
	20	0.220 ± 0.036^c	0.047 ± 0.009^c
	30	0.192 ± 0.033^d	0.037 ± 0.008^c
	50	0.141 ± 0.027^e	0.024 ± 0.006^d
	70	0.103 ± 0.021^f	0.016 ± 0.004^d
B	1	0.265 ± 0.047^a	0.062 ± 0.013^a
	20	0.253 ± 0.048^a	0.054 ± 0.013^a
	30	0.247 ± 0.048^a	0.051 ± 0.013^a
	60	0.226 ± 0.050^b	0.047 ± 0.013^a
	100	0.200 ± 0.052^b	0.041 ± 0.013^b
	140	0.178 ± 0.055^c	0.037 ± 0.013^b
	160	0.159 ± 0.057^c	0.035 ± 0.013^b

Open in a new tab

ROH estimates based on the two sets of optimal parameters

The two sets of optimal parameters derived from genomes with distinct assembly characteristics are presented in Table 5. Assembly differences led to distinct optimal density, gap, and length thresholds for ROH detection. When using Reference B’s optimal parameters, the average F_ROH was 0.237 in the inbred group compared to 0.049 in the control (parental) group. With Reference A’s parameters, the inbred group showed an average F_ROH of 0.238, nearly identical to the Reference B results. Individual F_ROH values for each inbred progeny are shown in Fig. 2. Despite sharing the same inbreeding background (F_PED = 0.25), F_ROH values exhibited considerable variation among individuals, ranging from 0.368 to 0.155, with a coefficient of variation (CV) of 0.195. The two calculated values based on different references showed strong correlation (r = 0.953; Fig. 3A). ROH estimates also demonstrated high correlation (r = 0.944; Fig. 3B) with F_HOM estimates, though F_HOM values were systematically higher. While cumulative ROH length in the inbred group remained largely consistent between references, both average length and ROH number showed significant differences (Fig. 4; parental data in Supplementary Figure S1). Reference B yielded the longest ROH segment of 13.793 Mb, with an average length of 174.970 kb, and 29.233% of segments exceeding 500 kb (Supplementary Table S3). In contrast, Reference A produced a maximum ROH segment of 1.072 Mb, average length of 41.580 kb, and only 0.420% of segments > 500 kb. The distributions of ROH segments across linkage groups in the inbred group are shown in Fig. 5. ROH segments distributed relatively evenly along chromosomes, but those identified using Reference A exhibit significantly greater fragmentation, despite the current lack of alignment between the two genomes’ linkage groups. Parental group distribution results are shown in Supplementary Figure S2.

Table 5.

Two sets of optimal parameters and the calculated F_ROH.

Reference genome	--homozyg-				--homozyg-window-				F_ROH
Reference genome	density	gap	kb	snp	snp	het	missing	threshold	Inbred	Parental
A	0.25	20	10	110	110	3	5	0.018	0.238	0.056
B	2	80	30	110	110	3	5	0.018	0.237	0.049

Open in a new tab

Fig. 2 — The distribution of F_ROH of all inbred progenies with different reference genomes. The violin plot shows the kernel density distribution of all samples, with a box plot overlaid on top (the middle line represents the median, the box represents the interquartile range). Black dots represent the F_ROH values of each individual (with slight jitter applied to prevent overlap).

Fig. 3 — Correlation analysis of individual ROH estimates. A: Comparison between results using two reference genomes; B: Comparison between F_ROH and F_HOM.

Fig. 4 — Number and mean length of ROH segments in inbred progeny across reference genomes. A: Number of ROH segments; B: Mean ROH length.

Fig. 5 — Distribution of ROH segments across linkage groups in inbred progenies. A: Reference A‌; B: Reference B.

Discussion

Traditional pedigree-based inbreeding coefficients are limited by recording errors and their inability to capture individual variability. ROH analysis has been widely applied in livestock for genomic inbreeding estimation, yet its application in shrimp has remained scarce, largely because parameters originally designed for livestock SNP arrays are poorly suited to the fragmented genomes and variable SNP distributions of crustaceans. In this study, we systematically optimized PLINK parameters for ROH detection in P. vannamei using two reference genomes of differing assembly quality. Our results demonstrated that assembly quality strongly affects parameter choice, and that default thresholds established for model organisms or array data are not suitable for shrimp. While appropriate parameter thresholds in both references enabled accurate total ROH length estimation, incomplete assemblies tended to detect higher numbers of shorter ROH fragments. Notably, we observed significant individual F_ROH variation among inbred progenies, underscoring the necessity for individual-level analysis. The standardized ROH analysis framework with optimized parameters, established in this study, provides a reliable technical foundation for precise inbreeding evaluation in shrimp and other crustaceans.

Medium- to high-density SNP arrays have been the standard tool for ROH detection in livestock^14,37,38. However, growing evidence indicates that data from these arrays underestimate inbreeding levels by missing short IBD segments^32,39. For example, reducing the cattle SNP array density to 50 K compromised the detection of 0.5–1 Mb ROHs but overestimated segments larger than 5 Mb^32,39. Kardos, et al.⁴⁰ therefore emphasized that whole-genome resequencing (WGR) is necessary for accurate identification of all IBD fragments. Our study represents the first systematic evaluation of ROH accuracy in shrimp, leveraging 10× resequencing data from inbred progeny to generate over 32 million SNPs across two assemblies, thereby providing a robust basis for parameter optimization.

To evaluate parameter rationality, genomic coverage of sliding windows was proposed as a benchmark by Meyermans, et al.²⁸. Permissive parameter sets we established achieved complete coverage, allowing systematic evaluation of threshold sensitivity. Among the eight PLINK parameters, three core thresholds (SNP density, maximum SNP gap, and minimum ROH length) showed 3- to 10-fold differences between reference genomes, while the remaining five parameters remained stable across assemblies. This suggests that density, gap, and length thresholds are primarily influenced by genome architecture, whereas heterozygote, missing, and sliding-window thresholds are determined more by sequencing quality and species-specific patterns. The minimal SNP number per sliding window/ROH segment was calculated using the formula based on ‌species and population characteristics. Our results also confirmed that a reduced sliding-window threshold (0.018, N_out = 1) better excluded spurious short ROHs than the commonly used default (0.05), underscoring the need for species-specific adjustment.

PLINK defaults for SNP density (1 SNP/50 kb) and maximum gap (1,000 kb) was less effective in P. vannamei WGS data, where marker distributions are highly heterogeneous. These defaults were originally designed for SNP arrays with uniform spacing and failed to filter false ROHs in regions of sparse coverage. For instance, maximum inter-SNP intervals observed in Reference A (69.5 kb) and Reference B (687.6 kb) were substantially smaller than the 1,000 kb default. While Meyermans, et al.⁴¹ proposed chromosome-wide simulations for medium-density arrays to optimize density/gap thresholds, such methods assume uniform marker distribution and thus underestimate variability at local scales. When applied to our study, the suggested contig-level density threshold (17.39 SNPs/kb) covered all 44 linkage groups but only 84.25% of scanning windows in our empirical window set. Given the high variability in SNP density and gap distributions across genomic regions and potential ROH segments in WGS data, we developed a window-based simulative strategy to identify optimal parameters by constructing a non-overlapping window set combined with coverage analysis. As shown in our validation results (Tables 2 and 3), this approach achieved high genome coverage while effectively filtering false-positive ROH segments.

The minimum length parameter is critical, as it determines whether short homozygous segments are attributed to inbreeding or to linkage disequilibrium (LD). It has been reported that ROH detection sensitivity depends on LD patterns, ‌which are largely driven by recombination rate‌, a crucial factor ‌determining ROH segment continuity and informing the choice of optimal length thresholds^33,42,43. For both reference genomes, we observed a strong inverse relationship between F_ROH and the length thresholds (Table 4). To differentiate between inbreeding-derived ROHs and homozygous segments merely reflecting LD, we applied a minimum length threshold corresponding to the LD half-decay distance in P. vannamei (18–30 kb)^34–36, which is grounded in the rationale of Stoffel, et al.³⁸ that such a distance effectively separates these two types of segments. Applying a 30 kb threshold in Reference B removed ~ 29% of ROHs < 100 kb with only a 6.8% reduction in F_ROH, indicating efficient removal of LD-induced segments. However, excessively stringent thresholds risk fragmenting genuine long ROHs, as shown in sheep. Stoffel, et al.³⁸ found that about 4% of detected ROH segments could not be attributed to IBD at a 1200 kb threshold, but this proportion reduced to just 1% when using a lower 400 kb threshold (below their 600 kb LD half-decay distance). This supports that many short segments may represent fragmented long ROH segments rather than false positives. This fragmentation effect was even more pronounced in shrimp. The prevalence of short ROH segments across the shrimp genome suggests exceptionally high recombination rates in specific genomic regions^16,44. Our LD decay analysis directly supports this inference, revealing an extremely rapid decay pattern within the study population. Specifically, the squared correlation coefficient (R²) dropped from 0.311 to 0.156 over only 25 bp in Reference A, and from 0.317 to 0.158 within 28 bp in Reference B. Such a precipitous decline signifies both abbreviated LD spans and intense local recombination activity⁴⁵. Notably, our genome-wide analysis based on Reference B demonstrates that P. vannamei experiences recombination rates an order of magnitude higher (10–23 times) than model organisms such as Drosophila melanogaster (~ 2.5 cM/Mb)⁴⁶.

It is important to note that incomplete genome assemblies can also produce short ROH segments. In the high-quality Reference B, ROHs ranged from 30 kb to 14 Mb, with mid-to-long segments (> 500 kb) accounting for ~ 29% of the total and mean F_ROH deviating by only 5% from pedigree expectations. In contrast, the fragmented Reference A produced predominantly short ROHs (< 500 kb, 99.5% of total) and underestimated F_ROH by 24% using the same 30 kb length threshold. Scaffold breaks, assembly gaps and misassemblies disrupt continuous homozygous regions, mimicking recombination breakpoints and causing systematic underestimation of long ROHs. Due to limitations in genome assembly continuity, ROH segments were more likely to be interrupted by assembly gaps rather than by recombination events. Consequently, LD patterns could not reliably guide parameter selection for this genome. Adjusting thresholds can partially mitigate this bias: lowering the minimum length threshold to 10 kb (approximately one-half to one-third of the LD half-decay distance) improved F_ROH estimates in Reference A. The ‌10 kb threshold‌ was chosen for reference genome A based on two key reasons: (a) Under this parameter, the calculated F_ROH values were equivalent‌ to those obtained using the LD-based threshold (30 kb) applied to reference genome B (‌ F_ROH = 0.247). (b) This threshold ‌corresponds to the first inflection point‌ observed in the decline of ‌ F_ROH with increasing minimum length thresholds, as supported by the significance analysis presented in ‌Table 4A‌. This demonstrates that even incomplete assemblies can yield reliable F_ROH estimates when parameters are carefully optimized. Consequently, future studies should investigate the feasibility of applying this framework to medium- or high-density SNP arrays to enhance their estimation precision.

Although the two optimal parameter sets derived from different references exhibited significant differences, ‌both‌ also showed substantial deviations (ranging from 3 to 20-fold) from the default parameters commonly used for terrestrial species. For example, the density threshold ranged from 0.5 to 4 SNP/kb in P. vannamei, compared to 0.01 SNP/kb in goats²¹. The gap threshold of 20–120 kb was much lower than the 250–1000 kb typically applied in livestock species^21,47. Similarly, the minimum ROH length was considerably lower than the 500–1000 kb thresholds used in pigs and horses^19,48. These discrepancies are likely attributable to distinct genomic characteristics among species. This not only underscores the necessity of species-specific parameter tuning but also demonstrates the broad applicability of our framework in addressing the lack of standardized ROH parameters across diverse species.

Accurate ROH detection in aquatic species like shrimp directly enhances‌ inbreeding level estimation. This resolves‌ biases from inaccurate pedigree data and enables precise monitoring of genetic diversity dynamics in breeding programs. Consequently, it offers guidance for the conservation of endangered varieties and the management of nucleus breeding populations. Further applications of ROH in aquaculture breeding include: (i) construction of ROH-derived genomic relationship matrices (G_ROH) to optimize genomic mating scheme and enhance‌ genetic gains⁴⁹; (ii) reconstruction of population history, selection pressures, and genetic structure, specifically distinguishing recent and historical inbreeding events based on different ROH segment lengths^14,20; (iii) identifying genomic regions under selection or potential deleterious variants via high-frequency ROH regions (i.e., ROH islands)^20,29,37. Collectively, these applications underpin‌ the long-term genetic progress of breeding programs.

Conclusions

This study establishes the first parameter-optimized ROH framework for penaeid shrimp, defining critical thresholds for resequencing data by integrating genome assembly characteristics. The results underscore the necessity of species- and assembly-specific optimization for accurate inbreeding assessment, with three parameters particularly important: SNP density, maximum inter-marker gap, and minimum ROH length. Despite considerable differences in ROH length distributions between genome references, F_ROH estimates remained highly consistent. We further observed substantial F_ROH variation among full-sibling individuals, highlighting the limitation of pedigree-based inbreeding coefficients and emphasizing the need for individual-level genomic assessment. The optimized PLINK parameters and methodology not only enable more precise F_ROH calculation in shrimp but also enhance methodological standardization and cross-study comparability for other aquaculture species.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(419KB, pdf)}

Acknowledgements

We thank BLUP Aquabreed Co., Ltd. for providing sample support during this experiment.

Author contributions

X.Z., Q.F., J.K. and S.L. are the principal investigators and project managers of this research; H.Z., P.D., J.S., J.K., K.L., X.M., Q.X., and S.L. provided the resources and cultivated experimental seedlings in the early stage; M.L., G.Q., and P.D. conducted data curation and investigation; J.K., X.M., Q.F. and S.L. supervised the project and handled project administration; Q.F. and S.L. were involved in conceptualization, methodology, writing-review & editing, and funding acquisition; X.Z. and Q.F. performed the formal analysis and were involved in writing the original draft; all authors contributed to the manuscript review and approval. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Key R&D Program of Shandong Province, China (Competitive Innovation Platform, No. 2024CXPT071-2; No. 2024LZGC038); The National Natural Science Foundation of China (No. 32573496); The Natural Science Foundation of Shandong Province (ZR2025MS316); Independent Research Projects of the State Key Laboratory of Mariculture Biobreeding and Sustainable Goods (BRESG-JB202407); Central Public-interest Scientific Institution Basal Research Fund, CAFS (2025XT0701); Hainan Seed Industry Laboratory (B25H1JC03); Central Public-interest Scientific Institution Basal Research Fund, CAFS (No. 2020TD26); Agriculture Research System of China (CARS-48); Taishan Scholars Program.

Data availability

Sequencing data generated for this project have been deposited in the Genome Sequence Archive (GSA) at the China National Center for Bioinformation (CNCB) under accession number CRA032813.

Declarations

Competing interests

The authors declare no competing interests.

Institutional review board statement

The experiments conducted in this study involved P. vannamei, which is classified as a lower invertebrate. According to the relevant national and institutional regulations, experiments involving lower invertebrates, such as P. vannamei, do not require ethical approval, as they are not classified under vertebrates or higher invertebrates that typically necessitate such oversight.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Qiang Fu, Email: fuqiang@ysfri.ac.cn.

Sheng Luan, Email: luansheng@ysfri.ac.cn.

References

1.Charlesworth, D. & Willis, J. H. The genetics of inbreeding depression. Nat. Rev. Genet.10, 783–796 (2009). [DOI] [PubMed] [Google Scholar]
2.Frankham, R. et al. Inbreeding and loss of genetic diversity increase extinction risk. A Practical Guide Genetic Manage. Fragmented Anim. Plant. Populations, 31–48 (2019).
3.Doekes, H., Bijma, P. & Windig, J. 184. Inbreeding depression in livestock: comparing trait groups and inbreeding measures. Proc. 12th World Congr. Genet. Appl. to Livest. Prod, 790–793 (2022).
4.Keys, S. J. et al. Comparative growth and survival of inbred and outbred Penaeus (marsupenaeus) japonicus, reared under controlled environment conditions: indications of inbreeding depression. Aquaculture241, 151–168 (2004). [Google Scholar]
5.Samantara, K., Reyes, V. P., Agrawal, N., Mohapatra, S. R. & Jena, K. K. Advances and trends on the utilization of multi-parent advanced generation intercross (MAGIC) for crop improvement. Euphytica217, 189 (2021). [Google Scholar]
6.Bijma, P., Woolliams, J. & Van Arendonk, J. Genetic gain of pure line selection and combined crossbred purebred selection with constrained inbreeding. Anim. Sci.72, 225–232 (2001). [Google Scholar]
7.Wan, Q. H., Wu, H., Fujihara, T. & Fang, S. G. Which genetic marker for which conservation genetics issue? Electrophoresis25, 2165–2176 (2004). [DOI] [PubMed] [Google Scholar]
8.Ouborg, N., Vergeer, P. & Mix, C. The rough edges of the conservation genetics paradigm for plants. J. Ecol.94, 1233–1248 (2006). [Google Scholar]
9.Wright, S. Coefficients of inbreeding and relationship. Am. Nat.56, 330–338 (1922). [Google Scholar]
10.Ron, M., Blanc, Y., Band, M., Ezra, E. & Weller, J. Misidentification rate in the Israeli dairy cattle population and its implications for genetic improvement. J. Dairy Sci.79, 676–681 (1996). [DOI] [PubMed] [Google Scholar]
11.Oikawa, T., Kunieda, T. & Sato, K. Approximated variance of inbreeding coefficient due to multiple paths in a pedigree. Nihon Chikusan Gakkaiho. 71, 348–352 (2000). [Google Scholar]
12.Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet.88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci.91, 4414–4423 (2008). [DOI] [PubMed] [Google Scholar]
14.Zhan, H. et al. Genome-wide patterns of homozygosity and relevant characterizations on the population structure in Piétrain pigs. Genes11, 577 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.McQuillan, R. et al. Runs of homozygosity in European populations. Am. J. Hum. Genet.83, 359–372 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Peripolli, E. et al. Runs of homozygosity: current knowledge and applications in livestock. Anim. Genet.48, 255–271 (2017). [DOI] [PubMed] [Google Scholar]
17.Kim, E. S., Sonstegard, T. S., Van Tassell, C. P., Wiggans, G. & Rothschild, M. F. The relationship between runs of homozygosity and inbreeding in Jersey cattle under selection. PloS One. 10, e0129967 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Adams, S. M. et al. Investigating inbreeding in the Turkey (Meleagris gallopavo) genome. Poult. Sci.100, 101366 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Polak, G. et al. Suitability of pedigree information and genomic methods for analyzing inbreeding of Polish cold-blooded horses covered by conservation programs. Genes12, 429 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wu, X. et al. Genome-wide scan for runs of homozygosity identifies candidate genes in Wannan black pigs. Anim. Bioscience. 34, 1895 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Luigi-Sierra, M. G. et al. Genomic patterns of homozygosity and inbreeding depression in Murciano-Granadina goats. J. Anim. Sci. Biotechnol.13, 35 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet.81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience4, s13742–s13015 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.D’ambrosio, J. et al. Genome-wide estimates of genetic diversity, inbreeding and effective size of experimental and commercial rainbow trout lines undergoing selective breeding. Genet. Selection Evol.51, 1–15 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Zhang, X. et al. Penaeid shrimp genome provides insights into benthic adaptation and frequent molting. Nat. Commun.10, 356 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Yuan, J., Zhang, X., Li, F. & Xiang, J. Genome sequencing and assembly strategies and a comparative analysis of the genomic characteristics in Penaeid shrimp species. Front. Genet.12, 658619 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Miller, J. R., Koren, S. & Sutton, G. Assembly algorithms for next-generation sequencing data. Genomics95, 315–327 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Meyermans, R., Gorssen, W., Buys, N. & Janssens, S. How to study runs of homozygosity using PLINK? A guide for analyzing medium density SNP data in livestock and pet species. BMC Genom.21, 1–14 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Macciotta, N. P. et al. The distribution of runs of homozygosity in the genome of river and swamp buffaloes reveals a history of adaptation, migration and crossbred events. Genet. Selection Evol.53, 1–21 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Lencz, T. et al. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci.104, 19942–19947 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Purfield, D. C., Berry, D. P., McParland, S. & Bradley, D. G. Runs of homozygosity and population history in cattle. BMC Genet.13, 1–11 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Ceballos, F. C., Hazelhurst, S. & Ramsay, M. Assessing runs of homozygosity: A comparison of SNP array and whole genome sequence low coverage data. BMC Genom.19, 106. 10.1186/s12864-018-4489-0 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Wang, Q. et al. A novel candidate gene associated with body weight in the Pacific white shrimp Litopenaeus vannamei. Front. Genet.10, 520 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Wang, H. et al. Selection signatures of Pacific white shrimp Litopenaeus vannamei revealed by whole-genome resequencing analysis. Front. Mar. Sci.9, 844597 (2022). [Google Scholar]
36.Garcia, B. F., Bonaguro, Á., Araya, C., Carvalheiro, R. & Yáñez, J. M. Application of a novel 50K SNP genotyping array to assess the genetic diversity and linkage disequilibrium in a farmed Pacific white shrimp (Litopenaeus vannamei) population. Aquaculture Rep.20, 100691 (2021). [Google Scholar]
37.Shi, L. et al. Estimation of inbreeding and identification of regions under heavy selection based on runs of homozygosity in a large white pig population. J. Anim. Sci. Biotechnol.11, 1–10 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Stoffel, M., Johnston, S., Pilkington, J. & Pemberton, J. M. Genetic architecture and lifetime dynamics of inbreeding depression in a wild mammal. Nat. Commun.12, 2972 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Zhang, Q., Calus, M. P., Guldbrandtsen, B., Lund, M. S. & Sahana, G. Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genet.16, 88 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Kardos, M., Taylor, H. R., Ellegren, H., Luikart, G. & Allendorf, F. W. Genomics advances the study of inbreeding depression in the wild. Evol. Appl.9, 1205–1218. 10.1111/eva.12414 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Meyermans, R. et al. Unraveling the genetic diversity of Belgian milk sheep using medium-density SNP genotypes. Anim. Genet.51, 258–265 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Hewett, A. M., Stoffel, M. A., Peters, L., Johnston, S. E. & Pemberton, J. M. Selection, recombination and population history effects on runs of homozygosity (ROH) in wild red deer (Cervus elaphus). Heredity (Edinb). 130, 242–250. 10.1038/s41437-023-00602-z (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Howrigan, D. P., Simonson, M. A. & Keller, M. C. Detecting autozygosity through runs of homozygosity: A comparison of three autozygosity detection algorithms. BMC Genom.12, 460. 10.1186/1471-2164-12-460 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Kirin, M. et al. Genomic runs of homozygosity record population history and consanguinity. PloS One. 5, e13996 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Baudat, F., Imai, Y. & De Massy, B. Meiotic recombination in mammals: localization and regulation. Nat. Rev. Genet.14, 794–806 (2013). [DOI] [PubMed] [Google Scholar]
46.Comeron, J. M., Ratnappan, R. & Bailin, S. The many landscapes of recombination in Drosophila melanogaster. (2012). [DOI] [PMC free article] [PubMed]
47.Herrero-Medrano, J. M. et al. Whole-genome sequence analysis reveals differences in population management and selection of European low-input pig breeds. BMC Genom.15, 601 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Ai, H., Huang, L. & Ren, J. Genetic diversity, linkage disequilibrium and selection signatures in Chinese and Western pigs revealed by genome-wide SNP markers. PloS One. 8, e56001 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Zhao, F. et al. Genetic gain and inbreeding from simulation of different genomic mating schemes for pig improvement. J. Anim. Sci. Biotechnol.14, 87 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(419KB, pdf)}

Data Availability Statement

Sequencing data generated for this project have been deposited in the Genome Sequence Archive (GSA) at the China National Center for Bioinformation (CNCB) under accession number CRA032813.

[CR1] 1.Charlesworth, D. & Willis, J. H. The genetics of inbreeding depression. Nat. Rev. Genet.10, 783–796 (2009). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Frankham, R. et al. Inbreeding and loss of genetic diversity increase extinction risk. A Practical Guide Genetic Manage. Fragmented Anim. Plant. Populations, 31–48 (2019).

[CR3] 3.Doekes, H., Bijma, P. & Windig, J. 184. Inbreeding depression in livestock: comparing trait groups and inbreeding measures. Proc. 12th World Congr. Genet. Appl. to Livest. Prod, 790–793 (2022).

[CR4] 4.Keys, S. J. et al. Comparative growth and survival of inbred and outbred Penaeus (marsupenaeus) japonicus, reared under controlled environment conditions: indications of inbreeding depression. Aquaculture241, 151–168 (2004). [Google Scholar]

[CR5] 5.Samantara, K., Reyes, V. P., Agrawal, N., Mohapatra, S. R. & Jena, K. K. Advances and trends on the utilization of multi-parent advanced generation intercross (MAGIC) for crop improvement. Euphytica217, 189 (2021). [Google Scholar]

[CR6] 6.Bijma, P., Woolliams, J. & Van Arendonk, J. Genetic gain of pure line selection and combined crossbred purebred selection with constrained inbreeding. Anim. Sci.72, 225–232 (2001). [Google Scholar]

[CR7] 7.Wan, Q. H., Wu, H., Fujihara, T. & Fang, S. G. Which genetic marker for which conservation genetics issue? Electrophoresis25, 2165–2176 (2004). [DOI] [PubMed] [Google Scholar]

[CR8] 8.Ouborg, N., Vergeer, P. & Mix, C. The rough edges of the conservation genetics paradigm for plants. J. Ecol.94, 1233–1248 (2006). [Google Scholar]

[CR9] 9.Wright, S. Coefficients of inbreeding and relationship. Am. Nat.56, 330–338 (1922). [Google Scholar]

[CR10] 10.Ron, M., Blanc, Y., Band, M., Ezra, E. & Weller, J. Misidentification rate in the Israeli dairy cattle population and its implications for genetic improvement. J. Dairy Sci.79, 676–681 (1996). [DOI] [PubMed] [Google Scholar]

[CR11] 11.Oikawa, T., Kunieda, T. & Sato, K. Approximated variance of inbreeding coefficient due to multiple paths in a pedigree. Nihon Chikusan Gakkaiho. 71, 348–352 (2000). [Google Scholar]

[CR12] 12.Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet.88, 76–82 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci.91, 4414–4423 (2008). [DOI] [PubMed] [Google Scholar]

[CR14] 14.Zhan, H. et al. Genome-wide patterns of homozygosity and relevant characterizations on the population structure in Piétrain pigs. Genes11, 577 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.McQuillan, R. et al. Runs of homozygosity in European populations. Am. J. Hum. Genet.83, 359–372 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Peripolli, E. et al. Runs of homozygosity: current knowledge and applications in livestock. Anim. Genet.48, 255–271 (2017). [DOI] [PubMed] [Google Scholar]

[CR17] 17.Kim, E. S., Sonstegard, T. S., Van Tassell, C. P., Wiggans, G. & Rothschild, M. F. The relationship between runs of homozygosity and inbreeding in Jersey cattle under selection. PloS One. 10, e0129967 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Adams, S. M. et al. Investigating inbreeding in the Turkey (Meleagris gallopavo) genome. Poult. Sci.100, 101366 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Polak, G. et al. Suitability of pedigree information and genomic methods for analyzing inbreeding of Polish cold-blooded horses covered by conservation programs. Genes12, 429 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Wu, X. et al. Genome-wide scan for runs of homozygosity identifies candidate genes in Wannan black pigs. Anim. Bioscience. 34, 1895 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Luigi-Sierra, M. G. et al. Genomic patterns of homozygosity and inbreeding depression in Murciano-Granadina goats. J. Anim. Sci. Biotechnol.13, 35 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet.81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience4, s13742–s13015 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.D’ambrosio, J. et al. Genome-wide estimates of genetic diversity, inbreeding and effective size of experimental and commercial rainbow trout lines undergoing selective breeding. Genet. Selection Evol.51, 1–15 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Zhang, X. et al. Penaeid shrimp genome provides insights into benthic adaptation and frequent molting. Nat. Commun.10, 356 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Yuan, J., Zhang, X., Li, F. & Xiang, J. Genome sequencing and assembly strategies and a comparative analysis of the genomic characteristics in Penaeid shrimp species. Front. Genet.12, 658619 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Miller, J. R., Koren, S. & Sutton, G. Assembly algorithms for next-generation sequencing data. Genomics95, 315–327 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Meyermans, R., Gorssen, W., Buys, N. & Janssens, S. How to study runs of homozygosity using PLINK? A guide for analyzing medium density SNP data in livestock and pet species. BMC Genom.21, 1–14 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Macciotta, N. P. et al. The distribution of runs of homozygosity in the genome of river and swamp buffaloes reveals a history of adaptation, migration and crossbred events. Genet. Selection Evol.53, 1–21 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res.20, 1297–1303 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Lencz, T. et al. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci.104, 19942–19947 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Purfield, D. C., Berry, D. P., McParland, S. & Bradley, D. G. Runs of homozygosity and population history in cattle. BMC Genet.13, 1–11 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Ceballos, F. C., Hazelhurst, S. & Ramsay, M. Assessing runs of homozygosity: A comparison of SNP array and whole genome sequence low coverage data. BMC Genom.19, 106. 10.1186/s12864-018-4489-0 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Wang, Q. et al. A novel candidate gene associated with body weight in the Pacific white shrimp Litopenaeus vannamei. Front. Genet.10, 520 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Wang, H. et al. Selection signatures of Pacific white shrimp Litopenaeus vannamei revealed by whole-genome resequencing analysis. Front. Mar. Sci.9, 844597 (2022). [Google Scholar]

[CR36] 36.Garcia, B. F., Bonaguro, Á., Araya, C., Carvalheiro, R. & Yáñez, J. M. Application of a novel 50K SNP genotyping array to assess the genetic diversity and linkage disequilibrium in a farmed Pacific white shrimp (Litopenaeus vannamei) population. Aquaculture Rep.20, 100691 (2021). [Google Scholar]

[CR37] 37.Shi, L. et al. Estimation of inbreeding and identification of regions under heavy selection based on runs of homozygosity in a large white pig population. J. Anim. Sci. Biotechnol.11, 1–10 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Stoffel, M., Johnston, S., Pilkington, J. & Pemberton, J. M. Genetic architecture and lifetime dynamics of inbreeding depression in a wild mammal. Nat. Commun.12, 2972 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Zhang, Q., Calus, M. P., Guldbrandtsen, B., Lund, M. S. & Sahana, G. Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genet.16, 88 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Kardos, M., Taylor, H. R., Ellegren, H., Luikart, G. & Allendorf, F. W. Genomics advances the study of inbreeding depression in the wild. Evol. Appl.9, 1205–1218. 10.1111/eva.12414 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Meyermans, R. et al. Unraveling the genetic diversity of Belgian milk sheep using medium-density SNP genotypes. Anim. Genet.51, 258–265 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Hewett, A. M., Stoffel, M. A., Peters, L., Johnston, S. E. & Pemberton, J. M. Selection, recombination and population history effects on runs of homozygosity (ROH) in wild red deer (Cervus elaphus). Heredity (Edinb). 130, 242–250. 10.1038/s41437-023-00602-z (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Howrigan, D. P., Simonson, M. A. & Keller, M. C. Detecting autozygosity through runs of homozygosity: A comparison of three autozygosity detection algorithms. BMC Genom.12, 460. 10.1186/1471-2164-12-460 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Kirin, M. et al. Genomic runs of homozygosity record population history and consanguinity. PloS One. 5, e13996 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Baudat, F., Imai, Y. & De Massy, B. Meiotic recombination in mammals: localization and regulation. Nat. Rev. Genet.14, 794–806 (2013). [DOI] [PubMed] [Google Scholar]

[CR46] 46.Comeron, J. M., Ratnappan, R. & Bailin, S. The many landscapes of recombination in Drosophila melanogaster. (2012). [DOI] [PMC free article] [PubMed]

[CR47] 47.Herrero-Medrano, J. M. et al. Whole-genome sequence analysis reveals differences in population management and selection of European low-input pig breeds. BMC Genom.15, 601 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Ai, H., Huang, L. & Ren, J. Genetic diversity, linkage disequilibrium and selection signatures in Chinese and Western pigs revealed by genome-wide SNP markers. PloS One. 8, e56001 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Zhao, F. et al. Genetic gain and inbreeding from simulation of different genomic mating schemes for pig improvement. J. Anim. Sci. Biotechnol.14, 87 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A genome-structure adaptive framework for ROH-based inbreeding estimation in Penaeus vannamei

Xiang Zou

Hao Zhou

Mianyu Liu

Guangfeng Qiang

Ping Dai

Juan Sui

Jie Kong

Kun Luo

Xianhong Meng

Qun Xing

Qiang Fu

Sheng Luan

Abstract

Supplementary Information

Introduction

Materials and methods

Inbred family construction, sequencing and genotyping

Minimum SNP number in sliding window and ROH

Fig. 1.

Threshold, heterozygotes and missing SNPs of sliding window

Minimum SNP density for ROH

Table 1.

Maximum SNP gap for ROH

Minimum ROH length

FROH calculation and statistical analysis

Results

Descriptive statistics of the empirical non-overlapping window set

Effects of density thresholds on ROH identification

Table 2.

Effects of gap thresholds on ROH identification

Table 3.

Effects of minimum length on ROH identification

Table 4.

ROH estimates based on the two sets of optimal parameters

Table 5.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Discussion

Conclusions

Supplementary Information

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Institutional review board statement

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

F_ROH calculation and statistical analysis