Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2021 Jul 1;108(8):1488–1501. doi: 10.1016/j.ajhg.2021.06.005

Genomic partitioning of inbreeding depression in humans

Loic Yengo 1,, Jian Yang 1,2,3, Matthew C Keller 4,5, Michael E Goddard 6,7, Naomi R Wray 1,8, Peter M Visscher 1
PMCID: PMC8387293  PMID: 34214457

Summary

Across species, offspring of related individuals often exhibit significant reduction in fitness-related traits, known as inbreeding depression (ID), yet the genetic and molecular basis for ID remains elusive. Here, we develop a method to quantify enrichment of ID within specific genomic annotations and apply it to human data. We analyzed the phenomes and genomes of ∼350,000 unrelated participants of the UK Biobank and found, on average of over 11 traits, significant enrichment of ID within genomic regions with high recombination rates (>21-fold; p < 10−5), with conserved function across species (>19-fold; p < 10−4), and within regulatory elements such as DNase I hypersensitive sites (∼5-fold; p = 8.9 × 10−7). We also quantified enrichment of ID within trait-associated regions and found suggestive evidence that genomic regions contributing to additive genetic variance in the population are enriched for ID signal. We find strong correlations between functional enrichment of SNP-based heritability and that of ID (r = 0.8, standard error: 0.1). These findings provide empirical evidence that ID is most likely due to many partially recessive deleterious alleles in low linkage disequilibrium regions of the genome. Our study suggests that functional characterization of ID may further elucidate the genetic architectures and biological mechanisms underlying complex traits and diseases.

Keywords: inbreeding depression, functional annotation, genomic partitioning, genome-wide association studies

Introduction

Mating between genetically related individuals, i.e., inbreeding, has detrimental phenotypic consequences in resulting offspring.1, 2, 3, 4, 5, 6, 7, 8 This phenomenon, known as inbreeding depression (ID), has been reported for multiple human traits, such as stature, intelligence, lung function, and fertility.5, 6, 7,9 Inbreeding results in increased homozygosity across the genome and ID can be explained by the increased homozygosity of (partially) recessive deleterious alleles, thereby exposing their detrimental effects on fitness and fitness-related traits. Detecting ID and quantifying its strength in human populations is particularly challenging because inbreeding is less common and less extreme and the effective population is larger in humans than in some other animal species.10,11 As a consequence, sample sizes of 100,000s or more participants are required to obtain reliable estimates of ID.12 Over recent years, the advent of large scale biobanks such as the UK Biobank (UKB) has allowed detection and quantification of ID in a wide variety of traits, including hip-to-waist ratio, heart rate, facial aging, or hemoglobin levels, which previously were not known to be associated with inbreeding.7

Despite a growing catalog of traits associated with inbreeding, the genetic basis of ID in humans remains elusive. Notably, genes, biological pathways, or functions involved in ID are still largely unknown. Previous studies have attempted to identify genes involved in ID by testing the association between runs of homozygosity (ROHs)13 and traits or diseases.14 Although the latter studies have not robustly identified such genes in humans,15,16 the approach has been more powerful in other animal species with smaller effective population sizes, such as cattle.17,18 In addition, studies in Drosophila19, 20, 21 have tested the association between inbreeding coefficients (F) and gene expression and have thereby identified genes implicated or affected by inbreeding. Overall, contrasting findings from human and non-human studies highlight the limited statistical power to dissect ID in humans and thus calls for larger sample sizes and new analytical methods to overcome this challenge.

Here, we develop a method to detect and quantify ID at a finer scale by breaking its effects down to genomic regions with specific annotations. An enrichment of ID within a given genomic region means that homozygosity in that region (or regions sharing the same annotation) has a disproportional effect on the mean value of a trait as compared to the effect of homozygosity at any equally sized, randomly selected genomic region. We chose genomic annotations as a higher-level unit (as opposed to lower-level units, such as genes, for example) in order to measure the functional impact on traits of homozygosity at particular types of variants. Using this method, we analyzed genetic and phenotypic data from a large sample of ∼350,000 unrelated participants of the UKB. Our method utilizes (but is not restricted to) functional annotations introduced in Finucane et al.,22 Gazal et al.,23 and Hujoel et al.,24 covering diverse properties of the human genome. We specifically quantify the degree to which variants located within these functional genomic regions contribute to ID across 11 traits. These 11 traits were selected on the basis of prior evidence of ID11 and are height, hip-to-waist ratio, handgrip strength (average of left and right hand), lung function measured as the peak expiratory flow, visual acuity, auditory acuity, number of years of education, fluid intelligence score, cognitive function measured as the mean time to correctly identify matches, fertility measured as the number of offspring, and overall health measured as the number of diseases diagnosed in an individual. We use our method to quantify the enrichment of ID within trait-associated genomic regions identified through genome-wide association studies (GWASs) and finally compare functional enrichments of heritability with that of ID.

Material and methods

Quantification of the enrichment of ID via individual-level data

Model definition

Let y denote a quantitative trait subject to ID. We assume an infinitesimal model where all SNPs contribute to ID. We can therefore write y as

y=j=1MβjFj+e, (Equation 1)

where Fj denotes an estimator of the inbreeding coefficient based on genotypes at SNP j (j=1,,M), βj the contribution of SNP j to ID, and e a residual term capturing all other effects, including additive genetic and environmental effects.

Estimation of ID from inbreeding measures based on genome-wide average homozygosity (as opposed to runs of homozygosity [ROHs]) relies on the assumption that βj is random and has a constant expectation E[βj]=b/M, where b denotes the genome-wide ID. Other inbreeding measures, such as the excess of homozygosity measure, rely on the assumption that E[βj] is proportional to mean heterozygosity hj=2pj(1pj), where pj is the minor allele frequency (MAF) at SNP j in the population. We previously introduced a flexible MAF and linkage disequilibrium (LD)-stratified method to estimate ID, which we showed to yield unbiased estimation of ID even when the relationship between E[βj] and hj is misspecified.9 Here, we further extend this approach by allowing SNPs within different genomic annotations to have a specific contribution to ID. We propose the following model:

E[βj]=b0M+k=1K(zjk[bkmk]+(1zjk)[bbkMmk]), (Equation 2)

where b denotes the genome-wide ID parameter, b0=(K1)b, zjk. the indicator of membership of SNP j to annotation k (k=1K), bk the contribution of annotation k to ID, and mk. (1mkM)he number of SNPs in annotation k. We hereafter denote πk=mk/M as the proportion of SNPs in annotation k. Under the null hypothesis that each SNP contributes equally to ID, each genomic annotation is expected to contribute proportionally to the number of SNPs it contains. Therefore, we can define the enrichment of ID in annotation k, hereter denoted δk, as the ratio between the contribution of annotation k to ID that is bk over the expected contribution of annotation k to ID that is πkb, i.e., δk=bk/(πkb).

Combining Equation 1 and Equation 2 leads to (Appendix A)

y=bFg+k=1K(bkπkb)[(πk1πk)(F¯kFg)]+e, (Equation 3)

where F¯k denotes the average inbreeding coefficient across SNPs in annotation k, Fg denotes the average inbreeding coefficient across all M SNPs, and e the residual term from Equation 1.

Estimation of model parameters

Equation 3 implies that τk=bk/πkbcan be directly estimated by performing a multivariate regression of y onto Fg and the (ΔFk=πk(F¯kFg)/(1πk)) terms. Therefore, enrichment of ID in annotation k can be detected with standard regression p values quantifying the statistical significance of τk. We use the latter approach in all analyses.

It is important to note, as with methods for partitioning heritability,22 that certain combinations of annotations cannot be fitted jointly with Equation 3. Indeed, in some cases, overlap between annotations can induce a strong collinearity between inbreeding measures, which can make estimates unstable. For example, when the annotation is defined by chromosome number, fitting Fg as well as all chromosome-specific inbreeding measures will lead to identifiability issues. In the latter case, a simple solution is to choose one class of the annotation as a reference (e.g., chromosome 1) and therefore remove it from the model.

One of the challenges of our model is to provide estimates that can be interpreted in terms of enrichment (or depletion) of ID. However, enrichment is often intuitively conceptualized on a multiplicative scale, whereas the inference in our model is done on an additive scale (linear regression). Parameters τk and δk=1+τk/b measure the same information, but τk is on an additive scale and is directly estimated, while δk is on a multiplicative scale (the scaled used for interpretation) and derived from estimated parameters. Previous studies aiming at partitioning additive genetic variance (e.g., Finucane et al.22) faced similar challenges and addressed them in the same way by using two scales for their model parameters. However, the parameterization chosen in our study leads to a much simpler relationship between τk and δk, thereby allowing for the use of the p value of δk, which is readily available from fitting the model to test the significance of δk for τk.

Average ID enrichment across traits

The significance of the average ID enrichment across traits (i.e., δ¯k) is determined by that of the average estimate of τk’s across traits, hereafter denoted τ¯k. Given that τ¯k is a linear combination of ordinary-least-squares (OLS) estimators, its asymptotic distribution is also Gaussian and its sampling variance defined as

var(τ¯k)=1T2t=1Tt=1Tρt,t[var(τk,t)var(τk,t)]1/2, (Equation 4)

where var(τk,t) is the sampling variance of the OLS estimator of τk for trait t and ρt,t the correlation between OLS estimators of τk,t and τk,t. In practice, we propose to approximate ρt,t with the sample correlation between traits, which is a valid approximation given that OLS estimators are linear transformations of the phenotypes. Therefore, we test the significance of τ¯k by using the following test statistic, τ¯k2/var(τ¯k), which follows a chi-square distribution with 1 degree of freedom under the null. Finally, we consistently define the average ID enrichment across traits as δ¯k=1+τ¯k/b¯, where b¯ is the average ID across traits.

ID enrichment within continuous annotations

We analyzed continuous genomic annotations defined by non-negative values (e.g., recombination rate or posterior causal probability). We scaled these annotations by the largest value observed across the genome such that each SNP is assigned a value z˜jk between 0 and 1. We then defined π˜k as the mean of the z˜jk’s across the genome (which is equivalent to the proportion of annotated SNPs when genomic annotations are binary variables) and used that value as our reference to calculate enrichments of ID for continuous annotations. Note that the use of another statistic from the distribution of z˜jk’s (e.g., the median) would not change statistical significance but will influence the magnitude of the estimated ID enrichment.

When reporting results for continuous annotations, we used terms such as “high” or “low” (e.g., “high recombination rates” or “low nucleotide diversity”) only to indicate the direction of the effect but not to imply that the annotation was discretized in any way.

Depletion of ID enrichment

In some cases, we observed that estimates of τk can lead to negative values of the enrichment statistic δk, which are difficult to interpret. We refer to this situation as “depletion of ID signal” and report instead the estimate δ˘k=1πk(δk1)/(1πk) corresponding to a transformed annotation defined by (1-zjk) or (1-z˜jk).

Inbreeding measures

To maximize power to detect ID enrichment across a wide range of causal allele frequencies, we use two inbreeding measures in our analyses. The first is the correlation between uniting gametes1,2 (FUNI) and the second is the proportion, FROH, of SNPs contained in >1.5 Mb-long runs. Note that our definition of FROH is slightly different from the widely used definition of FROH, which is the proportion of an individual genome that is covered by their ROHs. We chose this alternative definition because it offers more flexibility for partitioning analyses while still retaining the same amount of information (correlation > 0.99; Figure S1). FUNI was previously shown to be more powerful to detect ID caused by well-tagged causal variants (e.g., SNPs with a minor allele frequency [MAF] > 1%),9,25 while FROH can perform better when alleles causing ID are rarer and therefore poorly imputed via current imputation reference panels and methods.12,26 We re-assessed the relative fraction of ID captured with either inbreeding measure and show that common and rare alleles both contribute to ID and that their relative contribution is trait specific (Figure 1).

Figure 1.

Figure 1

Partitioning of inbreeding depression (ID) between FROH and FUNI

Left shows marginal analyses where ID is estimated with either FROH and FUNI. Right shows estimates of ID obtained from fitting FROH and FUNI jointly. Error bars represents 95% confidence intervals. Highlighted in red font: two traits (number of children and reaction time) for which ID is mostly captured by FROH. SD, standard deviation.

For ROH-based analyses, the null hypothesis is defined such that each SNP has an equal probability of being covered by identical-by-descent genomic segments, which we approximate in this study by using long ROHs. We show in the supplemental methods section that violation of this assumption because of non-uniform genomic distribution of ROHs has little impact on our results.

We used PLINK 1.9 with the –ibc command to estimate FUNI for each UKB participant. For continuous annotations, we used a custom C++ program (see data and code availability and web resources) that calculates weighted sums of per-SNP estimates of FUNI with weights proportional to the value of the annotation. For binary annotations, we calculated FUNI by only using SNPs assigned to those annotations. ROHs were called with the following PLINK command: –maf 0.05–homozyg–homozyg-density 50–homozyg-gap 1000–homozyg-kb 1500–homozyg-snp 50–homozyg-window-het 1–homozyg-window-missing 5–homozyg-window-snp 50. We then used 19,476,620 imputed SNPs with MAF > 0.1%, which were annotated to 187 functional categories (web resources) and defined FROH as the proportion of these SNPs contained in a ROH. Similarly, annotation-specific inbreeding measures were calculated as the (weighted) proportion of SNPs assigned to a given annotation that is contained in a ROH.

SNP genotyping

We used allele counts at 44,741,804 SNPs genotyped and imputed in 487,409 participants of the UK Biobank27,28 (UKB). Although an extensive description of our dataset was given previously,29 we briefly summarize the main steps of data preparation. We identified 456,414 UKB participants of European ancestry by using projected principal components based on sequenced participants of the 1000 Genomes Projects with known ancestry.29 We then restricted our analyses to a subset of these participants that contains 348,501 conventionally unrelated participants, i.e., whose estimated pairwise SNP-based genomic relationships are <0.05. Genomic relationships were estimated with 1,124,803 common (MAF ≥ 1%) HapMap330 SNPs via GCTA (v.1.9).31 Imputed SNPs included in our FUNI-based analyses were selected on the basis of the following criteria: MAF > 1%, p value from the Hardy-Weinberg equilibrium test > 10−6, and imputation quality rINFO2 statistic > 0.3. After quality control, 9,326,198 imputed SNPs were included in our analyses. Genotyped SNPs used to call ROH were selected, as previously described,11 via the following criteria: missingness rate < 1%, MAF > 5%, and Hardy-Weinberg equilibrium test p value > 0.0001. Quality control was performed with PLINK 1.932 (v.1.90b6.13 64 bit from November 30, 2019).

Association between levels of inbreeding and phenotypes measured in the UKB

We tested the association between FUNI or FROH and traits by using linear regression adjusted for age at recruitment (UKB field 21022-0.0), sex, assessment center (UKB field 54-0.0), genotyping chip and batch, year of birth (UKB field 34-0.0), year of birth square, and the top ten genetic principal components calculated via PLINK 2.0. For each trait, we excluded phenotypic values larger than 4 standard deviations and then pre-adjusted trait values for the covariates listed above. Residuals obtained from these pre-adjustment analyses were then inverse normally transformed and used as focal traits. UKB identifiers for our 11 focal traits are height (UKB field 50-0.0), hip-to-waist ratio (ratio of UKB field 49-0.0 over UKB field 48-0.0), handgrip strength (average of UKB fields 46-0.0 and 47-0.0), lung function measured as the peak expiratory flow (UKB field 3064-0.0), visual acuity measured on log MAR scale (average between UKB field 5201-0.0 and UKB field 5208-0.0), auditory acuity measured as the speech reception threshold (average between UKB field 20019-0.0 and UKB field 20021-0.0), number of years of education (EA), fluid intelligence score (UKB field 20016-0.0), cognitive function measured as the mean time to correctly identify picture card matches (UKB field 20023-0.0), fertility measured as the number of children (for males UKB field 2405-0.0 and for females UKB field 2734-0.0), and number of diseases diagnosed estimated as the number International Classification of Diseases, Tenth Revision (ICD10) codes reported for UKB participants (release prior to December 2020). The North West Multi-Centre Research Ethics Committee (MREC) approved the study, and all participants in the UKB study analyzed here provided written informed consent.

Genomic annotations used in individual-level analyses

We used genomic annotations previously compiled and processed by Finucane et al.,22 Gazal et al.,23 and Hujoel et al.24 The 44 genomic annotations analyzed in our study were derived from 187 annotations downloaded from the LD score regression repository (baselineLF_v2.2.UKB model; web resources).

Binary annotations include putative evolutionary old enhancers and promoters (referred to in Hujoel et al.24 as ancient sequence age: × 2 annotations); flanking bivalent transcription start sites or enhancers from the Roadmap Epigenomics Project;33 coding; intronic; promoter; 3′ UTR and 5′ UTR genomic regions from the UCSC Genome Browser34 (post-processed by Gusev et al.35); synonymous and non-synonymous; conserved genomic regions across mammals, primates, and vertebrates (phastCons 46-way24); CTCF (annotation from Hoffman et al.36); digital genomic footprint (DGF data from ENCODE37 and post-processed by Gusev et al.35); DNase-I hypersensitive sites (ENCODE and Roadmap Epigenomics data post-processed by Trynka et al.38 and also merged with fetal DHS annotation); enhancers (merged × 2 annotations: from Andersson et al.39 and Hoffman et al.36); histone marks (merged H3K27ac annotations from Hnisz et al.40 and Kundaje et al.33 post-processed by PGC2,41 H3K4me1,38 H3K4me3,38 and H3K9ac38); enhancers and promoters from Villar et al.;42 promoters of loss-of-function-intolerant genes from ExAC (annotation from Hujoel et al.24); promoter flanking;36 repressed genomic regions;36 super enhancers;40 transcription factor binding sites;37 transcribed genomic regions;36 transcription start sites;36 weak enhancers;36 and >4 rejected substitution from GERP++ score.43

Continuous annotations include maximum posterior probability from fine-mapping of molecular QTL44 (expression, methylation, H3K4me1, and H3K27ac), background selection statistic (McVicker B statistic),45 predicted allele age,46 time to most recent common ancestor,47 nucleotide diversity within 10 kb, recombination rate (within 10 kb) based on the Oxford recombination map,48 CpG content within 50 kb, GERP++ score (number of substitution; referred to as GERP NS), and number of species sharing a given putative enhancer.24 We also analyzed LD score with 1 Mb chi-square association test-statistic with a given trait or level of heterozygosity (i.e., MAF × (1 − MAF)) as continuous annotations.

Enrichment of heritability and ID from GWAS summary statistics

Enrichment of ID in trait-associated genomic regions

We first performed a standard GWAS of each of the 11 traits included in our analyses. We tested association between SNPs and traits by using an additive model (hereafter referred to as additive GWAS) where allele counts are directly correlated with traits. We then used resulting chi-square association test statistics as a continuous annotation and estimated its corresponding ID enrichment by using the method described above.

Estimation of heritability and ID enrichment via stratified LD score regression

We quantified the enrichment of SNP-based heritability in the 44 genomic annotations described above by using stratified LD score regression22 (SLDSC). In brief, the SLDSC method, as implemented in this study, was based upon the following regression model:

E[χj2]=1+Nah2+Nk=1Kθk,h2(jkMk), (Equation 5)

where ah2 measures the level of confounding (e.g., due to uncorrected population stratification) in the GWAS, χj2 the chi-square association statistic of SNP j, N the GWAS sample size, θk,h2 the effect size of annotation k on trait heritability (h2), jk (the annotation-weighted LD score), and Mk defined as

jk=i=1Mzikrij2andMk=l=1Mzik, (Equation 6)

where rij2 is the squared correlation of allele counts between SNP i and SNP j.

We used publicly available annotation-weighted LD scores of 1,190,321 sequenced HapMap 3 SNPs from Finucane et al.,22 Gazal et al.,23 and Hujoel et al.24 (web resources), which were calculated against a larger set of 9,997,231 sequenced SNPs with MAF > 0.5% in a European ancestry sample from the 1000 Genomes Project. LD scores of duplicated annotations were averaged. We analyzed each annotation independently and included the standard LD score (j0=i=1Mrij2) and ten MAF-class-specific LD scores (i.e., LD score with SNPs within a given MAF class only) as covariates.23 Parameters were estimated via weighted least-squares with weights proportional to 1/j0.

Next, we re-tested the association between SNPs and traits by using a model fitting both allele counts and the indicator of heterozygosity, which estimates dominance deviation at each SNP. We refer to the latter analysis as additive-dominance GWAS. We show in Appendix B that the LD score regression methodology can be extended to estimate ID from summary statistics of an additive-dominance GWAS and further extend it below to quantify enrichment of ID in genomic annotations by using the following regression model:

E[Zd,j]=ab+Nk=1Kθk,b(jkMk), (Equation 7)

where Zd,j is the Z scores of estimates of dominance deviation at SNP j, ab the intercept term measuring confounding effects, and θk,b the effect size of annotation k on ID. Standard errors for θk,h2, θk,b as well as their correlation across multiple annotations were estimated with a block-jackknife (leave one block out) procedure based on 240 ∼10 Mb-long genomic segments used as blocks.

Results

The relative contribution of rare variants to ID varies across human traits

Clark et al.16 previously claimed that ID is predominantly caused by rare, recessive variants made homozygous in long ROH and not by homozygosity at variants that are in LD with common SNPs. To reach that conclusion, their study compared the magnitude of conditional estimates of ID from a bivariate linear regression model fitting FUNI (referred to as FGRM in their study) and FROH jointly. Clark et al. restricted their calculation of FUNI to SNPs with an MAF > 5% across diverse studies in which participants were genotyped with different arrays. We show that the same analysis performed in a homogeneous and large sample like the UKB leads to a different and trait-specific conclusion regarding the contribution of rare variants to ID. As described in Clark et al., we calculated both FUNI and FROH by using 301,412 quality-controlled genotyped SNPs (as described in Yengo et al.11) with MAF > 5% and also used the same definition of ROH. For this analysis, FROH was defined as in Clark et al., i.e., the cumulated length of ROH in Mb divided by 3,000.

We found that the relative fraction of ID captured by each inbreeding measure varies between traits (Figure 1). For example, the estimated effects of FUNI on the number of children and on the mean time to identify matches (reaction time) both became not statistically significant (p < 0.05) once conditioned on FROH. This observation is consistent with Clark et al.’s claims and suggests a larger relative contribution of rare variants to ID in these traits. However, we find that ID in hip-to-waist ratio as well as on fluid intelligence is mostly captured by FUNI. Finally, for height and peak expiratory flow, the conditional effects FROH and FUNI are of similar magnitudes, suggesting that both low frequency variants and more common partially recessive alleles are causal of ID in these traits. On average across traits, we found that the conditional effects of FUNI and FROH were not statistically different from each another. Altogether, our results imply that alleles causing ID span a larger spectrum of the frequencies (from rare to common, e.g., >5%) and that the relative contribution of rare variants is trait specific.

Enrichment of ID within functional genomic annotations

We analyzed 44 genomic annotations (including 12 continuous annotations; material and methods), which cover ∼98% of the autosome (Figure S2). In total, our analysis involves 11 traits, 44 annotations, and 2 inbreeding measures (i.e., 11 × 44 × 2 = 968 trait-annotation-measure triplets). Therefore, statistical significance accounting for multiple testing (Bonferroni correction) was set to 0.05/968 ≈ 5.2 × 10−5. We also quantified the average ID enrichment across traits, in which case statistical significance was set to 0.05/(44 × 2) ≈ 5.7 × 10−4. Note that Bonferroni correction is most likely too conservative because the overlap between genomic annotations (Figure S3) induces a positive correlation across statistical tests. Therefore, we also report enrichments detected at a false discovery rate (FDR) threshold of 5%.

On average across traits, we detected significant enrichment of ID within 8 annotations (Figure 2; Table S1). Importantly, the largest enrichment was detected with both inbreeding measures within genomic regions with high recombination rates (δ¯FROH∼33.5, p = 3.2 × 10−4; and δ¯FUNI∼21.9, p = 1.5 × 10−6). Other significant enrichments were detected within conserved genomic regions across mammals (δ¯FUNI∼20.9, p = 7.8 × 10−5) and primates (δ¯FUNI∼19.1, p = 1.9 × 10−4); DNase I hypersensitive regions (DHS: δ¯FUNI∼5.7, p = 2.3 × 10−7); chromatin accessible regions identified through digital genomic foot printing (DGF: δ¯FUNI∼5.3, p = 2.1 × 10−5); regions with large GERP++ score,43 which measures the strength of purifying selection at a locus (δ¯FUNI∼3.9, p = 4.5 × 10−6); genomic regions with low nucleotide diversity (δ¯FROH∼3.0, p = 1.8 × 10−4); and H3K4me1 histone marks (δ¯FUNI∼1.9, p = 4.3 × 10−5).

Figure 2.

Figure 2

Significant inbreeding depression (ID) enrichment within eight genomic annotations

(A and B) Estimates of ID enrichment on average across 11 traits obtained via FUNI and FROH. Statistical significance was set at p < 0.05/(44 × 2) ≈ 5.7 × 10−4. Recombination rate and nucleotide diversity were analyzed as continuous annotations. “High recombination rate” denotes that recombination rate is positively correlated with ID, and “low nucleotide diversity” denotes that nucleotide diversity is negatively correlated with ID. Data underlying this figure are reported in Table S1. DHS, DNA-se I hypersensitive sites; DGF, chromatin accessible regions from digital genomic footprint; GERP (NS), GERP++ score number of substitution; H3K4Me1, histone mark (binary annotation). Error bars represent standard errors (SEs).

Previous studies have shown that nucleotide diversity and linkage disequilibrium (LD) can influence ROH detection.49,50 Therefore, errors in ROH calling due to these two annotations could potentially confound our results. Using data from the UKB and from an independent sample from the UK10K Project,51 we show in the supplemental methods section (Figures S4–S6) that ROH genomic density as well as errors in ROH calling cannot explain the enrichment of ID in low nucleotide diversity regions nor that in high recombination rate regions. Importantly, ID enrichment in high recombination rate regions is also detected via FUNI, which minimizes the likelihood of this finding’s being caused by errors in ROH calling. However, we found that exclusion of the major histocompatibility complex (MHC) locus (hg19: chr6: 25,000,000–35,000,000) from our analyses has a strong impact on the significance of δ¯FROH in low nucleotide diversity regions (without MHC: δ¯FROH∼2.0, p = 0.29; Table S8). Given that ROH genomic density alone cannot explain that observation, we further investigated whether population stratification of rare variants may confound our results. The latter hypothesis is justified by the fact that enrichment in low nucleotide diversity is detected with FROH, but not with FUNI, which suggests a signal (or a confounding effect) coming from low frequency variants. To test this hypothesis, we used birth coordinates as proxies for geographically localized rare variants and fitted those as covariates in our analyses. To account for non-linear effects of rare variants stratification, we used a k-means clustering approach to create 100 geographical clusters from participants’ birth coordinates and fitted those as categorical covariates. Although these additional analyses were based on a slightly reduced number of participants who reported their birth location in the UK, we still found a significant enrichment of ID within low nucleotide diversity regions (adjusted for 20 PCs + 100 birth coordinates clusters and including MHC: δ¯FROH∼3.3, p = 1.4 × 10−4), consistent with our main analysis. Altogether, our results show that the MHC locus disproportionately contributes to the enrichment of ID in low nucleotide diversity regions, which is not explained by higher ROH frequency at this locus nor by uncorrected population stratification.

Recognizing that the Bonferonni correction is conservative, at FDR < 5%, we detected 16 additional annotations, which include other regulatory elements (e.g., transcription factor binding sites: δ¯FUNI∼4.0) and conservation annotations (e.g., promoters containing evolutionary old DNA sequences: δ¯FUNI∼27.0) as shown in Table S1.

Next, we assessed the independence between significantly enriched annotations by fitting them jointly. On average across traits, only ID enrichment within genomic regions with low nucleotide diversity remained significant at our initial Bonferroni threshold (δ¯FROH∼3.1, p = 3.0 × 10−4; Table S2).

Moreover, we detected significant enrichment of ID in height within transcribed genomic regions (δFROH∼8.0, p = 4.9 × 10−5) as well as in genomic regions with a low nucleotide diversity (δFROH∼7.6, p = 4.8 × 10−9) and those with a shorter time to most recent common ancestor (TMRCA: δFROH∼5.8, p = 7.6 × 10−6). Note that alleles with a shorter TMRCA are more likely to have arisen recently in the population. Therefore, the latter result suggests a larger contribution of recent alleles to ID, consistent with previous findings.7,52 Importantly, these three genomic annotations were detected with FROH but did not pass statistical significance when quantified via FUNI (p > 0.27), which suggests that rare variants within (or with large values of) these annotations could be the main drivers of our results for height. Interestingly, we also detected a significant depletion of ID signal (material and methods) for height within transcription start sites (enrichment outside TSS: δFROH∼2.0, p = 3.2 × 10−5; Table S3). Furthermore, we detected suggestive ID enrichment at FDR < 5% for the majority of traits except for fertility and visual and auditory acuity (Table S3). Among these additional associations, we highlight the strong enrichment of ID in height and cognitive function within promoters of loss-of-function-intolerant genes from the Exome Aggregation Consortium53 (>84 and p < 0.001 with both inbreeding measures; Table S3). We also highlight suggestive enrichment of ID in peak expiratory flow within DGF (δUNI∼7.2, p = 5.5 × 10−5), where ∼13% common SNPs contribute ∼90% of the genome-wide ID for that trait.

ID enrichment within GWAS-associated genomic regions

Our analyses based on FUNI show interesting similarities between functional enrichment of ID and that of additive genetic variance as reported previously.22 In both cases, we see a strong enrichment of ID and SNP-based heritability within genomic regions with conserved functions across species as well as within regulatory elements such as promoters, enhancers, and DHSs and a much smaller enrichment within transcriptionally repressed or transcribed regions.

To further explore these similarities, we sought to directly quantify ID enrichment within trait-associated loci identified through GWASs. Therefore, we performed GWASs of all 11 traits in our sample of ∼350,000 unrelated UKB participants (material and methods) and used chi-square association statistics of each SNP to define a new continuous annotation. Thus, we calculated 11 trait-specific inbreeding measures derived from FUNI such that each SNP is weighted proportionally to the strength of its association with the target trait. We then tested the ID enrichment of these 11 GWAS-derived annotations. Given that stronger association is expected at SNPs with large LD scores,54 each analysis was also conditioned on another continuous annotation defined by the LD score of each SNP. On average across traits, we found marginally significant evidence that genomic regions with large chi-square association statistics, independently of LD, are enriched for ID signal (δ¯FUNI∼10.6, p = 0.03; Table S4).

In addition, we sought to estimate the correlation between functional enrichment of heritability and that of ID across the 44 genomic annotations analyzed previously. To ensure a fair comparison, we used stratified LD score regression (SLDSC) as a unified framework to assess enrichment of SNP-based heritability and that of ID. We denote θk,h2 and θk,b as the SLDSC effect size of annotation k on heritability and ID, respectively. We provide a theoretical justification for using LD score regression to estimate ID by highlighting its direct connection with FUNI (Appendix B). Consistently, the mean correlation between Z scores of τk (i.e., our individual-level data-based statistic to quantify enrichment of ID: τk=b(δk1); material and methods) and θk,b is ∼0.7 (range: 0.5 up to 0.8) across traits, which shows that both enrichment measures largely capture the same information (Figure S7).

On average across traits, we find a significant positive correlation between θk,h2 and θk,bof ∼0.77 with a block jack-knife standard error (b.s.e.) of ∼0.13 (Figure 3). We also detect a significant (p < 0.05/11) correlation between θk,h2 and θk,b for specific traits (Figure S8; Table S6), such as handgrip strength (r = 0.91; b.s.e. = 0.07), cognitive function (r = 0.82; b.s.e. = 0.14), and peak expiratory flow (r = 0.93; b.s.e. = 0.05). We show in the supplemental methods section and Figure S9, through theory and simulations, that a large positive correlation between enrichments of heritability and ID in fitness could be expected if the distribution of fitness effects of mutations causing ID is moderately skewed, i.e., if the proportion of mutations with large fitness effect is small. However, given that this distribution is largely unknown, our theoretical results therefore only provide a set of sufficient, but not necessary, conditions that can explain our observations. Furthermore, we performed additional simulations to show that the correlation between enrichments of ID and heritability is not due to an artifact in our methods, which would systematically induce such a correlation (supplemental methods, Figure S10).

Figure 3.

Figure 3

Correlation between functional enrichment of SNP-based heritability and inbreeding depression (ID) estimated from GWAS summary statistics

Enrichment statistics for heritability (x axis: θk,h2) and ID (y axis: θk,b) were estimated with stratified LD score regression (SLDSC) as described in the material and methods section. Enrichments statistics were averaged over 11 traits. Each dot represents a genomic annotation, and errors bars represent standard errors (SEs) calculated with a block-jackknife procedure. Annotations highlighted in coral font show a statistically significant ID enrichment at p < 0.05/88. DHS, DNA-se I hypersensitive sites; GERP (NS), GERP++ score number of substitution; LoF, loss of function. Data underlying this figure are reported in Table S5.

Despite an overall large correlation between enrichments of heritability and ID, we find that the effect of recombination rate on ID and on heritability are in opposite directions (Figure 3). To further explore what could explain this observation, we performed extensive forward-time evolutionary simulations by using SLiM355 (Supplemental Methods) to quantify the effect of recombination rate on both heritability and ID enrichments. In these simulations, we considered various combinations of selection (s) and dominance (h) coefficients for mutations causing ID and additive genetic variance in fitness (material and methods). Using 1,000 simulation replicates (for each scenario), we find that recombination rate affects the enrichment of heritability and ID in a non-linear fashion that is modulated by the strength of both selection and dominance (Figures S11 and S12). More specifically, our simulations show a strong enrichment of ID in high recombination rate regions (>2 cM/Mb) only when ID is caused by partially recessive (h=0.1) near-neutral mutations, i.e., when s is between 1/(2Ne) and 2/Ne, where Ne denotes the effective population size. Under the same scenarios, heritability was either slightly enriched or even depleted in high recombination regions. However, for large selection coefficients (s>2/Ne) and large dominance coefficients (h>0.3), enrichments of heritability and ID behave consistently, and both monotonically decrease with recombination rate.

In summary, our findings imply that functional enrichments of ID and additive genetic variance in the population are most likely caused by similar mechanisms, such as variable recombination rates between genomic regions and disruption of evolutionary old regulatory elements (e.g., older than the split between marsupial and placental mammals ∼160 million years ago).56 However, although co-localized, variants causing these two phenomena may only partially overlap, while variants causing ID may tend to be more recessive than those causing additive genetic variance.

Discussion

In this study, we introduce a method to detect and quantify the enrichment of ID within a wide range of binary and continuous genomic annotations and applied it to analyze 11 traits with prior evidence of ID. We show that ID is independently enriched within regions with high recombination rates, consistent with prior evidence in potatoes,57 and also, to a lesser extent, within regions with a low nucleotide diversity, a marker of high functional importance of DNA sequences.58,59 Interestingly, we find that the MHC locus contributes disproportionately to the enrichment of ID in low nucleotide diversity regions, even when analyses are adjusted for population stratification (genetic principal components + geographical birth coordinates). An enrichment of ID in low nucleotide diversity regions and in high recombination rate regions may seem counterintuitive given the negative correlation between these two annotations (r ∼ −0.2; Figure S3), which translates the fact that higher recombination rates are associated with higher nucleotide diversity. However, high recombination rates also contribute to minimize the strength of background selection60 (i.e., purging of neutral and near-neutral alleles in LD with deleterious mutations) and selective sweeps, thereby contributing to increase diversity of slightly deleterious alleles.61 Consistent with reduced selection efficiency, recombination rate and strength of background selection annotations show a larger negative correlation (r ∼ −0.3; Figure S3). Another potential explanation for the enrichment of ID within genomic regions with high recombination rates is because frequency of deleterious de novo mutation is increased in those regions.62,63 However, the relatively small variance in de novo mutations explained by recombination rate (R2 ∼ 5%, in Kessler et al.62) implies that variation in mutation rates most likely contributes a second order effect to our observations.

Using estimated coalescence times, we confirm that recent alleles contribute more to ID and also show that homozygosity in DNA sequences that are conserved across mammal and primate species contribute ∼20-fold more ID than any random equally sized portion of the genome. In humans, inbreeding between close relatives often manifests as severe Mendelian syndromes caused by homozygous loss-of-function mutations.8 Consistent with this observation, our study shows a large ID enrichment (δ¯ > 30; Table S1) within promoters of loss-of-function-intolerant genes. It is noteworthy that ID enrichment within coding regions was comparatively smaller (δ¯ between 1 and 7.1, p > 0.15; Table S1) and did not survive any of our corrections for multiple testing. The latter observation emphasizes that impairment of essential genes, as opposed to genes in general, is a key contributor to ID. Finally, our study shows a strong ID enrichment within multiple regulatory elements, which opens interesting avenues to quantify the effect of homozygosity on gene and protein expression in humans.

The genetic basis of ID is predicated on two hypotheses.1 The first one, the “dominance hypothesis,” stipulates that ID is caused by directional dominance effects of partially recessive deleterious alleles. On the other hand, the “overdominance hypothesis” assumes that the genetic architecture of traits subjected to ID is such that heterozygotes express better phenotypes than both homozygotes, for example due to balancing selection. Although major progress has been made in model species,1 disentangling these two explanations remains a challenge in natural populations. One possibility to address it could be by leveraging the distinct evolutionary consequences of these two hypotheses. The “dominance hypothesis” predicts a larger contribution to ID from loci where mutation rate is higher than average, while the “overdominance hypothesis” predicts a larger ID from loci where balancing selection is stronger.64 Our study shows patterns of ID enrichment that are consistent with both hypotheses. However, we found a much larger magnitude of ID enrichment from annotations that are correlated with mutation rates (e.g., recombination rates62,63 or CpG content;23 Table S1) as compared to those correlated with strength of selection (e.g., GERP++ score or background selection; Table S1). Altogether, our results imply that both hypotheses probably contribute to explain ID in human traits, but overdominance does so to a lesser extent.

We also revisited previous claims7 regarding the contribution of common variants to ID. We provide new evidence that variants with an MAF > 1% substantially contribute to ID in height and educational attainment (Figure 1), while their contribution to ID in fertility remains limited. While some of the traits analyzed in this study have been shown to drive assortative mating in the population, we do not think that assortment based on polygenic traits (e.g., height) would implicitly create an enrichment of ID within any specific part of the genome, as reported here. In fact, the increased homozygosity at trait-associated loci that is induced by assortative mating is inversely proportional to the number of causal variants and therefore negligible for highly polygenic traits.65,66 Finally, our results also contribute to illuminating the direction of causality between homozygosity and traits. Indeed, differential contribution of genomic regions to ID would not be expected if traits of parents (e.g., intelligence or socio-economic status) were causally associated with the phenotypes in their children and only incidentally associated with their relatedness, i.e., with offspring homozygosity. Note that the latter configuration has been shown to induce biases in estimates of ID, as suggested previously.67

Our study has a number of limitations. First, we lacked statistical power to detect more trait-specific patterns of ID enrichment beyond height. Similarly, we also lacked power to estimate conditional enrichment across multiple functional annotations. Therefore, estimates of ID enrichment reported here may reflect overlap between annotations (e.g., the correlation between DHS and DGF is ∼0.5; Figure S3), including ones not observed in our study. A second limitation of our study is that estimates of ID obtained via our extension of the LD score regression method have too large standard errors to be efficiently used for partitioning analyses (Table S7; Figure S13). However, we emphasize that such methods remain valid and can still be used when sample size is sufficiently large (e.g., n ∼ 1,000,000, which is becoming more common in human studies).

A third limitation of our study is that our method assumes a linear relationship between ID and continuous annotations (Equation 2), while this relationship may be more complex in reality. For example, our forward-time evolutionary simulations showed a non-linear relationship between the recombination rate and enrichment of ID (and that of additive genetic variance), which is modulated by the distribution of selection coefficients of deleterious alleles, as well as their dominance coefficients (Figures S11 and S12). Note that such a non-linear relationship has also been reported in previous studies23,68 that investigated the functional enrichment of trait heritability, although these studies did not emphasize that observation. As an alternative to assuming a linear relationship between continuous annotations and ID, we recommend discretizing the annotations into quantiles or testing the significance of the conditional effects of the annotation and its squared value. We show in Figure S14 that the empirical relationship between ID enrichment and recombination rate is largely monotonic, which suggests that the linear assumption made here would have a limited impact on our conclusions regarding this annotation.

In conclusion, we have proposed in this study a refined characterization of the functional effects of variants contributing to ID. Beyond conceptual parallels between estimation of ID and that of heritability established previously,9 our study demonstrates that functional mechanisms involved in ID are also relevant for characterizing genetic variance and genetic risk in the population. We foresee that the application of the different methods developed here to large collections of whole-genome or whole-exome sequencing data available in the near future will lead to major discoveries regarding the genetic basis of ID.

Declaration of interests

The authors declare no competing interests.

Acknowledgments

This research was supported by the Australian Research Council (DE200100425, FT180100186, and FL180100072), the Australian National Health and Medical Research Council (1173790 and 1113400) and the National Institute of Health (MH100141). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding bodies. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This research has been conducted with the UK Biobank Resource under project 12505. The authors are grateful to Deborah Charlesworth and Brian Charlesworth for helpful discussions and comments on the manuscript.

Published: July 1, 2021

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2021.06.005.

Appendix A: Derivation of Equation 3

Equation 3 is derived from combining Equation 1 and Equation 2 recalled below:

y=j=1MβjFj+e, (Equation A1)
E[βj]=b0M+k=1K(zjk[bkmk]+(1zjk)[bbkMmk]). (Equation A2)

Note that Equation 1 is the same as Equation A1 and that Equation 2 the same as Equation A2.

We write βj=E[βj]+εj then replace E[βj] with its expression from Equation 2. We get

y=j=1MβjFj+e=j=1ME[βj]Fj+e, (Equation A3)

where e=e+j=1MεjFj and such as E[e]=0.

Besides, we can rewrite Equation 2 from the main text (methods section) as

E[βj]=b(1K)M+k=1KbbkMmk+k=1Kzjk[bkmkbbkMmk],
E[βj]=1M[b(1K)+k=1KM(bbk)Mmk]+1mkk=1Kzjk[bkmkMmk(bbk)],
E[βj]=1M[b(1K)+k=1K(bbk1πk)]+1mkk=1Kzjk[bkπk1πk(bbk)].
E[βj]=1M[b(1K)+k=1K(bbk1πk)]+1mkk=1Kzjk[bkπkb1πk]. (Equation A4)

where πk=mk/M. If we denote F¯k and Fg as

F¯k=1mkk=1MzjkFjandFg=1Mk=1KFj

Then, combining Equation A3 and Equation A4 leads to

y=[b(1K)+k=1K(bbk1πk)]Fg+k=1K[bkπkb1πk]F¯k+e,
=[b(1K)+k=1K(bbk1πk)]Fg+k=1K[bkπkb1πk](F¯kFg+Fg)+e,
=[b(1K)+k=1K(bbk1πk)+k=1K(bkπkb1πk)]Fg+k=1K(bkπkb)[(πk1πk)(F¯kFg)]+e,
=bFg+k=1K(bkπkb)[(πk1πk)(F¯kFg)]+e. (Equation A5)

QED.

Appendix B: Enrichment of ID from additive-dominance GWAS summary statistics

We assume the following model

y=j=1M[xj2pj2pj(1pj)]βj(a)+[xj2+(1+2pj)xj2pj22pj(1pj)]βj(d)+e, (Equation B1)

where xj denotes the minor allele counts (xj=0,1, or 2) at SNP j, pj the minor allele frequency in the population, βj(a) and βj(d) additive and dominance effects at SNP j, respectively, and e a residual term. This model is the same as in Zhu et al.69 (Equation 3) and Yengo et al. (Equations S6 and S7).9 Zhu et al. further assumed that E[βj(a)]=E[βj(d)]=0 and that var[βj(a)]=ha2/M, var[βj(d)]=hd2/M, and var[βj(a)]=1ha2hd2. However, in order to account for ID, Equation B1 must be generalized to model the directionality of dominance effects. For that, we assume E[βj(d)]=b/M, where b is the genome-wide ID.

Now we denote zj=(xj2pj)/2pj(1pj) and Fj=[xj2(1+2pj)xj2pj2]/[2pj(1pj)]. We note that Fj is exactly FUNI at SNP j. Therefore, Equation B1 can be rewritten as

y=j=1Mzjβj(a)Fjβj(d)+e=b(1Mj=1MFj)+ga+gd+e, (Equation B2)

where E[ga]=E[gd]=0.

In an additive-dominance GWAS, the association between SNP i and y is tested by fitting simultaneously both additive (ai) and dominance (di) effects, i.e., by fitting the following model

y=aixi+diHi+ residual, (Equation B3)

where Hi=xi(2xi) is the indicator of heterozygosity at SNP i. Under the model above, we define dˆ as the ordinary least-squares estimator of di and SE(dˆi) as its standard error. We then denote the Z score of dˆi as Zd,i=dˆi/SE(dˆi)and shows that it verifies (proof is given below)

E[Zd,i/N]=E[Bˆi]=b cov(Fi,1Mj=1MFj)(bM)i, (Equation B4)

where i is the LD score of SNP i and N is the GWAS sample size. Note that Equation B4 is analogous to the main result underlying the LD score regression methodology, which we recall below in Equation B5:

E[(χi21)/N](h2M)i. (Equation B5)

Equation B4 and Equation B5 imply that the mean of the Bˆi’s divided by the mean LD score over a given set of SNPs is in fact a measure of the “per-SNP” ID just like the mean χ2 (minus 1) divided by the mean LD score is a measure of the per-SNP heritability. Therefore, Ek(h2) and Ek(b) quantify the relative per-SNP heritability and ID for SNPs within annotation k as compared with the rest of the genome.

In principle, Equation B4 can be further extended to estimate enrichment of ID as done with the stratified LD score regression methodology. However, we found the standard errors of estimates of ID based on Equation B4 to be ~1.8-fold larger than that of estimates obtained from individual-level data (Table S7), which substantially reduces statistical power to detect enrichments. Note that estimates from LD score regression with an intercept constrained to 0 have slightly lower standard errors but also suffer large biases as shown in Table S7.

Proof of Equation B4

The proof of Equation B4 relies on two arguments. The first one is to note that under classical linear regression theory, the ordinary least-squares estimate (aˆi,dˆi) of (ai,di) in Equation B3 verifies that

E(aˆidˆi)=(var(xi)cov(xi,Hi)cov(xi,Hi)var(Hi))1[cov(xi,y)cov(Hi,y)].

In particular for dˆ, we can write that

E[dˆi]=var(xi)cov(Hi,y)cov(xi,Hi)cov(xi,y)var(xi)var(Hi)cov(xi,Hi)2=cov(var(xi)Hicov(xi,Hi)xivar(xi)var(Hi)cov(xi,Hi)2,y).

Similarly, we derive the standard error of dˆi (under the assumption that the phenotypic variance is equal to 1 and that each SNP explains a negligible part of the trait variance) as

SE[dˆi]SD(y)hiN=1[hiN],

where SD(y) denotes the standard deviation of y.

If we denote p as the minor allele frequency, then under Hardy-Weinberg equilibrium, we can show that var(xi)=hi=2pi(1pi), var(Hi)=hi(1hi), and cov(xi,Hi)=hi(12pi). Using these relationships, we can further show that var(xi)var(Hi)cov(xi,Hi)2=hi3 and var(xi)Hicov(xi,Hi)xi=hi2Fi. Therefore,

E[dˆi]=cov(Fihi,y)

or, equivalently, E[(dˆi/SE[dˆi])/N]=E[Zd,i/N]cov(Fi,y). Now using Equation B2, we can subsequently write that

E[Zd,iN]= cov[Fi,b(1Mj=1MFj)+ga+gd+e]=bMj=1Mcov(Fi,Fj)+ cov(Fi,ga+gd+e).

Under the assumption that SNP effects are independent of genotypes and that e (environment) is independent of Fi, we have that cov(Fi,ga+gd+e)=0. Moreover, Yengo et al.9 previously showed that cov(Fi,Fj)rij2, where rij2 is the square LD correlation between SNP i and j. Hence,

E[Zd,iN](bM)j=1Mrij2=(bM)i.

Data and code availability

This study makes use of genotype and phenotype data from the UK Biobank data under project 12505. UKB data can be accessed upon request once a research project has been submitted and approved by the UKB committee. Data sources underlying all figures are provided as supplemental tables. Example R scripts from our pipeline to simulate and estimate enrichment of ID and a C++ code describing the algorithm utilized to calculate annotation-specific inbreeding measures are provided at the following URL: https://github.com/loic-yengo/Code_for_Genomic_Partitioning_of_Inbreeding_Depression.

Web resources

Supplemental information

Document S1. Figures S1–S14 and supplemental methods
mmc1.pdf (2.3MB, pdf)
Table S1. Tables S1–S8
mmc2.xlsx (70.4KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (3.2MB, pdf)

References

  • 1.Charlesworth D., Willis J.H. The genetics of inbreeding depression. Nat. Rev. Genet. 2009;10:783–796. doi: 10.1038/nrg2664. [DOI] [PubMed] [Google Scholar]
  • 2.Huang X., Yang S., Gong J., Zhao Y., Feng Q., Gong H., Li W., Zhan Q., Cheng B., Xia J. Genomic analysis of hybrid rice varieties reveals numerous superior alleles that contribute to heterosis. Nat. Commun. 2015;6:6258. doi: 10.1038/ncomms7258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Huisman J., Kruuk L.E.B., Ellis P.A., Clutton-Brock T., Pemberton J.M. Inbreeding depression across the lifespan in a wild mammal population. Proc. Natl. Acad. Sci. USA. 2016;113:3585–3590. doi: 10.1073/pnas.1518046113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pemberton J.M., Ellis P.E., Pilkington J.G., Bérénos C. Inbreeding depression by environment interactions in a free-living mammal population. Heredity. 2017;118:64–77. doi: 10.1038/hdy.2016.100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.McQuillan R., Eklund N., Pirastu N., Kuningas M., McEvoy B.P., Esko T., Corre T., Davies G., Kaakinen M., Lyytikäinen L.P., ROHgen Consortium Evidence of inbreeding depression on human height. PLoS Genet. 2012;8:e1002655. doi: 10.1371/journal.pgen.1002655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Joshi P.K., Esko T., Mattsson H., Eklund N., Gandin I., Nutile T., Jackson A.U., Schurmann C., Smith A.V., Zhang W. Directional dominance on stature and cognition in diverse human populations. Nature. 2015;523:459–462. doi: 10.1038/nature14618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Clark D.W., Okada Y., Moore K.H.S., Mason D., Pirastu N., Gandin I., Mattsson H., Barnes C.L.K., Lin K., Zhao J.H. Associations of autozygosity with a broad range of human phenotypes. Nat. Commun. 2019;10:4957. doi: 10.1038/s41467-019-12283-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bittles A.H., Neel J.V. The costs of human inbreeding and their implications for variations at the DNA level. Nat. Genet. 1994;8:117–121. doi: 10.1038/ng1094-117. [DOI] [PubMed] [Google Scholar]
  • 9.Yengo L., Zhu Z., Wray N.R., Weir B.S., Yang J., Robinson M.R., Visscher P.M. Detection and quantification of inbreeding depression for complex traits from SNP data. Proc. Natl. Acad. Sci. USA. 2017;114:8602–8607. doi: 10.1073/pnas.1621096114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Marshall T.C., Coltman D.W., Pemberton J.M., Slate J., Spalton J.A., Guinness F.E., Smith J.A., Pilkington J.G., Clutton-Brock T.H. Estimating the prevalence of inbreeding from incomplete pedigrees. Proc. Biol. Sci. 2002;269:1533–1539. doi: 10.1098/rspb.2002.2035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yengo L., Wray N.R., Visscher P.M. Extreme inbreeding in a European ancestry sample from the contemporary UK population. Nat. Commun. 2019;10:3719. doi: 10.1038/s41467-019-11724-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Keller M.C., Visscher P.M., Goddard M.E. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics. 2011;189:237–249. doi: 10.1534/genetics.111.130922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ceballos F.C., Joshi P.K., Clark D.W., Ramsay M., Wilson J.F. Runs of homozygosity: windows into population history and trait architecture. Nat. Rev. Genet. 2018;19:220–234. doi: 10.1038/nrg.2017.109. [DOI] [PubMed] [Google Scholar]
  • 14.Keller M.C., Simonson M.A., Ripke S., Neale B.M., Gejman P.V., Howrigan D.P., Lee S.H., Lencz T., Levinson D.F., Sullivan P.F., Schizophrenia Psychiatric Genome-Wide Association Study Consortium Runs of homozygosity implicate autozygosity as a schizophrenia risk factor. PLoS Genet. 2012;8:e1002656. doi: 10.1371/journal.pgen.1002656. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Christofidou P., Nelson C.P., Nikpay M., Qu L., Li M., Loley C., Debiec R., Braund P.S., Denniff M., Charchar F.J. Runs of Homozygosity: Association with Coronary Artery Disease and Gene Expression in Monocytes and Macrophages. Am. J. Hum. Genet. 2015;97:228–237. doi: 10.1016/j.ajhg.2015.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Johnson E.C., Bjelland D.W., Howrigan D.P., Abdellaoui A., Breen G., Borglum A., Cichon S., Degenhardt F., Forstner A.J., Frank J., Schizophrenia Working Group of the Psychiatric Genomics Consortium No Reliable Association between Runs of Homozygosity and Schizophrenia in a Well-Powered Replication Study. PLoS Genet. 2016;12:e1006343. doi: 10.1371/journal.pgen.1006343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pryce J.E., Haile-Mariam M., Goddard M.E., Hayes B.J. Identification of genomic regions associated with inbreeding depression in Holstein and Jersey dairy cattle. Genet. Sel. Evol. 2014;46:71. doi: 10.1186/s12711-014-0071-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ferenčaković M., Sölkner J., Kapš M., Curik I. Genome-wide mapping and estimation of inbreeding depression of semen quality traits in a cattle population. J. Dairy Sci. 2017;100:4721–4730. doi: 10.3168/jds.2016-12164. [DOI] [PubMed] [Google Scholar]
  • 19.Ayroles J.F., Hughes K.A., Rowe K.C., Reedy M.M., Rodriguez-Zas S.L., Drnevich J.M., Cáceres C.E., Paige K.N. A genomewide assessment of inbreeding depression: gene number, function, and mode of action. Conserv. Biol. 2009;23:920–930. doi: 10.1111/j.1523-1739.2009.01186.x. [DOI] [PubMed] [Google Scholar]
  • 20.García C., Avila V., Quesada H., Caballero A. Gene-expression changes caused by inbreeding protect against inbreeding depression in Drosophila. Genetics. 2012;192:161–172. doi: 10.1534/genetics.112.142687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.García C., Ávila V., Quesada H., Caballero A. Are transcriptional responses to inbreeding a functional response to alleviate inbreeding depression? Fly (Austin) 2013;7:8–12. doi: 10.4161/fly.22559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.R., Anttila V., Xu H., Zang C., Farh K. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gazal S., Finucane H.K., Furlotte N.A., Loh P.R., Palamara P.F., Liu X., Schoech A., Bulik-Sullivan B., Neale B.M., Gusev A., Price A.L. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hujoel M.L.A., Gazal S., Hormozdiari F., van de Geijn B., Price A.L. Disease Heritability Enrichment of Regulatory Elements Is Concentrated in Elements with Ancient Sequence Age and Conserved Function across Species. Am. J. Hum. Genet. 2019;104:611–624. doi: 10.1016/j.ajhg.2019.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yengo L., Zhu Z., Wray N.R., Weir B.S., Yang J., Robinson M.R., Visscher P.M. Reply to Kardos et al.: Estimation of inbreeding depression from SNP data. Proc. Natl. Acad. Sci. USA. 2018;115:E2494–E2495. doi: 10.1073/pnas.1718598115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Marchini J., Howie B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 2010;11:499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]
  • 27.Allen N., Sudlow C., Downey P., Peakman T., Danesh J., Elliott P., Gallacher J., Green J., Matthews P., Pell J. UK Biobank: Current status and what it means for epidemiology. Health Policy Technol. 2012;1:123–126. [Google Scholar]
  • 28.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yengo L., Sidorenko J., Kemper K.E., Zheng Z., Wood A.R., Weedon M.N., Frayling T.M., Hirschhorn J., Yang J., Visscher P.M., GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Altshuler D.M., Gibbs R.A., Peltonen L., Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Gusev A., Lee S.H., Trynka G., Finucane H., Vilhjálmsson B.J., Xu H., Zang C., Ripke S., Bulik-Sullivan B., Stahl E. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 2014;95:535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Hoffman M.M., Ernst J., Wilder S.P., Kundaje A., Harris R.S., Libbrecht M., Giardine B., Ellenbogen P.M., Bilmes J.A., Birney E. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 2013;41:827–841. doi: 10.1093/nar/gks1284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Andersson R., Gebhard C., Miguel-Escalada I., Hoof I., Bornholdt J., Boyd M., Chen Y., Zhao X., Schmidl C., Suzuki T. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–461. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hnisz D., Abraham B.J., Lee T.I., Lau A., Saint-André V., Sigova A.A., Hoke H.A., Young R.A. Super-enhancers in the control of cell identity and disease. Cell. 2013;155:934–947. doi: 10.1016/j.cell.2013.09.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–427. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Villar D., Berthelot C., Aldridge S., Rayner T.F., Lukk M., Pignatelli M., Park T.J., Deaville R., Erichsen J.T., Jasinska A.J. Enhancer evolution across 20 mammalian species. Cell. 2015;160:554–566. doi: 10.1016/j.cell.2015.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Davydov E.V., Goode D.L., Sirota M., Cooper G.M., Sidow A., Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++ PLoS Comput. Biol. 2010;6:e1001025. doi: 10.1371/journal.pcbi.1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hormozdiari F., Gazal S., van de Geijn B., Finucane H.K., Ju C.J., Loh P.R., Schoech A., Reshef Y., Liu X., O’Connor L. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 2018;50:1041–1047. doi: 10.1038/s41588-018-0148-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.McVicker G., Gordon D., Davis C., Green P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 2009;5:e1000471. doi: 10.1371/journal.pgen.1000471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Rasmussen M.D., Hubisz M.J., Gronau I., Siepel A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 2014;10:e1004342. doi: 10.1371/journal.pgen.1004342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Palamara P.F., Terhorst J., Song Y.S., Price A.L. High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat. Genet. 2018;50:1311–1317. doi: 10.1038/s41588-018-0177-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Myers S., Bottolo L., Freeman C., McVean G., Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310:321–324. doi: 10.1126/science.1117196. [DOI] [PubMed] [Google Scholar]
  • 49.Howrigan D.P., Simonson M.A., Keller M.C. Detecting autozygosity through runs of homozygosity: a comparison of three autozygosity detection algorithms. BMC Genomics. 2011;12:460. doi: 10.1186/1471-2164-12-460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gazal S., Sahbatou M., Perdry H., Letort S., Génin E., Leutenegger A.L. Inbreeding coefficient estimation with dense SNP data: comparison of strategies and application to HapMap III. Hum. Hered. 2014;77:49–62. doi: 10.1159/000358224. [DOI] [PubMed] [Google Scholar]
  • 51.Walter K., Min J.L., Huang J., Crooks L., Memari Y., McCarthy S., Perry J.R., Xu C., Futema M., Lawson D., UK10K Consortium The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82–90. doi: 10.1038/nature14962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Szpiech Z.A., Xu J., Pemberton T.J., Peng W., Zöllner S., Rosenberg N.A., Li J.Z. Long runs of homozygosity are enriched for deleterious variation. Am. J. Hum. Genet. 2013;93:90–102. doi: 10.1016/j.ajhg.2013.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O’Donnell-Luria A.H., Ware J.S., Hill A.J., Cummings B.B., Exome Aggregation Consortium Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Bulik-Sullivan B.K., Loh P.R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Haller B.C., Messer P.W. SLiM 3: Forward Genetic Simulations Beyond the Wright-Fisher Model. Mol. Biol. Evol. 2019;36:632–637. doi: 10.1093/molbev/msy228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Luo Z.-X., Yuan C.-X., Meng Q.-J., Ji Q. A Jurassic eutherian mammal and divergence of marsupials and placentals. Nature. 2011;476:442–445. doi: 10.1038/nature10291. [DOI] [PubMed] [Google Scholar]
  • 57.Zhang C., Wang P., Tang D., Yang Z., Lu F., Qi J., Tawari N.R., Shang Y., Li C., Huang S. The genetic basis of inbreeding depression in potato. Nat. Genet. 2019;51:374–378. doi: 10.1038/s41588-018-0319-1. [DOI] [PubMed] [Google Scholar]
  • 58.Booker T.R., Keightley P.D. Understanding the Factors That Shape Patterns of Nucleotide Diversity in the House Mouse Genome. Mol. Biol. Evol. 2018;35:2971–2988. doi: 10.1093/molbev/msy188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Tatarinova T.V., Chekalin E., Nikolsky Y., Bruskin S., Chebotarov D., McNally K.L., Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci. Rep. 2016;6:35730. doi: 10.1038/srep35730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Pouyet F., Aeschbacher S., Thiéry A., Excoffier L. Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences. eLife. 2018;7:e36317. doi: 10.7554/eLife.36317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Nordborg M., Charlesworth B., Charlesworth D. The effect of recombination on background selection. Genet. Res. 1996;67:159–174. doi: 10.1017/s0016672300033619. [DOI] [PubMed] [Google Scholar]
  • 62.Kessler M.D., Loesch D.P., Perry J.A., Heard-Costa N.L., Taliun D., Cade B.E., Wang H., Daya M., Ziniti J., Datta S. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc. Natl. Acad. Sci. USA. 2020;117:2560–2569. doi: 10.1073/pnas.1902766117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Halldorsson B.V., Palsson G., Stefansson O.A., Jonsson H., Hardarson M.T., Eggertsson H.P., Gunnarsson B., Oddsson A., Halldorsson G.H., Zink F. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science. 2019;363:eaau1043. doi: 10.1126/science.aau1043. [DOI] [PubMed] [Google Scholar]
  • 64.Lynch M., Walsh B. Sinauer; 1998. Genetics and Analysis of Quantitative Traits. [Google Scholar]
  • 65.Crow J.F., Felsenstein J. The effect of assortative mating on the genetic composition of a population. Soc. Biol. 1982;29:22–35. [PubMed] [Google Scholar]
  • 66.Crow J.F., Kimura M. Blackburn Press; 2009. An Introduction to Population Genetics Theory. [Google Scholar]
  • 67.Johnson E.C., Evans L.M., Keller M.C. Relationships between estimated autozygosity and complex traits in the UK Biobank. PLoS Genet. 2018;14:e1007556. doi: 10.1371/journal.pgen.1007556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Gazal S., Loh P.R., Finucane H.K., Ganna A., Schoech A., Sunyaev S., Price A.L. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 2018;50:1600–1607. doi: 10.1038/s41588-018-0231-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Zhu Z., Bakshi A., Vinkhuyzen A.A., Hemani G., Lee S.H., Nolte I.M., van Vliet-Ostaptchouk J.V., Snieder H., Esko T., Milani L., LifeLines Cohort Study Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 2015;96:377–385. doi: 10.1016/j.ajhg.2015.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S14 and supplemental methods
mmc1.pdf (2.3MB, pdf)
Table S1. Tables S1–S8
mmc2.xlsx (70.4KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (3.2MB, pdf)

Data Availability Statement

This study makes use of genotype and phenotype data from the UK Biobank data under project 12505. UKB data can be accessed upon request once a research project has been submitted and approved by the UKB committee. Data sources underlying all figures are provided as supplemental tables. Example R scripts from our pipeline to simulate and estimate enrichment of ID and a C++ code describing the algorithm utilized to calculate annotation-specific inbreeding measures are provided at the following URL: https://github.com/loic-yengo/Code_for_Genomic_Partitioning_of_Inbreeding_Depression.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES