Summary
Large biobank samples provide an opportunity to integrate broad phenotyping, familial records, and molecular genetics data to study complex traits and diseases. We introduce Pearson-Aitken Family Genetic Risk Scores (PA-FGRS), a method for estimating disease liability from patterns of diagnoses in extended, age-censored genealogical records. We then apply the method to study a paradigmatic complex disorder, major depressive disorder (MDD), using the iPSYCH2015 case-cohort study of 30,949 MDD cases, 39,655 random population controls, and more than 2 million relatives. We show that combining PA-FGRS liabilities estimated from family records with molecular genotypes of probands improves three lines of inquiry. Incorporating PA-FGRS liabilities improves classification of MDD over and above polygenic scores, identifies robust genetic contributions to clinical heterogeneity in MDD associated with comorbidity, recurrence, and severity and can improve the power of genome-wide association studies. Our method is flexible and easy to use, and our study approaches are generalizable to other datasets and other complex traits and diseases.
Keywords: family genetic risk scores, depression, pedigree, genealogy, genetic liability, genetic risk profiles
Our method, the Pearson-Aitken Family Genetic Risk Score, estimates genetic liability in a proband from patterns of disease outcomes in their relatives. The method is flexible, and the resulting scores can be used for genetic classification, describing genetic etiology, and improving power for gene mapping.
Introduction
The analysis of large biobanks (e.g., BioBank Japan,1 deCODE genetics,2 iPSYCH,3,4 UK Biobank,5 etc.) is omnipresent in complex disorder genetics research. These resources provide opportunities to combine large samples, molecular data, diverse phenotypes, and familial phenotypes. Leveraging familial phenotypes to estimate disease liability in large biobanks has applications for improving power of genome-wide association studies (GWASs),6,7 making classifications and predictions,8,9,10 and offering better descriptions of underlying causes of disease and heterogeneity.11,12 Combining familial and molecular data for these questions may be especially relevant for paradigmatic complex disorders, such as major depressive disorder (MDD), a leading cause of disability worldwide. Such disorders are marked by complex, multifactorial, highly polygenic etiologies that limit the power of molecular genetic investigations,13,14 meaning improved approaches are needed. However, it is not clear how best to combine familial phenotypes and genotype data. Existing methods cannot fully accommodate all biobanks, including the largest for psychiatric genetics, the iPSYCH2015 case-cohort study, due to complex, age-censored, extended genealogies. Previous applications have focused on one use case (e.g., GWAS or prediction) limiting the picture of generalizability to other questions. Here, we set out to develop a method that is applicable to any biobank and demonstrate, by studying the genetic basis of MDD, that it can improve multiple approaches applied in molecular genetic studies of complex disorders.
Currently, methods that transform patterns of diagnoses in genealogies to continuous liability scores15,16 in each relative are limited. Two related resampling approaches estimate posterior mean genetic liabilities assuming a liability threshold model, conditional on case-control status and family history (LT-FH)7 and additionally conditional on age at onset and sex (LT-FH++),17 but both consider only first degree relatives. This excludes information from more distant relatives and could confound estimates more strongly with familial environment. Both were applied only in the context of improving GWASs. So et al.18 developed a method based on the Pearson-Aitken (PA) selection formula,19 which is an analytical procedure for calculating liability from phenotypes in arbitrarily structured genealogies but assumes each relative has been followed for their entire life (i.e., is fully observed). A flexible, resampling-based extension of this model was proposed but is computationally prohibitive at scale.20 These approaches have had a focus on trait predictions. Family genetic risk scores (FGRSs)21 are kinship weighted sums of diagnoses of relatives with corrections for familial environment, censoring, and other covariates. FGRS accommodates extended genealogies and censored records but is not based on a well-described model and does not account for kinship among relatives of probands. FGRS has been applied to describe genetic differences within and across disorders. Current methods estimating individual liability from genealogies are limited and have been applied narrowly.
We introduce a method, Pearson-Aitken Family Genetic Risk Scores (PA-FGRS), validate it under simulations, and apply it to study MDD in the iPSYCH2015 case-cohort study. We demonstrate that combining PA-FGRS with genotypes improves three lines of inquiry: (1) classification of MDD in the context of polygenic score (PGS), (2) identifying robust genetic contributions to clinical heterogeneity of MDD, and (3) improving power in large, single-cohort GWASs of MDD. Our applications confirm, add context to, and extend recent methodological advances and their applications in similar data. The PA-FGRS framework is extensible, powerful, and well-calibrated and could be applied to large biobanks or smaller family studies to pursue similar aims with other complex disorders.
Methods
iPSYCH2015 case-cohort study
The Lundbeck Foundation initiative for Integrative Psychiatric Research (iPSYCH)3,4 is a case-cohort study of all singleton births between 1981 and 2008 to mothers legally residing in Denmark and who were alive and residing in Denmark on their first birthday (N = 1,657,449). The iPSYCH2015 case-cohort study comprises two enrollments from this base population. The iPSYCH2012 case-cohort study enrolled 86,189 individuals (30,000 random population controls; 57,377 psychiatric cases).3 The iPSYCH2015i case-cohort study expanded enrollment by an additional 56,233 individuals (19,982 random population controls; 36,741 psychiatric cases).3,4 DNA was extracted from dried blood spots stored in the Danish Neonatal Screening Biobank22 and genotyping was performed on the Infinium PsychChip v1.0 array (2012) or the Global Screening Array v2 (2015i). Psychiatric diagnoses were obtained from the Danish Psychiatric Central Research Register (PCRR)23 and the Danish National Patient Register.24 Diagnoses in these registers are made by licensed psychiatrists during in- or out-patient specialty care, but diagnoses or treatments assigned in primary care are not included. Linkage across population registers to parents where known and to the neonatal biobank is possible via unique citizen identifiers of the Danish Civil Registration System.25 The use of this data follows standards of the Danish Scientific Ethics Committee, the Danish Health Data Authority, the Danish Data Protection Agency, and the Danish Neonatal Screening Biobank Steering Committee. Data access was via secure portals in accordance with Danish data protection guidelines set by the Danish Data Protection Agency, the Danish Health Data Authority, and Statistics Denmark. There are restrictions to the availability of the individual-level data used for this work, as the consent structure of iPSYCH and Danish law prevent individual genotype and phenotype data from being shared publicly.
Genotyping and quality control
Genotype phasing, imputation, and quality control were performed in parallel in the 2012 and 2015i cohorts according to custom, mirrored protocols. Briefly, phasing and imputation were conducted using BEAGLEv5.1,26,27 both steps including reference haplotypes from the Haplotype Reference Consortium (HRC) v1.1.28 Quality control was applied prior to and following imputation to correct for missing data across SNPs and individuals, SNPs showing deviations from Hardy-Weinberg equilibrium in cases or controls, abnormal heterozygosity of SNPs and samples, genotype-phenotype sex discordance, minor allele frequency (MAF), batch artifacts, and imputation quality. Kinship was detected within and across 2012 and 2015i cohorts using KING,29 censoring to ensure no second degree or high relatives remained. Ancestry was examined using the smartpca module of EIGENSOFT,30 and multivariate PCA outliers from the set of iPSYCH individuals with both parents and four grandparents born in Denmark were excluded. In total, 7,649,999 imputed allele dosages were retained for analysis.
iPSYCH2015 case-cohort genealogies
All recorded relatives of probands in this iPSYCH2015 case-cohort were obtained from the Danish Civil Registry25 using mother-father-offspring linkages. From the 141,26531 probands, we identified 2,066,657 unique relatives, assembling all relationships into a population graph using the kinship232 and FamAgg31 packages where edges denoted membership in a recorded trio. The relatedness coefficient for each pair was calculated as a weighted sum of unique ancestral paths through the population graph (i.e., not including the same individual, except for the common ancestor). Each path in the sum was weighted by (0.5)ˆ(number of edges in the path).33 The Danish Civil Registry does not contain information on zygosity for same-sex twins, but following analysis of the SNP-kinship of children of same-sex twins (Figure S1), we assigned same-sex twins a relatedness coefficient of 0.75. Similarly, guided by analysis of siblings with missing paternal records (Figure S2), we assigned maternal siblings with missing paternal records a relatedness coefficient of 0.25. Among individuals with putative European ancestry and both genotyped on the same array, SNP-based relatives were identified using KING29 (--degree 4 option). Additionally, 24,773 pairs of relatives from the population genealogy included two probands genotyped on the same genotype array. We used Pearson’s correlation of the graph-inferred kinship and SNP-inferred kinship from KING29 as an estimate of concordance and quality of inferred relationships.
PA-FGRS
PA-FGRS estimates a liability for disease carried by a proband from the observed disease status in a pedigree and under the assumption of a liability threshold model for the disease.34 The method first estimates an initial liability for each relative and then uses the PA selection formula to sequentially update the expected liability in the proband conditional on each relative.34,35
We begin by assuming a disease, Di = 1, arises when an individual, i, carries a latent liability, Li, that surpasses some threshold, t. Liability, Li, can arise from additive effects (βj) of genetic factors (Xij) or environmental deviations (ei), and genetic contributions follow classic polygenic theory.34,35 We can write a generative model:
where the threshold, t, is the standard normal quantile that corresponds to a cumulative probability of Kpop, the lifetime prevalence of the disorder. Further, we assume that the vector consisting of the genetic liability of the proband and the total liability of n genetic relatives with covariance matrix:
Under this model, the expected value of Li, conditional on the true value for Di is according to truncated normal distribution theory36:
A critical assumption of this model is that each individual is fully observed (i.e., no age censoring), meaning there is an equivalence between their diagnostic and disorder status. This assumption rarely holds in practice, but the variable follow-up of relatives by the Danish register system makes it extremely tenuous. We instead propose a model (Note S1) where the disease status Yi in those who surpass the threshold is observed with a probability corresponding to the ratio between (possibly stratified) age-specific prevalence (Ki) and the life-time prevalence (Kpop).
To get the expected liabilities under this model, we use a mixture of an upper and a lower truncated Gaussian both with mean and variance corresponding to their conditional expectations and with the mixture proportion (), corresponding to the conditional probability of being a case. Let denote a truncated Gaussian with mean , variance , lower truncation at , and upper truncation at . Then the distribution of conditional on observations 1 to is
with for >0 and , while for >0 and , and if and otherwise. This we approximate as (Note S1)
Following adaptations18,37 of the PA selection formula,19 the conditional mean and variance of expected liability for a proband is estimated given their pedigree, initial liabilities, and population parameters.37 Let be the effect of conditioning on and has on then the vector of conditional mean liabilities, , is
| (Equation 1) |
Similarly, if conditioning changes to , the conditional covariance matrix of liabilities, , is estimated as
Previous work has found PA selection to be an efficient estimator of genetic liabilities of binary traits given family history.7,18,37 In practice, we start by setting the liability vector to a zero vector, we then iteratively condition on the observed disease status of each relative using the expected mean and a variance of a mixture of truncated Gaussians in combination with the PA selection formula to obtain the expected genetic liability of the index individual. Alternatively, the censoring can be modeled using an age-dependent threshold (ADT) model17 in which the liability threshold is assumed to be defined by the cumulative incidence proportion (Ki = P(D|agei)), such that . Under this model the expected liability of case is the while for a control it is (Note S2).
Software
PA-FGRS is freely available as R code: https://github.com/BioPsyk/PAFGRS.
Simulations
We first simulated 50,000 four-generational proband pedigrees with varying numbers (0–18) and kinds of relatives with variable age censoring. The heritability was set to 0.50 and lifetime prevalence was set to 0.2. This was done assuming random age of onset, under the ADT model, and under a model with a correlation between liability and age of onset of 0.50. We assessed the correlation between the estimated liabilities obtained from eight different liability estimation methods: PA-FGRS, PA-FGRSadt, FGRS,21 PA,18 LT-PA,7 LT-FH++,17 gibbsF90,38 and a Gibbs sampling-based approach.20 Next, we repeated the simulations under each generative model generating 2,500 pedigrees, 100 times for each of seven different lifetime prevalences.
To assess the impact of shared environment (c2), we considered a generative model with an additional factor that determined the similarity between parents, offspring, and siblings (Figure S3). We estimated the correlation of FGRS and the true genetic and environmental liability. For FGRS,21 we considered two versions: (1) a c2-adjusted FGRS and (2) an unadjusted FGRS. For PA-FGRS, we considered two versions: (1) using all available relatives (PA-FGRS) or (2) estimating liability without parents, siblings, or children (PA-FGRSnoFDR), proposing the latter as a correction for shared environment. For each of the four (PA-)FGRS, we computed the correlation between true and estimated liability in simulations.
Psychiatric phenotypes
Our primary outcome, MDD, was defined as having a registration with a depressive episode (F32) or recurrent depression (F33) before January 1, 2017, according to the Danish Psychiatric Central Research Register (PCRR).23 Diagnostic codes used for the construction of PA-FGRS scores are found in Table S1. For relatives diagnosed between 1968 and 1994, records are limited to in-patient contacts and ICD-8 codes.
Population parameters used for computing PA-FGRS in iPSYCH
The sex-specific lifetime prevalence of each disorder (Table S1) was obtained from published estimates based on Danish registers.39 Narrow-sense heritability was set to 0.8 for attention deficit/hyperactivity disorder (ADHD), autism spectrum disorder (ASD), bipolar disorder (BPD), and schizophrenia (SCZ) and 0.4 for MDD (Table S1). We chose to estimate Ki using sex and birth-year-specific cumulative incidence computed using all members of the iPSYCH2015 random sample genealogies (N = 979,582; Figure S4).
PGSs
PGSs for MDD, SCZ, and BPD were computed based on published, external summary statistics (Table S2) that had no overlap with iPSYCH. PGS for ASD and ADHD were based on GWAS performed in the complementary half of the iPSYCH2015 case-cohort study (i.e., iPSYCH2012 for iPSYCH2015i and vice versa; Figure S5). We used SBayesR40 to estimate allelic effects for SNPs in the intersection of all GWASs, iPSYCH, and the reference linkage disequilibrium (LD) panel. Palindromic SNPs (A/T, C/G), those not mapping uniquely to hg19 positions, and without a unique rsID in dbSNP v.151 were excluded via our summary statistics quality control (QC) pipeline (https://github.com/BioPsyk/cleansumstats).
Classification analysis
In the European subset of the iPSYCH2015 MDD case-cohort study (Figure S5), we used logistic regression with MDD as an outcome and each or varying combinations of PA-FGRS and PGS as predictors (Tables S1 and S2). For this analysis, PA-FGRS were computed excluding proband status. The classification accuracy was reported in an out-of-sample test. We trained the logistic classifier in iPSYCH2012 (or iPSYCH2015i) and report the area under the receiver operating characteristic curve (AUC) achieved in the independent, complementary iPSYCH2015i (or iPSYCH2012).
Empirical performance of each liability score was assessed by fitting a weighted probit regression with the liability score, age and sex as explanatory variables, and reporting the incremental gained relative to a model with only age and sex. 95% confidence intervals for the and differences in between estimators were estimated by 500 bootstraps. The same procedure was used to compare the performance of PA-FGRS under different h2 parameters and when comparing PA-FGRS, PA-FGRSadt, and PA across other psychiatric phenotypes. The same procedure was followed when comparing the performance of PA-FGRS under nine different settings of the h2 parameter.
Comparing polygenic profiles
Putative subgroup-defining features were obtained from the PCR23 and the Danish Civil Registry.25 We divided individuals diagnosed with MDD on the basis of a diagnosis of BPD (ICD10: F30–F31), comorbid anxiety (F40.0–40.2, F41.0–41.1, or F42), sex (as registered at birth), recurrence (ICD10: F32 or F33), severity (ICD10: F32/33.0, F32/33.1, F32/33.2, or F32/33.3), age at first recorded diagnosis, and mode of treatment (in-patient, casualty ward, or out-patient). We computed a composite estimate of genetic liability for each of the five mental disorders as a weighted sum of the PGS and PA-FGRS with weights corresponding to the betas from a logistic regression of their natural outcome in a calibration sample (Figure S5). For each subgroup defining feature, multiple multinomial logistic regression was fitted to sequentially estimate the effects of each the composite genetic risk estimates with age and sex and 10 genetic principal components (PCs) as covariates using the R package nnet.41 We report a normalized partial effect size for each PGS and PA-FGRS, βMLR/βLR. The effect is the ratio of the effect of the PA-FGRS on MDD outcomes (βMLR) over its effect on the natural outcomes (βLR; e.g., ASD for PA-FGRS for ASD). Each βLR was estimated separately in outcome-specific case cohort samples (e.g., ASD case cohort; Figure S5). This effect size can then be given context, for example, the effect of BPD genetic liability for being diagnosed BPD given a prior diagnosis of MDD is the same (βMLR/βLR ∼1) as the effect of BPD genetic liability on being diagnosed with BPD in the general population. These analyses were conducted separately for iPSYCH2012 and iPSYCH2015i samples and meta-analyzed. Subgroup-level effect estimates were meta-analyzed using inverse variance weighting, while heterogeneity test p values were combined using Fisher’s method. In total, we report 35 p values declaring those less than 0.05/35 = 0.0014 strictly significant.
GWASs
GWASs were performed within two proband groups, the iPSYCH2012 MDD case-cohort and the iPSYCH2015i MDD case-cohort, on imputed allelic dosage data using PLINK2.42 For binary MDD diagnosis, logistic regression was applied, and for continuous valued PA-FGRS, we used linear regression, both including sex and age and 10 principal components of genetic ancestry as covariates. Inverse-variance weighted meta-analysis of the two constituent samples was performed using METAL.43 SNPs with association p values less than 5 × 10−8 were declared significant, while variants with a false discovery rate of 0.05 were considered suggestive. Independent loci were defined as >1 Mb apart. Observed-scale SNP heritability () and genetic correlations to nine published GWASs (Table S3) were estimated using LD score regression.44,45 Difference in was computed as with std.err. . Genome-wide significant index SNPs were defined from a large external GWAS of MDD, modified to exclude 23andMe and iPSYCH, by clumping overlapping SNP lists. A paired t test comparing squared test statistics was used to assess significance of improvement. Polygenic scores for within iPSYCH classification were computed using SNPs with MAF >0.01 and INFO >0.8, clumped and thresholded with PLINK 1.90b6.27,42 using parameters --clump-kb 625 --clump-p1 0.1 --clump-p2 0.1 --clump-r2 0.8. Improvements in predictions were assessed using the difference in AUC test in the pROC package.
Results
The iPSYCH2015 MDD case-cohort genealogies are complex and contain a wealth of information
The iPSYCH2015 case-cohort study ascertained 141,265 probands from the population born in Denmark between May 1, 1981, and December 31, 2008 (N = 1,657,449), by cross linking the Danish Civil Registration System25 (CPR) and The Danish Neonatal Screening Biobank.22 The CPR includes all individuals who have legally resided in Denmark since its establishment in 1968 and each proband is associated with parental identifiers, where known. We use mother-father-proband connections46 to reconstruct extended genealogies (see methods) of 141,265 iPSYCH2015 probands, identifying 2,066,657 unique relatives spanning up to nine generations (birth years range from 1870s to 2016; Figures 1A and S6). Among the 120,269 and 73,052 European ancestry samples genotyped in iPSYCH2012 and iPSYCH2015i, we used KING29 to identify 41,476 first to fourth degree relative pairs. Of these, most first, second, and third degree relatives (99%, 67%, and 68%) had concordant relatedness in the genealogy, with pairs missing as relations were more distant (0.4%, 17%, and 25%). In particular, 82% of putative fourth degree relatives identified using genotypes were not recorded in the genealogy. The correlation of genotype-based and genealogy-based kinship for relative pairs identified using genotypes was 0.94 (Figures 1B and S7). This correlation for the 24,773 of the 20,071,410 relative pairs identified in the genealogy that included two iPSYCH probands genotyped on the same array was 0.97 (Figure S8). Siblings sharing one recorded parent (with the other missing) tended to be half-siblings (Figure S2), and approximately 45% of same-sex twins were monozygotic (Figure S1). The genealogies of 141,265 probands included 99.5% of parents, 82.0% of grandparents, and 7% of great-grandparents, with the number of relatives identified per proband varying considerably (Figure 1C). Clinical diagnoses are aggregated for all relatives during periods of legal residence within Denmark from 1968. In-patient psychiatric contacts were recorded from 1969 to 1994 using ICD-8 and from 1994 onwards using ICD-10. Since 1995, both in- and out-patient contacts are recorded using ICD-10 (Figures 1D and 1E). There is a wealth of high-quality psychiatric familial phenotypes for each genotyped proband (Figure 1), but relatives are neither completely nor consistently observed.
Figure 1.
The iPSYCH2015 MDD case-cohort genealogies are complex and contain a wealth of information
(A) Each of the 141,265 probands (white box) in iPSYCH2015 can be connected to a number of different types of relatives, here reported as a total across all probands and average per proband.
(B) SNP-based relatedness generally confirms relatedness inferred from genealogies and suggests that most first, second, and third-degree relatives are captured.
(C) The number of relatives linked to each proband varies considerably.
(D) The proportion of total person years of follow up is distributed differently across probands and their relatives, showing variability by relative type (y axis), year of observation (x axis), and register era (color).
(E) The proportion of total person years of follow up for MDD cases similarly varies.
P, parents; S, siblings; Ch, children; 1GP, grandparents; Pib(lings), aunts and uncles; Nib(lings), nieces and nephews; iCjR i-th cousin, j times removed; H-, half; Other, relative types not in the figure; MDD, major depressive disorder.
PA-FGRS is a flexible, powerful framework for estimating individual liability scores
PA-FGRS estimates the expected genetic liability carried by a proband from an arbitrary set of relatives, assuming the outcome results from a thresholded latent Gaussian liability (Figure 2). As input PA-FGRS takes a kinship matrix, diagnostic status and age (at censoring, diagnosis, or end of follow up) for each relative, disorder heritability, and individual morbid risks, which may be estimated from lifetime sex by birth-year-specific cumulative incidence. In the first step, each pedigree member is assigned an initial liability of 0 with variance 1. Then, we consecutively condition on observations of other relatives, 1, … , n, updating all expected liabilities based on each relative. We first update the expected liability of a selected relative, r(i), estimating their expected liability given their prior liability distribution, disease status, age, and the lifetime incidence estimate. Then we update the liabilities of all remaining relatives, ri+1, … , rn, according to the PA-selection formula19 and a modified kinship matrix (Figure S9). An optional final step updates the proband liability on their own diagnostic status and age. PA-FGRS produces a continuous score that summarizes the genetic liability from the proband’s pedigree.
Figure 2.
PA-FGRS estimates a continuous liability score for a proband from diagnoses in relatives and specific population parameters
PA-FGRS estimates latent disease liability in a proband from patterns of diagnoses in arbitrarily structured pedigrees where relative phenotypes may be age censored. Input data for a proband can be a simple, fully observed pedigree (i.e., no censoring; yellow proband), an extended pedigree with fully observed phenotypes (green proband), or an arbitrarily structured pedigree where many relative phenotypes may be age censored (blue proband). PA-FGRS combines (1) an assumed form for covariance in liabilities among relatives with (2) estimates of individual morbid risk from, e.g., covariate stratified cumulative incidence curves, in (3) an extension of the PA selection formulas that models age-censored controls as a mixture of cases and controls. Estimated genetic liabilities are assigned to each proband and determined by the unique configuration of their pedigree, population parameters, and the morbid risk of each relative. Proband liabilities (colored) are shown against the population distribution of genetic liability (gray) with E(G|case) and E(G|control), indicating the expected population (i.e., unconditioned) mean liability of a case and control, respectively.
cum. inc., cumulative incidence; h2, heritability; rxy, relatedness between relative x and y, which can take values i for index individual, s for sibling, f for father, or m for mother; T, threshold.
Other methods have approached this problem, but with limitations critical to our intended use case. Binary outcomes were incorporated in the BLUPF90 family of softwares38 (i.e., gibbsF90) and in prior implementations18,37 of the PA selection formula.19 These models, however, assumed no age censoring, which we address by either modeling individuals as a mixture of truncated Gaussians, with mixture proportions reflecting individual morbid risks (PA-FGRS; methods) or by assuming an ADT model as introduced by Pedersen et al.17 (PA-FGRSadt; see Note S2). FGRS21 followed this concept, but PA-FGRS takes a more formal approach that incorporates kinship relationships among relatives as well as between relatives and proband, producing better calibrated scores and estimates of conditional liability variance (see methods). LT-FH++17 used an ADT model (see methods), which, similar to a Cox model, assigns higher liabilities to early-onset cases (Figure S10), but the LT-FH++ paper only considered first-degree relatives. For more details on all comparator methods, see Note S4.
Simulations demonstrate the advantages of PA-FGRS over other methods
We simulated 1,900,000 four-generational pedigrees with an average of nine relatives per proband (range 0–18), generating phenotypes from a liability threshold model (methods). We found that eight considered methods, PA-FGRS, PA-FGRSadt, FGRS,21 PA18, LT-PA7, LT-FH++,3 gibbsF90,38 and a fully specified Gibbs sampling-based approach,20 gave estimates that were highly correlated (Figures 3A and 3B; r > 0.8), suggesting that they target similar latent constructs. Methods incorporating more similar information were more highly concordant, e.g., extended relatives (Figures 3A and 3B; r > 0.89) or extended relatives and censoring (Figure 3A; r > 0.95). The Gibbs sampling approach20 produced nearly identical estimates to PA-FGRS (r = 0.999; Figures 3A and 3B), suggesting PA-FGRS behaves near optimally. When simulating under a model where age of onset is independent of liability, PA-FGRS consistently produced the highest correlations with true liability across a range of simulated prevalences (Figure 3C), and when simulating under the ADT model, the PA-FGRSadt performed best (Figures 3D and S11–S14). The largest relative gains were when prevalence and censoring were high. PA-FGRS was also the only method tested that was well calibrated in the presence of censored data (Figures 3E and S12). In simulations without censoring, PA and gibbsF90 were highly correlated (Figure 3B), and both attained similar performance to PA-FGRS (Figure S11). The three resampling-based methods were much more computationally demanding than the analytical methods (Figure S15); in particular, gibbs20 was computationally demanding for larger pedigrees, and none of the three gave a more accurate liability estimate than PA-FGRS (Figures 3C and 3D).
Figure 3.
Simulations demonstrate the advantages of PA-FGRS over other methods
(A and B) PA-FGRS liabilities are correlated with those from other methods in simulations when (A) all relatives are fully observed or (B) when younger relatives are age-censored.
(C and D) (C) PA-FGRS shows the largest correlation with true genetic liability in simulations under a mixture model (i.e., a random age of onset [AOO]) across a range of trait prevalences, but (D) when simulations are conducted under an age-dependent-threshold (ADT) model, PA-FGRSadt performs the best.
(E) Linear regression of estimated liability on true liability shows PA-FGRS estimates are, uniquely, calibrated in the presence of age-censored records.
(F) The presence of shared environmental effects in generative models of familial resemblance creates correlation between PA-FGRS and this environmental component (C + E) of liability. This can be reduced at the cost of power (i.e., reduced correlation with genetic components, A) by excluding confounded (i.e., first degree) relatives.
(C–E) show mean and 95% confidence interval across simulations, while (F) shows median, range, and interquartile range across simulations.
Models: LTPA, liability threshold Pearson-Aitken7; LT-FH++, liability threshold family history plus plus17; gibbsF9038; FGRS, family genetic risk score21; PA-FGRS, Pearson-Aitken family genetic risk score; PA-FGRSADT, Pearson-Aitken family genetic risk score with an age-dependent threshold; gibbs20; PA, Pearson-Aitken.18 R2l, Liability scale variance explained; cor, correlation; deg., degree; c2, shared familial environment.
One limitation of methods that consider only first-degree relatives7,17 is that estimated genetic liabilities may be unduly influenced by effects of familial environment. This may be desirable if the goal is to optimize prediction9,18 only but less so if the goal is to make etiological inferences.21 We repeated our simulations including a common environment component of variance shared among first-degree relatives (Figures 3F and S3)—a typical quantitative genetics model.47 Here, PA-FGRS (and all other approaches) produce liability estimates that are correlated with environmental liability (Figure 3F). With extended genealogies, we can omit close relatives as a sensitivity test for undue influence. Liabilities estimated after excluding first-degree relatives remained good estimators of genetic liability and were uncorrelated with environmental liability (Figure 3F). The flexibility of PA-FGRS can add important context to estimated liabilities that may be especially important when interpreting, e.g., profiles of liability scores21,48 or if shared environment is a concern.
PA-FGRS requires external estimates of specific population parameters, namely lifetime prevalence and heritability. Providing inaccurate estimates leads to miscalibrated liabilities but has modest impact on the correlation between estimated and true liability in simulations (Figure S16).
PA-FGRS and PA-FGRSadt explain more variance in liability to MDD than other methods
We compared, in our two cohorts, the variance in liability to MDD explained by eight different methods that use diagnoses in relatives to estimate liability scores of probands (PA-FGRS, PA-FGRSadt, FGRS, PA, LT-PA, gibbsF90, LT-FH++, and PA-FGRS using only first-degree relatives). As in our simulation results, these estimates were highly correlated (Figure S17), and the best performing methods were PA-FGRS and PA-FGRSadt (iPSYCH2012: , ; iPSYCH2015i: , ; Figure S18). PA-FGRSadt explained slightly more variance in liability to MDD than PA-FGRS (; ) but this was not consistent across four other psychiatric disorders (Figure S19). Across the 10 total comparisons, PA-FGRS was best for five, PA-FGRSadt for three, and PA for two. Also consistent with simulation results (Figure S16), we found that varying the h2 parameter has negligible impact on empirical variance explained in MDD (Figure S20) but has a substantial impact on calibration (Figure S21).
PA-FGRS contribute to classification models of MDD over and above PGS
Both family history and PGS explain liability for MDD. Using a 2-fold split of iPSYCH (Figure S5), we trained a model to classify MDD from combinations of PA-FGRS and PGS in iPSYCH2012 (or iPSYCH2015i) and evaluated classification accuracy in the complement, iPSYCH2015i (or iPSYCH2012; methods; Figures 4A and 4B; Table S6). Both genetic instruments, fit alone, significantly classify MDD cases from controls in both cohorts: iPSYCH2012 (AUCPGS = 0.588 [0.583–0.594], p = ; AUCPA-FGRS = 0.598 (0.592–0.603), p = ) and iPSYCH2015i (AUCPGS = 0.573 [0.565–0.580], p = ; AUCPA-FGRS = 0.576 [0.569–0.583], p = ). When combined in a multivariate model, each genetic instrument contributes independent information to classification with combined effects of PA-FGRS and PGS larger than individual effects (iPSYCH2012: AUCPGS+FGRS = 0.630 [0.625–0.638] and iPSYCH2015i: AUCPGS+FGRS = 0.608 [0.601–0.615]).
Figure 4.
PA-FGRS contribute to classification models of MDD over and above PGS
Combining PA-FGRSMDD and PGSMDD improves classification of MDD. (A) The iPSYCH2012 (Ncases = 20,632, Nctrl = 23,870) and (B) iPSYCH2015i (Ncases = 10,317, Nctrl = 15,785) case-cohorts. Using PGSs for five disorders improves prediction of MDD over only PGSMDD in (C) iPSYCH2012 and (D) iPSYCH2015i. Using PA-FGRS for five disorders improves prediction of MDD over only PA-FGRSMDD in (E) iPSYCH2012 and (F) iPSYCH2015i. Combining PA-FGRS for five disorders with PGS for five disorders improves prediction of MDD over only PA-FGRSMDD and PGSMDD in (G) iPSYCH2012 and (H) iPSYCH2015i. Intervals are 95% confidence intervals.
AUC, area under the receiver operating characteristic curve; MDD, major depressive disorder; SCZ, schizophrenia; BP, bipolar disorder; ASD, autism spectrum disorder; ADHD, attention-deficit/hyperactivity disorder; PGS, polygenic score.
Including PGS for four other psychiatric disorders, SCZ, BPD, ASD, and ADHD, improved the classification of MDD relative to models with MDD PGS only (iPSYCH2012: AUC5-PGS = 0.599 [0.594–0.604]; iPSYCH2015i: AUC5-PGS = 0.589 [0.582–0.596]; Figures 4C and 4D; Table S6). Similarly, incorporating PA-FGRS for the four other psychiatric disorders improved the classification of MDD relative to models with MDD PA-FGRS only (iPSYCH2012: AUC5-PA-FGRS = 0.620 [0.614–0.625]; iPSYCH2015i: AUC5-PA-FGRS = 0.596 [0.589–0.603]; Figures 4E and 4F). Combining all 10 predictors resulted in the best out of sample classification (iPSYCH2012: AUC5-PGS+5-PA-FGRS = 0.648 [0.643–0.653]; iPSYCH2015i: AUC5-PGS+5-PA-FGRS = 0.626 [0.619–0.632]; Figures 4G and 4H). These results demonstrate that combining genetic instruments that leverage different sources of genetic information improves classification of MDD.
Composite genetic profiles identify robust genetic liability differences among subgroups in MDD
Individuals diagnosed with MDD demonstrate extensive clinical heterogeneity that may reflect etiologic heterogeneity. We used multinomial logistic regression to associate differences in clinical presentations of individuals diagnosed with MDD to genetic liability profiles (methods; Figure 5). We leverage the complementarity of PGS and PA-FGRS, above, by defining composite genetic liability scores (e.g., BPD score = βPGS∗PGSBPD+ βPA-FGRS∗PA-FGRSBPD, where βPGS and βPA-FGRS are the estimated effect of the PGS and PA-FGRS on their natural outcome in a case-control logistic regression). Each composite liability score was significantly larger in individuals diagnosed with MDD than in the control group across all subgroups (Figure 5; p < 0.05). The liability scores for BPD, SCZ, ASD, and ADHD tended to have smaller effects on MDD subgroups than on their natural outcome (i.e., βMLR/βLR < 1; the colored bars below dashed line in Figure 5; methods), except for BPD liability on conversion to a BPD diagnosis (βMLR/βLR = 0.97 [0.90–1.04]; Figure 5A).
Figure 5.
Composite genetic profiles identify robust liability differences between subgroups in MDD
We predicted MDD subgroup membership from composite genetic liability scores that integrate PGSs and PA-FGRS, together in multinomial logistic regression with controls as a reference group.
(A) Higher MDD, SCZ, and BPD genetic liability were associated with conversion from MDD to BPD.
(B) Higher MDD and SCZ genetic liability were associated with a comorbid anxiety diagnosis.
(C) Higher MDD genetic liability was associated with recurrent MDD.
(D) Lower MDD and BPD genetic liability were associated with out-patient treatment.
(E–I) No differences were observed (E) between females and males diagnosed with MDD, (F) first MDD diagnosis before/after age 23, or (G) mild, moderate, severe, or psychotic depression. PGS-only and PA-FGRS-only effects are highly consistent both, when (H) using all relatives and (I) when excluding first degree relatives.
(A–G) Effect sizes are presented on a calibrated scale, where the regression coefficient describing the effect of a genetic liability score on the subgroup is divided by the coefficient of the same score when predicting its natural outcome (i.e., BPD score predicting BPD) in a simple logistic regression. This places the magnitude of subgroup effects on a scale that is relative to the effect of the score in its distinguishing natural outcome from controls, which can account for differences in the sensitivity of the individual scores.
(A–F) Models are meta-analyzed across iPSYCH2012 (Ncases ≤ 20,632, Nctrl ≤ 23,870) and iPSYCH2015i (Ncases ≤ 10,317, Nctrl ≤ 15,785), (G) is only available in iPSYCH2012. Significance is depicted in bold, at p < 0.05/35.
Detailed sample sizes for the individual analyses are provided in Table S3. Error bars indicate 95% confidence intervals. MDD, major depressive disorder; SCZ, schizophrenia; BPD, bipolar disorder; ASD, autism spectrum disorder; ADHD, attention-deficit/hyperactivity disorder; p, unadjusted p value from meta-analyzed multinomial regressions. p values greater than 0.05 after Bonferroni adjustment displayed in gray text.
Among 30,949 individuals diagnosed with MDD, those also diagnosed with BPD (N = 1,477) had significantly (p < 1.4 10−3, adjusting for 35 tests) higher genetic liability for MDD (), BPD (), and SCZ (; Figure 5A). Among the 29,472 individuals diagnosed with MDD (excluding BPD), the 7,205 also diagnosed with an anxiety disorder had higher genetic liability to MDD () and SCZ (; Figure 5B). Individuals with recurrent depression (N = 9,903) had higher liability to MDD (; Figure 5C) than those with single-episode depression (N = 19,569). Individuals treated for MDD in-patient (NHospitalized = 5,815) had higher liability to MDD () and BPD () than those treated out-patient (NOut-patient = 12,432, Figure 5D). We did not observe any significant differences () in the genetic liability score profiles of males vs. females (NFemale = 19,906, NMale = 9,566; Figure 5E) based on age at first diagnosis (Figure 5F) or based on diagnostic codes for severity (mild NMild = 3,004, NModerate = 8,742, NSevere = 2,391, NPsychotic = 856; Figure 5G).
Each analysis was repeated using PGS- or PA-FGRS-only profiles (Figures S22 and S23). PGS-only and PA-FGRS-only results were highly similar (r = 0.95 [0.93–0.97]; Figure 5H), and PA-FGRS or PGS scores alone were less powerful than composite scores (PA-FGRS-only mean log10(p) = 2.90; PGS-only mean log10(p) = 2.47; composite mean log10(p) = 4.24). PGS and PA-FGRS appear to capture similar constructs, and by combining the two, we can increase power to detect differences in genetic liability between groups. Finally, to test for large effects of the familial environment, we constructed PA-FGRS excluding nuclear family members (i.e., parents, siblings, half-siblings, and children). The overall trends were highly consistent with the full analysis (Figure 5I), albeit with reduced significance (Figure S24). Repeating analyses using PA-FGRSadt had no impact on the profile results (correlation of effects estimated with PA-FGRS and PA-FGRSadt = 0.9998; Figures S25–S27). Genetic liability score profiles are associated with differences in the clinical presentation of MDD, involving contributions from non-MDD liability scores, with parallel trends in PGS or PA-FGRS alone, and do not seem strongly influenced by familial environment.
GWAS on PA-FGRS liability values can add power to single-cohort MDD GWAS
Studying genetic liability of threshold traits is expected to boost power in GWAS (Figure S28). We performed meta-analytic GWAS across the iPSYCH2012 (Ncases = 17,518, Nctrl = 23,341) and 2015i (Ncases = 8,323, Nctrl = 15,204; Figure S5) cohorts and compare logistic regression GWASs of binary diagnoses to linear regression GWASs of PA-FGRS in the same individuals (methods; Figure 6). GWAS of PA-FGRS identified three independent loci (Figure 6A; index SNPs: rs16827974, = 0.014, p = ; rs1040574, = -0.011, p = ; rs112585366, = 0.026, p = ; Table S7). These three variants and 24 of the 29 suggestive loci (false discovery rate < 0.05) showed consistent sign in an independent MDD GWAS from Howard el al.49 (excluding iPSYCH; Tables S2 and S3). GWASs of binary diagnoses identified two of these loci (Figure 6B; index SNPs: rs6780942, 8.5 Kb from rs16827974 Beta = 0.085, p = ; rs3777421 36.3 Kb from rs1040574, = -0.073, p = ; Table S8). These two and 24 of the 35 suggestive loci (false discovery rate < 0.05) showed consistent sign in Howard el al.49 (excluding iPSYCH; Tables S2 and S3). The 28 independent, genome-wide significant index SNPs reported in Howard el al.49 (excluding iPSYCH) have slightly, but significantly, larger test statistics in the GWAS on PA-FGRS (PA-FGRS mean χ2 = 4.55; case-control mean χ2 = 3.80; paired t test p = 0.018; Figure 6C).
Figure 6.
PA-FGRS liabilities improve power for GWAS of MDD
(A and B) Genome-wide association studies (GWASs) of 25,841 cases and 38,545 controls using (A) PA-FGRS liability finds three independent genome-wide significant loci, while (B) logistic regression (case/control) finds two.
(C) PA-FGRS GWAS test statistics are more extreme (i.e., more significant) than case-control GWAS at index SNPs of 28 loci reported in a previous GWAS of MDD.
(D and E) PGSs trained using PA-FGRS GWASs achieve higher classification accuracy than those trained on case-control GWAS in (D) iPSYCH2012 and (E) iPSYCH2015i, two independent evaluation cohorts.
(F) SNP-heritability estimated by LD-score regression analyses is slightly, but not significantly, larger for PA-FGRS GWAS, while estimated intercepts are equivalent.
(G) PA-FGRS and case-control GWAS show similar genetic correlations with external GWAS of MDD and related traits. Error bars indicate 95% confidence interval.
h2o, observed scale SNP heritability; int, LD score regression intercept; rG, SNP-based genetic correlation. External GWAS citations: UKB+PGC,49 UKB (GPpsy),11 FinnGen (ICD),50 UKB (Imputed),51 UKB (Lifetime MDD),11 SCZ,52 BPD,53 ADHD,54 educational attainment.55
Next, we trained PGSs in each subcohort (iPSYCH2012 or iPSYCH2015i) using GWASs performed in the other (iPSYCH2015i or iPSYCH2012). In both cohorts, PGS trained with PA-FGRS GWASs were modestly, but significantly, better at classifying MDD vs. controls (2012: AUCcase-control PGS = 0.537 [0.531–0.542], AUCPA-FGRS PGS = 0.544 [0.538–0.550], test of differences: p = ; 2015i: AUCcase-control PGS = 0.556 [0.548–0.563], AUCPA-FGRS,PGS = 0.548 [0.540–0.556], test of differences: p = ; Figure 6D). Observed scale SNP-h2 was larger in the PA-FGRS GWAS, but this difference was not significant (h2obs,PA-FGRS-h2obs,PA-case/ctrl = 0.015 [−0.013–0.043]; Figure 6E), and genetic correlations with external studies of MDD and other psychiatric disorders were similar (Figure 6F). Taken together, the confluence of trends suggests GWASs on PA-FGRS do provide a small increase in power per genotyped individual, which is consistent with previous work17 and our simulations (Figure S28). Replacing PA-FGRS with PA-FGRSadt liabilities reduced the gains across most GWAS tests (Figures S29–S33).
Discussion
We have developed a method for estimating genetic liability, PA-FGRS, that is more generalizable across datasets and research questions and outperforms existing methods in complex genealogical data. We show that PA-FGRS complements genotype-based inferences into MDD in three ways: (1) PA-FGRS liabilities improve classification models when fit together with state-of-the-field PGS, (2) combing PA-FGRS and PGS better describes the etiology underlying clinical heterogeneity associated with comorbidity, recurrence, and severity in MDD, and (3) GWASs performed on PA-FGRS scores have slightly more power than GWASs on case-control status. Our method is flexible, easy to use, and could be applied to ask similar questions about other complex diseases. While large-scale family data are not ubiquitous, Nordic and other national registers have extensive genealogical records. These data sources are much more representative of underlying populations than typical volunteer cohorts but are not typically genotyped at large scale. PA-FGRS can enable important genetic epidemiological investigations of these resources that overcome some limits of PGS studies in ascertained biobanks. Our data-first approach—describing the unique characteristics of a powerful resource and tailoring a method to accommodate its peculiarities—allows us to leverage, rather than discard or censor, inconvenient data. This is a complementary approach to lowest common denominator cross-cohort studies and may be especially relevant as larger, deeper, and necessarily more peculiar data emerge.
PA-FGRS is model based, incorporates distant relatives, and handles age-censored phenotypes of relatives. Incorporating distant relatives allows us to manipulate our liability calculation to exclude close relatives as a sensitivity test for undue impact of familial environment. This, and using morbid risk to define a mixture model, makes PA-FGRS most similar in concept to FGRS,21 but the formal model underlying PA-FGRS (PA-selection theory) gains efficiency and improves calibration of estimated liabilities. The modeling of censoring in PA-FGRS is different from the ADT model, e.g., LT-FH++17 or age-dependant liability threshold model (ADuLT),56 which we implemented as PA-FGRSadt. PA-FGRS assigns the same liability to rare cases as average cases (i.e., uses one population threshold) using covariate stratified cumulative incidence to define mixture proportions for controls. LT-FH++ and PA-FGRSadt assign increasingly larger genetic liability to increasingly rarer cases observed in empirical cumulative incidence curves (i.e., uses per individual thresholds). For example, in our MDD data, males diagnosed at a young age many decades back (a rare event in empirical cumulative incidence) would make much larger contributions to liability estimates in the LT-FH++/ADuLT model than in the PA-FGRS model. While simulations confirmed that PA-FGRSadt is better when simulating under the ADT model, among tested methods, only PA-FGRS remained calibrated under model misspecification, suggesting that PA-FGRS is a robust choice. We did see a small improvement in prediction of MDD by the ADT model but not in GWASs nor in the subgroup liability profiling. The underlying model of PA-FGRS is amenable to analysis and extension, representing an advantage for future work that could extend the model to include non-additive covariance, multiple traits, or more complicated etiological models of heterogeneity and comorbidity.
Combining family-based liabilities and genotype-based PGS from multiple disorders significantly improved classification accuracy. In cancer57 or coronary artery disease,58 risk models incorporate multiple measures—health states, health traits, family history, and PGS. In psychiatry, this has been pursued in more limited contexts (e.g., Agerbo et al.8). Previous studies have found that combining parental history information and PGS improves the prediction accuracy;8 however, these studies only considered risk associated with parental MDD and did not leverage diagnoses in other relatives. Integrative models that combine multiple sources of genetic information, such as family history, estimated liability, and PGS, along with exposure data have the potential to advance the clinical utility of risk assessment in psychiatry but will require large population data and integrative models.
Our composite profile analysis replicates, extends, and adds context to previous work considering genetic liability profile differences between subgroups in MDD. First, we replicate previous results in similar data showing statistically significant associations of genetic liability to MDD with recurrence and MDD and BPD with treatment location.59,60 Our models calibrate effect sizes differently to accommodate noisy instruments, such as PGS, leading to different framing of effect sizes, which we see as moderate to substantial rather than minimal. We also replicate associations between BPD and SCZ liability and conversion from MDD to BPD61 but interpret BPD genetic liability to be significantly more important than SCZ genetic liability for conversion. Second, studies in Swedish registers have shown differences in FGRSs associated with progression to BPD,21 comorbid anxiety,48 recurrence,21 treatment setting,62 and age at onset21 of MDD. We confirm higher genetic liability to MDD among cases with recurrent depression using composite, PGSs-only, and PA-FGRS-only liability scores, which suggests that genetic liability to MDD itself plays an important role in recurrency—although we cannot rule out that a larger fraction of people with single-episode depression could be misdiagnosed. We also replicated a higher liability to MDD and SCZ among MDD cases with comorbid anxiety using our composite and PGS-only scores and saw the same trend of higher liability to MDD and BPD among hospitalized cases using our composite and PGS-only scores. These findings appear to indicate that liabilities to psychiatric diseases other than MDD also contribute to clinical heterogeneity within MDD—although, again, we cannot rule out that individuals who truly have these other disorders (e.g., SCZ) have been misclassified into these categories (e.g., MDD + anxiety). Findings of higher BPD liability in male MDD cases62 was nominally significant in our study. We did not observe associations to age at onset; however, iPSYCH has a reduced range for onset (15 to 35 vs. <22 to >69) and includes only secondary-care treated (i.e., more severe) MDD. Our study replicates and extends previous results by providing more interpretable effect sizes using an alternative model-based approach for family liability scores and by showing consistency between familial and molecular scores.
We observed a small, significant improvement in power when performing GWAS for MDD on PA-FGRS liabilities. A previous study incorporating family history in GWASs of MDD using iPSYCH did not observe gains17 but only considered first-degree relatives and weighted them differently. Consistent with simulations, the relative increase in power observed in highly ascertained case-control data is smaller than what has been reported for population-based studies.7,17 In population studies, especially for rarer disorders, most of the variance in liability is hidden within controls, whereas for highly ascertained data, most of the variance in liability remains between cases and controls. In this latter context, little is gained by moving from binary to continuous measures. Although we observe small gains in power for GWASs, consistent with other studies, the most impactful applications of PA-FGRS may lie in classification and descriptions of etiology. Our study should be interpreted in light of a few important limitations. Certain modeling choices could affect the reliability of PA-FGRS. First, pedigree size varies substantially among individuals. Probands with few relatives have scores regressed more toward the mean liability, which can introduce bias. Second, modeling the censoring process requires external information about age-of-onset curves for disease of interest—as do the other methods—and these may change in calendar time cohorts. While reliable age-of-onset curves are available for the present register coverage, estimating age-of-onset curves for past decades with different diagnostic systems and different register coverage is challenging. Third, our model assumes that the true liability of cases with different age and calendar year of onset is the same, while others have proposed true liability should vary according to these covariates.17,56 Both approaches are based on heuristics and could be better compared, integrated, and optimized to improve performance. Fourth, in our framework, genotypes are only used to generate PGSs based on external effect estimates, but as the number of genotyped samples within cohorts grows, adding within cohort estimates either as a meta-PGS63 or a genomic BLUP64 could likely improve accuracy.
Here, we have taken a data-first approach to studying the genetic architecture of MDD by tailoring both our study aims and method development to the particular strengths and challenges of a unique data resource. Doing so resulted in a methodological increment with broad applicability and highlights the utility of integrating multiple sources of genetic data when considering trait predictions, etiological descriptions, and gene mapping.
Data and code availability
The code generated during this study is available through public github repositories: https://github.com/BioPsyk/PAFGRS and https://github.com/MortenKrebs/PA-FGRS-simulations.
Acknowledgments
This work was supported by Lundbeck Foundation (R335-2019-2318 to A.J.S; R230-2016-3565 to M.D.K.; R335-2019-2339 to B.J.V.; and R208-2015-3951 to S.L.), the National Institute of Mental Health (R01MH130581 to K.S.K. and J.F.), the Danish Independent Research Fund (2034-00241B to B.J.V.), The Research Fund of the Mental Health Services – Capital Region of Denmark (R4A92 to S.L.), and Fonden for Faglig Udvikling af Speciallægepraksis (38850/16 to S.L.). We also acknowledge the Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH) (R102-A9118, R1552014-1724, and R248-2017-2003) and high-performance computer capacity for handling and statistical analysis of iPSYCH data on the GenomeDK HPC facility provided by the Center for Genomics and Personalized Medicine and the Centre for Integrative Sequencing, iSEQ, Aarhus University, Denmark (grant to Prof. Anders Børglum).
Author contributions
Conceptualization, M.D.K. and A.J.S; data curation, M.D.K., K.-L.G.H., M.L., V.A., X.C., and A.J.S.; formal analysis/investigation, M.D.K., K-L.G.H., and M.L.; funding acquisition, M.D.K., J.F., K.S.K., and A.J.S.; methodology, M.D.K., H.O., K.S.K., and A.J.S.; project administration, M.D.K. and A.J.S.; software, M.D.K.; resources, J.J.M., iPSYCH Study Consortium, A.I., A.B, and T.W.; supervision, R.B., B.J.V, J.F., S.-A.B., N.C., A.D., N.Z., K.S.K., and A.J.S.; validation, M.D.K., E.P., J.S., and A.J.S.; visualization, M.D.K., M.L., and A.J.S.; writing – original draft, M.D.K. and A.J.S.; writing – review & editing, all authors.
Declaration of interests
B.J.V. is a member of Allelica’s scientific advisory board.
Published: October 28, 2024
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2024.09.009.
Contributor Information
Morten Dybdahl Krebs, Email: morten.dybdahl.krebs@regionh.dk.
Andrew J. Schork, Email: andrew.joseph.schork@regionh.dk.
Web resources
BLUPF90 documentation, http://nce.ads.uga.edu/html/projects/programs/docs/blupf90_all8.pdf
Summary statistics QC pipeline, https://github.com/BioPsyk/cleansumstats
PA-FGRS R package, https://github.com/BioPsyk/PAFGRS
PA-FGRS Simulation code, https://github.com/MortenKrebs/PA-FGRS-simulations
Supplemental information
Tables S1–S8
References
- 1.Nagai A., Hirata M., Kamatani Y., Muto K., Matsuda K., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Mushiroda T., et al. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 2017;27:S2–S8. doi: 10.1016/j.je.2016.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jónsson H., Sulem P., Kehr B., Kristmundsdottir S., Zink F., Hjartarson E., Hardarson M.T., Hjorleifsson K.E., Eggertsson H.P., Gudjonsson S.A., et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci. Data. 2017;4 doi: 10.1038/sdata.2017.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pedersen C.B., Bybjerg-Grauholm J., Pedersen M.G., Grove J., Agerbo E., Bækvad-Hansen M., Poulsen J.B., Hansen C.S., McGrath J.J., Als T.D., et al. The iPSYCH2012 case-cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Mol. Psychiatry. 2018;23:6–14. doi: 10.1038/mp.2017.196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bybjerg-Grauholm J., Bøcker Pedersen C., Bækvad-Hansen M., Giørtz Pedersen M., Adamsen D., Søholm Hansen C., Agerbo E., Grove J., Als T.D., Schork A.J., et al. The iPSYCH2015 Case-Cohort sample: updated directions for unravelling genetic and environmental architectures of severe mental disorders. medRxiv. 2020 doi: 10.1101/2020.11.30.20237768. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Liu J.Z., Erlich Y., Pickrell J.K. Case-control association mapping by proxy using family history of disease. Nat. Genet. 2017;49:325–331. doi: 10.1038/ng.3766. [DOI] [PubMed] [Google Scholar]
- 7.Hujoel M.L.A., Gazal S., Loh P.-R., Patterson N., Price A.L. Liability threshold modeling of case–control status and family history of disease increases association power. Nat. Genet. 2020;52:541–547. doi: 10.1038/s41588-020-0613-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Agerbo E., Trabjerg B.B., Børglum A.D., Schork A.J., Vilhjálmsson B.J., Pedersen C.B., Hakulinen C., Albiñana C., Hougaard D.M., Grove J., et al. Risk of Early-Onset Depression Associated With Polygenic Liability, Parental Psychiatric History, and Socioeconomic Status. JAMA Psychiatr. 2021;78:387–397. doi: 10.1001/jamapsychiatry.2020.4172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hujoel M.L.A., Loh P.-R., Neale B.M., Price A.L. Incorporating family history of disease improves polygenic risk scores in diverse populations. Cell Genom. 2022;2 doi: 10.1016/j.xgen.2022.100152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mars N., Lindbohm J.V., Briotta Parolo P.D., Widén E., Kaprio J., Palotie A., Ripatti S., FinnGen Systematic comparison of family history and polygenic risk across 24 common diseases. Am. J. Hum. Genet. 2022;109:2152–2162. doi: 10.1016/j.ajhg.2022.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cai N., Revez J.A., Adams M.J., Andlauer T.F.M., Breen G., Byrne E.M., Clarke T.-K., Forstner A.J., Grabe H.J., Hamilton S.P., et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat. Genet. 2020;52:437–447. doi: 10.1038/s41588-020-0594-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.LaBianca S., Brikell I., Helenius D., Loughnan R., Mefford J., Palmer C.E., Walker R., Gådin J.R., Krebs M., Appadurai V., et al. Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder. Nat. Genet. 2024;56:234–244. doi: 10.1038/s41588-023-01593-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yang J., Wray N.R., Visscher P.M. Comparing apples and oranges: equating the power of case-control and quantitative trait association studies. Genet. Epidemiol. 2010;34:254–257. doi: 10.1002/gepi.20456. [DOI] [PubMed] [Google Scholar]
- 14.Wray N.R., Pergadia M.L., Blackwood D.H.R., Penninx B.W.J.H., Gordon S.D., Nyholt D.R., Ripke S., MacIntyre D.J., McGhee K.A., Maclean A.W., et al. Genome-wide association study of major depressive disorder: new results, meta-analysis, and lessons learned. Mol. Psychiatry. 2012;17:36–48. doi: 10.1038/mp.2010.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wray N.R., Visscher P.M. Quantitative genetics of disease traits. J. Anim. Breed. Genet. 2015;132:198–203. doi: 10.1111/jbg.12153. [DOI] [PubMed] [Google Scholar]
- 16.Wright S. An Analysis of Variability in Number of Digits in an Inbred Strain of Guinea Pigs. Genetics. 1934;19:506–536. doi: 10.1093/genetics/19.6.506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pedersen E.M., Agerbo E., Plana-Ripoll O., Grove J., Dreier J.W., Musliner K.L., Bækvad-Hansen M., Athanasiadis G., Schork A., Bybjerg-Grauholm J., et al. Accounting for age of onset and family history improves power in genome-wide association studies. Am. J. Hum. Genet. 2022;109:417–432. doi: 10.1016/j.ajhg.2022.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.So H.-C., Kwan J.S.H., Cherny S.S., Sham P.C. Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. Am. J. Hum. Genet. 2011;88:548–565. doi: 10.1016/j.ajhg.2011.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Aitken A.C. Note on selection from a multivariate normal population. Proc. Edinb. Math. Soc. 1935;4:106–110. [Google Scholar]
- 20.Campbell D.D., Li Y., Sham P.C. Multifactorial disease risk calculator: Risk prediction for multifactorial disease pedigrees. Genet. Epidemiol. 2018;42:130–133. doi: 10.1002/gepi.22101. [DOI] [PubMed] [Google Scholar]
- 21.Kendler K.S., Ohlsson H., Sundquist J., Sundquist K. Family Genetic Risk Scores and the Genetic Architecture of Major Affective and Psychotic Disorders in a Swedish National Sample. JAMA Psychiatr. 2021;78:735–743. doi: 10.1001/jamapsychiatry.2021.0336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nørgaard-Pedersen B., Hougaard D.M. Storage policies and use of the Danish Newborn Screening Biobank. J. Inherit. Metab. Dis. 2007;30:530–536. doi: 10.1007/s10545-007-0631-x. [DOI] [PubMed] [Google Scholar]
- 23.Mors O., Perto G.P., Mortensen P.B. The Danish Psychiatric Central Research Register. Scand. J. Public Health. 2011;39:54–57. doi: 10.1177/1403494810395825. [DOI] [PubMed] [Google Scholar]
- 24.Lynge E., Sandegaard J.L., Rebolj M. The Danish National Patient Register. Scand. J. Public Health. 2011;39:30–33. doi: 10.1177/1403494811401482. [DOI] [PubMed] [Google Scholar]
- 25.Pedersen C.B. The Danish Civil Registration System. Scand. J. Public Health. 2011;39:22–25. doi: 10.1177/1403494810387965. [DOI] [PubMed] [Google Scholar]
- 26.Browning B.L., Zhou Y., Browning S.R. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am. J. Hum. Genet. 2018;103:338–348. doi: 10.1016/j.ajhg.2018.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Browning S.R., Browning B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.McCarthy S., Das S., Kretzschmar W., Delaneau O., Wood A.R., Teumer A., Kang H.M., Fuchsberger C., Danecek P., Sharp K., et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- 31.Rainer J., Taliun D., D’Elia Y., Pattaro C., Domingues F.S., Weichenberger C.X. FamAgg: an R package to evaluate familial aggregation of traits in large pedigrees. Bioinformatics. 2016;32:1583–1585. doi: 10.1093/bioinformatics/btw019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sinnwell J.P., Therneau T.M., Schaid D.J. The kinship2 R package for pedigree data. Hum. Hered. 2014;78:91–93. doi: 10.1159/000363105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wright S. Coefficients of Inbreeding and Relationship. Am. Nat. 1922;56:330–338. [Google Scholar]
- 34.Falconer D.S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 1965;29:51–76. [Google Scholar]
- 35.Fisher R.A. XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans. R. Soc. Edinb. 1919;52:399–433. [Google Scholar]
- 36.Johnson N.L., Kotz S., Balakrishnan N. Volume 2. John Wiley & Sons; 1995. (Continuous Univariate Distributions). [Google Scholar]
- 37.Mendell N.R., Elston R.C. Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks. Biometrics. 1974;30:41–57. [PubMed] [Google Scholar]
- 38.Misztal I. Proceedings of the Computational Cattle Breeding ’99 Workshop. Interbull Bul.; 1999. Complex models, more data: simpler programming; pp. 33–42. [Google Scholar]
- 39.Pedersen C.B., Mors O., Bertelsen A., Waltoft B.L., Agerbo E., McGrath J.J., Mortensen P.B., Eaton W.W. A comprehensive nationwide study of the incidence rate and lifetime risk for treated mental disorders. JAMA Psychiatr. 2014;71:573–581. doi: 10.1001/jamapsychiatry.2014.16. [DOI] [PubMed] [Google Scholar]
- 40.Lloyd-Jones L.R., Zeng J., Sidorenko J., Yengo L., Moser G., Kemper K.E., Wang H., Zheng Z., Magi R., Esko T., et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 2019;10:5086. doi: 10.1038/s41467-019-12653-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Venables W.N., Ripley B.D. Springer Science & Business Media; 2003. Modern Applied Statistics with S. [Google Scholar]
- 42.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Willer C.J., Li Y., Abecasis G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Schizophrenia Working Group of the Psychiatric Genomics Consortium. Patterson N., Daly M.J., Price A.L., Neale B.M. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.R., ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3. Duncan L., et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Athanasiadis G., Meijsen J.J., Helenius D., Schork A.J., Ingason A., Thompson W.K., Geschwind D.H., Werge T., Buil A. A comprehensive map of genetic relationships among diagnostic categories based on 48.6 million relative pairs from the Danish genealogy. Proc. Natl. Acad. Sci. USA. 2022;119 doi: 10.1073/pnas.2118688119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lynch M., Walsh B. 1998. Genetics and Analysis of Quantitative Traits (Sinauer) [Google Scholar]
- 48.Kendler K.S., Ohlsson H., Sundquist J., Sundquist K. Impact of comorbidity on family genetic risk profiles for psychiatric and substance use disorders: a descriptive analysis. Psychol. Med. 2021;53:2389–2398. doi: 10.1017/S0033291721004268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Howard D.M., Adams M.J., Clarke T.-K., Hafferty J.D., Gibson J., Shirali M., Coleman J.R.I., Hagenaars S.P., Ward J., Wigmore E.M., et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 2019;22:343–352. doi: 10.1038/s41593-018-0326-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kurki M.I., Karjalainen J., Palta P., Sipilä T.P., Kristiansson K., Donner K.M., Reeve M.P., Laivuori H., Aavikko M., Kaunisto M.A., et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023;613:508–518. doi: 10.1038/s41586-022-05473-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Dahl A., Thompson M., An U., Krebs M., Appadurai V., Border R., Bacanu S.-A., Werge T., Flint J., Schork A.J., et al. Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder. Nat. Genet. 2023;55:2082–2093. doi: 10.1038/s41588-023-01559-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Trubetskoy V., Pardiñas A.F., Qi T., Panagiotaropoulou G., Awasthi S., Bigdeli T.B., Bryois J., Chen C.-Y., Dennison C.A., Hall L.S., et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature. 2022;604:502–508. doi: 10.1038/s41586-022-04434-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Mullins N., Forstner A.J., O’Connell K.S., Coombes B., Coleman J.R.I., Qiao Z., Als T.D., Bigdeli T.B., Børte S., Bryois J., et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet. 2021;53:817–829. doi: 10.1038/s41588-021-00857-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Demontis D., Walters R.K., Martin J., Mattheisen M., Als T.D., Agerbo E., Baldursson G., Belliveau R., Bybjerg-Grauholm J., Bækvad-Hansen M., et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 2019;51:63–75. doi: 10.1038/s41588-018-0269-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Okbay A., Beauchamp J.P., Fontana M.A., Lee J.J., Pers T.H., Rietveld C.A., Turley P., Chen G.-B., Emilsson V., Meddens S.F.W., et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533:539–542. doi: 10.1038/nature17671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Pedersen E.M., Agerbo E., Plana-Ripoll O., Steinbach J., Krebs M.D., Hougaard D.M., Werge T., Nordentoft M., Børglum A.D., Musliner K.L., et al. 2022. ADuLT: An efficient and robust time-to-event GWAS. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lee A., Mavaddat N., Wilcox A.N., Cunningham A.P., Carver T., Hartley S., Babb de Villiers C., Izquierdo A., Simard J., Schmidt M.K., et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med. 2019;21:1708–1718. doi: 10.1038/s41436-018-0406-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Goff D.C., Jr., Lloyd-Jones D.M., Bennett G., Coady S., D’Agostino R.B., Gibbons R., Greenland P., Lackland D.T., Levy D., O’Donnell C.J., et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014;129:S49–S73. doi: 10.1161/01.cir.0000437741.48606.98. [DOI] [PubMed] [Google Scholar]
- 59.Musliner K.L., Mortensen P.B., McGrath J.J., Suppli N.P., Hougaard D.M., Bybjerg-Grauholm J., Bækvad-Hansen M., Andreassen O., Pedersen C.B., Pedersen M.G., et al. Association of Polygenic Liabilities for Major Depression, Bipolar Disorder, and Schizophrenia With Risk for Depression in the Danish Population. JAMA Psychiatr. 2019;76:516–525. doi: 10.1001/jamapsychiatry.2018.4166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Musliner K.L., Agerbo E., Vilhjálmsson B.J., Albiñana C., Als T.D., Østergaard S.D., Mortensen P.B. Polygenic Liability and Recurrence of Depression in Patients With First-Onset Depression Treated in Hospital-Based Settings. JAMA Psychiatr. 2021;78:792–795. doi: 10.1001/jamapsychiatry.2021.0701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Musliner K.L., Krebs M.D., Albiñana C., Vilhjalmsson B., Agerbo E., Zandi P.P., Hougaard D.M., Nordentoft M., Børglum A.D., Werge T., et al. Polygenic Risk and Progression to Bipolar or Psychotic Disorders Among Individuals Diagnosed With Unipolar Depression in Early Life. Am. J. Psychiatry. 2020;177:936–943. doi: 10.1176/appi.ajp.2020.19111195. [DOI] [PubMed] [Google Scholar]
- 62.Kendler K.S., Ohlsson H., Bacanu S., Sundquist J., Sundquist K. Differences in genetic risk score profiles for drug use disorder, major depression, and ADHD as a function of sex, age at onset, recurrence, mode of ascertainment, and treatment. Psychol. Med. 2023;53:3448–3460. doi: 10.1017/S0033291721005535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Albiñana C., Grove J., McGrath J.J., Agerbo E., Wray N.R., Bulik C.M., Nordentoft M., Hougaard D.M., Werge T., Børglum A.D., et al. Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction. Am. J. Hum. Genet. 2021;108:1001–1011. doi: 10.1016/j.ajhg.2021.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Aguilar I., Misztal I., Johnson D.L., Legarra A., Tsuruta S., Lawlor T.J. A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 2010;93:743–752. doi: 10.3168/jds.2009-2730. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Tables S1–S8
Data Availability Statement
The code generated during this study is available through public github repositories: https://github.com/BioPsyk/PAFGRS and https://github.com/MortenKrebs/PA-FGRS-simulations.






