Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2024 Feb 15;19(2):e0282212. doi: 10.1371/journal.pone.0282212

Interpreting polygenic score effects in sibling analysis

Jason Fletcher 1,*, Yuchang Wu 2, Tianchang Li 2, Qiongshi Lu 3
Editor: Heming Wang4
PMCID: PMC10868763  PMID: 38358994

Abstract

Researchers often claim that sibling analysis can be used to separate causal genetic effects from the assortment of biases that contaminate most downstream genetic studies (e.g. polygenic score predictors). Indeed, typical results from sibling analysis show large (>50%) attenuations in the associations between polygenic scores and phenotypes compared to non-sibling analysis, consistent with researchers’ expectations about bias reduction. This paper explores these expectations by using family (quad) data and simulations that include indirect genetic effect processes and evaluates the ability of sibling analysis to uncover direct genetic effects of polygenic scores. We find that sibling analysis, in general, fail to uncover direct genetic effects; indeed, these models have both upward and downward biases that are difficult to sign in typical data. When genetic nurture effects exist, sibling analysis creates “measurement error” that attenuates associations between polygenic scores and phenotypes. As the correlation between direct and indirect effect changes, this bias can increase or decrease. Our findings suggest that interpreting results from sibling analysis aimed at uncovering direct genetic effects should be treated with caution.

Introduction

Due to the high correlation in genetic measurement between offspring and parents, it is difficult to separate “direct” genetic effects of offspring genotype on phenotype with “indirect” genetic effects of parental genotype on offspring phenotype. This issue was demonstrated empirically by Kong et al. [1], who showed that associations between non-transmitted parental alleles and offspring phenotype explained approximately 30% of the r-squared of the offspring polygenic scores (PGS) on offspring phenotype in the case of educational attainment.

Researchers have since made the conjecture that sibling models can solve this and other problems, since biological siblings share the same parents and therefore may share the same indirect genetic effects. The more general claim of the usefulness of siblings and the associated “genetic lottery” proceeds the results of indirect genetic effects (see Fletcher and Lehrer [2, 3]). Indeed, many subsequent analyses have demonstrated important reductions in the estimated associations between PGS and phenotypes after controlling for sibling fixed effects, often on the order of 50% (Trejo and Domingue [4], Selzam et al. [5]). This theory and evidence have increased researchers’ confidence in interpreting sibling models as producing causal (direct) genetic effects—both in upstream GWAS analysis (Howe et al. [6]) and in downstream PGS analysis (Belsky et al. [7]). For example, in a recent review article, Harden and Koellinger [8] state:

“Because genotypes are assigned randomly with respect to all other variables, an association between sibling differences in PGS and sibling differences in phenotype is powerful evidence that the PGS is tapping genetic variants with a causal influence on the phenotype.”

While intuitive, these interpretations have not been subject to theoretical and empirical scrutiny. We use family data (quads) combined with simulation evidence to show that this intuition relies on very simple models of genotype-to-phenotype associations, where direct and indirect genetic effects are separable (and thus differenced out in sibling models). While researchers have suggested that using a sibling model would purge the indirect effects, we show that, even in our simplified cases, researchers have failed to recognize the dual role of indirect genetic effects as a confounder that produces both positive and negative bias on the estimated effects of offspring PGS. We build the intuition for the negative bias by considering a scenario where all SNP effects are causal and there are no indirect genetic effects and show that sibling analysis leads to attenuated estimates of PGS effects. Essentially, the attenuation stems from sibling analysis’ elimination of both confounding and true genetic effects. We then extend this scenario to allow correlation between indirect and direct genetic effects to show that the combination of biases in general can be negative or positive and depend on: the strength of indirect effects, the correlation between indirect and direct genetic effects, and also the extent to which sibling’s environments are correlated. In summary, even our simple data generating process shows that sibling analysis is unlikely to produce accurate estimates of direct effects and, worse, does not suggest clear correctives or bounding exercises that are effective.

Result

The detailed simulation procedures are described in the Materials and methods section. Briefly, we draw the direct and indirect effect sizes for each SNP as well as the environmental effect for each individual following a specific set of parameters in each scenario. Then, we regressed the phenotype constructed by adding up the components above on the theoretical PGS estimation that can be obtained as outputs of genome-wide association studies (GWAS). We compared either the regression coefficient or R2 of both between- and within-family regression to imitate real PGS analysis and investigate the impact of changes of parameters in the data generating process on the performance of sibling PGS analysis. Table 1 provides a summary of the inputs and analysis output across our scenarios below.

Table 1. Summary of calculations for regression estimates.

Study design Regression model Regression coefficient R 2
Between family Y~PGSmix 1 σdir2+σind2+2ρgσdir2+2σind2+2ρg+σe2
Sibling difference ΔYPGSmix σdir2+ρgσdir2+σind2+2ρg (σdir2+ρg)2(σdir2+σind2+2ρg)[σdir2+2σe2(1re)]
Sibling difference ΔYPGSdir 1 σdir2σdir2+2σe2(1re)

PGS regression coefficients and R2 in different study designs. Different study designs can be used for PGS regression analysis. In a “between family” analysis, phenotype Y is regressed on PGS. In a sibling difference design, the phenotype difference between siblings is regressed on their PGS difference. PGS can be calculated using either a mixture of direct and indirect SNP effects (PGSmix) or using only the direct effect (PGSdir).

Special case of no indirect genetic effects: The influence of the correlation of environmental effects between siblings

We performed simulations under the simple scenario where the phenotype has no indirect genetic effect contribution (σi = 0); one example phenotype for this scenario could be height, which has been shown to have minimal indirect genetic effects (Kong et al. [1]). In this case, from Formulas (10, 11) in the method section, we can see that sibling PGS models are not guaranteed to estimate the variance component of direct effects even when genetic nurture is absent. This is because the elimination of family effects that occur in sibling models also eliminate true direct genetic effects. Our formulas show that the elimination of these true genetic effects can (only) be offset in the (unlikely) case where siblings have a correlation of 0.5 in their environmental effects (ρe = 0.5); in this case, the correlation in both environmental and genetic effects are the same. In general, when siblings have a correlation in their environmental effects higher than 0.5, the R2 of sibling PGS model is overestimated; when this correlation is below 0.5, the R2 of sibling PGS model is underestimated (Fig 1). Since the environmental effect is assumed to be independent of genetic components in this study, this component does not interact with other potential factors. Therefore, to focus on the influence of other parameters (and not ρe), we will assume that siblings have a correlation of 0.5 in their environmental effects throughout the rest of the simulations.

Fig 1. The effect of environmental correlation re between siblings on the R2 of PGS regression analyses from different study designs.

Fig 1

y-axis shows the ratio of R2 from between family design (red) or sibling difference design (blue) vs. the proportion of phenotypic variance due to the direct effect in the population. Each boxplot shows the simulation results of 200 repeats. In each repeat, we simulated the true SNP effect sizes (βdir and βind) from a bivariate normal distribution, and the environmental effect sizes for siblings from a bivariate normal distribution. We then calculated the phenotype, PGS, and run the linear regression analyses. In this figure, the PGS was calculated using (βdir+βind). The variance of direct effect size and the variance of the environmental effect size were fixed at 1 and 3, respectively. The indirect effect size and the correlation rg between direct and indirect effect were both set to 0. When re = 1, the environmental terms for two siblings become identical, thus their phenotypic difference becomes ΔPGSdir and their PGS difference is also ΔPGSdir since here we set βind = 0. Thus, its R2 is always 1 in each repeat whereas the proportion of the phenotypic variance by the direct effect is ¼. Therefore, the ratio is always 4 for the last setting as shown in the figure.

The influence of indirect genetic effects

In order to focus attention on the impacts of non-zero indirect genetic effects, we performed simulations with the following settings: the variance of direct genetic effects normalized to 1, the variance of the environmental effect is assumed to be three times the variance of the direct genetic effect, and no correlation between direct and indirect genetic effects. As we increase the contribution of indirect genetic effects, we found that sibling PGS models produce estimates of the direct effect variance component that is attenuated compared to the estimates for the population (i.e. non-sibling) PGS models. As we showed in Fig 1, with the assumption of environmental correlations of 0.5 (ρe = 0.5) and no indirect genetic effects, the first column of results in Fig 2 are estimated accurately (see Formulas (10, 11)). However, as indirect genetic effects are introduced in Columns 2 and 3, the downward bias of the sibling model appears and becomes larger. This is consistent with empirical results from other work cited above. However, since the ratio of these estimated R2 values compared with the expected direct effect component variance is smaller than 1, our results demonstrate that the sibling PGS model does not accurately recover the direct genetic effects. The figure also shows that the population (i.e. non-sibling) model cannot recover direct effects, and that the estimated effects are biased upward (as is commonly understood).

Fig 2. The effect of indirect effect size variance on the R2 of PGS regression analyses from different study designs.

Fig 2

y-axis shows the ratio of R2 from between family design (red) or sibling difference design (blue) vs. the proportion of phenotypic variance due to the direct effect in the population. The yellow box shows the ratio of R2 from the sibling difference design based on PGSmix vs. that in a sibling difference design based on PGSdir. Each boxplot shows the simulation results of 200 repeats. In this figure, the variance of direct effect size and the variance of the environment effect size were fixed at 1 and 3, respectively. The correlation rg between direct and indirect effect sizes and the correlation re between two sibling’s environments were fixed at 0 and 0.5, respectively. When the indirect effect is 0 (both σind2 and βind are 0), the sibling difference analyses become identical regardless of whether the PGS is computed based on (βdir+βind) or βdir, thus the yellow box is fixed at 1 when σind2=0 in this figure.

We now further consider the regression that sibling PGS model (i.e. sibling difference model) performs: ΔYjΔPGSj^. When taking the difference in sibling phenotypes, the genetic nurture effect cancels out since full siblings share the same parents (formula (6)). Whereas in estimated PGS, the transmitted genetic nurture remains different between siblings (formula (7)).

To better understand the impact of indirect genetic effects on sibling PGS analysis, we plot the ratio of sibling model R2 estimated with ΔPGSj^ compared with the R2 estimated with the sibling difference in the direct genetic effect component (formula (12)). This compares the regular sibling model estimated R2 with the R2 of true direct effects. Note in the third box in each group, we see results below 1. Thus, the indirect genetic effect reduces the regression R2 to a larger extent as its contribution to the phenotype increases. We might speculate, then, a smaller reduction in regression R2 for a phenotype with moderate indirect genetic effects (e.g. asthma) compared to a phenotype with likely larger indirect genetic effects (e.g. education)—but both estimates would be affected.

The influence of correlation between direct and indirect effects

To add an additional element to our analysis, we considered the case of non-zero correlation between indirect genetic effects and direct genetic effects. As above, we constructed phenotypes with the variance of direct effect normalized to 1, the variance of indirect genetic effect as either a half or equal to the variance of the direct genetic effect, and the variance of environmental effect as three times the variance of the direct genetic effect. However, we now relax the assumption from above that the correlation between indirect and direct genetic effects is zero and instead varied the correlation between direct and indirect effects between -1 and 1. We found that sibling PGS models, in general, produce estimates of the of direct genetic effects that are smaller than estimates from population (i.e. non-sibling) PGS models (Fig 3). We note that previous literature viewed reductions in estimated direct effect contribution using sibling models as evidence that these models were eliminating confounds (such as indirect genetic effects), whereas our results show that sibling models actually underestimate direct genetic effects in many scenarios.

Fig 3. The effect of correlation rg between the direct and indirect effect sizes on the R2 of PGS regression analyses from different study designs.

Fig 3

y-axis shows the ratio of R2 from between family design (red) or sibling difference design (blue) vs. the proportion of phenotypic variance due to the direct effect in the population. The yellow box shows the ratio of R2 from the sibling difference design based on PGSmix vs. that in a sibling difference design based on PGSdir. Each boxplot shows the simulation results of 200 repeats. In this figure, the variance of direct effect size and the variance of the environmental effect size were fixed at 1 and 3, respectively. The correlation between two sibling’s environments was fixed at 0.5. The two panels correspond to the results when the variance of the indirect effect size is 0.5 and 1, respectively.

We show results for two settings that (as above) fix the variance of direct genetic effects at 1 and the variance of indirect genetic effects at 0.5 or 1. We find that increasing the correlation between direct and indirect effects increases the ratio of sibling PGS estimates and the direct effect component variance from below 1 (underestimate) to above 1 (overestimate). R2 can be viewed as a function of the correlation between direct and indirect effects given specific values of other parameters, such as the variance of indirect genetic effects. Thus, the ratio of the sibling PGS estimate and the direct effect variance component (defined on the population) does not change linearly (formula (10), Appendix 3 in S1 File). Whether we allow indirect genetic effects to be modest (0.5) or large (1) and also allow correlations between the direct and indirect genetic effects, we find that it is rare that sibling analysis will accurately estimate direct genetic effects (i.e the horizontal line at 1, where the estimated R2 is equal to the expected R2). We note that we fine large variations in the bias: the results suggest that the estimated R2 can be up to twice the size of the true R2 or less than half the size of the true R2, depending on the data generating process. We also note again that we have assumed in these analyses that the environmental effects are correlated at 0.5 between siblings. Otherwise, these results would “shift down”, as we show in Fig 1.

The performance of PGS regression coefficients

As another important measurement of PGS model performance, regression coefficients have also drawn attention in past research. One study showed that between-family PGS regression would yield coefficients that are biased upwards and within-family PGS regression would yield coefficients that bias downwards (Trejo & Domingue [4]). To verify this in our framework, we estimated regression coefficients in our framework on a wider value range of rg (i.e. the correlation between indirect and direct genetic effects). We also noted a divergence between our framework and previous results in terms of the standardization of PGS. Our framework does not standardize the PGS variable in the estimation, so that the between-family regression will estimate a coefficient of 1, which equals the theoretical value when the direct effect variance component is accurately recovered (Fig 4). For the sibling PGS model, with unstandardized PGS estimates, regression coefficients could be over- or under-estimated depending on the ratio of indirect and direct effect variance and the value of their correlation. Overall, the larger the ratio is, the more the sibling regression coefficients converge to a downwardly biased value. The larger the correlation is, the more the sibling regression coefficients are biased downwards (Fig 4). We also confirmed these with our derivation and rewrite Trejo and Domingue’s derivation without standardization of PGS (Appendix 5 in S1 File).

Fig 4. The effect of correlation rg between the direct and indirect effect sizes on PGS regression coefficients from different study designs.

Fig 4

y-axis shows the ratio of the PGS regression coefficients from between family design (red) or sibling difference design (blue) vs. 1, which is the effect size of the PGSdir (see Table 1). We used the same simulation settings as those used in Fig 3. Since we set the variance of the direct effect size σdir2=1, when the variance of the indirect effect size is also 1 and their correlation is -1, the direct and indirect effect sizes become exactly the opposite of each other, therefore PGSmix = 0 and the linear regression cannot be run under this scenario. Therefore, we do not include results when σind2=1 and ρg = −1.

Discussion

To examine the performance of the estimated R2 of sibling PGS analysis in recovering the direct genetic effect variance component from a data generating process that includes both direct and indirect genetic effects, we performed our analysis on simulated phenotypes based on genotypic data of quads in the SPARK cohort. Our analytical results demonstrated that sibling PGS analysis generally does not yield R2 that accurately reflects the direct effect variance component.

In our simplified scenario where a phenotype is not impacted by indirect genetic effects, sibling PGS analysis can yield R2 estimates that are biased either upward or downward, depending on the environmental correlation between siblings. In this study, we assumed a genotypic correlation of 0.5 between siblings which set a scale for the variance of the difference between their estimated PGS. More importantly, when taking the difference in sibling phenotypes, the indirect genetic effect that is shared between siblings is eliminated, leaving only the difference in direct genetic effects and difference in environmental effects. However, PGS constructed from GWAS estimates will, in most cases, contain both direct and indirect effects. When taking a difference between siblings, the direct genetic effects and transmitted genetic nurture effects remain in the beta weights that are used to construct the PGS in downstream analysis. Given these issues, three aspects of the sibling PGS model are found to generate biases.

First, based on the composition of the sibling phenotype difference, sibling PGS regressions only retains the proportion explained by direct genetic effects and environmental effect differences. Essentially, when the phenotype is affected by indirect genetic effects, the total variance of the phenotype is reduced when using sibling analysis compared to the variance defined in population-based PGS regression. Even with accurate direct effect PGS, sibling PGS still fails to fully recover the direct effect variance component for the population.

In order to examine the impact of other factors, we turned to comparing the sibling R2 estimates with the theoretical sibling R2 when regressing phenotype difference on true direct PGS difference. We found that, as the contribution of indirect genetic effects increases from 0, the ratio of estimated sibling R2 and the theoretical sibling R2 of direct genetic effects continues to decline from its target (unbiased) value of 1. This means the indirect genetic effect component is similar to “measurement error” in this case, attenuating the direct effect estimates. Additionally, when the contribution of direct, indirect, and environmental effects is held fixed, changes in the correlation between direct and indirect genetic effect will also lead to bias in sibling R2. As the correlation reduces from 1 to -1, the estimated R2 is increasingly biased downwards. We label this as an “LD-like” relationship between direct and indirect genetic components, which survives sibling differencing or sibling fixed effects analysis. When compared with the direct effect variance component defined at the population level, the correlations between indirect and direct genetic effects can lead to either downward or upward bias. Similarly, we make slight adjustments on the previous results from Trejo and Domingue that focus on bias of regression coefficients (rather than R2) obtained from sibling PGS analysis. Our conclusion shows that sibling analysis continues to be biased upwards or downwards in a way that depends on a combination of variances of direct genetic effects, indirect genetic effects, and their correlation.

It is important to note that our results are from a “simple” data generating process, where we assumed no assortative mating, no gene-environment interaction or correlation, and sibling genetic correlations of exactly 0.5. Adding these other elements to the framework will further complicate evaluating the performance of the sibling analysis, but, we suspect, will lead to additional biases rather than fewer. Thus, our view of the results from a relatively simple framework is that sibling analysis, coupled with conventional PGS, can rarely uncover a key target—direct genetic effects. Solving this issue will rely on dissecting each individual variant’s direct and indirect effects and calculating respective PGS for direct and indirect genetic components, possibly through sibling GWAS and multi-generational analysis (Howe et al., [6]; Wu et al., [9]).

Materials and methods

Data

We leverage family-based genetic data from quads (2 parents, 2 children) in the SPARK (Simons Foundation Powering Autism Research for Knowledge) study (Feliciano et al [10]) in order to seed the model with realistic genetic information. Specifically, we obtained 7,026,791 SNPs from 1813 families with two parents and two full siblings. Following previous work (Huang et al. [11]), we filtered out single nucleotide polymorphisms (SNPs) with a minor allele frequency less than 1%, with an imputation quality score less than 0.8, that are duplicated, or strand-ambiguous. Then we pruned the SNPs with linkage disequilibrium (LD) with a pairwise r2 higher 0.1. From the remaining 127,310 SNPs, we randomly picked 10,000 SNPs as causal variants and simulated phenotypes based on them.

Model specifications

We assume a data generating process for the phenotype Yij that includes both direct and indirect genetic effects. Our model follows Kong et al. [1] by assuming additive separable direct (βdir,k) and indirect (βind,k) genetic effects that are drawn from a bivariate normal distribution with a correlation parameter. The indirect genetic effect behaves as a family fixed effect that is shared between siblings. We denote the genotype of the ith child/sibling in the jth family as Gij; the genotype of mother and father in the jth family as Gm,j and Gp,j respectively. We can write the model as

Yij=k=1MGijkβdir,k+k=1M(Gm,jk+Gp,jk)βind,k+eij, (1)

where eij is the environmental residual. We assume equal indirect paternal and indirect maternal effect, both being βind,k. We note that this parameter can also be viewed as the average indirect parental effect if maternal and paternal effects are in fact unequal (Wu et al. 2021). We also assume that the direct and indirect effect sizes of the kth SNP on the phenotypes follow

(βdir,kβind,k)MVN((00),1M(σd2ρdiσdσiρdiσdσiσi2)). (2)

where σd2 represents the variance of the direct effect component of the phenotype; σi2 represents the variance of the indirect effect component of the phenotype; and ρdi represents the correlation between direct and indirect genetic effect sizes; M denotes the number of causal SNPs with direct effect size βdir,k and indirect effect size βind,k. For families whose genotypes are used in the simulation, we assume that the environmental effects for the 2 children in the jth family also follow a bivariate normal distribution

(e1je2j)MVN((00),(σe2ρeσe2ρeσe2σe2)), (3)

where σe2 represents the variance of environmental effects shared between 2 siblings; and ρe represents the environmental correlation between 2 siblings. We further assume all genotypes involved are standardized.

Formula (1) can be rearranged to separate the effects of transmitted (Gij) and non-transmitted (Nij) alleles as

Yij=k=1MGijk(βdir,k+βind,k)+k=1MNijkβind,k+eij. (4)

We assumed transmitted and non-transmitted alleles to be independent (supported by genotypic data, Appendix 1 in S1 File). It is clear to see from the rearrangement that a GWAS on phenotype will capture both the true direct and indirect genetic effects. Following the conventions in the literature, we constructed the downstream PGS estimation with the theoretical GWAS estimated allelic weights βk^=βdir,k+βind,k, assuming all causal SNPs are accurately estimated, which we denote as PGSij^=k=1MGijkβk^ (Lee et al. [12]). We obtained between-family PGS regression coefficients γOLS and r-squared R2OLS by regressing the phenotype of one sibling from each family on their estimated PGS as

Y1j=γOLSPGS1j^+eOLS,j. (5)

Derivations on the theoretical regression coefficient and r-squared for between-family analysis are included in the Appendix 2 in S1 File. For the sibling analysis, we took the difference in the phenotype between two siblings in a family as the within-family outcome

ΔYj=Y1jY2j(k=1MG2jkβdir,k+(Gm,jk+Gp,jk)βind,k+e2j)=k=1M(G1jkG2jk)βdir,k+(e1je2j). (6)

The shared indirect genetic effect is eliminated between siblings. We took the difference in the estimated PGS between two siblings in a family as the within-family predictor

ΔPGSj^=PGS1j^PGS2j^=k=1M(G1jkG2jk)(βdir,k+βind,k). (7)

Then we obtained within-family PGS regression coefficients γΔ and r-squared R2Δ by regressing the difference in the phenotype on the difference in the estimated PGS as

ΔYj=γΔΔPGS^+eΔ,j. (8)

When we assume siblings from the same families have a correlation of 0.5 in their genotypes, G1jk and G2jk (also supported by the genotypic data we use for simulation, details in Appendix 1 in S1 File), the within-family regression coefficients and r-squared can be derived as

γΔ=σd2+ρdiσdσiσd2+σi2+2ρσdσi (9)

and

RΔ2=(σd2+ρdiσdσi)2(σd2+σi2+2ρdiσdσi)*(σd2+Var(e1je2j)). (10)

To quantify the performance of both analyses on recovering the direct effect component, we compared the outcome regression coefficient with 1 (as direct effect component in model (2) takes a coefficient of 1) and r-squared with the proportion of direct effect variance component defined on population base

h2dir,OLS=σd2σd2+2σi2+2ρdiσdσi+σe2. (11)

We also compared the outcome r-squared of sibling analysis with the proportion of direct effect variance component defined on sibling differences which allowed us to better understand the impact of the change of parameters on sibling analysis alone. That is, we performed regression

ΔYj=γdir,Δ[k=1M(G1jkG2jk)βdir,k]+edir,Δ,j (12)

and obtained

h2dir,Δ=σd2σd2+Var(e1je2j). (13)

(Appendix 4 in S1 File)

Simulation

We generated direct effect and indirect effect allelic weights for each offspring from a normal distribution from each combination of parameters and apply them to offspring’s standardized genotypes. We also generated environmental effects for each offspring from a normal distribution following the parameters in each setting. By adding these components up for each offspring, we obtain their phenotypes.

Setting 1. From the derivations of our estimates above, we found that even in the simplest scenarios with unbiased GWAS effect sizes and genetic nurture absent, sibling analysis does not accurately estimate the variance component of direct effect. As a special case of Formula (11), here we have the population-based direct effect variance component defined as

hdir,OLS2=σd2σd2+σe2.

However, R2 from sibling analysis is expected to be

RΔ2=σd2σd2+Var(e1je2j)=σd2σd2+2σe22ρeσe2.

Comparing these two formulas, one can see that the difference between them depends on the correlation between the environmental effect of siblings, ρe. Only when ρe = 0.5, these two quantities equal. Therefore, we designed a setting where we kept the variance of direct genetic effect constant and set indirect genetic effect to be 0. Then, the correlation between direct and indirect effect is also 0. We also set the variance of the environmental residual and set the correlation between siblings’ environmental residual, ρe, to 0, 0.5, or 1. Thus, a total of 3 scenarios were examined in setting 1.

Setting 2. We kept the variance of direct genetic effect constant (normalized to 1) and varied the indirect genetic effect and the correlation between direct and indirect genetic effect to evaluate the influence of each factor on the sibling analysis R2 when the other was held constant. Specifically, we set the variance of indirect effect to be either 0, 0.5, or 1. For each variance of indirect effect, we varied the correlation between direct and indirect effect from -1 to 1 by a step of 0.2. We also set the variance of environmental residual to be 3. A total of 33 scenarios (3 variance of indirect effect x 11 correlation between direct and indirect effect) were examined in setting 2.

Supporting information

S1 File

(DOCX)

Acknowledgments

We thank members of the Social Genomics Working Group at University of Wisconsin for helpful comments. We are grateful to all the families participating in the Simons Foundation Powering Autism Research for Knowledge (SPARK) study.

Data Availability

The data underlying the results presented in the study are available from SPARK (Simons Foundation Powering Autism Research for Knowledge) study https://www.sfari.org/resource/spark/

Funding Statement

The authors gratefully acknowledge use of the facilities of the Center for Demography of Health and Aging at the University of Wisconsin-Madison, funded by NIA Center Grant P30 AG017266.

References

  • 1.Kong A., Thorleifsson G., Frigge M.L., Vilhjalmsson B.J., Young A.I., Thorgeirsson T.E., et al., 2018. The nature of nurture: Effects of parental genotypes. Science 359(6374), pp.424–428. doi: 10.1126/science.aan6877 [DOI] [PubMed] [Google Scholar]
  • 2.Fletcher Jason M., and Lehrer Steven F. "The effects of adolescent health on educational outcomes: Causal evidence using genetic lotteries between siblings." In Forum for Health Economics & Policy, vol. 12, no. 2. De Gruyter, 2009. [Google Scholar]
  • 3.Fletcher Jason M., and Lehrer Steven F. "Genetic lotteries within families." Journal of health economics 30, no. 4 (2011): 647–659. doi: 10.1016/j.jhealeco.2011.04.005 [DOI] [PubMed] [Google Scholar]
  • 4.Trejo Sam, Domingue Benjamin W. “Genetic nature or genetic nurture? Introducing social genetic parameters to quantify bias in polygenic score analyses.” Biodemography and Social Biology, 64:3–4, 187–215. [DOI] [PubMed] [Google Scholar]
  • 5.Selzam Saskia, Ritchie Stuart J., Pingault Jean-Baptiste, Reynolds Chandra A., O’Reilly Paul F., and Robert Plomin. "Comparing within-and between-family polygenic score prediction." The American Journal of Human Genetics 105, no. 2 (2019): 351–363. doi: 10.1016/j.ajhg.2019.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Howe L.J., Nivard M.G., Morris T.T., Hansen A.F., Rasheed H., Cho Y., et al., 2021. “Within-sibship GWAS improve estimates of direct genetic effects”. BioRxiv. [Google Scholar]
  • 7.Belsky Daniel W., Domingue Benjamin W., Wedow Robbee, Arseneault Louise, Boardman Jason D., Caspi Avshalom, et al. "Genetic analysis of social-class mobility in five longitudinal studies." Proceedings of the National Academy of Sciences 115, no. 31 (2018): E7275–E7284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Harden K.P. and Koellinger P.D., 2020. Using genetics for social science. Nature human behaviour, 4(6), pp.567–576. doi: 10.1038/s41562-020-0862-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wu Yuchang, Zhong Xiaoyuan, Lin Yunong, Zhao Zijie, Chen Jiawen, Zheng Boyan, et al. “Estimating genetic nurture with summary statistics of multigenerational genome-wide association studies.” Proceedings of the National Academy of Sciences 118, no. 25 (2021). doi: 10.1073/pnas.2023184118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Feliciano P., Daniels A.M., Snyder L.G., Beaumont A., Camba A., Esler A., et al., 2018. SPARK: a US cohort of 50,000 families to accelerate autism research. Neuron, 97(3), pp.488–493. doi: 10.1016/j.neuron.2018.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Huang K, Wu Y, Shin J, Zheng Y, Siahpirani AF, et al. “Transcriptome-wide transmission disequilibrium analysis identifies novel risk genes for autism spectrum disorder.” PLOS Genetics 17, no.2 (2021). doi: 10.1371/journal.pgen.1009309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lee James J., Wedow Robbee, Okbay Aysu, Kong Edward, Maghzian Omeed, Zacher Meghan, et al. "Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals." Nature genetics 50, no. 8 (2018): 1112–1121. doi: 10.1038/s41588-018-0147-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Heming Wang

5 May 2022

PONE-D-22-07328Interpreting Polygenic Score Effects in Sibling AnalysisPLOS ONE

Dear Dr. Fletcher,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Heming Wang, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure: 

"The author(s) received no specific funding for this work."

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. 

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"The authors gratefully acknowledge use of the facilities of the Center for Demography of Health and Aging at the University of Wisconsin-Madison, funded by NIA Center Grant P30 AG017266.  We thank members of the Social Genomics Working Group at University of Wisconsin for helpful comments. We are grateful to all the families participating in the Simons Foundation Powering Autism Research for Knowledge (SPARK) study."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

"The author(s) received no specific funding for this work."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

********** 

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

********** 

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

********** 

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this manuscript, the authors explored these expectations by using family (quad) data and simulations that include indirect genetic effect processes and evaluates the ability of sibling analysis to uncover direct genetic effects of polygenic scores. They pointed out that interpreting results from sibling analysis aimed at uncovering direct genetic effects should be treated with caution.

Overall, it is a solid manuscript. The authors conducted a variety of simulation scenarios to demonstrate the bias. The methodology and procedures were proposed and clearly described. The results and conclusions were clear and logical. The limitation is that the results are from a “simple” data generating processes, where they assumed no assortative mating, no gene-environment interaction or correlation, and sibling genetic correlations of exactly 0.5. But I understand that it is much more complicate in the real world.

My comments are:

1. the authors did 15 simulation repetitions, for me 15 times is kind low. I would expect that the results are generated from at least a few hundreds of repetitions.

2. In each of the boxplot, we can see that some groups have larger variations then others. Is there any explanation?

Minor comments:

1.The reference style is not consistent. For example, Introduction, second paragraph.

2. The authors should fix all the math symbols and letters. They are not showed on the pdf file.

Reviewer #2: The authors use simulated and real-world data (from parent-children quads), to determine whether within-family analyses using polygenic scores can truly separate direct and indirect genetic effects. This analysis seems especially important given that this is a claim that is often made, but not previously tested. The authors find that even under the ideal scenario (all direct, no indirect effects), within-family analyses tend to underestimate the true effect. As indirect and direct genetic effects become more correlated, estimate from within-family models become more biased in an unpredictable manner. This is an important investigation of a common assumption: that within family models are robust estimates of true direct genetic effects. It would be a welcome addition to the literature. My comments are mostly minor in hopes of clarifying some of the information provided in this manuscript.

Major points

Is this really a limitation of genetic analyses only? Could this be a broader issue of family fixed effects models in social and behavioral sciences more broadly? Is there any reason to think that the issues raised in these simulations do not also apply to phenotypic or environmental analyses within family?

Minor points:

Moving the data and methods into the main text would be helpful. I know that it helped orient me to the analyses.

The axis labels in the figures could be a bit clearer, maybe “estimated/expected”? Something that helps the reader quickly distinguish what the ratio is comparing.

Additionally, I think the figure labels could better differentiate the comparison. For example, rather than “between/between” and “within/within”, why not “between/indirect” and “within/direct”? This may better orient the reader to the fact that it is a comparison of an estimate to the true population parameter.

It might help to provide some specific examples of when we might expect different scenarios of different combinations of direct and indirect effects (e.g., height vs educational attainment).

I hope the authors take these comments in the constructive manner in which they are intended. This is a valuable analysis and important to those in the filed using polygenic scores.

********** 

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Xiaoyin Li

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Heming Wang

12 Sep 2022

PONE-D-22-07328R1Interpreting Polygenic Score Effects in Sibling AnalysisPLOS ONE

Dear Dr. Fletcher,

Thank you for submitting your manuscript to PLOS ONE. We are happy to tell you your revised manuscript in provisionally accepted. However, in order to publish your paper, please revise the format your math equations such that they will be displayed correctly.

Please submit your revised manuscript by Oct 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Heming Wang, PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

Please see the note from Reviewer 1 and double check the display of math formula in your final version.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors addressed most of my comments. Some of the math formulas still did not display correctly.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Feb 15;19(2):e0282212. doi: 10.1371/journal.pone.0282212.r004

Author response to Decision Letter 1


3 Feb 2023

The only issue raised was the errors in the PDF of some of the equations. The manuscript file has been replace with a PDF file by journal staff to address the errors.

Attachment

Submitted filename: Response to Reviewers 8.4.22.docx

Decision Letter 2

Heming Wang

10 Feb 2023

Interpreting Polygenic Score Effects in Sibling Analysis

PONE-D-22-07328R2

Dear Dr. Fletcher,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Heming Wang, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Heming Wang

16 Feb 2023

PONE-D-22-07328R2

Interpreting Polygenic Score Effects in Sibling Analysis

Dear Dr. Fletcher:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Heming Wang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers 8.4.22.docx

    Attachment

    Submitted filename: Response to Reviewers 8.4.22.docx

    Data Availability Statement

    The data underlying the results presented in the study are available from SPARK (Simons Foundation Powering Autism Research for Knowledge) study https://www.sfari.org/resource/spark/


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES