Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
letter
. 2023 Jul 6;110(7):1221–1223. doi: 10.1016/j.ajhg.2023.05.015

Implications of family history and polygenic risk scores for causation

Shuai Li 1,2,3,4, John L Hopper 1,
PMCID: PMC10357467  PMID: 37419094

To the Editor: Mars and colleagues1 investigated the risk associations of having a family history, and of a genome-wide polygenic risk score (PRS), for 24 common diseases using the extraordinary FinnGen resource. They first estimated the marginal associations and then the conditional associations by fitting both family history and PRS together. Creatively, they did this separately for family history defined by first- and by second-degree relatives, and the results differed. We believe that their analyses have important implications for understanding both the causes of familial aggregation and the causes of the diseases themselves, specifically: (1) there is evidence for non-genetic causes of familial aggregation, (2) the PRS appears to capture a substantial causal effect, and (3) some of the non-genetic familial causes are confounded with the PRS.

The observed familial risk ratio (FRR) for a given set of relatives, estimated as an odds ratio in this study, is a measure of the familial aggregation in a disease due to all familial cause(s). Under a multiplicative normal risk model, the familial variance in log(incidence) due to those causes is given by σ2 = log(FRR)/r, where r is the correlation between those relatives in the familial causes.2 Note that the formula indicates that familial variance would be the same, regardless of the type of relatives, if the underlying model was correct.

Assuming a model in which all of the familial variance is due to genetic factors, the FRRs for first-degree relatives (in which case r = 0.5) predict the familial variances of the investigated diseases to range from 0.31 (stroke) to 2.36 (chronic kidney disease) with a mean of 1.4. However, the FRRs for second-degree relatives (in which case r = 0.25) predict the familial variances to range from 0.27 (stroke) to 2.46 (colorectal cancer) with a mean of 1.0, different from those predicted by the FRRs for first-degree relatives (p = 10−4 from a paired t test). The authors also noted that the FRRs on the log scale (“effect sizes”) for second-degree relatives were on average about one-third of those for first-degree relatives, not half as predicted under the genes-only hypothesis. For breast cancer specifically, the familial variance would be 1.7 if based on first-degree relatives but only 1.5 if based on second-degree relatives. Such a difference is unlikely to be explained by a difference in the accuracy of the family history data between first- and second-degree relatives, given that the study measured family history using genetic relatedness and disease diagnosis data from healthcare registries for both types of relatives.

Therefore, given that second-degree relatives are unlikely to share non-genetic factors as much as first-degree relatives, for those diseases like breast cancer where the familial variances estimated from first-degree relatives are greater, not all familial factors are genetic; see (1) above. This is consistent with findings for breast cancer from the Nordic Twin Study where, under the “genes only” assumption, the variance predicted by the FRR for monozygotic twin pairs was larger than that predicted by the FRR for dizygotic twin pairs across all ages, especially at a younger age.3

Family designs can be used to assess evidence for and against causation between an exposure and an outcome, i.e., causes per se. One example is the ICE FALCON method that uses data from related individuals, such as twin and sibling pairs.4 The method estimates the marginal associations of a person’s outcome (Yself) with their own exposure (Xself) and their relative’s exposure (Xrel), and the conditional associations from fitting the two exposures together, and then considers the changes in the pairs of association estimates. Observing that a person’s exposure association does not decrease after conditioning on their relative’s exposure, but that their relative’s exposure association decreases after conditioning on the person’s exposure, is consistent with the exposure having a causal effect on the outcome. That is, asymmetry in the changes in regression coefficients is indicative of causation, even in a setting where there is also familial confounding. Similar to mediation analysis, ICE FALCON examines the changes between marginal and conditional associations, but unlike mediation analysis, the purpose of ICE FALCON is to investigate whether there is any causation underlying the association between Xself (mediator) and Yself (dependent variable).

ICE FALCON has some similarities to Mendelian randomization (MR), except that (1) it uses the relative’s measured exposure (not a person’s measured genotype) as a proxy for instrumental variables that are not observed (i.e., familial causes specific to the exposure), and (2) it makes inference from considering changes in pairs of associations, not from fitting a marginal association alone.4

Mars and colleagues essentially applied the ICE FALCON concept with two extensions: (1) they used data for unrelated rather than related individuals, and (2) the relative’s PRS (Xrel) was not measured, but proxied by the relative’s disease status (Yrel), a component of the person’s family history. That is, they fitted Yself as a function of one or both of Xself and Yrel.

Their observations that, after fitting a PRS association, the family history association decreased are consistent with the PRS having a causal effect on the investigated diseases, especially when the decrease was much more than vice versa; see (2) above. This causality might be due to the genetic variants involved in the PRS being in linkage disequilibrium with causal variants, the causal variants being directly included in the PRS, or both. Causality is further supported by the observation that the family history association decreased by a smaller magnitude after conditioning on the PRS based on fewer variants (Figure 6), which would be expected to capture the effects of fewer causal variants.

On the other hand, the observation that for some diseases the PRS association attenuated after adjusting for first-degree family history is consistent with the existence of some factors shared within nuclear families confounding the PRS association; see (3) above. The familial confounding could be due to factors such as dynastic effects, assortative mating, population stratification,5 and familial exposures.

The causation inference approach above, using family history as a proxy instrumental variable and examining the changes between marginal and conditional associations, extends ICE FALCON to use data for unrelated individuals, so it is more widely applicable given that only a small proportion of studies collect data from relatives, whereas most collect family history data. We call this general approach ICE CRISTAL (inference about causation from examining changes in regression coefficients and innovative statistical analyses). ICE CRISTAL previously provided evidence consistent with mammographic density having a causal effect on breast cancer risk.6

Both ICE CRISTAL and ICE FALCON are more powerful and flexible than MR; they can still make inference about causation even if the instrumental variable (family history or relative’s exposure) is associated with the person’s outcome through pathways not involving the person’s exposure, as in the study of Mars and colleagues, in which family history was still associated with the disease after conditioning on the PRS. Not limited to measured and “weak” genetic variants as instrumental variables, ICE CRISTAL and ICE FALCON can investigate more exposures than MR. In particular, MR cannot be used to investigate the causal effect of PRS, as there are no genetic instrumental variables for a PRS other than itself.

Familial variance, especially if estimated from monozygotic twin pairs, indicates the maximum risk discrimination that can be achieved using genetic factors. Mars and colleagues’ study implies that genetic risk discrimination differs across the investigated diseases. Based on the FRRs for first-degree relatives and assuming all familial causes are genetic, the maximum AUC by knowing all genetic factors ranged from 0.65 to 0.86, with stroke (0.65), hypertension (0.74), and coronary artery disease (0.75) having the lowest AUCs and breast cancer (0.82), colorectal cancer (0.83), and prostate cancer (0.85) having some of the highest AUCs.

Further work to address limitations and incorporate additional analyses could yield fresh insights. First, no statistical analyses of the changes between marginal and conditional associations are presented. It is therefore not possible to assess whether the decrease in the family history association after fitting PRS, or the causal effect of PRS suggested by the ICE CRISTAL analysis, is due to chance. It would be of interest to apply the methods of Freedman et al.,7 as in Martin et al.6 Second, for most diseases the absolute familial variance decreases with age. The proportion of familial variance explained by PRS, however, appears to increase with age for breast cancer,8 yet is relatively constant for colorectal cancer.9,10 It would be of interest to estimate both the familial and PRS variances by age and test if their associations, together and on their own, differ by age. Third, lack of validation of the inferred causation from ICE CRISTAL could be addressed by applying ICE FALCON to the nearly 90,000 pairs of first-degree relatives in FinnGen with PRS and disease diagnosis data.

In summary, we believe that the data of Mars and colleagues’ study, especially with the additional analyses suggested above, have important implications for understanding disease etiology and risk discrimination.

Acknowledgments

This work is supported by National Health and Medical Research Council (NHMRC; grant no. GNT2017373). S.L. is a NHMRC Emerging Leadership Fellow . J.L.H. is an NHMRC Senior Principal Research Fellow.

References

  • 1.Mars N., Lindbohm J.V., Della Briotta Parolo P., Widén E., Kaprio J., Palotie A., FinnGen. Ripatti S. Systematic comparison of family history and polygenic risk across 24 common diseases. Am. J. Hum. Genet. 2022;109:2152–2162. doi: 10.1016/j.ajhg.2022.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Clayton D.G. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Möller S., Mucci L.A., Harris J.R., Scheike T., Holst K., Halekoh U., Adami H.O., Czene K., Christensen K., Holm N.V., et al. The heritability of breast cancer among women in the Nordic Twin Study of Cancer. Cancer Epidemiol. Biomarkers Prev. 2016;25:145–150. doi: 10.1158/1055-9965.EPI-15-0913. [DOI] [PubMed] [Google Scholar]
  • 4.Li S., Bui M., Hopper J.L. Inference about causation from examination of familial confounding (ICE FALCON): a model for assessing causation analogous to Mendelian randomization. Int. J. Epidemiol. 2020;49:1259–1269. doi: 10.1093/ije/dyaa065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Brumpton B., Sanderson E., Heilbron K., Hartwig F.P., Harrison S., Vie G.Å., Cho Y., Howe L.D., Hughes A., Boomsma D.I., et al. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nat. Commun. 2020;11:3519. doi: 10.1038/s41467-020-17117-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martin L.J., Melnichouk O., Guo H., Chiarelli A.M., Hislop T.G., Yaffe M.J., Minkin S., Hopper J.L., Boyd N.F. Family history, mammographic density, and risk of breast cancer. Cancer Epidemiol. Biomarkers Prev. 2010;19:456–463. doi: 10.1158/1055-9965.EPI-09-0881. [DOI] [PubMed] [Google Scholar]
  • 7.Freedman L.S., Graubard B.I., Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Stat. Med. 1992;11:167–178. doi: 10.1002/sim.4780110204. [DOI] [PubMed] [Google Scholar]
  • 8.Li S., MacInnis R.J., Lee A., Nguyen-Dumont T., Dorling L., Carvalho S., Dite G.S., Shah M., Luccarini C., Wang Q., et al. Segregation analysis of 17,425 population-based breast cancer families: Evidence for genetic susceptibility and risk prediction. Am. J. Hum. Genet. 2022;109:1777–1788. doi: 10.1016/j.ajhg.2022.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Li S. Negative age-dependence of the polygenic risk score gradient for colorectal cancer. Gastroenterology. 2021;160:2214–2215. doi: 10.1053/j.gastro.2020.09.064. [DOI] [PubMed] [Google Scholar]
  • 10.Li S., Hopper J.L. Age dependency of the polygenic risk score for colorectal cancer. Am. J. Hum. Genet. 2021;108:525–526. doi: 10.1016/j.ajhg.2021.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES