Abstract
Polygenic indices (PGI)—the new recommended label for polygenic scores (PGS) in social science—are genetic summary scales often used to represent an individual’s liability for a disease, trait, or behavior based on the additive effects of measured genetic variants. Enthusiasm for linking genetic data with social outcomes and the inclusion of premade PGIs in social science datasets have facilitated increased uptake of PGIs in social science research—a trend that will likely continue. Yet, most social scientists lack the expertise to interpret and evaluate PGIs in social science research. Here, we provide a primer on PGIs for social scientists focusing on key concepts, unique statistical genetic considerations, and best practices in calculation, estimation, reporting, and interpretation. We summarize our recommended best practices as a checklist to aid social scientists in evaluating and interpreting studies with PGIs. We conclude by discussing the similarities between PGIs and standard social science scales and unique interpretative considerations.
Keywords: Polygenic scores, polygenic index, methods, sociogenomics, genetics, social science
1. Introduction
Polygenic indices (PGI), until recently in social science labeled and more widely known as polygenic scores (PGS)1, are increasingly permeating social science (e.g., Becker et al., 2021; Boardman & Fletcher, 2021; Mills & Tropf, 2020). PGIs are genetic summary scores typically calculated as the sum of many (often millions of) genetic variants across the genome weighted by their association with a given outcome estimated by genome-wide association studies (GWAS; discussed herein). PGIs are estimates of the additive contribution of an individual’s genetics to a trait value based on measured genetic variants.
Since their conception and initial application roughly 15 years ago (Purcell et al., 2009; Wray et al., 2007), the use of PGIs has greatly expanded with the increased availability of genotype data2 in human studies and the growth of GWASs. Millions of individuals have been genotyped with data available for research in both publicly funded databanks, such as the U.K. Biobank and the U.S. All of Us Study, and privately funded ones, like 23&Me. There is now a veritable firehose of genetic data (Conley & Fletcher, 2017). Increasingly, social scientists are drinking from this firehose of genetic data by incorporating PGIs into their research.
Facilitating the uptake of PGIs in social science, several rich, laudably accessible social science datasets, like the Health and Retirement Study (HRS), Wisconsin Longitudinal Study (WLS), and Add Health Study, include pre-made PGIs. Outcomes for which PGIs are available range from biomedical conditions, such as coronary artery disease and plasma cortisol, to various social traits and behaviors, such as educational attainment, smoking behavior, going to pubs/social clubs weekly, lifetime number of sexual partners, ever used cannabis, and age at first birth (Becker et al., 2021; Braudt & Harris, 2020; Ware et al., 2021). With the inclusion of pre-made PGIs, social scientists no longer must work with high-dimension genotype data or utilize complex statistical genetic analyses to construct these measures. In short, social scientists increasingly have access to pre-made PGIs, which they can include in their research with relative ease, and are encouraged to do so (e.g., Mills & Tropf, 2020; Harden & Koellinger, 2021).
However, the ready availability and ease of incorporating PGIs into social science research risk potential pitfalls, including misuse and misinterpretation (Boardman & Fletcher, 2021; Burt, 2023b; Coop & Przeworski, 2022a). To be sure, the potential for misuse or misinterpretation of statistical methodologies is always a hazard in scientific research, including standard3 social science studies (see Burt 2023d, for a recent example). However, the risk is amplified in social science studies using PGIs (hereafter, ‘PGI studies’) for several reasons. First, social scientists generally lack expertise in genomics and statistical genetics. Although increased opportunities for training exist, genetics expertise remains rare among social scientists (Boardman & Fletcher, 2021), and with it, the knowledge necessary for understanding PGI methods, including their assumptions and limitations.
In addition to a lack of expertise, social scientists’ ability to understand the details and complexities of PGI studies is hindered by heterogeneity in reporting, likely influenced by several intertwined factors. For one, no established standards exist for constructing or reporting on PGIs (Wand et al., 2021). Additionally, with the ongoing development of new methods, there is a greater diversity in relevant information that could be presented (e.g., Choi et al., 2020; Wand et al., 2021). Moreover, the GWASs on which PGIs are based are extraordinarily complicated and increasingly so. The methodological details for a GWAS usually do not fit in the original articles themselves and are typically relegated to extensive supplements (Burt & Munafò, 2021). Hence, a social science study using a pre-made PGS in a new and different set of analyses is tasked with a difficult challenge—providing sufficient information given the extensive analyses and specifications involved in creating these measures. Not surprisingly, the methodological details provided in some PGS studies are varied, even inadequate, which unfortunately impedes understanding, evaluation, and replicability. Indeed, a recent review of more than 200 studies developing new PGSs found that approximately 40% “did not include adequate variant information” for replication; the authors concluded: “Reproducibility has been hampered by underreporting of key PGS information” (Lambert et al., 2021, p.416).
Furthermore, due to various complexities discussed here, PGIs are simply challenging to interpret (Boardman & Fletcher, 2021; Pingault et al., 2022). As Boardman and Fletcher (2021) noted: “The unsolved difficulty of separating causal genetic effects from confounding … are only partially understood and rarely sufficiently discussed and addressed by most researchers currently integrating genetics and social science” (p. 8-9, emphasis added; also, Burt, 2023b; Keller, 2023).
To date, several authoritative review articles and books for social scientists have extensively discussed the promises of sociogenomics, in general, and potential opportunities for using PGIs in social science, in particular (e.g., Conley, 2016; Harden & Koellinger, 2020; Mills & Tropf, 2020). Additionally, several articles address issues related to complexities around genetic populations and “genetic ancestry” and the potential misuse of sociogenomic scholarship in support of racist agendas (e.g., Herd et al., 2021; Mathieson & Scally, 2020; Panofsky & Bliss, 2017). Tutorials directed to social science audiences for calculating these scores, including technical guidance for handling genotype data and using specialized computational software to construct PGIs, also exist (i.e., Mills et al., 2020). What is missing is an interpretive and evaluative methodological guide to PGIs for social scientists consuming these studies with a specific focus on best practices and potential pitfalls, including interpretation.
This paper aims to remedy this deficit by providing a primer on PGI studies accessible to social scientists who lack sociogenomics expertise. This article’s goals are introducing sociologists to the language and methods of PGI studies, fostering greater understanding of challenges and issues, and stimulating further discussion. Readers should note that I focus on studies using pre-made PGIs, given that these will typically be employed in social scientific studies.
With these aims, this article proceeds in several parts. First, I cover the genetic basics necessary to understand PGIs and inform interpretation. Then, I introduce readers to PGIs and related statistical genetics concepts and analyses, including GWASs. I explain the fundamentals of these measures—including what is measured, what is not measured, and why. Next, I address sources of complexities and confounding, which preclude the naïve interpretation of GWASs and PGSs for complex social traits as only capturing causal genetic effects or ‘genetic (versus) environmental influences’. I focus on sources of environmental and genetic confounding along with existing methods to mitigate confounding. Then, I focus on PGIs. I discuss key features of PGI construction, focusing on assumptions, potential biases, and important considerations and adjustments. Additionally, I propose a set of guidelines for reporting PGI methods and best practices, which are analogous to those proposed in the clinical arena (e.g., Wand et al., 2021) and are consistent with standard reporting practices in social science. Equally important, I provide guidance for interpreting PGSs. I offer these as well-founded recommendations, not as one-size-fits-all authoritarian requirements. Before concluding, I argue that, rather than being unique, most of the challenges with PGIs are shared with (or mirror that of) standard social science scales.
Given the complexities of PGIs, I necessarily cover considerable ground and introduce and employ esoteric genomic concepts. I provide a glossary for key concepts, italicized on the first usage in the main text, in Appendix A for reference. Although my coverage is not for the faint of heart, I aim to provide a guide that can be referenced when, for example, reading or reviewing a PGI study.
Before entering the field of sociogenomics, I first define and briefly discuss a conceptual distinction, often used but rarely described adequately, between “complex social traits” and “biological traits”.4 Although most of what I discuss is relevant to PGIs for any trait, my focus is on “complex social traits” versus “[however complex] biological traits”.
I use social traits to refer to what Searle (1995) calls “social kinds” or what Haslanger (1995) refers to as “constitutive social constructions”. On this conception, social traits, which are invariably complex, are defined, at least in part, by social forces. Unlike natural kinds and biological traits, social traits do not exist in nature apart from human creations awaiting human discovery; rather, they depend on social forces and human classifications (e.g., Hacking, 1999; Searle, 1995). For example, humans create complex social traits like “educational attainment”, typically defined as years of educational attainment, which is the most well-studied social trait in GWAS and widely used social PGI. The trait (years of) educational attainment is based on social distinctions inevitably layered on top of other social forces that exist irreducibly in a social matrix. Social traits are “not simply a matter of brute physics and biology” and do not correspond to a biological state or supervene upon biology (Searle, 1995, p.7).
By contrast, biological traits exist whether we know about them or not and are defined by a biological state (even as the scales we use to measure them and the names we give are often social creations) (Searle, 1995). Regardless of what we call these traits and whether we know about them, at a given time, humans have a weight or mass, height, blood pressure, blood glucose level, and the like. To be sure, recognizing a trait is biological does not mean underlying social forces do not significantly influence said trait. Similarly, recognizing that something is a social trait does not imply that such a trait is not shaped by bio-genetic factors or partially reflected in biological differences or that it is not “real” or cannot be studied scientifically. Rather, to say something is a social trait means that it is defined in part by human social creations and, as such, cannot be reduced to a biological or physio-material state (see Burt, 2022; Dupre, 2001, 2012; Searle, 1995, for more lengthy discussions and examples).
Moreover, the social trait—biological trait distinction is not always clear-cut and might be usefully conceived as a continuum with some traits being “more social” than others. For example, many biomedical traits exist more in the intermediate range of this continuum due to the role of social forces in creating typological distinctions where they do not exist naturally or somewhat arbitrary thresholds (e.g., “high versus low blood pressures”; ADHD diagnoses, even autism spectrum disorder) (Burt, 2015; Press, 2006). Importantly, recognizing that some traits are more intermediate and that whether a trait is social or biological is not always clear does not undermine the value of the distinction. After all, to use a familiar adage, there is no clear-cut distinction between night and day, but the difference between night and day is often both clear and useful.
Having charted a roadmap and clarified what I mean by a complex social trait, I now turn to our primer on basic genetics underlying PGIs. Scholars familiar with basic genetics and statistical genetic terminology may skip this section.
2. Fundamentals Underlying PGSs: Genetic Basics & Relevant Genetic Concepts
The human genome comprises roughly 3 billion base pairs (bp) spread across 46 chromosomes. All going well, humans inherit two copies of each (non-sex5) chromosome: one maternal and one paternal set. Each chromosome is a single, double-stranded molecule of DNA arranged in the famous double helix. At a given position on our genome, we have one of four bases (or nucleotides): adenine (A), thymine (T), guanine (G), or cytosine (C). The order of these bases on our chromosomes is our genetic code.
Several basic genetic concepts are key to understanding PGSs and are often employed in research using these scores. Genomics uses the term locus (pl. loci) for a position on the genome; this can refer to the position of a single base or a broader region encompassing several million base pairs (Mbp). A genetic variant refers to a difference in the bases at a particular position on the genome or locus. The term genome refers to the DNA in an organism (or cell), i.e., the entire DNA sequence of an organism. Genomics, then, is the study of the structure and function of the entire genome. Genes are specific segments of DNA sequences that code for functional (protein or RNA) products. For example, the well-known BRCA1 gene is the DNA sequence (roughly 81,000 bases) encoding the ‘breast cancer type 1 susceptibility protein’. To put variants in a genetic context, approximately 34,500 variants have been identified in the BRCA1 gene; of these, ~7500 have been validated by experts, and roughly 3700 variants have been classified as disease-associated (BRCA Exchange, 2022).6 Altogether, humans have about ~20,000 (protein-coding) genes, which constitute about 1.2% of our genome.
As is often repeated, any two given humans are >99.5% genetically similar. Specifically, if we compare two random, conventionally unrelated7 people to each other (or one person to the reference genome), we will observe 3.5 to 4.5 million variant differences between them. In addition to providing information about population demographic histories, genetic variants are studied because functional variants influence differences in traits and disease susceptibility by altering what is produced or how much. Although some rare deleterious variants can have large, debilitating effects, most variants in our genomes are benign and persist because they do not have major effects on fitness (Taliun et al., 2021; Telenti et al., 2016). Evidence suggests that, on average, individuals have ~12,000 coding variants (i.e., variants located in protein-coding genes) that alter the amino acid sequence and hence protein product (Taliun et al., 2021).
Most genetic variants are differences at a single base (e.g., an A or C) at a given position on a chromosome. These single-base variants, known as single nucleotide variants, account for approximately 87% of the variants in an individual’s genome (Strachan & Read, 2018). Alleles refer to alternative versions of a given variant (e.g., allele A or allele C). Most single nucleotide variants are biallelic (i.e., two forms), but some are multiallelic (e.g., A, C, or T).8 Biallelic variants, which are the focus of GWASs and PGIs and thus our focus here, are conventionally described by the frequency of the less common or minor allele as minor allele frequency (MAF), which of course varies from >0% – < 50%. Polymorphism refers to a “common” variant in a population; common is typically defined as having a MAF of >1%.
The alleles one possesses for a given genetic variant are called one’s genotype.9 Given that we carry two copies of each (non-sex) chromosome, one’s genotype for a variant is defined by the two alleles one carries. For example, for a biallelic SNP with alleles A or C, one’s genotype can be AA, AC, or CC. In statistical genetics analyses, including GWASs and PGSs, the categorical measure of genotype for a biallelic variant is converted to a quantitative measure of minor allele dosage calculated as the number of minor alleles one carries (0, 1, or 2).
In addition to single nucleotide variants, some variants involve changes in the number or sequence of bases or multiple bases. These can be roughly categorized into two broad classes based on size. Although the terminology is inconsistent, one class includes smaller insertion-deletions (indels or delins) and copy number variants (CNVs), such as a sequence TTACTGC repeated three to six times. On average, smaller indels and CNVs constitute ~13% of variants in each human genome. The other broad class known as structural variants (SVs) are larger DNA rearrangements, which can involve millions of base pairs. Although relatively uncommon (accounting for only ~0.15% of variants or an average of 7500 per human genome), structural variants account for more overall base differences between people given their size (Collins et al., 2020; Sudmant et al., 2015) and have a disproportionate role in shaping human differences compared to other variants (Sudmant et al., 2015; Takumi & Tamada, 2018). For example, structural duplications can involve multiple copies of genes and thus influence the overproduction of a given protein; structural deletions can involve the deletion of an entire gene and the underproduction of the functional protein.
PGIs are based on biallelic single nucleotide polymorphisms (SNPs), with few exceptions. SNPs are single nucleotide variants that are polymorphisms (i.e., are common). Depending on how common is defined, the human population has between 10 to 20 million SNPs. Although most single nucleotide variants in an individual’s genome are SNPs (i.e., are common), most single nucleotide variants in the human population are not common (i.e., are not SNPs). Fathoming this seeming paradox requires understanding the sources of genetic variation. On average, each person inherits between 45 to 80 de novo mutations (i.e., variants not present in either of their parents), which arise during gametogenesis (Acuna-Hidalgo et al., 2016). Over time, evolutionary forces positively select advantageous mutations and negatively select deleterious mutations, and many non-functional, neutral variants come along for the ride. Thus, most of the ~4 million variants in an individual’s genome are relatively ancient and common, having accumulated and spread over hundreds to thousands of generations.10 Most SNPs predate the Out-of-Africa dispersal11 of humans some 50k-100k years ago (Henn et al., 2012; McEvoy et al., 2011; Skoglund & Mathieson, 2018) and are observed in all human populations, having persisted across generations because they have no deleterious effects.
On the other hand, most variants in the population are rare and recent due to the explosion of the human population in the past two thousand, especially 200 years (Fu et al., 2013; Tennessen et al., 2012). More than 450 million validated variants have been identified in the human genome, translating to one variant every seven bases, on average (Taliun et al., 2021). Population geneticists surmise that every single nucleotide variant compatible with life exists in some living human (McClellan & King, 2010). Importantly, recent, rare variants are much more likely to be functional and deleterious as they have yet to be eliminated by negative or purifying selection (Fu et al., 2013). Awareness of the different distribution and effects of common and rare variants is helpful for understanding the limitations of what PGIs measure, as I discuss below.
3.0. Introduction to PGIs and Genome-Wide Association Studies (GWASs)
3.1. Polygenic Indices (PGI) (a.k.a., Polygenic Scores (PGS))
Inherited susceptibility to conditions such as cystic fibrosis and sickle cell anemia can be caused by rare, pathogenic variants at a single locus (“monogenic” disorders). In contrast, complex traits are multifactorial in etiology—shaped by both environmental and genetic influences—and polygenic—i.e., associated with numerous genetic variants, most of which explain only a minuscule proportion of trait variance (r2 <.005%) (Visscher et al 2017; Wray et al. 2021).12 The massive polygenicity of complex traits with relatively weak average effect sizes and largely unknown biology means that the alleles one has at any given locus are largely uninformative about mechanisms and individual risks.
PGIs were developed to aggregate these weak and typically causally non-specific genetic associations between SNPs and phenotypes into a single measure or summary genetic ‘predictor’ (Dudbridge, 2013; Wray et al., 2007). PGIs are now a standard downstream application of GWASs and are increasingly employed in the biomedical and social sciences (Wray et al., 2019). A variety of terms have been used for PGI, including most commonly polygenic scores (PGSs), polygenic risk scores (PRS), genome-wide polygenic scores (GPS), and genetic risk scores (GRS), with the ‘risk’ label usually limited to biomedical research. Recently, the label polygenic index (PGI) was proposed for social science scholarship to avoid the term ‘score’, which may imply a value judgment. Given our social science focus, I use PGI (see note 1). Whatever the label, PGIs can be formally defined as individual genetic summary scores representing the additive genetic associations with some outcome based on an individual’s genotype data, typically SNPs. Becker et al. (2021) aptly describe a PGI as a “noisy measure of a latent variable [they] call the ‘additive SNP factor’” (p.1744).
Turning to construction, PGIs are typically created as the sum of the weighted SNP allele dosages, where weights are based on effect sizes estimated from a GWAS. Thus, understanding the basics of a GWAS is necessary for understanding a PGI.
3.2. Genome-Wide Association Studies (GWAS)
3.2.1. Conceptual Approach
A GWAS is a (directional) hypothesis-free regression-based data mining technique to ‘scan the genome’ for trait-associated loci (sometimes called ‘risk loci’ or quantitative trait loci (QTLs)) (Balding, 2006; Klein et al., 2005). Typically, GWASs involve analyzing a preselected set of several hundred thousand to more than 10 million SNPs (although other variant forms (e.g., CNVs) can and have on occasion been examined). In most cases, SNPs are not selected because they have some biological effect—as most do not. Instead, as discussed below, SNPs are selected to ‘tag’ regions of common genetic variation. Thus, the set of SNPs examined for GWASs (and used in PGSs) are typically not based on any putative relevance for a given outcome. In practice, this means that usually the same set of SNPs are used in GWASs (and PGSs) for all traits.
Although a thorough discussion is out of scope, understanding the GWAS tag SNP methodology—and thus what PGSs aggregate—requires a brief foray into facets of meiosis and inheritance (see Marees et al., 2018; Mills et al., 2020; Uffelmann et al., 2021, for more detail). Each chromosome is a composite of segments of our parents’ matching chromosomes (one of each they inherited from their parents). On average, 1.5 segments per chromosome are exchanged in meiosis (called meiotic recombination or “crossing over”), which creates unique chromosomal combinations (except in the case of identical twins). Over time, these chromosomal segments are passed down mostly unbroken over generations, creating blocklike segments of the human genome—known as haplotype blocks—composed of correlated SNPs. The correlated SNPs that tend to be inherited together on haplotype are said to be in linkage disequilibrium (LD)—that is, they are not inherited independently of one another (linkage equilibrium).
Although varying, haplotype blocks average ~50kbp in size and contain an average of 50 SNPs (The International HapMap Consortium, 2005). With 50 SNPs, the number of potential combinations of SNPs on a haplotype block is, of course, 2^50; however, most variation in most haplotype blocks are characterized by five or fewer haplotypes (i.e., combinations of SNPs) (The International HapMap Consortium, 2005; Strachan & Read, 2018). A particular combination of SNPs on a block is known as a haplotype. Notably, haplotypes are defined by common variants. Newer, rarer variation exists as heterogeneity around the common, ancient SNPs that define haplotype blocks.
The GWAS methodology relies on the correlated haplotype structure of our genome. Knowledge of the patterns of LD throughout the genome has informed the design of efficient genotyping arrays or “SNP chips” used to measure preselected variants. Rather than examining every nucleotide across the genome, which is impracticable, GWASs employ SNPs selected to capture or proxy a region of common variation—hence, the name ‘tag SNPs’ (Carlson et al., 2004). In other words, GWASs aim to identify trait-associated loci defined by common variants using tag SNPs.
3.2.2. Statistical Model
GWASs employ standard econometric models (at scale) to regress a phenotype (a measurable disease, trait, or behavior, such as T1 diabetes or years of educational attainment) on each SNP one at a time, typically net of controls for age, sex, and genetic ancestry/similarity13 (discussed more below). GWAS regressions provide slope estimates or effect sizes for each SNP, which indicates whether there are systematic differences in allele frequencies between groups stratified by trait values or case-control status (e.g., years of education, disease cases versus controls). Because GWASs analyze SNPs one at a time, the effect size estimate for each SNP represents the effect of that SNP on the trait along with the effects of all the variants correlated with and thus tagged by the SNP (as a function of the strength of their correlation).
As noted, SNP associations with traits are almost always relatively weak, necessitating enormous sample sizes for statistical power with adjustments for multiple testing. Typically, the genome-wide significance level for GWASs is p < 5x10−8, corresponding to a Bonferroni adjustment for one million independent hypothesis tests. Most GWASs, called standard GWAS (aka, population-based GWASs) are conducted on a sample of conventionally unrelated persons (contrasted with within-family GWASs discussed later). Most social science GWASs are sample-size weighted meta-analyses. For example, the recent prominent EA4 GWAS included ~10 million SNPs and >3 million individuals of “European genetic ancestries” from 71 cohorts/samples that were (meta14) meta-analyzed (e.g., Okbay et al., 2022).
The focal results from a GWAS are the SNP beta coefficients (the effect size estimate for each SNP) and their associated p-values. These estimates are compiled by advanced computational software into results known as summary statistics, which contain no individual information and thus can be freely shared. These summary statistics are then used for downstream analyses, including PGIs.
3.2.3. Interpretation and Challenges
As is hopefully clear, GWASs do not measure genes and typically do not identify causal variants. Rather, GWASs employ tag SNPs—most of which have no known or expected biological function—to identify trait-associated loci (QTLs) where the causal variant(s) is presumed to lie.
Although GWASs were initially motivated by the aim of moving from a SNP or QTL to a causal variant acting on a gene with a defined function, this (functional annotation) is typically a fraught endeavor for several reasons. For one, the QTLs identified by SNPs can be large (>1Mbp) and contain dozens of genes, hundreds of SNPs, and several hundred thousand variants. Various fine-mapping and functional follow-up analyses can be employed to try to identify causal variants and genes; however, these methods are complicated and based on incomplete knowledge of gene function. Thus, fine-mapping results are often ambiguous, especially for biologically distal complex social traits (Nicholls et al., 2020; Schaid et al., 2018).
In short, GWASs provide only crude information about SNP-trait associations and trait-associated loci rather than information on causal variants, genes, or biological systems in which they act (see also, Burt 2023b). Moreover, SVs and rare variants are not well tagged by SNPs (Backman et al., 2021; Tam et al., 2019); thus, these disproportionately functional and deleterious variants exist as additional genetic variation, which is not adequately covered.
Rather than identifying specific causal mechanisms, a major goal of GWASs is estimating direct genetic effects of variants at a given tagged locus—as the causal path from an individual’s genotype to his/her trait value through some usually unknown biological mechanisms (#1 and #2 in Box 1). Direct genetic effects can be understood as variant substitution effects:
"as the (counterfactual) change in an individual’s phenotype as a result of changing that individual’s genotype from conception (holding all else constant). … The mechanism that cascades from a variant substitution may be entirely molecular, for example, altering gene expression that leads to disease, or it may be more complex and external, for example, influencing behavior that leads to environmental changes that, in turn, influence the phenotype. In both cases, there is a causal path from an individual’s genotype to their phenotype that reflects a counterfactual model”
(Morris et al., 2020, p.1; also, Freese, 2008).
Box 1. Sources of SNP-Trait Associations.
(1) Direct Genetic Effect: The SNP itself is causally (functionally) associated with the trait (uncommon; most SNPs are non-functional).
(2) Direct Genetic Effect Proxy: The SNP tags the effects of a nearby causal variant(s) with which it is correlated in LD). This is the effect GWASs are designed to detect.
(3) Genetic Confounding: The SNP tags the effects of causal variants across the genome (not in proximity) with which it is correlated (long-range LD); due to population stratification or AM; this is genetic confounding of GWASs estimates.
(4) Environmental Confounding: The SNP captures or reflects different trait-associated social-environmental exposures experienced by population subgroups (substructure) that are more strongly related (i.e., population stratification). This is environmental confounding of GWAS estimates.
(5) "Social Genetic Effect': The SNP captures the genetic effects of a given variant/QTL in relatives who influence us socially (i.e., social 'genetic' effects).
Crucially, the interpretation of standard GWAS estimates as direct genetic effects of variants at a given locus requires the randomization of variants across genetic and (trait-relevant) environmental backgrounds (Veller & Coop, 2023; Weir, 2008). Of course, genotypes are not randomized across genetic or environmental backgrounds. Instead, people who are more genetically similar (or closely related, even distantly) tend to develop in more similar sociopolitical, cultural, and physical environments. Consequently, standard GWAS estimates (and thus PGIs) are often environmentally confounded by associative effects from parents, siblings, and even peers; biases from assortative mating of parents; and sociocultural and physical environmental influences that are correlated with genetic background (Abdellaoui et al., 2022; Lawson et al., 2020; Veller & Coop, 2023; Young et al., 2019). The potential for substantially confounded estimates is particularly acute for complex social traits. The different sources of genotype-phenotype associations are summarized in Box 1.
In the next section, I introduce the population genetic phenomena that can distort (disproportionately inflate) estimates of direct genetic effects from standard GWASs (and PGIs derived from them), which foregrounds our subsequent discussion of necessary adjustments when the aim is to estimate genetic (versus environmental) effects as is common. Our discussion is necessarily condensed; however, the multiple complex and interrelated sources of confounding—population (sub)structure, assortative mating, ‘social genetic effects’, including familial/dynastic and cultural transmission—have been and continue to be extensively discussed (see e.g., Abdellaoui et al., 2022; Burt, 2023b; Coop & Przeworski, 2022b; Feldman & Ramachandran, 2018; Morris et al., 2020; Richardson & Jones, 2019; Rosenberg et al., 2019; Veller & Coop, 2023; Young et al., 2019).
Importantly, not all utilizations of PGSs are based on their capturing only (direct) genetic effects or genetic (versus environmental) effects. Thus, the extent to which the environmental confounding of PGSs is a limitation depends on study aims (e.g., Boardman & Fletcher, 2021; Fletcher, 2023; Plomin & von Stumm, 2022). Currently, PGSs are typically employed in social science research to capture genetic (versus environmental) influences, which is why understanding sources of confounding and adjustments is important (Boardman & Fletcher, 2021; Burt, 2023b).
4. Sources of Confounding of GWAS Estimates (& PGIs)
4.1. Population Stratification
Perhaps the most widely discussed source of bias in genetic association studies is population stratification, which refers to false-positive or inflated genotype-phenotype associations due to (uncorrected) population substructure (i.e., population structure + phenotype stratification) (Astle & Balding, 2009; Cardon & Palmer, 2003; Lander & Schork, 1994). Allele frequencies differ not only between broad population groups—differences which are the basis of popular genetic ancestry tests—but also within ostensibly homogenous population subgroups due to non-random mating (i.e., the tendency to affiliate and mate with people sharing similar sociocultural backgrounds and in the same geographic region) in concert with random genetic drift and/or selection. This resulting systematic allele frequency differences between population subgroups reflecting greater genetic similarity is known as population (genetic sub)structure. All populations have a structure, which often manifests as geographical structure (e.g., Abdellaoui et al., 2022; Haworth et al., 2019).
Uncontrolled population structure induces genetic confounding of GWAS associations when causal variants are structured. In such cases, non-causal structured SNPs will capture (and thus be biased by) the effect of structured causal variants spread across the genome, even when these variants are not in LD within a population (see Veller & Coop, 2023). Consequently, the effects of causal variants will be inflated (as multiple counted), and spurious SNP-trait associations (and QTLs) will be induced by structured causal variants elsewhere in the genome (#3 in Box 1) (Astle & Balding, 2009).
Conversely, uncontrolled population structure generates environmental confounding when trait-associated environmental exposures (sociocultural, political, or physical/geographical) vary by subgroup membership. In such cases, structured variants will capture the effects of subgroup differential exposures to socio-environmental factors inducing spurious or inflated genetic associations (#4 in Box 1) (Morris et al., 2020; Young et al., 2019). The classic example used to illustrate population structure confounding is a GWAS of chopstick-eating skills (Hamer, 2000; Lander & Schork, 1994). “While there surely are genetic variants affecting our ability to handle chopsticks, most of the variation for this trait across the globe is due to environmental differences (cultural background), and a GWAS would mostly identify variants that had nothing to do with chopstick skills, but simply happened to differ in frequency between East Asia and the rest of the world” (Barton et al., 2019, p.2). For complex social traits of interest to social scientists, which are indelibly shaped by social structural and cultural processes, the specter of environmental confounding of GWAS and PGI estimates is significant.
4.2. Social Genetic Effects/Associative Effects
A second source of environmental confounding is social genetic effects or associative effects (#5 in Box 1). Such ‘indirect genetic effects’15 occur when the phenotype of an individual is influenced by the (genetically influenced) phenotype of others with whom they are more strongly related (Griffing, 1967; Kemper et al., 2021; Lynch & Walsh, 1998). As we all know, (non-adopted) children inherit not only ½ of their parents’ genotypes but also their environments, which include parenting practices and interests, economic resources, culture, and the like. Associative effects include familial effects, which are associations between inherited SNPs and offspring phenotypes due to the variants’ effects on the parents’ phenotype (e.g., Kong et al., 2018; Young et al., 2019). Assume, for illustration, that variants in gene ADH1B associated with alcohol metabolism affect alcohol use. ADH1B variants in the parental generation may influence parental phenotypes (alcohol use) and alcohol-use-associated rearing environments (norms and customs around, even the availability of, alcohol), thereby influencing offspring alcohol use. Standard genetic associations will reflect both direct influences of ADH1B variants and social genetic effects from the parents’ phenotypes.
Importantly, associative effects are not limited to parenting practices and family environments but may also include the effects of intergenerational transmission of (dis)advantages, which are sometimes referred to as ‘dynastic effects’, as well as broader influences such as neighborhoods, social networks, and other forces (Abdellaoui et al., 2022; Brumpton et al., 2020; Veller & Coop, 2023; Young et al., 2019). In short, standard GWAS estimates as direct genetic effects are usually confounded (typically inflated for social traits) by associative effects. Moreover, these (additive) biases may be exacerbated by gene x (correlated) environment interactions (Veller & Coop, 2023).
4.3. Assortative Mating
Another phenomenon that can bias genotype-phenotype associations is assortative mating—which occurs when individuals selectively mate based on phenotypic similarity and social homogamy (Morris et al., 2020). Humans mate selectively on both physical (height, BMI) and social (socioeconomic status, education) characteristics (Domingue et al., 2014; Robinson et al., 2017; Yengo et al., 2018). When selected phenotypes have a genetic component—as most do—then assortative mating induces increased genetic similarity, as a bundling of trait-associated alleles between mates that is greater than what would be observed under random mating (Morris et al., 2020; Veller & Coop, 2023).
Like population stratification, assortative mating can induce correlations between variants across the genome (long-range LD), which can bias estimates of the effect of genotype on phenotypes (#3 in Box 1) (Hartwig et al., 2018; Lynch & Walsh, 2018; Yengo et al., 2018). Additionally, assortative mating can exacerbate biases from population stratification, when it is subpopulation specific, and familial/dynastic effects, via disproportionate inheritance of the environment (see Morris et al., 2020). In short, assortative mating can inflate estimates of heritability, genetic correlations, and most importantly for our purposes, bias GWASs associations and PGIs (Brumpton et al., 2020; Howe et al., 2022; Kong et al., 2018; Yengo et al., 2018).
In sum, genetic associations only reflect direct genetic effects to the extent that there is no unmeasured (trait-relevant) population stratification, associative effects, or assortative mating. When these phenomena exist and are insufficiently controlled, as is invariably the case in standard GWASs of social traits, “estimates of genetic associations will be biased due to hidden correlations in the data and incorrectly attributed to genetic effects” (Morris et al., 2020, p.3).
5. Adjustments to Mitigate Confounding of GWAS Estimates
The potential for confounding of genetic associations has long been recognized, and thus a variety of methods have been developed to ameliorate confounding and isolate direct genetic effects (e.g., Cardon & Palmer, 2003; Kong et al., 2018; Lander & Schork, 1994; Price et al., 2006; Pritchard & Rosenberg, 1999).
5.1. Standard GWAS Adjustments
The most basic adjustment for relatedness confounding (as population structure) is analyzing a single broad (crude) ‘ancestral’ group. Although necessary in most cases, this adjustment is invariably insufficient because, as I have noted, fine-scale (recent) population substructure exists even within relatively homogenous samples from a single location, such as individuals of ‘European genetic ancestries’ from Britain, Finland, or Iceland (Cook et al., 2020; Haworth et al., 2019; Helgason et al., 2005; Kerminen et al., 2017). Importantly, the human population is not composed of distinct population subgroups, even as groups that are more genetically similar relative to other groups can invariably be identified (see note 12; also Barton et al., 2019; Coop & Przeworski, 2022). Thus, additional adjustments for population substructure are necessary.
Various ‘within-population’ adjustments to standard GWASs have been proposed over the past few decades, most of which are highly statistically sophisticated and computationally intensive. A thorough discussion of these options is decidedly out of scope (but see Appendix C for a brief introduction to PC adjustment, given their use in PGI studies). Briefly, common adjustments for confounding include preventative approaches, which adjust estimates by controlling for estimates of genetic ‘ancestries’ by including the top 10 to 20 principal components of ‘genetic ancestry’/similarity (Price et al., 2006) or mixed-linear modeling using the genetic relatedness matrix (Yang et al., 2014; Yu et al., 2006). Other downstream approaches, which aim to detect and correct for relatedness confounding after the fact (e.g., by adjusting/inflating standard errors), include methods such as linkage disequilibrium score regression (LDSC) (Bulik-Sullivan et al., 2015) and LDAK (Speed & Balding, 2019; Speed et al., 2012).
Key for our purposes, evidence suggests that these statistical techniques reduce but do not eliminate relatedness confounding (e.g., Berg et al., 2019; Dandine-Roulland et al., 2016; Haworth et al., 2019; Lawson et al., 2020; Mostafavi et al., 2020; Sohail et al., 2019; Veller & Coop, 2023; Zaidi & Mathieson, 2020). Specifically, these various adjustments partially control for some sources of confounding but not all forms and/or rely on assumptions that are likely to definitively violated in practice (e.g., Border et al., 2022; Veller & Coop, 2023). For example, one of the more common adjustment methods, LDSC, which was used in the recent EA3 and EA4 GWASs, aims to separate population stratification biases from causal genetic effects based on assumptions (e.g., no long-range LD) likely violated by population stratification (or assortative mating). Moreover, LDSC does not adjust for social genetic effects or assortative mating biases (Border et al., 2022; Bulik-Sullivan et al., 2015).
Thus, even with these adjustments, standard GWASs estimates of SNP effect sizes remain biased (disproportionately inflated) to some degree by genetic and environmental confounding. This persistence of relatedness confounding has spurred renewed emphasis on within-family methods.
5.2. Within-family GWASs
Standard GWASs, discussed above, analyze a sample of conventionally unrelated individuals to estimate population-based effect sizes. In contrast, within-family GWASs estimate allelic effects within families, controlling for shared genetic and environmental backgrounds (population stratification and parental/dynastic effects) via family fixed effects (see Veller & Coop, 2023; Young et al., 2019). Within-family GWASs come in different forms, but the most common is a sibling-difference (or sib-ship) GWAS, which takes advantage of the random allocation of genotypes to offspring within families. Sibship GWASs estimate sibling differences in trait values based on sibling differences in genotypes (as deviations from the mean sibling genotype), controlling for the mean sibling genotype (e.g., Howe et al., 2022; Lee et al., 2018).
Consistent with evidence from other methods, comparisons of sibship estimates to that of standard GWASs suggest both that confounding of standard GWAS estimates of direct genetic effects persist even with adjustments and that this confounding can be substantially mitigated by the estimation of allelic effects within families (Choi et al., 2020; Veller & Coop, 2023; Young et al., 2019). Available evidence suggests that bias from residual confounding is particularly acute for several complex social traits, including, inter alia, educational attainment, number of children, depressive symptoms, and age at first birth (Howe et al., 2022). Thus, many complex social traits, within-family GWAS estimates tend to be substantially attenuated—by as much as one-half to two-thirds (Brumpton et al., 2020; Howe et al., 2022; Lee et al., 2018; Mostafavi et al., 2020). However, for some traits, particularly more proximal biological or clinical/biomedical traits, within-family GWAS estimates tend to more closely resemble standard GWAS estimates. For illustration, Howe et al. (2022) found that the effect size reduction for more proximal biological traits like height or BMI was moderate (<15%) and minimal to non-existent for ‘clinical phenotypes’ like C-reactive protein, cholesterol (HDL and LDL), glomerular filtration rate (eGFR), and similar.
To be sure, within-family GWASs are not a panacea. Although they largely reduce confounding, within-family GWASs do not eliminate environmental confounding due to, for example, sibling indirect effects or genetic confounding, such as intra-chromosomal long-range LD from non-random mating (see Fletcher & Boardman, 2021; Veller & Coop, 2023). Moreover, genotype-genotype (G x G) and genotype-environment (G x E) interactions further complicate matters (Veller & Coop, 2023). What is more, within-family studies have conceptual/generalizability issues. That is to say, the question “what causes differences between siblings in the same family?” is not the same as “what causes differences between unrelated people in the population?” (see Burt, 2023a; 2023b; Coop & Przeworski, 2022; Curtis, 2023). In sum, as Veller and Coop (2023) conclude, it is clear that family-based studies are “a clear step forward towards quantifying genetic effects…[but] these designs come with their own set of caveats” (p.29).
Importantly for our purposes, almost all PGIs are based on standard GWASs. As more effort is made to genotype families in concert with the development of novel methods (e.g., Young et al., 2022), this may begin to change. For now, readers should note that residual confounding in GWAS estimates for (most to all) complex social traits persists. Moreover, with the increased scale and statistical power of GWASs, both the potential for and degree of bias may be exacerbated, as even subtle confounding may be picked up in increasingly enormous GWASs (Morris et al., 2020). Moreover, the potential for bias is heightened in PGIs. As Berg et al. (2019) explain:
“…while the bias in detection and effect sizes at any individual locus is small, the systematic nature of biases across many loci compound to significant errors at the level of polygenic scores. This error substantially inflates the proportion of the variance in polygenic scores that is among populations. Individual level prediction efforts therefore suffer dramatically from stratification bias, as even small differences in ancestry will be inadvertently translated into large differences in predicted phenotype”
(p.14; also, Haworth et al., 2019).
5.3. Summary
For GWAS associations and PGIs to capture unconfounded direct genetic effects, there must be no uncontrolled biases from population stratification, assortative mating, or social genetic effects. For pre-made PGIs, standard GWASs are typically used, and available evidence suggests that substantial residual biases persist for complex social phenotypes. Thus, standard PGIs are not appropriately interpreted as reflecting only direct genetic effects or “genetic versus environmental effects”. As Coop and Przeworski (2022) note: “PGS for traits based on standard GWAS are not estimates of direct genetic effects alone, and their predictive power derives from all these effects [population stratification, assortative mating, or social genetic effects] combined”.
In the preceding discussions, I have sought to introduce readers to genomics terminology, conceptual and statistical models, and challenges in estimating genetic associations aggregated in PGIs. In what follows, I link these discussions to constructing and interpreting PGIs for social science, focusing on best practices and potential pitfalls.
6. Guidance and Recommendations for PGI Studies
In this section, I discuss options and propose a set of best practices for PGI studies for social scientists. Basically, I translate standard best practices for social science research to PGI studies. Notably, these recommended best practices do not reflect a consensus among scholars, even as these are well-founded and echo existing recommendations from adjacent fields (e.g., Wand et al., 2021). There is never a one-size-fits-all set of standards for quantitative studies, and PGI studies are no exception. I offer this guide as a helpful starting point for evaluation and interpretation. These recommendations can and should be revised and expanded as knowledge develops and new methods emerge.
A summary of these best practices can be found in Table 1 as a checklist separated into three rough categories: descriptive, technical, and interpretive. Notably, I present the checklist in the order that will be useful for reading and evaluating PGI studies, but I discuss these issues in a different order. Specifically, I discuss technical issues before descriptive ones because explaining the technical adjustments is necessary to understand the rationale for the descriptive recommendations.
Table 1. Summary Checklist for Evaluating Research with PGIs.
Note: Gray font denotes information that need not be repeated in every study (e.g., can reference technical reports).
|
This coverage is not intended to be exhaustive. Specifically, I do not cover standard quality control (QC) practices involved prior to the analysis of genetic data. Many of these practices are generally performed by statistical geneticists before the genetic data are made available to social scientists and are quite complex, making adequate coverage impracticable here.16 I also do not cover issues related to conventional measurement error or an approach for dealing with PGI measurement error (see, e.g., Becker et al., 2021; Pingault et al., 2022).17 I also reiterate that this is not intended as a “how to do PGI studies” guide (see Choi et al., 2021; Mills et al., 2020); instead, this is a guide to aid social scientists in interpreting and evaluating these studies.
In what follows, I highlight descriptive and technical information for studies using pre-made PGIs. In some cases, existing articles or reports provide certain details (identified with gray font in Table 1), which need not be repeated in every study using these PGIs. In these cases, these reports can be referenced in lieu of repeating information insofar as these sources contain sufficient information, remain up to date, and are accessible.18
6.1. Reporting and Technical Best Practices
Recall, PGIs are individual summary scores created as the sum of the count of minor alleles (0, 1, or 2) for each SNP weighted by its strength of association with the outcome estimated from GWASs.
As with other statistical analyses, analyses with PGIs rely upon several methodological assumptions, involve technical adjustments, and can be constructed with various methods and parameter specifications. A key assumption of PGIs is that genetic influences proxied by SNPs act additively and can be analyzed as a (weighted) linear composite. The technical specifications can be classified across three stages: GWAS estimation, PGI calculation, and study analyses with PGSs as a variable. These technical features are summarized in section B of Table 1.
6.1.1. GWAS Adjustments/Technical Specifications
When PGIs are used to reflect genetic (versus environmental) influences, as is common, how the GWAS adjusted for population stratification should be specified (see B.1 in Table 1).19 As noted, common options for mitigation include controlling for 10 to 20 genetic ‘ancestry’ PCs, linear mixed modeling, LD-Score Regression (LDSC), and, in a few cases, PC controls for common and rare variants or IBD segments, which more adequately capture recent, fine-scale structure (Bulik-Sullivan et al., 2015; Price et al., 2006; Zaidi & Mathieson, 2020).
Within-family GWASs are another option. As noted, this approach corrects for the bulk of confounding and social genetic effects. However, within-family GWASs remain limited in size. Thus, reduced bias comes at the cost of lower statistical power and precision. Given this, some scholars have suggested a two-stage process in which a traditional unrelated GWAS is estimated (with standard PS controls) to determine p-values, with effect size estimates adjusted downward based on the results from the sibling-based GWAS (e.g., Choi et al., 2020). This approach, which reduces but does not eliminate stratification biases (Zaidi & Mathieson, 2020), is not common in practice, even in GWAS studies that laudably replicate standard GWAS with a sibship GWAS to estimate confounding (and thereby creating the summary statistics to create such PGSs (e.g., Lee et al., 2018)). In short, given the different approaches to adjusting for confounding and distinct strengths and weaknesses, the adjustment method(s) used to create the weights on which a PGI is based is key information that should be specified (better) or easily accessible (satisfactory).
6.1.2. PGS Calculation (Adjusting for Uncertainty and LD in PGS)
PGS construction should involve adjustments to account for uncertainty in estimates due to sample-specific features (Janssens, 2019). As GWASs are data-mining techniques estimated on specific samples drawn from finite subsets of the human population, like all inferential statistics estimates, “the resulting SNP effect size estimates are some combination of true effect and stochastic variation” (Choi et al., 2020). This produces sample-specific error, including what is known as ‘winner’s curse’ among the strongest associations with (inflated) estimated effects that may not generalize well to the population (Shi et al., 2016; Zöllner & Pritchard, 2007). As Choi et al. (2020) note: “Given that SNP effects are estimated with uncertainty, and since not all SNPs [included in a GWAS] affect the trait under study, the use of unadjusted effect size estimates of all SNPs could generate poorly estimated [PGIs] with high standard error[s].” For the typical goal of creating non-sample-specific genetic predictors, this uncertainty should be taken into account (see Becker et al., 2021).
The second complication is the non-independence of (i.e., LD between) SNPs. Although GWASs examine each SNP one at a time, as discussed, SNPs are not inherited independently but tend to be inherited together on chromosomal blocks (haplotype blocks), producing strong correlations between SNPs in proximity (i.e., high LD). The result is a cluster of SNPs in a region associated with the outcome (creating the ‘skyscraper’ look in the Manhattan plots used to depict GWAS findings; see Appendix B Figure 1). Simply aggregating these SNP effects would double count the same effect(s) driving the observed association. Using a social analogy, this would be akin to creating a predictive measure for street crime, which summed together the weighted effect sizes from separate regressions of a variety of items on street crime, including the following: Are you aged 12-25? Are you aged 14-28? Are you aged 13-18? Are you aged 10-21?
Thus, PGS calculations should involve adjusting GWAS effect sizes for uncertainty/winner’s curse and LD (see Table 1, B2). Surprisingly, some pre-made PGIs, such as those in the HRS and Add Health, do not account for these issues and simply compute what might be called an ‘unadjusted PGI’ (Braudt & Harris, 2020; Ware et al., 2021).
There are two basic approaches for these adjustments, which are automated in various software programs (PRSice, lassosum, LDpred) based on parameter specifications. In what follows, I explain the process and parameter specifications for these adjustments. Readers uninterested in these details might utilize the summary in Table 1.B2 and skip to section 6.1.3. Conversely, I refer readers interested in even more detail to Pain et al. (2021).20
6.1.2.1. Traditional Approach: Clumping/Pruning and Thresholding.
The traditional method of adjusting for these complications in PGI generation is known as (LD) clumping/pruning and (p-value) thresholding (P&T or C+T).21 Clumping or Pruning is an iterative process aiming to thin (remove) correlated SNPs in proximity to avoid double counting the same effect. Clumping requires the specification of two parameters: (a) a threshold of LD for non-independence and (b) the distance around a given SNP defining proximity. As with various standard social science thresholds, there is no agreed-upon standard for either of these two parameters, which often vary across studies.22
The pruning process involves aggregating SNPs into LD clumps (based on specified thresholds); these clumps are then thinned by retaining SNPs with the lowest p-values. The result is that among the SNPs in a clump, only the most strongly associated with the outcome is retained, and retained SNPs are “approximately uncorrelated” or “approximately independent” of each other and are referred to as lead SNPs. For illustration, in the EA4 GWAS, out of the ~10 million SNPs analyzed, 225,933 SNPs were genome-wide significant (p< 5x10−8); after pruning, 3952 ‘approximately independent’ lead SNPs remained (Okbay et al., 2022).23
Most PGIs are created using summary statistics (rather than raw genotype data), which lack information on LD. In these cases, LD is estimated using an external (reference) panel of similar ancestries. This can create variation in estimates and, in some cases, bias if the samples are not a good match (Bulik-Sullivan et al., 2015). Given this, information on what LD reference panel(s) is used and any prominent sample differences should be noted in technical descriptions on PGI construction.
Following pruning, p-value thresholding involves excluding SNPs that do not meet some p-value threshold. These thresholds adjust for uncertainty by reducing the effects of excluded SNPs to zero. In practice, p-value thresholds are not determined a priori but are selected based on results (highest incremental R2) after testing several p-value thresholds, from genome-wide significant (p< 5x10−8) to p<1 (i.e., all SNPs are included). Almost always, the p<1 (no p-value threshold) PGI is chosen. Notably, these ‘all-SNP’ PGIs are not corrected for uncertainty and tend to contain more bias and sample-specific noise than those created with p-value thresholds (Berg et al., 2019; Burt, 2023b; Kerminen et al., 2017; Mostafavi et al., 2020; Zaidi & Mathieson, 2020).
6.1.2.2. Alternative/More Sophisticated Approaches.
In recent years, new adjustment approaches have been developed. These include statistical regularization techniques, such as LASSO or ridge regression (Mak et al., 2017)—which are not as commonly used and thus are not discussed here—and Bayesian approaches that perform shrinkage based on various assumptions about the genetic architecture of the trait (see Choi et al., 2020). Among the most common contemporary methods for adjusting PGS construction is the Bayesian approach, LDpred (Vilhjálmsson et al., 2015), used in the recent EA3 and EA4 studies (see, e.g., Lee et al., 2018; Okbay et al., 2022).
Like other methods, LDpred estimation requires the specification of several parameters: the LD radius (i.e., the number of SNPs adjusted for around a given SNP) and an assumption about the genetic architecture of the trait (i.e., the fraction of SNPs that are causal and the distribution of effect sizes). In practice, most applications of LDpred apply an infinitesimal prior with a Gaussian distribution. In other words, most PGSs constructed with LDpred assume that all SNPs have causal effects, and the effects sizes are normally distributed.24 In these cases, estimates are adjusted by a uniform Bayesian shrink factor, which is a function of the heritability, the number of SNPs, and the sample size.25 The default LD radius in LDpred, selected because it “works well in practice”, is roughly 2-Mb and is calculated based on the number of SNPs as follows: M/3,000, where M is the total number of SNPs (Vilhjálmsson et al., 2015, p. 579). Under the (likely violated in the presence of assortative mating or population stratification) assumption that SNPs outside of the LD radius are unlinked and using an external reference panel, LDpred adjusts SNP estimates for non-independence based on the estimated amount of genetic variation tagged by a given SNP, known as the LD Score (see Bulik-Sullivan et al., 2015; Vilhjálmsson et al., 2015).
6.1.3. Analyses with PGIs
Whereas the prior section dealt with adjustments to the GWAS effect size estimates to be employed when creating a PGI (always in an external sample), this section deals with adjustments to consider when incorporating a pre-made PGI in a study. Specifically, I outline several basic PGI technical issues that should be specified and, where choices are made, justified. These guidelines, summarized in Table 1 B.3, are consistent with standards in non-genetic social science.
6.3.1. Controlling for technical artifacts.
Various quality control adjustments for technical artifacts are recommended for PGI studies insofar as there is variation within studies.26 These adjustments, summarized B.3(a) of Table 1, include controls for genotyping center, genotyping platform, SNP chip, and batch effects. These technical characteristics are invariably considered in GWAS quality control (QC) procedures and ideally in GWAS studies, including meta-analyses (Laurie et al., 2010; Truong et al., 2022). However, since PGIs are constructed in a different sample, these sample-specific features should be controlled in PGI studies when variable. In short, when variable, I encourage researchers to include controls for batch and or study-site effects along with the SNP chip used in genotyping. When differences exist and are not controlled, I encourage scholars to recognize this as a potential limitation and consider whether it is relevant to the study findings.
6.1.3.2. Adjusting for relatedness confounding.
As with GWASs, PGI analyses based on PGIs from standard GWASs need to be adjusted for relatedness confounding. As discussed, among the most significant challenges for estimates of complex social traits are uncontrolled population structure and social genetic effects.
GWASs are usually conducted on a more ‘ancestrally’ homogenous population (i.e., individuals who are more genetically similar), which is individuals of “European genetic ancestries” nearly 80% of the time (Martin et al., 2019). As has been extensively discussed elsewhere, PGIs have limited portability or generalizability, such that their predictive power is weaker, sometimes markedly attenuated, in samples of individuals who are less genetically similar (or whose genetic ancestries differ substantially) (i.e., Martin et al., 2017; Martin et al., 2019). For example, the EA4 LDPred PGS explained 15.8% and 12% of the variation in individuals of “European genetic ancestries” from the Add Health and HRS samples, respectively. However, among individuals of “African genetic ancestries” from the same samples, the variance explained was only 2.3% and 1.3% (Okbay et al., 2022; also, Curtis, 2018; Kim et al., 2018; Ware et al., 2017).
This limited “portability” of PGIs is due to several factors, including different allele frequencies and LD patterns, arising due to divergent demographic histories and recombination events (Rosenberg et al., 2019). Populations may also have different frequencies of causal variants and, in some cases, different causal variants. Moreover, the effects of causal variants can also be influenced by different genetic background and environmental factors (Adhikari et al., 2019; Mostafavi et al., 2020). For these reasons, PGIs are typically examined separately in ostensibly homogenous populations of similar ‘genetic ancestries’; however, it bears repeating that these ‘ancestry’ groups are crude demarcations of an underlying continuum of relatedness rather than distinct groups. As Coop and Przeworski (2022) noted, “there is no bright line demarcating comparisons ‘within’ versus ‘between’ ancestries: there is a giant family tree of humanity, and people who share more ancestral paths through it than others, and more similar environments than others” (p.8).
6.1.3.2.1. Standard PGSs Adjustments.
As with GWASs, additional controls are needed due to population substructure within ostensibly homogenous groups. Most PGI studies include controls for the top genetic ‘ancestry’ PCs (usually 5-20), often based on the SNP genetic relatedness matrix27 (Price et al., 2006). Typically, PCs are pre-constructed, available alongside pre-made PGIs, and recommended for use in datasets like the HRS, WLS, and Add Health. As with GWASs, including genetic ‘ancestry’ PCs based on common SNPs tends to reduce but not eliminate population structure confounding. The persistence of residual population stratification can manifest in the geographic clustering of PGIs, even after including PCs (e.g., Haworth et al., 2019; Kerminen et al., 2017; Lawson et al., 2020).
Not infrequently, PGI studies often include minimal controls, e.g., only for age, sex, and ‘genetic ancestry’ PCs. Given that it is possible—even likely—that a PGI estimate is substantially biased for all the reasons discussed, I suggest the consideration of additional control variables, following others (Kerminen et al., 2017; Zaidi & Mathieson, 2020). Specifically, I recommend that studies present the adjusted (for genetic PCs in the model) correlations between the PGI and other common sociodemographic variables to assess potential confounding, especially birth region or coordinates of birth (or current region if birthplace is unavailable). When the focus is on PGIs as genetic effects and PGIs are associated with region/coordinates of birth, these geographic controls should be included in the analytic model(s).
When PGIs are incorporated as genetic influences, I recommend examining and presenting PGI correlations with other trait-relevant sociodemographic variables, including SES or income, parental education, or similar, even as I expect correlations with these variables for most social traits. To be clear, a PGI correlation with these variables does not automatically indicate confounding, but it can be informative. Where sample sizes are sufficiently large, scholars might also test whether age, sex, and SES interact with the PGI in predicting complex social outcomes, given that we would expect—and evidence suggests—that PGIs are differently predictive based on individual characteristics (see, e.g., Mostafavi et al., 2020), with p-value adjustments for multiple testing.
6.1.3.2.2. Within-family PGSs Models.
Another approach to mitigating residual relatedness confounding of standard PGIs is the within-family approach (with its randomization of parental genotypes to offspring within families; Kong et al., 2018; Young et al., 2019). Like within-family GWASs, within-family PGI models come in different forms. The parent-offspring design involves the construction and inclusion of a parental non-transmitted PGI—i.e., a PGI composed out of one or both (better) parents’ non-transmitted alleles—to adjust for parental genetics that are not passed on to their children (Kong et al., 2018; Young et al., 2018; Young et al., 2022). In educational attainment studies, controlling for parental non-transmitted alleles tends to reduce the estimated child PGS effect size estimate by ~50% (e.g., R2=5% to 2.5%; Kong et al., 2018), which is about the same amount as that observed in adoption studies (e.g., R2 = .074% to .034%; Cheesman et al., 2020), suggesting that this method mitigates relatedness confounding.
A second within-family method is a sibling-difference PGI study. Here, sibling differences in PGIs (i.e., deviations from the mean sibship PGI) are used to predict sibling differences in the outcome, controlling for the mean sibling PGI as a family fixed effect, and thereby controlling for shared genetic and environmental backgrounds, like a sibling-difference GWAS (Lee et al., 2018; Selzam et al., 2019).
As with within-family GWASs, results from within-family PGI studies are often interpreted as direct (causal) genetic effects. Although valuable at mitigating confounding, within-family PGI models do not eliminate all confounding. The most significant limitation is the use of allelic weights from a standard GWASs. Some effects of population stratification, assortative mating, and associative effects are already built in the PGIs due to using weights from a standard GWAS (Veller & Coop, 2023). Furthermore, and analogous to within-family GWASs, conceptual/generalizability issues (focus on sibling differences rather than differences between unrelated people) and G x G and G x E interactions are additional complicating factors. As Veller and Coop (2023) explain:
“The reason [for this additional complication] is that, even if we were to know the causal alleles for a trait of interest, what we estimate by measuring their associations with phenotypic differences within families is not analogous to the counterfactual effects of experimentally substituting alleles in random individuals. Instead, we are necessarily restricting our focus to the effect of their transmission from heterozygous parents…Thus, although the ongoing shift towards family-based studies is motivated by concerns about confounding, with different alleles experiencing different environmental and genetic backgrounds, family-bases studies can be influenced by conceptually similar issues of confounding in the presence of G x E and G x G interactions”
(p.28; also Boardman & Fletcher, 2021; Fletcher et al., 2021 Zaidi & Mathieson, 2020).
Additionally, sibling indirect effects can bias estimates of family effects differently from standard estimates—in some cases inflating and in other cases deflating estimates (see Fletcher et al., 2021; Veller & Coop, 2023). Fletcher et al. (2021) “[found] that sibling models, in general, fail to uncover direct genetic effects; indeed, these models have both upward and downward biases that are difficult to sign in typical data” (p.1). Recent work suggests that some of these issues (e.g., sibling indirect effects) can be addressed with more complete (via measurement or imputation) family genotype data (see Young et al., 2022).
In sum, within-family PGI studies are a useful approach for substantially mitigating confounding and isolating direct genetic effects when standard GWAS estimates are used. Even so, within-family PGI estimates are not appropriately interpreted as estimates of direct genetic effects in a population (see Fletcher et al., 2021; Veller & Coop, 2023).
6.1.3.3. Presentation of Results.
The final technical specifications I suggest (listed in Table 1 B.3(e)) apply to all quantitative social science studies. Given that PGIs are standardized, I recommend the presentation of all coefficients on a standardized metric (excluding categorical or dummy variables like sex), preferably alongside unstandardized coefficients when these exist on an intuitive scale (e.g., years of age).28 Second, I suggest the full presentation of all results in a table in the main text. Several published PGI studies only present results in bar charts without providing coefficients or standard errors for the focal regression results in the main text. Bar charts and other graphs can, of course, be extremely useful in facilitating interpretation. However, bar charts can unintentionally mislead due to the partial information provided along with differences in perception (related to even arbitrary choices such as color choices (hue and luminescence) and style (width and size) (see Bryan, 1995; Cairo, 2019; Weissgerber et al., 2015)). I thus recommend that PGI studies follow standard practices in presentation and present full findings in a table in the main text.
Finally, I discourage decile comparisons, following Turkheimer (2019a)29. If made, however, I suggest the provision of decile comparisons for other predictor variables for comparisons to add context and meaning along with confidence intervals around the individual predictions (for an excellent critique of decile analysis and discussion of the standard error in prediction, see Turkheimer, 2019a).
6.2. Descriptive Best Practices
Having discussed technical details and best practices, I now turn to descriptive details (summarized in Table 1A). In PGI studies, several sample and SNP details should be provided in the text or easily accessible to readers elsewhere.
First, PGI studies should—and usually do—specify which PGI is used and why. In the same way there is no single measure of self-control, so it does not suffice to say one is using a/the self-control scale, there is no one PGI for a given phenotype. Datasets often include multiple pre-made PGIs based on different specifications and/or GWASs for a given trait. For example, the prominent educational attainment PGIs correspond to the series of EA GWASs (EA1 – EA4), which involve increases in sample size, different sampling frames, subtle variations in phenotype measurement, and different methods, and which are now periodically updated in existing datasets (e.g., Becker et al., 2021; Lee et al., 2018; Okbay et al., 2022). Moreover, as I have discussed, there are multiple ways to create PGSs, with different assumptions and goals (e.g., a PGI created with LDPred versus P&T with p-value thresholding or an ‘unadjusted PGI’). Thus, specifying what PGI is used is necessary in the same way it is necessary to specify what measure of self-control one is using and how one is combining items. This involves specifying the GWAS study on which the PGI weights are based and calculation methods.
Second, general descriptive details on the SNPs included in a PGI should be provided. Specifically, studies should clearly describe the number of SNPs in the PGI and, ideally, how they were selected (e.g., sample matching and P&T). Recall, most PGIs include between several hundred thousand to several million SNPs. In cases where a study uses a unique PGI that departs from standard practices in SNP selection, the reasons for any differences from commonly used measures should be noted and justified. For example, researchers might employ a PGI only with genotyped SNPs (excluding imputed SNPs) or with more (or less) stringent p-value thresholds. Understanding how a PGI is measured is important for all the same reasons that understanding standard social science scales are measured is important: for interpretation, comparison to other studies, and replication. As the number and diversity of PGIs expand, this detail will be even more important (see Wand et al., 2021).
Additional descriptive details on PGIs need not be repeated in every study but should be easily accessible. Specifically, readers should be able to facilely ascertain the particular platform(s) and SNP array(s) or ‘SNP chip(s)’ used in the study. Additionally, details on the number/proportion of SNPs that were imputed rather than measured should be accessible along with information on the specific imputation reference panel(s) used (e.g., HRC, TOPMED, 1KG30) and the imputation method. At present, there is no standard threshold for SNP inclusion based on such criteria as missingness, MAF, or imputation quality, and the use of different thresholds could contribute to different study findings.31 Thus, to facilitate replicability and comparability, such PGI-specific information should be easily accessible for readers. Crucially, most pre-made PGIs are accompanied by technical reports containing this information (see, e.g., Ware et al., 2021).
To be clear, basic details such as the number of SNPs, the GWAS on which the PGI is based, the construction method, and any adjustments should be mentioned in the text of all PGI studies. Further technical details about genotyping, imputation, and PGS QC protocols need not be repeated in a study using a pre-made PGI insofar as this information is clearly articulated in existing technical reports or articles and up to date (see Table 1-A.2).
Additionally, given the context- and condition-dependency of genetic associations, PGI studies should note the general characteristics of the GWAS study sample and any biases or differences from the PGI study sample (Barton et al., 2019). As noted, individuals in most large, prominent GWAS studies are not a random subset of the population from which they are drawn (Fry et al., 2017; Pirastu et al., 2021; Tyrrell et al., 2021). Although the EA4 study, for example, included more than 3 million individuals in 72 samples, two samples—23&Me (75%) and the UK Biobank (14.5%)—account for more than 89% of the total individuals. Both samples are known to be healthier, wealthier, and more highly educated than the general US and UK populations due to participation biases. The participation rate in the UK Biobank was less than 5.5% of the roughly 9 million people invited (Fry et al., 2017), whereas 23&Me is a private sample composed disproportionately of wealthier individuals who pay for genotyping (Burt & Munafò, 2021).
Relevant GWAS sample characteristics include the extent to which the samples are genetically similar and have analogous contexts of development shaped by socio-historical, geographical, and individual factors (age, sex, SES) (Mostafavi et al., 2020). Thus, for example, studies using an EA4 PGI should note that the GWAS discovery sample was not representative (healthier, wealthier, and more highly educated), compare the GWAS sample with the study sample, and report how sample-specific features and/or differences might affect the PGI study findings.
Finally, as noted above, sociological PGI studies should report the association between relevant sociodemographic variables and PGIs, in the same way we expect this standard information for non-genetic measures. As is common in social science, studies should consider whether the PGI associations with sociodemographic variables are as expected and note any unusual patterns.
6.3. Interpretive Best Practices
Last but not least, I turn to interpretation. Although the interpretation of a PGS estimate in a study depends on various factors, including study purposes, adjustments, and controls, several general guidelines can be suggested (summarized in Table 1C). As with the interpretation of non-genetic latent constructs, care must be taken to accurately convey what is measured. As Fletcher (2023) recently noted: “The ambiguous nature of a PGS’s interpretation has led far too many investigators to over-interpret and narrowly label a PGS as ‘genetic,’ often to elevate the perceived importance of ‘genetics’ in contributing to social science outcomes” (also, Boardman & Fletcher, 2021; Burt, 2023b; Keller, 2023).
6.3.1. Noisy, Partial, Proxy Genetic Measurement.
A PGI should be interpreted in a manner consistent with what is measured. At present, some depictions of PGIs in social science articles are misleading. For example, although increasingly rare, some studies still depict PGIs as measuring ‘genes for’ some outcome (e.g., having more ‘education-related genes’). This should be avoided. While ‘genes for’ is perhaps an easy shorthand, it is incorrect, misleading, and implies measurement specificity, which is lacking (Burt, 2023a).
Additionally, PGI interpretations should avoid implying that causal variants are measured in PGIs or that we know what these variants do (e.g., how they work). PGIs alone provide little to no insight into genes, causal variants, or biological mechanisms. As prominent behavior geneticist Turkheimer (2019b) averred, “polygenic scores achieve their predictive power by abdicating any claim to biological meaning” (p.46).
Finally, interpretations should be consistent with the tag SNP methodology being a (crude) proxy methodology. What is often overlooked and is obscured by phrases like “education-related genes” or even “education-associated variants” is the fact that for all outcomes—from T1 diabetes, plasma cortisol, and Alzheimer’s disease to ADHD, educational attainment, and income—GWASs and PGIs typically use the same set of SNPs, albeit differently weighted by the GWAS results. This use of the same indicators is one of the unique characteristics of PGIs not shared with standard social science scales, which are invariably composed of different items. (Although out of scope, this characteristic of PGIs should be considered in studies examining multiple PGIs and for genetic correlations.32)
PGI interpretations should also be consistent with the fact that PGIs are not only crude but partial measures of genetic heterogeneity. Although not without strengths (e.g., efficiently tagging a region of common variation), as I have discussed, the tag SNP methodology does not comprehensively capture genetic variation between individuals. Important variation, such as rare single nucleotide variants and structural variants both of which are more likely functional and deleterious, are often poorly tagged, if tagged at all. For interpretation, this means that PGIs partially control for genetic heterogeneity. Contrary to some interpretations, including PGIs does not allow for distilling the effects of “nurture net of nature” or the effect of a specific social variable net of genetic variation. PGI studies should not present results as demonstrating environmental influences ‘net of genetic differences’ (see also, Boardman & Fletcher, 2021; Burt, 2023b; Fletcher, 2023).
6.3.2. Residual Environmental Confounding.
As I have reiterated, standard PGIs for complex social traits are likely substantially confounded when the interest is, as is usually the case, genetic influences operating within the focal individual (see Box 1). Given this, interpretations of standard PGIs should be consistent with the fact that these measures do not capture “genetic (versus environmental) influences” or “nurture (versus nature)”. In practice, this means that PGI studies should steer clear of interpreting results as indicating “genetic (versus environmental) effects” or as demonstrating genetic influences net of environmental influences since standard PGIs capture both (see also, Boardman & Fletcher, 2021; Burt, 2023b; Fletcher, 2022).
Using a within-family model substantially reduces population stratification and familial/dynastic confounding and supports stronger interpretations of the PGI as capturing genetic influences. However, as noted, most within-family PGIs are based on standard GWASs, where biases are, to some extent, already baked in via the GWAS weights and come with their own set of caveats (Boardman & Fletcher, 2021; Veller & Coop, 2023). Thus, caution is still required, even as stronger interpretations are supported.
6.3.3. Context- and Population-Specificity.
Like other social science studies, PGI study findings are only appropriately generalized to the sampled population in the context in which they are situated. Thus, interpretative care is required around the context- and population-specificity of PGIs. This means that PGIs are appropriately interpreted as context-specific effects for individuals with similar genetic ancestries (or greater genetic similarity). This specificity also disqualifies PGI comparisons across contexts, populations, or individual characteristics to assess differences in “how much genetics matters” or to suggest “constrained or suppressed genetic influences” (see also, Burt 2023b, 2023c). In addition to differential effects across population subgroups, Mostafavi et al. (2020) demonstrated that the effect sizes of PGSs vary within genetically similar subgroups based on individual socio-demographic characteristics such as age, sex, and SES.
6.3.4. Avoiding Propensity Terminology.
In our view and that of others, PGIs are not properly interpreted as representing “genetic propensity for” or “genetic endowment for” complex social traits like educational attainment (e.g., Boardman & Fletcher, 2021; Burt, 2023b; Coop & Przeworski, 2022; Fletcher, 2022). Propensity is typically understood as a natural tendency or proneness to something. PGIs do not capture genetic propensities not only because of the abovementioned statistical issues (environmental confounding, partial genetic measures), but also, even if PGIs did fully and perfectly capture additive causal genetic influences for a given complex social trait in a given sample, these effects can vary in degree or even direction based on context-, population-, and individual characteristics. Human complex social traits emerge not from an additive, static model of genetic plus environmental inputs but from an extraordinarily complex interaction of sociocultural and physical environmental influences, whose effects can vary based on developmental timing, context, and interactions with genetic background (see discussions in Burt 2022, 2023b). This is, in part, why PGIs for complex social traits tend to be relatively poor individual predictors.
What is more, the counterfactual model underlying GWASs and PGIs does not distinguish between upward (genetic) causes and downward (social) causation, i.e., genotype—phenotype associations which are socially forged (see Burt, 2023a; Merchant, 2023). For a simple example, people who are taller and have lighter skin tones tend to be positively privileged in education and employment. Due to such social privileging, variants linked to lighter skin pigmentation and taller height would be causally associated with higher education and income in a counterfactual model. However, in our view, the positive privilege bestowed upon those with these variants and phenotypes is not properly interpreted as reflecting a “genetic propensity” because the causal force driving this relationship is society not biology (see Burt 2023b, 2023c; see also the well-known Jencks’ et al. (1972) red hair example). Given that PGSs for complex social traits likely reflect some downward causation to some unknown degree, depicting the resulting associations as reflecting an ‘inner propensity’ even in within-family models is misguided. Even in the best cases, PGIs do not reflect a genetic propensity for complex social traits in the same way that SES does not reflect a “social propensity” for educational attainment.
6.3.5. Weak Individual Prediction.
As alluded to above, at present and at least into the near future, PGIs do not fare well at predicting individual outcomes (e.g., Harden & Koellinger, 2020; Wray et al., 2021). PGI interpretations, particularly discussions of the practical utility of PGIs, should be—and usually are—consistent with the limited ability to predict complex social outcomes. Available evidence suggests that it is highly unlikely that PGIs will ever be accurate individual predictors of complex social traits. Discussions in PGI studies should avoid exaggerating the potential efficacy of these measures.
6.3.1.6. Clearly Acknowledging Limitations.
Although I recommend careful description and interpretation throughout, I also believe it is essential to acknowledge the study-relevant limitations of a given PGI in a manner directed to a non-expert audience of social scientists. Invariably, PGI studies report the “portability” problem of PGIs as a limitation, but other limitations are not always clearly described (also Burt, 2023b; Fletcher & Boardman, 2021). PGI studies should clearly describe to the reader the implications of the limitations of a given study’s PGI methods. For example, I recommend that studies clearly note the limitations of the tag SNP methodology (including that PGIs do not measure rare variants or other variant forms); that standard PGIs are substantially environmentally and genetically confounded to some degree; and that PGIs only reflect additive genetic effects in a given context and population. Moreover, when PGIs are created without standard adjustments, e.g., without adjustments for LD, this should be noted and justified, and the implications should be discussed.
In sum, the interpretation of PGIs in studies requires great care and belies facile interpretation as reflecting “genetic influences” on or “genetic propensity for” complex social traits. Given the lack of expertise and potential for misunderstanding, I encourage using a box or other guidelines to draw readers’ attention to these limitations and challenges at the outset. (I think such interpretive boxes would be useful for all studies, but that is not our focus here.) Here is an example that might be used in a PGI study using a standard PGI.
Importantly, however, I do not wish to exaggerate the challenges with PGIs relative to standard social science methods. Indeed, in my view, the uniqueness of PGIs is primarily their genetic basis and the potential for misinterpretation due to a lack of social science expertise in genetics along with the view among some that genetic measures are superior (free of arbitrary decisions or subjectivity, immune from ‘reverse’ (downward) causation, the like). Most of the challenges facing PGIs are shared with or mirror that of standard social science scales, albeit with some unique genetic characteristics. Thus, before closing, I wish to briefly recognize the similarity in challenges between PGIs and standard social science scales.
PGIs as Genetic Summary Scales
I have highlighted that PGIs are crude, noisy, and impartial measures of the underlying latent construct (‘additive genetic risk’); however, crude, noisy, and impartial measurement is the rule, not the exception, in social science research. All summary scales of latent constructs, created using a strategy of indicator variables (Cohen, 1999; Becker et al., 2021), are imperfect—as partial, containing error, and often confounded. Standard social science summary scales, from neighborhood disadvantage and socioeconomic status to parental warmth and child impulsivity, are also crude, noisy, and partial. While I recommend that scholars should recognize that PGIs are noisy and partial measures of genetic heterogeneity and should be presented as such, we think the same is true for social scales. For illustration, a study using a measure of yearly household income as a proxy for SES should not depict findings as being ‘net of SES’ but ‘net of household income’. Moreover, like PGI estimates, standard social science estimates are not appropriately generalized beyond the population and context represented in the sample. PGIs are not unique in these limitations.
Similarly, while I briefly highlighted the somewhat arbitrary boundaries involved in the creation of PGIs (e.g., LD threshold, LD radius), many standard social science scales are saturated with arbitrariness from physical boundaries (e.g., as that involved in the definition of “neighborhood” (census tract, block, ego-based measures)), to social groupings and identities, to even the scale of measurement (e.g., as a composite of Likert scale scores based on assigning value to the response options “always”, “often”, “sometimes”, “rarely”, “never”).
Finally, I have extensively discussed the environmental confounding of genetic associations and PGSs. Rather than being unique, this represents the flip side of the genetic confounding of putatively environmental influences. To be clear, the environmental confounding of PGIs, like the genetic confounding of social science measures, does not make these measures useless—social or genetic. It makes them associations, almost always sociogenic associations, reflecting average differences among individuals (Turkheimer, 2011).
In short, social science research is dominated by invariably partial measures of key concepts from SES (measured by income and/or parental education), to crime (as arrests or street crime), and various perceptual measures of environments as controls, even focal study variables. My impetus for writing this paper is not that PGIs are uniquely flawed but that these limitations are poorly understood. Moreover, social science measures are typically driven by theory and existing knowledge of phenomena and empirical relations, and social scientists have extensive training in the analytical techniques and best practices for conducting research with those methods. Unlike most operationalizations of concepts, PGIs are, like some other big data measures in this brave new digital and genetic world, atheoretical (see also, Boardman & Fletcher 2021; Burt 2023b). Most of us can cognize the limitations of using household income as a measure of SES; most social scientists do not understand what is and is not measured in PGIs. I hope this manuscript provides guidance to help bridge this gap in interpretation.
How PGIs can be used to illuminate our understanding of individual differences in social traits and behaviors remains to be seen. However, our recognizing they have many challenges does not imply they are useless. If perfect measurement were required, social science would grind to a halt.
Conclusion
Recognizing that PGIs are increasingly accessible and utilized in social science research, I aimed to provide social scientists with a guide for understanding and interpreting PGIs and a set of recommended best practices. A key theme of this article is the importance of accurate, careful presentation and interpretation of PGIs, given their esoteric challenges and the sophisticated methodologies underlying these individual scores. Facile but inaccurate description of PGIs as ‘genes for’ some social trait should be replaced with the more accurate but slightly more cumbersome language “additive genetic associations with” a trait. Moreover, recognition of the limitations of the measures and methods should be clearly articulated. As Boardman and Fletcher (2021) recently noted: “The initial claims that genetic measures are fixed, determined at birth, and unresponsive to the environment have been overextended in interpreting PGS ‘effects’ in social science applications. Social scientists need to be conversant enough in genetics to recognize when important (and often implicit) assumptions are being made about our ability to measure ‘genetic effects’ that can be distinguished from the environment and structural factors in which social scientists are most interested” (p. 11).
I provided a set of guidelines for using PGIs; however, readers should note that our recommended best practices should be viewed as proposals that are grounded in evidence and informed by existing standards in other fields (e.g., Choi et al., 2020; Wand et al., 2021), not as inflexible authoritarian requirements. I offer these recommendations and checklist as a starting point, recognizing that these will need to be updated as new methods and measures are developed in a rapidly evolving and growing field. I hope that by illuminating the basic features of PGIs and providing a checklist of best practices in methods, reporting, and interpretation, I make these studies more approachable for non-practitioners, easier to evaluate, and replicable in a manner that can facilitate the advancement of robust knowledge on a range of social outcomes.
Supplementary Material
Box 2. About Polygenic Indices (PGIs).
Definition:
PGIs are individual genetic summary scores representing the additive associations between marker variants (SNPs) and traits, weighted by the results of a GWAS. PGIs were developed to aggregate individually weak genetic associations between SNPs and phenotypes into a single measure or summary genetic ‘predictor’.
Interpretation:
PGIs capture genetic associations with traits in a given context and population and do not reflect "genetic versus environmental influences". PGIs do not capture all of the genetic heterogeneity between individuals.
Standard or population-based PGIs also capture social/environmental influences (i.e., are confounded to some degree), including cultural/structural and geographic forces, familial and dynastic forces, interactional social forces (including peer and neighborhood) effects, and the like.
Acknowledgments:
I am grateful to comments suggestions from Kara Hannula, Brea Perry, Caitlin Dorsch, and three anonymous reviewers on earlier versions of this manuscript. The content is solely the responsibility of the author and does not represent the views of the National Institutes of Health or those who provided feedback.
Funding Statement:
Support for this research was provided by a K01 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (5K01HD094999). Partial support for this research came from a Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant, P2C HD042828, to the Center for Studies in Demography & Ecology at the University of Washington.
Appendix B. Manhattan Plot Example: Okbay et al. (2022) Standard GWAS of Educational Attainment
Fig. 1 ∣.
Manhattan plots for the standard GWAS of Educational Attainment, Reproduced from Okbay et al. (2022). Estimates from a meta-analysis of 3 studies/cohorts containing a total N = 3,037,499 inividuals). The X axis is the chromosomal position and the y axis is the p-value on a −log10 scale. The dashed line marks the genome-wide significance threshold (p < 5x10^-8).
Appendix C. Background on PCA Adjustments for Population Structure ∣ Genetic ‘Ancestry’ PCs
Population substructure confounding (i.e., population stratification) remains a significant concern in genetic association studies, even among ostensibly homogenous samples. There are a variety of methodological tools for use in mitigating confounding in GWASs. Some approaches aim to prevent or reduce confounding—PCA and linear mixed models—whereas others adjust for confounding after estimation (e.g., genomic control and LDSC).
A standard approach used in GWASs and PGSs is controlling for 5 – 20 principal components representing genetic relatedness (also called ‘ancestry’) based on principle components analysis (PCA). PCA is a linear dimension reduction technique not infrequently used in scale creation in standard social science. In statistical genetics, PCA is often used to detect and control for population substructure. PCA is conducted on either a normalized SNP allele dosage matrix (or ‘standardized’ genotype matrix) or a SNP-based genetic relatedness matrix33 (GRM); Yang et al., 2011). The SNPs included can vary from all available SNPs, genotyped SNPs, a reduced set of independent (i.e., LD-pruned) SNPs, or a smaller set of ‘ancestry-informative markers’ (AIMs); the latter is typically not recommended (see, e.g., Price et al., 2010).
For illustration, Ware et al. (2021) note that PCA on the Health and Retirement Study (HRS) was estimated on SNPs:
“selected by linkage disequilibrium (LD) pruning of all autosomal SNPs with a missing call rate < 5% and a MAF > 5%, and excluding any SNPs with a discordance between HapMap controls genotyped along with the study samples and those in the external HapMap data set. In addition, the 2q21 (LCT), HLA, 8p23, and 17q21.31 regions were excluded from the initial pool. Genetic ancestry in HRS was identified through PC analysis on genome-wide SNPs calculated across all participants using the aforementioned filtering criteria.”
Although Ware et al. (2021) do not report the exact number of SNPs used in the PCA, which might be useful, the PCA ‘sample’ information provided is significantly more detailed than that found in the Add Health PGS analytic report, which noted: “To identify respondents in these four genetic ancestry groups we use principal component analysis on all unrelated members of the full Add Health genotyped sample and project those estimates onto the small remainder of related individuals” (Braudt & Harris, 2020). Conversely, details on the results of the PCA analysis were more extensive in the Add Health report than in the HRS report.
PCA is used to identify continuous, orthogonal axes of genetic ‘ancestry’ variation known as principal components (PCs; or eigenvectors) that explain the most variance in the data (Patterson et al., 2006; Price et al., 2006). Not surprisingly, the top axes often have a geographic interpretation. For example, in a now-famous study, Novembre et al. (2008) plotted a two-dimensional PCA summary of the genetic variation of a European sample, which bore a striking similarity to a geographic map of Europe. Notably, PCs do not always reflect population structure, per se; they may also reflect assay artifacts, for example (Clayton et al., 2005; Price et al., 2010)
For genetic association studies, PCs can be used in quality control to identify issues with the data (e.g., assay artifacts) and to identify and remove ‘genetic ancestry’ outliers (typically using reference samples). GWASs and PGSs incorporate these PC controls for population substructure—patterns of genetic similarity—by including them in the regression models (or by computing and analyzing residuals of linear regressions on the PCs34).
Typically, the top 10 to 20 PCs are included as controls, although sometimes fewer than ten are included. The choice of the number to include is somewhat arbitrary. Available evidence suggests that for some samples and some traits, controlling for 50 PCs is insufficient (see Dandine-Roulland et al., 2016). As an alternative to including an arbitrary number based on rules of thumb, one could select the number of PCs based on a threshold of variance explained or include only PCs associated with the phenotype.
Reports or instructions accompanying pre-made PGIs typically (if not always) include recommendations for incorporating PCs in PGS analysis. For example, Ware et al. (2021) recommend the inclusion of PCs with PGIs in the HRS as follows:
“Ancestry specific PCs 1-10 are included for each group. PCs 1-5 and PCs 6-10 are randomly labeled within each PC set to help reduce identifiability. To control for confounding from population stratification, or to account for any ancestry differences in genetic structures within populations that could bias estimates, we highly recommend that users perform analyses separately by ancestral group and, at the very least, adjust for PCs 1- 5. The PCs control for any genetic aspects of common ancestry that could be spuriously correlated with the PGS and the outcome of interest (Price et al., 2006)”
(p.5, bold font in original).
Similarly, Braudt and Harris (2020) suggest including PCs in PGI studies:
“To help account for potential bias due to population stratification and/or differences in genetic structure within ancestry groups we include the first ten ancestry-specific principal components of the genetic data with the PGSs. It is strongly recommended that researchers perform sensitivity analyses separately by ancestral groups and/or include at least the first five ancestry-specific principal components as covariates in analyses using these PGSs (Price et al. 2006)”
(p.11, bold font in original).
As noted in the text, PC controls for population substructure are standard, and evidence suggests that using PC controls for genetic relatedness effectively reduces biases from population substructure in GWASs and PGIs. However, as noted in the main text, available evidence suggests that standard PC controls are inadequate for correcting for population substructure, particularly in scenarios relevant to PGI studies.
First, typically PCs are constructed from SNPs (e.g., SNPs with MAF > 5% in the HRS), excluding rarer variants. In most cases, capturing recent, fine-scale population structure, particularly with sharp local effects, requires rare variants (e.g., rare variant PCA; Zaidi & Mathieson, 2020, also O’Connor et al., 2015). As Zaidi and Mathieson (2020, p.2) explain: “When population structure is recent, smooth environmental effects lead to an inflation in common, but not rare, variants and this inflation can only be corrected with rare- but not common-PCs. This is a consequence of the fact that rare variants carry more information about recent structure than common variants in GWA meta-analysis”. What is more, they note: “Local environmental effects largely impact rare variants only and the inflation due to local effects cannot be fully corrected using either common- or rare-PCs. This is because local environmental effects cannot be represented by a linear combination of the first hundred principal components” (Zaidi & Mathieson, 2020, p.3, parentheticals omitted for clarity).
Thus, PC controls based on common variants will often not adequately detect and correct for population substructure of recent origin. In such cases, PCA based on rare variants is recommended; however, this requires sequence data (thus drastically lowering available sample sizes). Alternatively, studies may use imputed rare variants, but to be effective as controls, this would require high imputation accuracy, which is not assured and depends on the reference sample, the study sample, and the imputation method (Zaidi & Mathieson, 2020). As an alternative to rare variant analysis, PCA has been conducted on haplotype sharing (Lawson et al., 2012) or identity-by-descent (IBD) segments, which like rare variant PCA, are more informative about recent demographic history (Zaidi & Mathieson, 2020). Again, however, in some cases of sharply distributed effects, due to batch effects or local environmental effects, biases “may not be corrected with any method” (Zaidi & Mathieson, 2020, p.10).
In addition to the limitations of focusing on common variants, PCA is less effective at detecting and correcting for structure in smaller samples, which are often meta-analyzed, as is common. In such cases, there is a heightened risk of uncorrected structure biasing results (Berg et al., 2019).
The implications of residual confounding due to recent structure and the use of common variant (SNP) GRMs for PCA differ for GWASs and PGSs. As Zaidi and Mathieson (2020) explain:
“Even imperfect correction for population structure is probably sufficient to limit the number of genome-wide false positive associations in GWAS. But when information is aggregated across a large number of marginally associated variants, even small overestimates in effect sizes can lead to substantial bias in polygenic scores. Essentially some of the predictive power of polygenic scores will derive from predicting environmental structure rather than genetic effects”
(p.10).
Moreover, PGSs will be even more prone to stratification when constructed using all SNPs (i.e., p <1) as is common. “At such loci, the causal effects are likely to be small relative to the effect of stratification, leading to false identification of more structured variants” (Zaidi & Mathieson, 2020, p.7).
In sum, including PC controls for population substructure is generally necessary in GWASs and PGS studies and is effectively reduces population stratification confounding. However, PC controls based on SNPs are insufficient for controlling for fine-scale, more recent structure. As such, PGI studies that include genetic ‘ancestry’ PCs are not appropriately depicted as having “corrected for population substructure”.
Footnotes
Conflict of Interest: None
To be clear, PGIs are PGSs. In earlier versions of this manuscript, I used the term polygenic score (PGS) instead of polygenic index (PGI), although noting these were different terms for the same measure. Some reviewers of our submissions thoughtfully, albeit strongly, encouraged me to use the recently proposed term in social science, polygenic index. For background, in late 2021, a distinguished group of sociogenomic scholars published an article using the term PGI instead of PGS, which they justified in Box 1 of their article as follows: “In this paper, we use the term ‘polygenic index’ instead of the commonly used terms ‘polygenic score’ and ‘polygenic risk score.’ Most of us prefer the term polygenic index because we are persuaded by the argument that it is less likely to give the impression of a value judgment where one is not intended. The term polygenic index was first proposed by Martha Minow [a Harvard legal scholar] at a meeting of the Trustees of the Russell Sage Foundation” (Becker et al., 2021, p.1745). Several subsequent articles have used the term PGI, including a few articles in sociology. While I appreciate the motivation for the change in terminology and believe that the goal of reducing the impression of a (negative) value judgment is a good one, I am not persuaded that this change in terminology will have the intended effect and may not be without cost. In particular, the polyonomy may cause confusion in an already esoteric subfield of social science and create the impression that PGSs and PGIs are distinct, thereby disconnecting scholarship using different terms for the same measure. Even so, given that PGI may come to predominate in social science, that reviewers expressed strong preferences that I use this term, and that this is not a debate that can be settled here, I use the recommended PGI term. However, I wish to note that I do not think the “score” label is a/the problem, and I believe the shift to “index” potentially obscures the fact that these are “rankings” (i.e., positions on a scale) of genetic associations with socially valued outcomes.
Genotyping is the measurement of a preselected set of genetic variants; most contemporary genotyping is ‘genome-wide’, as measuring variants (usually between 200k to 5 million) spread across the genome.
Following others and for readability, I use ‘standard social science methodologies’ to refer to non-genetic methodologies. I do not mean to ‘other’ sociogenomics research but use as a shorthand for recognizing that training in genomics remains atypical (i.e., not standard) in social science. Notably, this shorthand is not to be confused with the model that Tooby and Cosmides (1992) labeled the “standard social science model”—which assumes cultural determinism, relativism, and blank slatism—a model that is, in my view, decidedly not standard in social science.
I am grateful to reviewers who suggested the value of discussing this conceptual distinction.
The exception is sex chromosomes for males who—all going well—carry an X chromosome from their mother and a Y chromosome from their father.
Not surprisingly given complexities, discussions around inherited genetic susceptibilities are often phrased incorrectly. For example, discussions often revolve around ‘inheriting a gene’; for example, inherited susceptibility to breast cancer on the BRCA1 gene might be described as “having the BRCA1 gene”. ‘Having [two functional copies of] the BRCA1 gene’ is desirable. What they mean to say is that they have a functional, disease-associated variant(s) in their BRCA1 gene(s), which alters the resulting protein product and thus disrupts the protein’s functioning (including tumor-suppressing effects), thereby increasing cancer susceptibility.
We say conventionally unrelated because relatedness is, of course, a matter of degree. We are all related to one another because we share common (variably distant) ancestors. In statistical genetics, conventionally unrelated is typically defined as being less related than a second-degree relative.
Although the 1000 Genomes Project (1000G) Phase 3 indicated that roughly 2.3% of SNVs were multiallelic, some have argued that the frequency of multiallelic sites may be much higher, with some estimates upwards of 5% (see, e.g., Campbell et al., 2016).
Unfortunately, as with some other terms in genomics, the term genotype is used to refer to different things. Genotype is used to refer to one’s alleles at a single locus, at several loci, or an individual’s unique (excepting twins) genome.
Novel mutations continue arising throughout life in somatic and germ cells; however, only germ cell mutations are transmitted to the next generation (Campbell & Eichler, 2013).
The Out-of-Africa model of humans’ geographic origin and migration, supported by existing paleo-anthropological and population genetic research, holds that H. sapiens originated in Africa and dispersed around the world in several migrations. The most ‘significant’ and ‘recent’ dispersal out of Africa is estimated to have occurred some 50-70k years ago (Posth et al., 2016; Skoglund & Mathieson, 2018).
In the recent educational attainment GWAS (EA4), for example, the median effect size (as a unit increase in allele dosage) for a ‘lead SNPs’—the ~3000 variants of the ~10 million with the largest independent associations with EA—was “1.4 weeks more schooling: the effects at the 5th and 95th percentiles (in absolute value) [were] 0.9 and 3.5 weeks, respectively” (Okbay et al., 2022, p.439).
Various important critiques of and caveats about genetic ancestry labels have been published (e.g., Coop, 2022; Fujimura & Rajagopalan, 2011; Lewis et al., 2022; Mathieson & Scally, 2020; Panofsky & Bliss, 2017; Royal et al., 2010). Although I do not have space to cover these here, I wish to acknowledge several important points. Most importantly, so-called genetic ancestry groups are crude abstractions from an underlying continuum of human relatedness based on genetic similarity, relative to reference populations, and over a specific time frame (Coop, 2022). As Mathieson and Scally (2020) note: “most statements about ancestry are really statements about genetic similarity, which has a complex relationship with ancestry, and can only be related to it by making assumptions about human demography whose validity is uncertain and difficult to test” (p.1). Typically, references to genetic ancestry groups are “‘ancestry-like’ relationships” (Mathieson & Scally, 2020, p.4). Some scholars have argued that incautious use of genetic ancestry labels can bias our thinking by reifying arbitrary abstractions and obscuring underlying heterogeneity within ‘ancestry groups’ (e.g., Coop, 2022). Here I employ descriptive terms referring to genetic ancestry groups based on genetic similarity out of necessity; however, I often describe ancestries as plural, following suggestions (Coop, 2022), and, where appropriate, place these terms in quotes to signal that these are ‘ancestry-like relationships’.
Technically, EA4 is a meta-analysis of the summary statistics (GWAS results) from 2 samples and 1 meta-analysis of 69 samples (hence the meta-meta-analysis label).
Although often called ‘indirect genetic effects’, we eschew this label here because the term ‘indirect effects’ is already common in social science and has a distinct meaning from how it is used in sociogenomics.
I refer readers interested in more biological or technical detail, including QC protocols, elsewhere (see, e.g., Choi et al., 2020; Marees et al., 2018).
One reviewer expressed the firm belief “that any such checklist should include correcting for attenuation bias due to measurement error in PGIs (Becker et al., 2021)”. I do not include in the recommended checklist at present for several reasons. First, I have concerns about this approach, which makes assumptions known to be violated in all human populations. As Becker et al. (2021) explain, the method they outline assumes that PGI weights are unbiased estimates of the true PGI weights. Due to assortative mating, population stratification, and familial confounding, all available evidence suggests that for all complex social traits, if not all traits, PGI weights are not unbiased. In short, this assumption is definitively violated, and in some cases substantially so. Additionally, this corrector rests fundamentally on the assumption of random sampling, which most datasets depart from considerably due to non-random sampling, non-random refusals, and sample attrition. Finally, the issue of measurement error attenuation is not linked to PGIs but affects all variables in social science research to some extent. Thus, correcting PGIs but not other variables would be uneven, and recommending measurement error correction for all variables, given dubious underlying assumptions is out of scope.
GWAS and PGS details are sometimes provided in lengthy, dense supplements or esoteric notes. Given this, we suggest that to the extent that descriptive information is referenced elsewhere, authors should point to the specific page/table in which the information is provided. Readers and reviewers should not have to go to extraordinary lengths to find basic descriptive information on the focal variables in a study—whether a PGI study or any study. For an excellent example of accessible technical details for a pre-made PGI, see Ware et al. (2021).
As we have noted earlier, insofar as the aim of a given study is to create a genetic predictor—e.g., as a control for genetic and background socioenvironmental factors—then such confounding need not be corrected (e.g., Fletcher, 2023). However, most current uses of PGIs and touted potential uses in social science involve using PGIs to measure genetic influences (see, e.g., Harden & Koellinger, 2020). Moreover, for social science purposes, some of the variables most confounded with PGIs—geographic region, culture, socioeconomic status, and networks—are among the forces of most interest to social scientists (Curtis, 2023).
Additional complications to sociogenomics studies not discussed here include the changes over time in the human genome reference, various reference panels, and revisions in software. At present, LDpred has been supplemented by LDpred2 and PRSice with PRSice-2; many, but not all, of the revisions are focused on computational optimization.
Some restrict the label pruning to ‘uninformed pruning,’ which involves the random thinning of SNPs in a region, an approach I have not seen in practice. By contrast, results-informed-pruning (also known as clumping), which I discuss here, preferentially selects SNPs most strongly associated with the trait under study and retains multiple SNPs in the same region if both have ‘independent’ effects (see Choi et al., 2020). Following others, I use the terms clumping and pruning interchangeably to refer to informed pruning.
For example, I have seen studies where SNP LD thresholds range from r2<.1 (more common) to r2<.5; the distances around SNPs (LD radius) range from 250kbp to being on the same chromosome (i.e., at least 47-Mb).
The number of genome-wide significant SNPs was gleaned from a personal communication (email) with A. Young, based on calculations by A. Okbay (March 2023).
Notably, as the creators of LDpred acknowledge: “a non-infinitesimal model, where only a fraction of the genetic variants are truly causal and affect the trait, is more likely to describe the underlying genetic architecture” (Vilhjálmsson et al., 2015, p.585; Burt, 2023c). However, the non-infinitesimal version of LDpred “is a Gibbs sampler and is particularly sensitive to model misspecification when applied to summary statistics with large sample sizes,” as is common. Adjustments have been made in LDpred2 that address this issue, but we have yet to see this non-infinitesimal model used in social science research (Privé et al., 2020).
According to Vilhjálmsson et al. (2015), in the case of an infinitesimal prior, the Bayesian shrink factor is 1/(1+[M/(h2N)]), where M is the number of SNPs, h2 is the heritability, and N is the sample size. Thus, for example, with 2 million SNPs, h2 =.2, and N = 10,000, each SNP weight would be shrunk by 0.00099900099.
For example, the Add Health youth sample was genotyped with different SNP chips. Approximately 80% were genotyped on the Illumina Omni1-Quad BeadChip and ~20% on the Illumina Omni2.5-Quad BeadChip (Braudt & Harris, 2020).
As discussed at greater length in Appendix C, scholars have proposed incorporating PCs based on rare variants or IBD segments, which are more effective at controlling for local population structure or recent population history than PCs based on common SNPs (Zaidi & Mathieson, 2020). Rare variants tend to be recent mutations, which cluster geographically around the location at which the mutation first arose; thus, they can be particularly useful indicators of recent population structure (Novembre et al., 2008; Slatkin, 1985).
One reviewer strongly objected to this recommendation to standardize coefficients, arguing that “there is no compelling reason to standardize all covariates in a model if the researcher is not comparing the effect sizes of the PGI with any other variable”. I wish to acknowledge their objection. However, even if authors are not interested in relative effect sizes, many readers will be or want to situate the effect size of a PGI relative to some standard deviation change in a more familiar variable (e.g., maternal education in years; household income). I again note that this is a recommendation, not a requirement.
Turkheimer (2021) pithily noted, “decile predictions in the absence of prediction error is a QRP [questionable research practice], part of an unintentional but systematic program of sweeping the biggest problem of human behavioral genomics-- tiny effect sizes--under the methodological rug. Smart researchers should cut it out.”
Where HRC refers to the Haplotype Reference Consortium (The Haplotype Reference Consortium, 2016), TOPMED refers to the Trans-Omics for Precision Medicine imputation panel (Kowalski et al., 2019), and 1KG refers to the 1000 Genomes project reference panel (1000 Genomes Project Consortium, 2012).
Choi et al. (2020) propose the following standards for SNP inclusion: genotyping rate >.99, sample missingness <.02, MAF >.01 (if target data are <1000, MAF >.05), and imputation INFO score of imputation quality >.8.
Genetic correlations are estimated using GWAS summary statistics (e.g., educational attainment and number of children) and treating the SNP effect size estimates from the different GWASs as the observations. Thus, like other correlations, genetic correlations simply give us insight into whether the people with higher education also tend to be the same people with fewer children—i.e., these tell us nothing about cause or shared genetic etiology as is sometimes implied. Unlike other correlations, however, these estimates of similarity are based on the same measures or observations (SNP allele dosages). Thus, if two phenotypes (with sufficient variation) are correlated in a sample, there will necessarily be a genetic correlation.
Genetic relatedness is typically calculated as the average correlation of individuals across SNPs—known as the coefficient of relatedness.
By the Frisch-Waugh-Lovell theorem, this is equivalent: one regresses the phenotype on the PCs, collecting residuals A; then one regresses the tag SNP on the PCs, collecting residuals B; and then regresses the residuals from the phenotype regression (A) on the residuals from the SNP regression (B) (Lovell, 2008).
References
- 1000 Genomes Project Consortium. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abdellaoui A, Dolan CV, Verweij KJH, & Nivard MG (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics, 54(9), 1345–1354. doi: 10.1038/s41588-022-01158-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Acuna-Hidalgo R, Veltman JA, & Hoischen A (2016). New insights into the generation and role of de novo mutations in health and disease. Genome Biology, 17(1). doi: 10.1186/s13059-016-1110-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Adhikari K, Mendoza-Revilla J, Sohail A, Fuentes-Guajardo M, Lampert J, Chacón-Duque JC, … Acuña-Alonzo V (2019). A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nature communications, 10(1), 358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Astle W, & Balding DJ (2009). Population structure and cryptic relatedness in genetic association studies. [Google Scholar]
- Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, … Balasubramanian S (2021). Exome sequencing and analysis of 454,787 UK Biobank participants. Nature, 599(7886), 628–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balding DJ (2006). A tutorial on statistical methods for population association studies. Nature Reviews Genetics, 7(10), 781–791. [DOI] [PubMed] [Google Scholar]
- Barton N, Hermisson J, & Nordborg M (2019). Population genetics: Why structure matters. eLife, 8, e45380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Becker J, Burik CA, Goldman G, Wang N, Jayashankar H, Bennett M, … Kleinman A (2021). Resource profile and user guide of the Polygenic Index Repository. Nature Human Behaviour, 5(12), 1744–1758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, … Coop G (2019). Reduced signal for polygenic adaptation of height in UK Biobank. eLife, 8. doi: 10.7554/elife.39725 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boardman JD, & Fletcher JM (2015). To cause or not to cause? That is the question, but identical twins might not have all of the answers. Social science & medicine (1982), 127, 198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boardman JD, & Fletcher JM (2021). Evaluating the Continued Integration of Genetics into Medical Sociology. Journal of Health and Social Behavior, 62(3), 404–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Border R, Athanasiadis G, Buil A, Schork AJ, Cai N, Young AI, … Sankararaman S (2022). Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science, 378(6621), 754–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braudt D, & Harris KM (2020). Polygenic scores (pgss) in the national longitudinal study of adolescent to adult health (add health)–release 2. [Google Scholar]
- BRCA Exchange. (2022). The BRCA Exchange Web Portal. Retrieved from https://brcaexchange.org/
- Brumpton B, Sanderson E, Heilbron K, Hartwig FP, Harrison S, Vie GÅ, … Boomsma DI (2020). Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nature communications, 11(1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bryan J. (1995). Seven Types of Distortion: A Taxonomy of Manipulative Techniques used in Charts and Graphs. Journal of Technical Writing and Communication, 25(2), 127–179. doi: 10.2190/pxqq-ae0k-eqcj-06f0 [DOI] [Google Scholar]
- Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Patterson N, … Neale BM (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47(3), 291–295. doi: 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt CH (2015). Heritability studies: methodological flaws, invalidated dogmas, and changing paradigms. Advances in Medical Sociology (Genetics, Health and Society), 16, 1–44. [Google Scholar]
- Burt CH (2022). Irreducibly social: Why biocriminology’s ontoepistemology is incompatible with the social reality of crime. Theoretical criminology, 13624806211073695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt CH (2023a). All that Glisters is Not Gold: Genetics for Social Science. Behavioral and brain sciences, 46. doi: 10.1017/S0140525X22002217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt CH (2023b). Challenging the Utility of Polygenic Scores for Social Science: Environmental Confounding, Downward Causation, and Unknown Biology. Behavioral and brain sciences, 46, 1–36. doi: 10.1017/S0140525X22001145 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt CH (2023c). Polygenic Scores for Social Science: Clarification, Consensus, and Controversy. Behavior and Brain Sciences, 46. doi: 10.1017/S0140525X23000845 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt CH (2023d). Feminist Lesbians as Anti-Trans Villains: A Comment on Worthen and Elaboration. Sexuality & Culture 27(1): 161–190. [Google Scholar]
- Burt CH, & Munafò M (2021). Has GWAS lost its status as a paragon of open science? PLoS biology, 19(5), e3001242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cairo A. (2019). How charts lie: Getting smarter about visual information: WW Norton & Company. [Google Scholar]
- Campbell CD, & Eichler EE (2013). Properties and rates of germline mutations in humans. TRENDS in Genetics, 29(10), 575–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campbell IM, Gambin T, Jhangiani S, Grove ML, Veeraraghavan N, Muzny DM, … Lupski JR (2016). Multiallelic Positions in the Human Genome: Challenges for Genetic Analyses. Human Mutation, 37(3), 231–234. doi: 10.1002/humu.22944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardon LR, & Palmer LJ (2003). Population stratification and spurious allelic association. The Lancet, 361(9357), 598–604. doi: 10.1016/s0140-6736(03)12520-2 [DOI] [PubMed] [Google Scholar]
- Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, & Nickerson DA (2004). Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. The American Journal of Human Genetics, 74(1), 106–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheesman R, Hunjan A, Coleman JRI, Ahmadzadeh Y, Plomin R, Mcadams TA, … Breen G (2020). Comparison of Adopted and Nonadopted Individuals Reveals Gene–Environment Interplay for Education in the UK Biobank. Psychological Science, 31(5), 582–591. doi: 10.1177/0956797620904450 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi SW, Mak TS-H, & O’Reilly PF (2020). Tutorial: a guide to performing polygenic risk score analyses. Nature Protocols, 15(9), 2759–2772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clayton DG, Walker NM, Smyth DJ, Pask R, Cooper JD, Maier LM, … Stevens HE (2005). Population structure, differential bias and genomic control in a large-scale, case-control association study. Nature Genetics, 37(11), 1243–1246. [DOI] [PubMed] [Google Scholar]
- Cohen BP (1999). Developing sociological knowledge: Theory and method (Second ed. Vol. 1123): Wadsworth Publishing Company. [Google Scholar]
- Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, … Wang H (2020). A structural variation reference for medical and population genetics. Nature, 581(7809), 444–451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conley D. (2016). Socio-genomic research using genome-wide molecular data. Annual Review of Sociology, 42, 275–299. [Google Scholar]
- Conley D, & Fletcher J (2017). The genome factor: Princeton University Press. [Google Scholar]
- Cook JP, Mahajan A, & Morris AP (2020). Fine-scale population structure in the UK Biobank: implications for genome-wide association studies. Human molecular genetics, 29(16), 2803–2811. [DOI] [PubMed] [Google Scholar]
- Coop G. (2022). Genetic similarity and genetic ancestry groups. arXiv preprint arXiv:2207.11595. [Google Scholar]
- Coop G, & Przeworski M (2022a). Lottery, luck, or legacy? Evolution. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coop G, & Przeworski M (2022b). Luck, lottery, or legacy? The problem of confounding. A reply to Harden. Evolution; International Journal of Organic Evolution, 76(10), 2464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Curtis D. (2018). Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. Psychiatric genetics, 28(5), 85–89. [DOI] [PubMed] [Google Scholar]
- Dandine-Roulland C, Bellenguez C, Debette S, Amouyel P, Génin E, & Perdry H (2016). Accuracy of heritability estimations in presence of hidden population stratification. Scientific reports, 6, 26471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Domingue BW, Fletcher J, Conley D, & Boardman JD (2014). Genetic and educational assortative mating among US adults. Proceedings of the National Academy of Sciences, 111(22), 7996–8000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dudbridge F. (2013). Power and predictive accuracy of polygenic risk scores. PLoS Genetics, 9(3), e1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dupré J. (2001). Human nature and the limits of science: Clarendon Press. [Google Scholar]
- Dupré J. (2012). Processes of life: Essays in the philosophy of biology: Oxford University Press. [Google Scholar]
- Feldman MW, & Ramachandran S (2018). Missing compared to what? Revisiting heritability, genes and culture. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1743), 20170064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fletcher J. (2023). Often Wrong, Sometimes Useful: Including Polygenic Scores in Social Science Research. Behavioral and brain sciences. [DOI] [PubMed] [Google Scholar]
- Fletcher J, Wu Y, Li T, & Lu Q (2021). Interpreting Polygenic Score Effects in Sibling Analysis. Cold Spring Harbor Laboratory. Retrieved from 10.1101/2021.07.16.452740 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freese J. (2008). Genetics and the social science explanation of individual outcomes. American Journal of Sociology, 114(S1), S1–S35. [DOI] [PubMed] [Google Scholar]
- Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, … Allen NE (2017). Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. American journal of epidemiology, 186(9), 1026–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu W, O’Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, … Akey JM (2013). Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature, 493(7431), 216–220. doi: 10.1038/nature11690 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fujimura JH, & Rajagopalan R (2011). Different differences: The use of ‘genetic ancestry’versus race in biomedical human genetic research. Social Studies of Science, 41(1), 5–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griffing B. (1967). Selection in reference to biological groups I. Individual and group selection applied to populations of unordered groups. Australian Journal of Biological Sciences, 20(1), 127–140. [PubMed] [Google Scholar]
- Hacking I, & Hacking J (1999). The social construction of what? : Harvard university press. [Google Scholar]
- Hamer DH (2000). Beware the chopsticks gene. Molecular psychiatry, 5(1), 11–13. [DOI] [PubMed] [Google Scholar]
- Harden KP, & Koellinger PD (2020). Using genetics for social science. Nature Human Behaviour, 4(6), 567–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haslanger S. (1995). Ontology and Social Construction. Philosophical Topics, 23(2), 95–125. [Google Scholar]
- Haworth S, Mitchell R, Corbin L, Wade KH, Dudding T, Budu-Aggrey A, … Smith GD (2019). Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nature communications, 10(1), 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Helgason A, Yngvadottir B, Hrafnkelsson B, Gulcher J, & Stefánsson K (2005). An Icelandic example of the impact of population structure on association studies. Nature Genetics, 37(1), 90–95. [DOI] [PubMed] [Google Scholar]
- Henn BM, Cavalli-Sforza LL, & Feldman MW (2012). The great human expansion. Proceedings of the National Academy of Sciences, 109(44), 17758–17764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herd P, Mills MC, & Dowd JB (2021). Reconstructing Sociogenomics Research: Dismantling Biological Race and Genetic Essentialism Narratives. Journal of Health and Social Behavior, 00221465211018682. [DOI] [PubMed] [Google Scholar]
- Howe LJ, Nivard MG, Morris TT, Hansen AF, Rasheed H, Cho Y, … Palviainen T (2022). Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nature Genetics, 54(5), 581–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 437(7063), 1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janssens ACJ (2019). Validity of polygenic risk scores: are we measuring what we think we are? Human molecular genetics, 28(R2), R143–R150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jencks C, Smith M, Acland H, Bane MJ, Cohen D, Gintis H, … Michelson S (1972). Inequality: A reassessment of the effect of family and schooling in America. In (pp. 517–523). New York: Basic Books. [Google Scholar]
- Kemper KE, Yengo L, Zheng Z, Abdellaoui A, Keller MC, Goddard ME, … Visscher PM (2021). Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals. Nature communications, 12(1), 1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerminen S, Havulinna AS, Hellenthal G, Martin AR, Sarin A-P, Perola M, … Ripatti S (2017). Fine-scale genetic structure in Finland. G3: Genes, Genomes, Genetics, 7(10), 3459–3468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim MS, Patel KP, Teng AK, Berens AJ, & Lachance J (2018). Genetic disease risks can be misestimated across global populations. Genome Biology, 19, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein RJ, Zeiss C, Chew EY, Tsai J-Y, Sackler RS, Haynes C, … Mayne ST (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308(5720), 385–389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kong A, Thorleifsson G, Frigge ML, Vilhjalmsson BJ, Young AI, Thorgeirsson TE, … Masson G (2018). The nature of nurture: Effects of parental genotypes. Science, 359(6374), 424–428. [DOI] [PubMed] [Google Scholar]
- Kowalski MH, Qian H, Hou Z, Rosen JD, Tapia AL, Shan Y, … Avery C (2019). Use of> 100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genetics, 15(12), e1008500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, … Parkinson H (2021). The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics, 53(4), 420–425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lander ES, & Schork NJ (1994). Genetic dissection of complex traits. Science, 265(5181), 2037–2048. [DOI] [PubMed] [Google Scholar]
- Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, … Weir BS (2010). Quality control and quality assurance in genotypic data for genome-wide association studies. Genetic Epidemiology, 34(6), 591–602. doi: 10.1002/gepi.20516 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawson DJ, Davies NM, Haworth S, Ashraf B, Howe L, Crawford A, … Timpson NJ (2020). Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Human Genetics, 139(1), 23–41. doi: 10.1007/s00439-019-02014-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawson DJ, Hellenthal G, Myers S, & Falush D (2012). Inference of population structure using dense haplotype data. PLoS Genetics, 8(1), e1002453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, … Linnér RK (2018). Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nature Genetics, 50(8), 1112–1121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis AC, Molina SJ, Appelbaum PS, Dauda B, Di Rienzo A, Fuentes A, … Hammonds EM (2022). Getting genetic ancestry right for science and society. Science, 376(6590), 250–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lovell MC (2008). A simple proof of the FWL theorem. The Journal of Economic Education, 39(1), 88–91. [Google Scholar]
- Lynch M, & Walsh B (1998). Genetics and analysis of quantitative traits (Vol. 1): Sinauer; Sunderland, MA. [Google Scholar]
- Mak TSH, Porsch RM, Choi SW, Zhou X, & Sham PC (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology, 41(6), 469–480. [DOI] [PubMed] [Google Scholar]
- Marees AT, De Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, & Derks EM (2018). A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. International Journal of Methods in Psychiatric Research, 27(2), e1608. doi: 10.1002/mpr.1608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, … Kenny EE (2017). Human demographic history impacts genetic risk prediction across diverse populations. The American Journal of Human Genetics, 100(4), 635–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, & Daly MJ (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics, 51(4), 584–591. doi: 10.1038/s41588-019-0379-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mathieson I, & Scally A (2020). What is ancestry? PLoS Genetics, 16(3), e1008624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McClellan J, & King M-C (2010). Genetic heterogeneity in human disease. Cell, 141(2), 210–217. [DOI] [PubMed] [Google Scholar]
- McEvoy BP, Powell JE, Goddard ME, & Visscher PM (2011). Human population dispersal "Out of Africa" estimated from linkage disequilibrium and allele frequencies of SNPs. Genome Res, 21(6), 821–829. doi: 10.1101/gr.119636.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mills MC, Barban N, & Tropf FC (2020). An introduction to statistical genetic data analysis: Mit Press. [Google Scholar]
- Mills MC, & Tropf FC (2020). Sociology, Genetics, and the Coming of Age of Sociogenomics. Annual Review of Sociology, 46. [Google Scholar]
- Morris TT, Davies NM, Hemani G, & Smith GD (2020). Population phenomena inflate genetic associations of complex social traits. Science Advances, 6(16), eaay0328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, & Przeworski M (2020). Variable prediction accuracy of polygenic scores within an ancestry group. eLife, 9. doi: 10.7554/elife.48376 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicholls HL, John CR, Watson DS, Munroe PB, Barnes MR, & Cabrera CP (2020). Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci. Frontiers in Genetics, 11. doi: 10.3389/fgene.2020.00350 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, … Bustamante CD (2008). Genes mirror geography within Europe. Nature, 456(7218), 98–101. doi: 10.1038/nature07331 [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Connor TD, Fu W, Mychaleckyj JC, Logsdon B, Auer P, Carlson CS, … Akey JM (2015). Rare Variation Facilitates Inferences of Fine-Scale Population Structure in Humans. Molecular Biology and Evolution, 32(3), 653–660. doi: 10.1093/molbev/msu326 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Okbay A, Wu Y, Wang N, Jayashankar H, Bennett M, Nehzati SM, … Gjorgjieva T (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pain O, Glanville KP, Hagenaars SP, Selzam S, Fürtjes AE, Gaspar HA, … Plomin R (2021). Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genetics, 17(5), e1009021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Panofsky A, & Bliss C (2017). Ambiguity and scientific authority: Population classification in genomic science. American Sociological Review, 82(1), 59–87. [Google Scholar]
- Patterson N, Price AL, & Reich D (2006). Population Structure and Eigenanalysis. PLoS Genetics, 2(12), e190. doi: 10.1371/journal.pgen.0020190 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pingault JB, Allegrini AG, Odigie T, Frach L, Baldwin JR, Rijsdijk F, & Dudbridge F (2022). Research Review: How to interpret associations between polygenic scores, environmental risks, and phenotypes. Journal of Child Psychology and Psychiatry, 63(10), 1125–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pirastu N, Cordioli M, Nandakumar P, Mignogna G, Abdellaoui A, Hollis B, … Baya N (2021). Genetic analyses identify widespread sex-differential participation bias. Nature Genetics, 53(5), 663–671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plomin R, & Von Stumm S (2022). Polygenic scores: prediction versus explanation. Molecular psychiatry, 27(1), 49–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Posth C, Renaud G, Mittnik A, Dorothée, Rougier H, Cupillard C, … Krause J (2016). Pleistocene Mitochondrial Genomes Suggest a Single Major Dispersal of Non-Africans and a Late Glacial Population Turnover in Europe. Current Biology, 26(6), 827–833. doi: 10.1016/j.cub.2016.01.037 [DOI] [PubMed] [Google Scholar]
- Press N. (2006). Social Construction and Medicalization: Behavioral Genetics in Context. In Parens E, Chapman AR, & Press N (Eds.), Wrestling with Behavioral Genetics: Science, Ethics, and Public Conversation (pp. 131–149). Baltimore, MD: The John’s Hopkins University Press. [Google Scholar]
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, & Reich D (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38(8), 904–909. doi: 10.1038/ng1847 [DOI] [PubMed] [Google Scholar]
- Price AL, Zaitlen NA, Reich D, & Patterson N (2010). New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics, 11(7), 459–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pritchard JK, & Rosenberg NA (1999). Use of unlinked genetic markers to detect population stratification in association studies. The American Journal of Human Genetics, 65(1), 220–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Privé F, Arbel J, & Vilhjálmsson BJ (2020). LDpred2: better, faster, stronger. Bioinformatics, 36(22-23), 5424–5431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, … The International Schizophrenia Consortium. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460(7256), 748–752. doi: 10.1038/nature08185 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Richardson K, & Jones MC (2019). Why genome-wide associations with cognitive ability measures are probably spurious. New Ideas in Psychology, 55, 35–41. doi: 10.1016/j.newideapsych.2019.04.005 [DOI] [Google Scholar]
- Robinson MR, Kleinman A, Graff M, Vinkhuyzen AA, Couper D, Miller MB, … Nolte IM (2017). Genetic evidence of assortative mating in humans. Nature Human Behaviour, 1(1), 0016. [Google Scholar]
- Rosenberg NA, Edge MD, Pritchard JK, & Feldman MW (2019). Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evolution, medicine, and public health, 2019(1), 26–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Royal CD, Novembre J, Fullerton SM, Goldstein DB, Long JC, Bamshad MJ, & Clark AG (2010). Inferring genetic ancestry: opportunities, challenges, and implications. The American Journal of Human Genetics, 86(5), 661–673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaid DJ, Chen W, & Larson NB (2018). From genome-wide associations to candidate causal variants by statistical fine-mapping. Nature Reviews Genetics, 19(8), 491–504. doi: 10.1038/s41576-018-0016-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Searle JR (1995). The construction of social reality: Simon and Schuster. [Google Scholar]
- Selzam S, Ritchie SJ, Pingault J-B, Reynolds CA, O’Reilly PF, & Plomin R (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351–363. doi: 10.1016/j.ajhg.2019.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi J, Park J-H, Duan J, Berndt ST, Moy W, Yu K, … Silverman D (2016). Winner's curse correction and variable thresholding improve performance of polygenic risk modeling based on genome-wide association study summary-level data. PLoS Genetics, 12(12), e1006493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skoglund P, & Mathieson I (2018). Ancient genomics of modern humans: the first decade. Annual Review of Genomics and Human Genetics, 19, 381–404. [DOI] [PubMed] [Google Scholar]
- Slatkin M. (1985). Rare alleles as indicators of gene flow. Evolution, 39(1), 53–65. [DOI] [PubMed] [Google Scholar]
- Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, … Sunyaev SR (2019). Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife, 8. doi: 10.7554/elife.39702 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Speed D, & Balding DJ (2019). SumHer better estimates the SNP heritability of complex traits from summary statistics. Nature Genetics, 51(2), 277–284. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Speed D, Hemani G, Johnson MR, & Balding DJ (2012). Improved heritability estimation from genome-wide SNPs. The American Journal of Human Genetics, 91(6), 1011–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strachan T, & Read AP (2018). Human molecular genetics. [Google Scholar]
- Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, … Korbel JO (2015). An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571), 75–81. doi: 10.1038/nature15394 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takumi T, & Tamada K (2018). CNV biology in neurodevelopmental disorders. Current Opinion in Neurobiology, 48, 183–192. [DOI] [PubMed] [Google Scholar]
- Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, … Abecasis GR (2021). Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature, 590(7845), 290–299. doi: 10.1038/s41586-021-03205-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tam V, Patel N, Turcotte M, Bossé Y, Paré G, & Meyre D (2019). Benefits and limitations of genome-wide association studies. Nature Reviews Genetics, 20(8), 467–484. doi: 10.1038/s41576-019-0127-1 [DOI] [PubMed] [Google Scholar]
- Telenti A, Pierce LCT, Biggs WH, Di Iulio J, Wong EHM, Fabani MM, … Venter JC (2016). Deep sequencing of 10,000 human genomes. Proceedings of the National Academy of Sciences, 113(42), 11901–11906. doi: 10.1073/pnas.1613365113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, … Akey JM (2012). Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337(6090), 64–69. doi: 10.1126/science.1219240 [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Haplotype Reference Consortium. (2016). A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics, 48(10), 1279–1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 437(7063), 1299–1320. doi: 10.1038/nature04226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tooby J, & Cosmides L (1992). The psychological foundations of culture. The adapted mind: Evolutionary psychology and the generation of culture, 19, 19–136. [Google Scholar]
- Truong VQ, Woerner JA, Cherlin TA, Bradford Y, Lucas AM, Okeh CC, … Pividori M (2022). Quality Control Procedures for Genome-Wide Association Studies. Current Protocols, 2(11), e603. [DOI] [PubMed] [Google Scholar]
- Turkheimer E. (2011). Genome wide association studies of behavior are social science. In Philosophy of behavioral biology (pp. 43–64): Springer. [Google Scholar]
- Turkheimer E. (2019a). Against Decile Analysis. Retrieved from https://www.geneticshumanagency.org/gha/against-decile-analysis/ [Google Scholar]
- Turkheimer E. (2019b). The social science blues. Hastings Center Report, 49(3), 45–47. [DOI] [PubMed] [Google Scholar]
- Tyrrell J, Zheng J, Beaumont R, Hinton K, Richardson TG, Wood AR, … Tilling K (2021). Genetic predictors of participation in optional components of UK Biobank. Nature communications, 12(1), 886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, … Posthuma D (2021). Genome-wide association studies. Nature Reviews Methods Primers, 1(1), 1–21. [Google Scholar]
- Veller C, & Coop G (2023). Interpreting population and family-based genome-wide association studies in the presence of confounding. bioRxiv, 2023.2002.2026.530052. doi: 10.1101/2023.02.26.530052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, … Do R (2015). Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The American Journal of Human Genetics, 97(4), 576–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, & Yang J (2017). 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics, 101(1), 5–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wand H, Lambert SA, Tamburro C, Iacocca MA, O’Sullivan JW, Sillari C, … Brockman D (2021). Improving reporting standards for polygenic scores in risk prediction studies. Nature, 591(7849), 211–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ware EB, Gard A, Schmitz L, & Faul J (2021). HRS Polygenic Scores — Rlease 4.3 2006-2012 Genetic data. Retrieved from Ann Arbor, Mchigan: [Google Scholar]
- Ware EB, Schmitz LL, Faul J, Gard A, Mitchell C, Smith JA, … Kardia SL (2017). Heterogeneity in polygenic scores for common human traits. bioRxiv, 106062. [Google Scholar]
- Weir B. (2008). Linkage disequilibrium and association mapping. Annu. Rev. Genomics Hum. Genet, 9, 129–142. [DOI] [PubMed] [Google Scholar]
- Weissgerber TL, Milic NM, Winham SJ, & Garovic VD (2015). Beyond bar and line graphs: time for a new data presentation paradigm. PLoS biology, 13(4), e1002128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray NR, Goddard ME, & Visscher PM (2007). Prediction of individual genetic risk to disease from genome-wide association studies. Genome Research, 17(10), 1520–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray NR, Kemper KE, Hayes BJ, Goddard ME, & Visscher PM (2019). Complex trait prediction from genome data: contrasting EBV in livestock to PRS in humans: genomic prediction. Genetics, 211(4), 1131–1141. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456317/pdf/1131.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray NR, Lin T, Austin J, McGrath JJ, Hickie IB, Murray GK, & Visscher PM (2021). From basic science to clinical application of polygenic risk scores: a primer. JAMA psychiatry, 78(1), 101–109. [DOI] [PubMed] [Google Scholar]
- Yang J, Lee SH, Goddard ME, & Visscher PM (2011). GCTA: A Tool for Genome-wide Complex Trait Analysis. The American Journal of Human Genetics, 88(1), 76–82. doi: 10.1016/j.ajhg.2010.11.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Zaitlen NA, Goddard ME, Visscher PM, & Price AL (2014). Advantages and pitfalls in the application of mixed-model association methods. Nature Genetics, 46(2), 100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yengo L, Robinson MR, Keller MC, Kemper KE, Yang Y, Trzaskowski M, … Benjamin DJ (2018). Imprint of assortative mating on the human genome. Nature Human Behaviour, 2(12), 948–954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young AI, Benonisdottir S, Przeworski M, & Kong A (2019). Deconstructing the sources of genotype-phenotype associations in humans. Science, 365(6460), 1396–1400. doi: 10.1126/science.aax3710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young AI, Frigge ML, Gudbjartsson DF, Thorleifsson G, Bjornsdottir G, Sulem P, … Kong A (2018). Relatedness disequilibrium regression estimates heritability without environmental bias. Nature Genetics, 50(9), 1304–1310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young AI, Nehzati SM, Benonisdottir S, Okbay A, Jayashankar H, Lee C, … Kong A (2022). Mendelian imputation of parental genotypes improves estimates of direct genetic effects. Nature Genetics, 54(6), 897–905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, … Holland JB (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics, 38(2), 203–208. [DOI] [PubMed] [Google Scholar]
- Zaidi AA, & Mathieson I (2020). Demographic history mediates the effect of stratification on polygenic scores. eLife, 9. doi: 10.7554/elife.61548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zöllner S, & Pritchard JK (2007). Overcoming the winner’s curse: estimating penetrance parameters from case-control data. The American Journal of Human Genetics, 80(4), 605–615. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

