Abstract
Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts and future challenges in the field. Advances in technology, statistical methods and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.
Introduction
In the publication describing the initial draft of the human genome 1,2, the progress during the 20th century in understanding the structure and content of genetic information was divided into four phases. Each of them spanned about a quarter of the century: discovery of chromosomes; defining the molecular structure of DNA; discovery of the molecular machinery of gene function; and finally determining the sequence of entire genes, scaffolds, and genomes. These achievements propelled the entire field of genetics into the genomic era in the early 21th century.
As the first quarter of this century soon draws to a close, we can reflect on the crowning achievements of genomics during this period: the characterization of genetic variation in human populations and the discovery of its contribution to phenotypic variation. Since the publication of the draft sequence of the human genome, human genetics has experienced dramatic growth in both the diversity and quality of genetic data, alongside an understanding of how genetic variation is linked to a wide variety of phenotypes (Figure 1a). These developments have demonstrated the fundamental role that genetics has in characterizing human biology, ranging from molecular to physiological levels, as well as the evolutionary history of our species and the evolution of complex traits, as we discuss in this Review.
Figure 1. Data sets and motivation for human genetics research.
A) Growth in human genetics data set as exemplified by properties of selected landmark studies, plotted by the comprehensiveness of the genome analysis (x-axis) with the technologies indicated on the top, and the number of donors (y-axis). The type and quantity of phenotype data available is indicated by the dots. Underlined project names include a relatively balanced representation of individuals from diverse ancestries. The projects shown are Human Genome Project (HGP); HapMap; Wellcome Trust Case Control Consortium (WTCCC); 1000 Genomes (1KG), UK Biobank (UKBB), Pangenome project, Genotype Tissue Expression (GTEx), and Trans-Omics for Precision Medicine (TOPMed). WGS = whole genome sequencing. B-C) Illustration of the two complementary approaches how human genetics contributes to human health. B illustrates how well-powered GWAS can allow building polygenic risk scores that can be used for personalized disease risk prediction. C illustrates how understanding the functional mechanisms of GWAS loci can allow targeting these mechanisms with drugs and other interventions to prevent or treat disease.
These investments have been also motivated by the potential of human genetic research to enhance human health. This can unfold through two synergistic routes. Accurate prediction of genetic effects on disease risk can improve diagnosis, prognosis and treatment selection. While genomic medicine has already had a transformative clinical impact in rare disease3,4, analogous applications in complex diseases are only now emerging from polygenic risk scores, as discussed further below5 (Figure 1b). Beyond prediction, human genetics also empowers the development of new drugs and interventions via identification of causal genes and molecular mechanisms involved in disease6 (Figure 1c). This paradigm is well supported by the higher success rates for drug targets backed by genetic evidence7,8. This requires characterization of functional mechanisms of genetic disease associations, which remains a considerable challenge, with current insights and ways forward discussed further below. A foundation of these goals of genetic prediction and mechanistic understanding is population genetics that describes the processes that have given rise and maintain human genetic variation.
Origins and contemporary patterns of genetic variation in human populations
Population genetics, which originated just over a century ago alongside modern statistics, is the study of the origin and evolution of genetic variation within groups of individuals. Classically, “population” is used in the biological sense of groups of randomly mating individuals. When applied to our own species, however, the term is often used to demarcate groups of humans and thereby implies discrete units of human genetic variation that is in stark contrast with the incredible amount of shared variation among all humans 9,11. Although we will use “population” throughout this review in the technical biological sense, there is an increasing disciplinary call to shift from labeling human groups as discrete separate units, particularly when social systems have influenced those unit labels 10,12. In fact, one of the ten “bold predictions for human genomics by 2030” from the US National Human Genome Research Institute’s strategic plan is that “Research in human genomics will have moved beyond population descriptors based on historic social constructs such as race.”13.
Quoting Hubby and Lewontin 14, who were the first to use gel electrophoresis to demonstrate variation at the genetic level in natural populations, “a description of the genetic variation in a population is the fundamental datum of evolutionary studies.” Thus population genetic studies which focus on the genetic variation in a population are crucial to understanding the genetic basis of complex traits, and many future opportunities for research lie at the intersection of human population genetics and statistical genetics, as we discuss later.
Genomic era datasets for population genetics
Many insights into recent human population histories have been enabled by early projects pursued in tandem with the Human Genome Project, in particular the Human Genome Diversity Panel (HGDP) 15; and the International HapMap Project (hereafter “HapMap”) 16. The HGDP consisted of lymphoblastoid cell lines from 1064 individuals from globally distributed populations, collected for characterization of human population genetic variation17. The HapMap was an international collaboration that began in 2002 and focused on the development of a haplotype map of the human genome, with a specific motivation to advance genetic association studies. The HapMap ultimately led to the 2009 release of over 1.6 million single nucleotide variants (SNVs) from 1301 samples from 11 populations16. The 1000 Genomes Project was initiated in 2008 as a continuation of the HapMap project to catalog the variants in the human genome that have a frequency of at least 1% in the populations studied. This was done by expanding focus from only SNVs to include other types of genetic variants, and using both low- and high-coverage whole genome and exome sequencing, ushering in a new phase of population genetics focusing on analyzing whole genome sequences18,19. The 1000 Genomes Project’s data includes sequences from over 2,500 individuals from 26 populations.
Alongside the projects focused on characterizing genetic variation of the general population, with accessible data resources but no link to donor phenotypes, large case-control data sets created for genetic mapping of complex traits (see below) were also used for population genetic inference. Today, this trend continues with an increasing emphasis of biobank datasets that combine genetic data sets to comprehensive phenotyping from up to hundreds of thousands of individuals, often leveraging healthcare systems and registries. A major benefit of biobanks for population genetic research is that they offer high-resolution insight into gene flow, assortative mating and population structure over the last few hundred years, enabled due to their scale and the presence of distant and close genetic relatives. For example, the UK Biobank contains over 40,000 first and second degree relative pairs20, and FinnGen contains over 30,000 first and second degree relative pairs21.
While the early disease-focused case-control data sets had sparse data from non-European ancestries24, during the biobank era some progress has been made – at least in absolute numbers of non-European donors, such as in BioBank Japan and All of Us. However, European ancestries still dominate most biobanks, such as UK Biobank, FinnGen, deCODE, and the Estonian biobank. Unfortunately, African populations remain understudied and underrepresented in genetic data sets even though they hold particular value in understanding the origins of human genetic variation22,23,25,26. Furthermore, research on large-scale and biobank datasets often still ignores data from minoritized groups27,28, underscoring the importance not just of diverse data for analyses but of methods for handling imbalanced datasets and varying levels of linkage disequilibrium (LD, the correlation of alleles at different genetic variants) in human genetic studies. The need for data and methods is coupled with ethical, legal and social issues in diversifying genetic research. Many population genetic studies, beginning with the HGDP, have grappled with and continue to grapple with ethical considerations regarding the collection and use of genetic data from participating individuals and their communities. For biobank projects and other genetic studies, informed consent practices are paramount, together with the need for community engagement and release of results to stakeholders.
Insights into human population history from genetic studies
Population genetics methods applied to data of genetic variation in natural populations enable inferences about the past based on four fundamental processes: mutation, recombination, drift (caused by finite population size), and selection. A fifth process of migration (gene flow) is increasingly appreciated as a force shaping human genetic variation at multiple timescales. Altogether, analysis of these processes has provided valuable insights into human population history and its contribution to the contemporary patterns of genetic variation in humans.
One of the major focus areas of population genetic research has been characterization of human migrations across vast time scales. New models for human origins have recently highlighted the complexity of deep population structure in Africa, which in turn offers paths to expand the focus of studies of archaic introgression into modern humans beyond patterns out of Africa 29,30. In many geographic regions, local migrations over millennia have produced a high correlation between genetic distance and geographic distance 31,32. However, historical events such as colonization and chattel slavery further led to the founding and persistence of admixed populations — populations descended from gene flow between two or more previously separated source populations, whose descendant individuals derive ancestry in differing proportions over time from the source populations33,34. The advances in sequencing technologies in the genomic era and introduction of large-scale datasets for medical studies have enabled insight into very recent and much more localized gene flow, for example during The Great Migration (1910–1970) of African Americans out of the US South35. While some migration events are relatively well known from archeological and historical records, genetic data that captures biological ancestry has provided unique insights to population movements during human history.
Population founding events that have characterized much of human history lead to dramatic reductions in genetic variation. This, together with the relatively recent origin of our species in Africa, has resulted in a pattern where genetic differences among human individuals’ genomes are very small and less than for many other species; common variants are often shared across populations; and most human genetic variation is quite rare and confined to single continental ancestries. While single nucleotide variants occur in 3.1% of sites in the genome36, the vast majority of all the cataloged variants are vanishingly rare, and thus any two individuals differ by an average of a few million single nucleotide variants, representing less than 0.1% of the genome. Due to serial founder effects, the amount of genetic variation decreases with population distance from Africa: Recent efforts to harmonize the HGDP and 1000 Genomes high-quality whole-genome sequence data15,19 counted an average of 6.1M SNVs per African individual and 5.3M SNVs in others37, with similar patterns for structural variants. In pairwise comparisons, two Yoruba individuals had 4,897,091 pairwise differences at sequenced single-nucleotide variants, over 38% more than two French individuals or two East Asian individuals36. LD is also lowest in African ancestries, followed by European, Asian and the American ancestries.
Genetic architecture of human diseases and traits
Connecting genetic variation to phenotypes and understanding the underlying biological mechanisms has been a fundamental goal of human genetics, but the means to achieving this goal have changed dramatically over the past decades. Initial efforts focused on genotyping individuals with severe or highly familial conditions to identify the causal pathogenic mutations they shared, under the assumption that these mutations would be highly penetrant and few in number. In parallel, linkage studies collected families with affected and unaffected individuals and traced the genetic segments that were overrepresented in cases, sometimes implicating large haplotypes with many genes. As the cost of genotyping decreased, the study of common traits shifted towards association studies, wherein large cohorts of unrelated cases and controls were genotyped and individual variants tested for correlation with the trait of interest. An initial period of candidate gene association studies, where only predefined regions were genotyped and tested, led to contradictory findings38, with many questioning the contribution of common variants to common disease39. However, both theory and practical application of genome-wide association studies (GWAS), together with rigorous multiple test correction, began to yield robust associations that replicated across independent studies40. Even these early associations were often surprisingly weak, indicative of either a small contribution of common genetic variation to phenotype, or a highly polygenic contribution involving many variants41. As GWAS sample sizes grew, evidence for the polygenicity of common traits accumulated42, implying that very large studies are necessary to identify the full spectrum of causal genetic variants. This has motivated the rise of large-scale biobanks and propelled the number of genome-wide significant associations into the hundreds of thousands, enabling highly precise estimates of disease architecture: the number, frequencies, genomic distribution and disease contributions of causal variants across the genome, discussed in detail below.
Recently, disease genetics has come full circle with large-scale sibling and family-based GWAS, which mirror the early linkage studies but at massive scale43. Family-based studies enable the partitioning of disease architecture into so-called direct and indirect effects; the former associated with variants within an individual and the latter associated with variants shared by their relatives (presumably acting through shared environments). While still relatively small, these studies have demonstrated that many apparent genetic associations are, in fact, correlated with rather than causal for environmental influences on traits, potentially spanning generations and communities44. While the study of common traits has primarily been driven by GWAS of common variants, enabled by inexpensive genotyping arrays, the contribution of rare variants is now being quantified through large-scale exome- and genome- sequencing studies that can capture the full spectrum of genetic variation45,46. Direct genetic association studies are often underpowered for rare variants, leading to the use of burden tests that “collapse” all variation in a tested gene. Analogous approaches, albeit with different implementations and standards that often include features of linkage analysis, are applied in Mendelian disease genetics.
Ubiquitous common variant heritability
Decades before genotyping, the total contribution of genetics to a trait, i.e. the trait heritability, could be estimated through the use of twin and family-based studies. Under certain strict assumptions47, the increased correlation in phenotype between monozygotic and dizygotic twins or across family relationships can be decomposed into genetic and environmental components. Large-scale genotyping enabled the application of similar principles to putatively unrelated individuals, by contrasting subtle patterns of genetic similarity with phenotypic similarity to estimate the so-called genotype-, SNP-, or chip- heritability. The resulting parameter quantifies the variance in phenotype explained by all genotyped variants and any untyped variation they are correlated with48. Multiple methodologies have been devised for estimating SNP-heritability, either using individual-level data42, polygenic scores49, or only summary-level data50 but all of these approaches have converged on the general finding that most common traits have a significant SNP heritability. For example, in the UK Biobank, 551 common phenotypes had a mean SNP-heritability of 10.9% and 15.6% across all illness and non-illness traits, respectively51; in the Biobank Japan a mean SNP-heritability was estimated to be 8.6% across 58 continuous traits52. Indeed, nearly every common biobank phenotype has some correlation with genetics, with 91% of traits in the FinnGen biobank21 exhibiting at least one genome-wide significant association (for traits with >10,000 cases). The identification of some genetic variants influencing any common trait should thus be the expectation rather than the exception.
Extreme polygenicity of common traits
In addition to SNP heritability, another key parameter driving genetic discoveries is the trait polygenicity: the total number of causal variants influencing the trait and the distribution of their effect sizes. Highly polygenic traits involve many weak causal variants and require large sample-sizes to characterize. Because most causal variants are still unknown, various quantifications of trait polygenicity have been proposed, such as the number of non-null effects on a trait53, the effective number of independent variants54, or the minimum number of causal variants explaining a given fraction of heritability55. Regardless of the statistical model, polygenicity has been consistently estimated to be very high, ranging from thousands of causal variants for some estimators53,54 to millions of variants for others55. These staggering estimates would imply that, for some traits, many causal variants are acting through nearly every gene in the genome on average and implicate more than half of all common polymorphisms55. In general, cellular and pigmentation traits exhibit the lowest polygenicity (hundreds of causal variants53,55 ) whereas anthropometric and cognitive/behavioral traits exhibit some of the highest estimates (>10,000 effective variants54). While traits with similar heritabilities often exhibited different levels of polygenicity53, the variance in polygenicity across traits was generally lower than expected suggesting that selection, one of the factors driving polygenicity, may be acting on pleiotropically across traits rather than on any one measured phenotype56,57. Recently, a GWAS of height in 5.4 million participants demonstrated that 12,111 jointly significant variants explained 40% of the phenotypic variation (compared to total SNP-heritability of 45%), lending the first direct evidence for high trait polygenicity58. The evolutionary causes of high polygenicity continue to be actively investigated56, but the implications are clear: understanding human traits will require distilling the function of tens of thousands of variants59,60.
Functional partitioning of disease polygenicity
Similar to partitioned SNP-heritability, polygenicity can also be partitioned to quantify whether a given functional annotation contains variants with strong or weak effects on disease. Strikingly, estimates of partitioned polygenicity exhibit very high correlation with partitioned heritability (r2=0.88)54. SNPs in conserved regions, for example, are enriched 13x for heritability and likewise enriched 14x for polygenicity relative to other SNPs, implying that their outsized contribution to heritability may be due in part to an increase in the number of causal variants rather than an increase in the absolute effect sizes. This model, referred to as a “flattening” of heritability, posits that natural selection has distributed (or “flattened”) causal variation in functionally important regions to be more polygenic. Because higher polygenicity also leads to decreased GWAS power, the most significant GWAS associations (and those identified in smaller GWAS studies) may thus not reside in the most functionally “important” regions. The “flattening” of genetic effects may also explain why many complex traits appear to be “omnigenic”59: governed by a small number of “core” genes with direct effects on the trait, which are in turn disproportionately dampened by negative selection, thus increasing the relative contribution of “peripheral” genes with no direct connection to the trait61. One interpretation of both the flattening and omnigenic models is that mapping “core” genes from top GWAS hits alone may be difficult, with large variant effect sizes not implicating the most biologically relevant genes or drug targets. Intriguingly, recent analyses have shown that approved drug target genes are enriched for GWAS association evidence, regardless of the effect size, allele frequency, or year of GWAS7. Larger, better powered GWAS may thus continue to yield important insights into disease mechanisms and therapeutics or even increase in relevance.
Rare variant heritability
The emergence of large whole genome and whole exome sequencing studies has started to enable the characterization of rare and low frequency variant disease architectures. Initially, studies relied on genotype imputation from reference data to explore the heritability of low-frequency variants (0.5–5%). For example, across 40 UK Biobank traits, coding variants explained a greater proportion of low-frequency SNP heritability (17%, 38x enriched) than of common SNP heritability (2%, 7.7x enriched), consistent with the action of negative selection in keeping large effect (typically coding) variants at lower frequencies62. Nevertheless, all coding and UTR variants (i.e. those captured by exome-sequencing) still explained only 26.8% of low-frequency SNP heritability, indicating that whole-genome sequencing would be necessary to identify most low-frequency effects.
Recently, whole-genome sequencing data from 25,465 unrelated individuals was leveraged to estimate total SNP heritability, including rare variants63. These total heritability estimates were 68% for heights and 30% for BMI, contrasted with a common SNP heritability of 48% and 24% respectively. Rare variants may thus increase the explained trait variance by 1.25–1.4x relative to common variants alone. A major contribution to the heritability of height came from very rare variants in low LD, which are particularly difficult to impute from reference panels. While a fundamental advance, the study had limitations: the use for complex heritability partitioning to account for allele frequency and LD biases, and the use of conventional common variant approaches to account for population structure (which can fail for rare variation). More data, more traits, and novel methodological approaches will continue to shed light on the question of whole-genome heritability. Intriguingly, both estimates were significantly lower than prior estimates from twin studies, implying either the existence of additional untyped genetic variation (for example, due to structural variants) or systematic biases in the twin cohort analyses.
Emerging methods such as Burden Heritability Regression64 have expanded the estimation of genome-wide partitioned heritability to rare variation. Under the assumption that rare alleles are likely to have consistent effects within a gene, this approach quantifies the total variance in a trait that can be explained by gene burdens across all genes. When applied to 6.9 million coding variants across 22 common traits in the UK Biobank, the average burden heritability was estimated to be 1.3% (for loss of function and missense variants below 0.001 frequency) and significantly non-zero for each trait. Notably, genes that were individually significant in an independent analysis of the same cohort often explained a large fraction of the burden heritability: for example, APOB alone explained 39% of the burden heritability for LDL cholesterol, and 172 known tumor suppressor genes explained 48% of the burden heritability for a composite cancer phenotype. If accurate, these estimates would imply that the rare variant trait architecture is much less polygenic than the extreme polygenicity often observed for common variants. We caution that the characterization of rare variant disease architecture is still in its infancy, larger cohorts are orthogonal methods needed to understand these parameters and to move beyond relatively simple burden models.
Several large-scale exome-wide association studies have now been conducted and have yielded novel rare variant associations. Exome sequencing data from ~450,000 individuals in the UK Biobank were tested for association with ~4,000 traits, identifying 8,865 significant associations across 564 genes46. Multiple insights into disease architecture were observed. First, rare coding associations were significantly enriched near common GWAS loci, with an enrichment of 59.3x for the nearest gene to a GWAS association, decreasing to 11.4x for genes within one megabase of a GWAS association. These findings show a striking convergence of rare and common variant effects on common diseases. Second, FDA approved drug target genes were 3.6x more likely to exhibit an association, consistent with prior findings that drug targets with human genetics evidence are more likely to be approved. Third, 77% of associations could only be identified using a burden analysis and not single variant associations, underscoring burden heritability as a major driver of discoveries at this sample size. Fourth, while disease lowering associations are potentially the most attractive drug targets, only five such associations were identified and all were previously known, indicative of low power to detect protective effects in unascertained cohorts. While most rare variant associations have been deleterious, an exome study of smoking behavior in n=749,459 individuals, one of the largest to date, identified rare variants in CHRNB2 associated with a 35% decreased odds for smoking heavily65. This finding highlights the growing opportunities for discovering new levers into the treatment of common phenotypes.
Trans-ancestry genetic architecture
While most of the above analyses focused on genetic architecture within presumptively homogenous populations, progress is being made towards understanding genetic architecture across different populations. Theoretical and data-driven studies demonstrated that individual variant associations and aggregates of associations in polygenic scores are likely to translate poorly to genetically distant populations even if the underlying causal variants are shared66,67. This lack of transferability can be driven by a mixture of differences in causal variant allele frequency, linkage disequilibrium (LD) patterns to non-causal variants, and the true underlying effects (modulated, for example,by Gene-Gene or Gene-Environment interactions). A strikingly linear relationship between genetic distance and polygenic risk score predictive accuracy was recently demonstrated in a large, admixed biobank across 84 traits (mean Pearson correlation of −0.95 between genetic distance and accuracy)68. Importantly, while the mean risk score value also correlated significantly with genetic distance, the strength and direction of the correlation varied substantially across traits and populations, highlighting the challenges of correcting trans-ancestry score estimates. Beyond demonstrating lack of portability, the contributions of frequency, LD, and effect size are also now being quantified. A recent study of admixed individuals used local ancestry to quantify the correlation in causal effect sizes between African and European ancestry segments69. The estimate was remarkably high, with a mean causal effect-size correlation of 0.95+/−0.02 across 38 traits and three very different biobanks. This high genetic correlation was also consistent with prior work showing that poor polygenic score portability may be largely explained by frequency and LD differences between populations, rather than different causal variants70. Intriguingly, the genetic correlation was significantly lower (0.50+/−0.07) when estimated across non-admixed individuals from different populations in the same study, with prior studies also showing trans-ancestry correlations ranging from 0.46 to 0.85 and generally well below 1.0 71,72 Given the dearth of existing multi-ancestry cohorts27, these findings and open questions further emphasize the importance of designing association studies to maximize population-level and individual-level genetic diversity. Indeed, large multi-ancestry biobanks have already demonstrated increased ability to identify and refine causal variants73,74.
Multiple approaches are emerging for maximizing the utility of genetic data across populations and ancestries. Many studies have shown that both polygenic prediction and variant fine-mapping can be improved by incorporating functional annotations 75,76. Notably, the trans-ancestry accuracy in polygenic prediction accuracy gains are often substantially larger than the within-population gains, suggesting that better identification of causal variants can mitigate some of the heterogeneity due to frequency or LD differences across populations. Furthermore, power can be increased by aggregating (potentially heterogeneous) variant-level effects into sets such as genes, and then combining the effects of these sets across populations28,77,78. Such variant sets could in principle be aggregated at various biological scales – genes, pathways – and their effects further propagated through biological networks. These approaches highlight how deeper understanding of the causal biological network across traits and populations can be incorporated back into multi-ancestry analyses to further improve power.
Opportunities at the interface of human population genetics and statistical genetics
As the Human Genome Project outlined from its inception, research in human genomics is motivated by gaining understanding of the genetic basis of human disease and complex traits. Moving forward, such research must draw on population genetic models and data from the full diversity of our species. Here we outline a series of opportunities for research at the intersection of statistical and population genetics.
As discussed above, trans-ancestry transferability of genetic associations and polygenic scores remains a key challenge in the field, exacerbated by the ingoing lack of well-powered data sets for many non-European ancestries. While initial studies in admixed populations suggest that causal effect sizes may be largely shared across populations, the full extent of gene-gene and gene-environment interactions, as well as their population or trait specificity, remains to be quantified. Understanding these parameters will further inform the optimal design of accurate predictive scores across the full range of human diversity. Population genetic summaries and models of ancestry are needed to increase transferability of association results, especially for individuals of mixed ancestry for whom local ancestry-aware score construction may improve predictive accuracy79,80. Multiple studies have shown that increased genetic diversity improves the resolution of statistical fine-mapping, which in turn increases the accuracy of polygenic scores. Thus greater diversity of data from underrepresented individuals will yield immediate benefits for all individuals.
Additionally, environmental heterogeneity pervades human genetic studies and confounds our understanding of the genetic basis of human disease and complex traits81, leading to poor prediction of traits from genetic data alone. Models of assortative mating and consanguinity highlight how violations of standard population genetic assumptions of random mating and equilibrium population dynamics can inflate observed correlations between human traits97. Recent clustering83 and contrastive learning approaches84 highlight confounding factors that bias the downstream estimation of genetic effects. While family-based studies offer the ability to estimate direct and indirect effects of genetic variation on a given trait43,85, family-based estimates of direct effects can be biased by genetic confounding86, in ways that are compounded when estimating susceptibility using genome-wide association results. The scale of biobanks, increasing detail of metadata and environmental covariates, and the development of longitudinal followup efforts will enable more awareness and better controlling for environmental confounders, as well as allow for leveraging distant genetic relatives for estimating genetic effects and understanding recent population genetic processes such as pedigree collapse. Modeling the full complexity of human relationships, environmental correlations, and interactions will increase the causal validity and generalizability of genetic discoveries.
In order to prioritize traits for risk prediction and genetic studies, evolutionary models for complex trait architecture are key. Recent work on stabilizing selection suggests that trait architectures for many well-studied complex quantitative traits are similar in their polygenicity87 and that this mode of selection may lead to less cross-population transferability of association results88. Additionally, traits with smaller effects on fitness produce less transferable associations, with weak negative selection producing more population-specific trait architectures 89. Quantifying the extent of polygenic selection and adaptation on complex traits remains a great challenge, in part due to the complexities of disentangling subtle population stratification90,91. Improved methods to detect fine-scale population structure92, larger within-family analyses44, and comprehensive models of selection will enable a more complete understanding of genome-wide evolutionary processes. Beyond answering fundamental questions in human evolution, these findings will also have practical implications: how to mine the thousands of trait-associated loci for the most disease relevant genes and drug targets, and how to integrate the findings from rare and common variation.
Beyond understanding the mechanisms of known associations, there are many opportunities to incorporate novel or difficult to collect variation. Large-scale biobanks have highlighted the important role of structural variation, including copy number variants and tandem repeats with some of the largest effect sizes on traits seen to date93. Structural variants are typically not directly genotyped and often not imputed, leaving a gap in our knowledge of disease mechanisms beyond single variants. Methods for identification of complex structural changes directly from data94, as well as improvements in genome assembly95, may reveal entirely new classes of disease relevant variation. Similarly, while the role of additive and dominance variation has been well characterized through large-scale biobank and heritability analyses96, the influence of epistatic effects remains largely a mystery. While genetic interactions and hotspots are widespread in model organisms97 they have been challenging to characterize in humans due to the breadth of the search space and statistical limitations98. This is especially true for more complex relationships beyond simple pairwise interactions, which may be impossible to even enumerate in human populations. Integration of population genetic modeling99 and functional studies100 may push through the statistical limitations and expand our understanding of trait effects into higher orders.
Continued research at the nexus of population and statistical genetics, as well as the increased ability to study traits in biobanks using family-based and genealogical approaches, will help make gains towards improved variant discovery and risk prediction, while identifying traits with large environmental influences for which additional studies, data, and approaches will be needed for risk assessment, treatment and prevention.
Molecular and cellular effects of genetic variation
Genetic variants affecting complex physiological traits and diseases must have proximal effects on molecular functions, which then impact subsequent molecular processes at the cellular level. Deciphering these molecular and cellular mediators of genetic associations has emerged as a central focus of contemporary human genetics, as it can offer insights into molecular understanding of causal processes of disease. The significance of this extends beyond fundamental biology since these processes serve as potential intervention targets7,8 (Figure 1c). Furthermore, while many molecular effects of variants have no impact on physiological phenotypes (Figure 2), they represent a natural experiment of variations in the genome sequence, which can contribute to understanding the biology of genome function.
Figure 2. An illustration of genetic effects on functions at different levels.

There are large numbers of variants affecting molecular functions of the genome and the cell, many of which have no or smaller effects downstream. Variants affecting physiological, anatomical and disease traits can be under direct natural selection. The purple graph indicates the success in discovery of genetic associations for molecular traits (captured by molQTL mapping) and for physiological and disease traits (captured by classical GWAS), with a gap in our knowledge of genetic associations for cellular and tissue-level traits.
Methods for analyzing the functional effects of genetic variants
While the analysis of molecular effects of variants has been part of molecular genetics from its inception, it wasn’t until the DNA hybridization array technology in the 2000’s that genome-wide analysis became feasible. This technological advancement led to expression quantitative trait locus (eQTL) mapping to identify variants associated with gene expression levels, an approach first applied to humans about 15 years ago (Figure 3a). Since then, this method has evolved to cover molecular phenotypes from epigenomic measurements to splicing and protein levels, collectively often referred to as molecular QTLs (molQTLs). Large-scale projects have constructed molQTL resources for various tissues and cells, including under in vitro stimuli, with an increasing use of single cell technologies101. Most molQTLs robustly identified to date are in cis, i.e. affecting a nearby target gene via cis-regulatory mechanisms, because trans-QTLs between variants and genes across the genome can be reliably identified only with large sample sizes and careful control of confounders102. MolQTL methodology is reviewed in detail elsewhere103.
Figure 3. Approaches for understanding molecular effects of genetic variants at scale.

a) molecular QTL (molQTL) mapping, b) engineered perturbations of the genome, c) inference from multi-layered functional omics data.
Experimental genome perturbations in in vitro cellular systems have rapidly become popular tools for scalable mapping of molecular effects of genetic variants (Figure 3b). These approaches include episomal assays such as MPRA (massively parallel reporter assay), and perturbations of the genome using the CRISPR toolkit, coupled with diverse readouts of molecular effects. Ongoing efforts such as IGVF (Impact of Genomic Variation on Function) and AVE (Atlas of Variant Effects) pursue more systematic application of these tools towards both noncoding and coding variation. A key prerequisite for most of these approaches is high-quality fine-mapping to target the likely causal variant(s) at associated loci, and the improving methods and resources from the GWAS community will thus greatly enhance these experimental efforts.
Furthermore, the vast functional genomics data sets from projects such as ENCODE provide a powerful foundation for predictive inference of genetic variant effects even when genome variation is not directly assayed (e.g., 104,105; Figure 3c). Development of these methods is a highly active area of research, with progress particularly for predicting the effects of coding and splice-affecting variation104,106,107, while predicting the effects of transcriptional regulatory variants105 has proven to be challenging108.
Molecular architecture of complex trait loci
There are likely dozens of different molecular mechanisms by which genetic variation can impact organismal phenotypes. Among these mechanisms, perhaps the most easily interpretable is that of coding variants, which directly impact protein coding sequence and function. However, unlike early mapping of Mendelian disorder variants that found causal SNPs to nearly always affect coding sequences, GWASs have revealed very early on already that genetic variants underlying complex trait associations are often noncoding and impact gene expression109,110. These discoveries motivated an explosion of interest in understanding how genetic variants impact gene regulation, and particularly how they impact gene expression levels.
Over the last decade, several statistical methods have been developed and deployed on different datasets to identify functional enrichments of GWAS loci. The most compelling GWAS functional enrichments identified thus far have been in regions with high chromatin accessibility or in regions marked by histone modifications associated with enhancers and promoters110–113. In fact, the majority of SNP-heritability for a variety of common traits can be localized to regulatory rather than coding regions, with estimates of up to 79% of SNP heritability residing in DNAse I hypersensitive sites (spanning 16% of variants, a 4.9x enrichment) across 11 diseases114, or 15% of SNP-heritability residing in enhancer elements (spanning 0.4% of variants, a 37.5x enrichment) across 17 common traits112. Additionally, regions conserved in mammals were estimated to harbor 35% of SNP-heritability (spanning 2.6% of variants, a 9.6x enrichment)112, consistent with the expected role of evolutionary constrained elements in shaping disease architecture.
GWAS loci have also been reported to be enriched among variants associated with multiple types of regulatory variants. As expected, these include genetic variants that impact gene expression level regulation, e.g by affecting DNA methylation115, histone modification levels116, and chromatin accessibility117. The enrichment of trait heritability among eQTL fine-mapped SNPs is similar to that in enhancer regions (about 5X for non-specific eQTLs/enhancers, and 20X for eQTLs or enhancers specifically identified in trait-relevant cell-types112,118. However, the total trait heritability estimated to be explained by common variants overlapping eQTL SNPs (averaging 11%150 or 14%118 across traits estimated by mediation or enrichment analysis, respectively) tend to be much smaller than that overlapping enhancer and promoter regions (23.9–79.2%112). Although the 11–14% and 80% estimates likely represent conservative and optimistic estimates for heritability explained by eQTLs and variants in enhancers or promoters, respectively, these observations suggest that the quality of our maps of eQTLs lag far behind that of enhancer and promoter regions, and that more work is needed to understand how regulatory variants impact gene expression.
In addition to variants that impact gene expression levels, GWAS loci are also enriched in many other types of molecular QTLs such as those with effects on mRNA splicing120,121 and other effects on transcript structure122 and posttranscriptional modifications123. The estimated enrichment of GWAS signals in these molecular QTLs is highly variable and may be trait-dependent. For example, several studies have reported a higher enrichment of neuropsychiatric GWAS loci among variants that impact RNA splicing (sQTLs) than compared to that among eQTLs124; and a recent study found higher enrichment of autoimmune GWAS signal in RNA editing QTLs than compared to both eQTLs and sQTLs123. To date, the total heritability explained by different posttranscriptional molQTLs pale in comparison to that explained by eQTLs or variants in enhancer and promoter regions. Though, this may simply reflect the fact that eQTLs and enhancers have been studied at much larger scales and in a wider number of cell-types compared to other regulatory mechanisms.
In addition to functional enrichments that indicate cis-regulatory mechanisms of immediate molecular drivers of GWAS loci, GWAS offers a unique causality anchor to identify cell types and cell states where the causal molecular processes contributing to traits are taking place. Understanding the cell type specificity of these can inform disease biology as well as potential targets of interventions that minimize off-target side effects. For most complex traits, it is far from trivial to infer causal cell-types from clinical characteristics, as symptoms of a disease can pinpoint different tissues, cell-types or developmental stages than where processes that are causal to disease take place. The most fruitful approach to address this challenge has been to analyze GWAS heritability enrichment in genes and regulatory elements that are active in specific tissues and cell types (e.g., 113,125). Notably, a major finding has been that the enrichment of GWAS loci in enhancer/promoter regions is highest in cell-types or tissue-types that make intuitive sense. For example, autoimmune disease loci are most enriched in enhancers active in immune cell-types (e.g. T-cells, and B-cells), while neuropsychiatric disease loci are most enriched in neuronal cell-types. Still, these enrichments often only give us a coarse-grain idea of which cell-types contribute to a trait or disease, and more research is needed to find out whether the genetic signal is strong enough to identify more precise causal cell-types. In fact, previous work observed that although genes or enhancers with cell-type-specific patterns of activity were highly enriched in trait heritability, the bulk of the heritability was found in genes or enhancers that were broadly active in many or most cell-types59. These observations suggest that most of the genetic signal will be in enhancers with broad, rather than cell-type-specific activity. Thus, it is possible that the pleiotropic nature of functional enhancers limits our ability to use genetic signals to ‘fine-map’ causal cell-types.
Interpreting complex trait loci using molQTLs
MolQTL mapping is fundamentally a genetic method, while the other approaches showcased in Figure 3 are rooted in molecular and computational biology. Thus, we will discuss molQTL mapping in more detail below, with further discussion of history and methodology provided e.g. in103,126.
In the early 2010s, straightforward analyses that overlap significant GWAS and eQTL SNPs were used to identify variants associated with traits that also had a functional impact on gene expression levels, helping to identify potential causal gene. However, with the increasing number of GWAS and eQTL signals, it became evident that new statistical methods were needed, in particular to address the scenario where a large proportion of variants are associated with some molecular phenotypes due to LD127. As a result, several advanced statistical methods were developed to assess colocalization: whether the same variant(s) were likely causal drivers for both a GWAS signal and a molQTL signal in a specific genetic locus128,129. An alternative approach is the estimation of “molecular association” such as in Transcriptome-Wide Association Studies (TWAS), which test for an association between a predicted molecular phenotype (e.g. expression) and the trait130, and can similarly be applied to summary level data131,132. This approach relaxes the requirement of colocalization that causal variants are shared between the molecular and disease traits – they merely need to be correlated – while increasing sensitivity through the use of multivariable predictive models133,134. While neither approach can guarantee that the particular gene’s expression is causally related to disease etiology, colocalization removes spurious overlaps due to LD and molecular association enables sensitive quantification of the correlated effect and direction.
A notable observation from employing these methods is the low fraction of GWAS loci that colocalize with eQTLs135 or can be explained by molecular associations119. This is evident even for many immune or blood-related traits where the available eQTL data from relevant cell types is assumed to be comprehensive. For example, only about 25% of autoimmune trait GWAS loci colocalize with an eQTL from different immune cell-types136. Adding other molecular QTLs such as splicing QTLs can increase the colocalization rate, but still leaves the majority of autoimmune-associated loci without a colocalization137. More generally across complex traits, genetic effects mediated by cis-eQTLs account for an average of just 11% of trait heritability119. An additional complication is that regulatory elements and variants can regulate multiple genes and the one picked up by a colocalizing eQTL is not necessarily the truly causal disease gene in the locus138. Several reasons may account for the relatively modest rate of GWAS colocalization. First, genetic variants that affect gene regulation independent of gene expression level may play a larger role than we previously anticipated. While most research has concentrated on how genetic variants regulate gene expression levels, genetic variants can influence cellular biology through various other regulatory mechanisms, as previously discussed. Nevertheless, because the rate of colocalization between GWAS and expression QTLs is the highest across nearly all complex traits and among all molQTLs, the prevailing opinion is that the majority of trait variants operate by affecting protein expression levels. Supporting this, several recent studies found that genetic variants that impact chromatin activity (histone mark QTLs, or hQTLs) or accessibility (chromatin accessibility QTLs, or caQTLs) colocalize at much higher rates (sometimes ~50% more) than eQTLs from the same cell- or tissue-types173. These observations imply that trait-associated variants often regulate gene expression levels by modulating enhancer or promoter activity. Yet, the ability to statistically detect their impact on gene expression might be weaker than on chromatin-level phenotypes141). This aligns with the idea that enhancer activity has a simpler genetic architecture than gene expression level, as steady-state mRNA expression levels are affected by co- and post-transcriptional mRNA processing mechanisms in addition to mechanisms that impact transcription initiation. Another explanation for the higher chromatin QTL colocalizations compared to eQTLs is a difference in the discovery thresholds: for chromatin QTLs to be detected, the enhancer must be active in a given cell- or tissue type142, whereas for the same variant to be an eQTL, the enhancer must both be active and also drive gene transcription. For example, “primed” enhancers have been found to harbor caQTLs in multiple types of naive immune cells, but they appear to be eQTLs only in cells that were stimulated by cytokines or pathogens143.
Several theories have been proposed to explain the limited overlap between molecular QTLs and GWAS hits. Genes involved in complex traits may have redundant enhancers that buffer the impact of genetic variants on gene expression levels, making those eQTLs that are relevant to complex traits more difficult to identify144. Along similar lines, features of eQTL mapping may favor discovery of loci and genes with lower selective constraint, regulatory complexity, and functional importance, thus biasing the results away from identification of genes that underlie trait variation145. Another possible explanation is that for most traits, we have not studied gene expression in the cell-types or cell-states that are most relevant for disease135. Indeed, despite substantial sharing of molecular QTLs across tissues, it is possible that many dynamic QTLs dependent on temporal context of cellular state can only be identified in some as yet unexamined rare cell-type or developmental trajectory146.
A pessimistic interpretation may be that the value of eQTL studies in interpreting complex trait-associated variants is and will continue to be modest in the future. However, it may be wise to recall that early discoveries from GWASs with limited sample sizes were also very modest39. As the sample size of GWAS grew in the tens and hundreds of thousands, transformational insights emerged, many of which now shape our understanding of human traits and biology. Analogous scaling up of sample sizes on molQTL studies to the hundreds of thousands has not been done, and the largest studies102 come from bulk tissue samples which limits the power, resolution and interpretability of detecting regulatory effects that may operate and drive disease in specific cell types and cell states. While even larger and context-specific molQTL maps covering all relevant cell types is unlikely to provide a singular complete solution to molecular interpretation of GWAS loci, it remains as the only approach that allows interrogation of genetic variant effects in diverse primary cell types. Thus, we foresee that expanding molQTL studies will continue to have value in the future, alongside other approaches that use in vitro perturbations and computational inference from epigenomic data (Figure 3).
Cellular programs and physiological effects of disease-associated loci
During the past 10 years, the main focus in functional interpretation of GWAS has been on identifying the causal driver genes in the locus. While the toolkit for this inference is still incomplete, in hundreds if not thousands of GWAS loci the causal gene in cis has been identified with a reasonable confidence147. However, these studies have so far provided limited information about the cellular programs and downstream physiological mechanisms that underlie a disease (Figure 2). This is due to two major gaps in our knowledge: functional annotation of human genes is very incomplete, and understanding of cellular programs and regulatory networks that tie individual genes into broader cellular behaviors is even more incomplete. Furthermore, genes and variants can have pleiotropic effects across cells and tissues, making it difficult to distinguish disease-causing effects.
Thus, our advancing interpretation of cis-regulatory mechanisms must be coupled with vigorous efforts to link variants and genes to cellular programs and further to physiological mechanisms that underlie traits and diseases. There are numerous approaches to pursue this goal, typically extending the concepts outlined in Figure 3 to cellular phenotyping that is informative of functional effects beyond the cis-regulatory space. Well-powered GWAS with a large number of loci has allowed enrichment analyses of the implicated genes in annotations of cellular networks and pathways, pinpointing likely trait-relevant functions (e.g., 148,149). Furthermore, GWAS variants can be directly associated to molecular traits across the genome, in particular via trans-eQTL mapping. This requires very large sample sizes of thousands of individuals and careful analysis to avoid confounding factors102. Future single-cell analyses have the potential to further increase the informativeness of this approach. GWAS for measured or inferred cellular traits such as transcription factor activity or cell morphology can further link molecular changes to cellular programs, but phenotyping in adequately large sample sizes has been a challenge (Figure 2). GWAS for large-scale measurements of tissue-level phenotypes provide interesting examples of mechanistically more interpretable traits that lie between molecular traits measured directly from cells, and highly complex physiological phenotypes captured by classical GWAS. An emerging approach for characterization of genetic effects for cellular traits are “cell villages” where cells from multiple donors are grown together and phenotyped by e.g. cell sorting, and enrichment of genetic variants in cellular phenotype groups indicates an association to that trait193.
Conclusions and future prospects
Human genetics continues to thrive as a very dynamic field. As discussed above, the expanding and diversifying data sets that include not only genetic variation but also data or inference of phenotype data, environmental factors and family relationships provide ample opportunity for understanding the population genetic processes that have given rise to the current spectrum of genetic variation in humans. Genetic association studies are finally starting to cover the full spectrum of different types of variants across the frequency spectrum. The integration of population genetics and statistical genetics provides a rich opportunity for improved mapping of genetic architecture of complex traits. As the number of robustly identified genetic associations has exploded, the challenge of their functional interpretation has become a central question in the field, now tackled with a combination of tools expanding beyond genetics to molecular computational biology.
These advances have also shed light on persistent challenges and open questions in the field, some of which are highlighted in Table 1. Beyond generating more data, advances are likely to come from the synthesis of insights across the disciplines of quantitative, molecular, population, and epidemiological genetics. Understanding the causes and consequences of disease architecture (widespread polygenicity and pleiotropy, in particular) will require advances in quantitative genetics to incorporate parameters of natural selection coupled with advances in genetic epidemiology to understand the relevant environmental contexts and risk factors shared across traits. The latter, in turn, will likely benefit from the partitioning of environmental and genetic variance enabled by advances in family-based study designs, which are also beginning to capitalize on fundamental theories from causal inference and counterfactual reasoning to aid interpretation151. Understanding the language and grammar of regulatory variation will require integration of population-scale quantitative genetics, which can quantify the effects of standing variation in vivo and in a disease context, with experimental molecular genetics, which can probe the effects of unobserved perturbations and validate novel predictions in vitro and in a non-disease or synthetic disease context. Both approaches can benefit from emerging tools in statistical genetics and machine learning for prediction, prioritization, and feature interpretation to more efficiently identify the most relevant disease genes and their broader disease network effects. Finally, all of these inquiries can benefit from diverse, multi-ancestry cohorts and advances in population genetics to understand the complex genetic genealogy of contemporary populations using modern day and ancient genomic data.
Table 1.
Outstanding challenges for human genetics research to tackle within the next 5–10 years, with a focus on population genetics, common complex traits, and basic research.
| Challenge | Goal | Ways forward |
|---|---|---|
| Completing genetic variation maps | Comprehensive characterization of all types of genomic variation across the global population | Long-read sequencing of tens of thousands of individuals across the global population |
| Mechanisms of human adaptation | Identification of genetic variants and genes underlying human adaptation | Data from diverse populations; statistical models; functional follow-up |
| Map of selective constraint | Annotation of genomic elements and molecular processes under selective constraint, a key metric of functional relevance | Massive genetic data sets |
| Gene-environment interactions and correlations | Quantifying and controlling for environmental heterogeneity in biobank datasets, identifying important environmental confounders | Harmonized metadata across biobanks with longitudinal followup, and geographic mapping; diverse family-based study designs |
| Causal GWAS genes in cis | A robust and reasonably accurate toolkit for in silico annotation of likely causal driver genes for any GWAS locus | Integration of different tools (molQTL also from single cells, enhancer maps, CRISPR) with gold-standard annotations |
| Regulatory code of the genome | Prediction of context-specific cis-regulatory effects of genetic variants | Functional genomics data and validation data sets combined with deep learning and AI methods |
| Cellular programs underlying complex disease | Identifying cellular processes that mediate GWAS associations and the cell states where they take place | Integration of human genetics with large-scale in vitro experiments and molecular cell biology |
| Organ and physiological processes underlying complex disease | Identifying changes in tissue and organ functions and other physiological phenotypes that mediate GWAS associations | Measurement or inference of these lower-level traits for GWAS; organoids and model organism research |
| pheWAS interpretation | Inference of interpretable, causal relationships between traits from pheWAS data | Advanced statistical models; comprehensive and interpretable phenotypes and metadata |
| Translatable and interpretable polygenic scores | Genetic predictors that incorporate common and rare variants and environmental risk factors and are translatable between different groups | Advanced statistical models; more diverse GWAS data sets; better fine-mapping |
Given the rapid progress during this millennium, human genetics is now well poised to provide a deeper understanding of human biology - and improve human health. Successful description of genetic variation, mapping of genetic associations, and identification of their functional effects provides the foundation for mechanistic understanding of these processes. This opens the door to successful prediction in diagnostic settings and identification of interventions to affect processes that contribute to disease. The ultimate goal is the integration of rare and common genetic risk factors with environmental risks, as well as pharmacogenetic advancements in tailoring treatment selection5. Equally important is to map the limits of genetics, and understand how the broad patterns of heritability rise via complex interrelated processes of genetics and diverse environmental factors throughout an individual’s life. Here, the interactions between human genetics and neighboring fields, such as epidemiology and molecular biology, are critical.
Beyond these challenges and ways forward, continued success and global justification of human genetics necessitate scrutiny of the field as a professional community, as well as its relationship with the surrounding society. Like other scientific domains, human genetics has a problematic history and ongoing issues linked to exploitation and exclusion of indigenous communities and minorities both within the professional community and as research participants. Confronting and addressing these issues is essential to pave the way for a more inclusive and responsible future152,153. Genetics research is increasingly incorporated into the social sciences, and effective communications across these disciplines is needed to ensure the limits of genetic inference are fully understood.
Yet, human genetics provides some of the most compelling empirical evidence of our collective origins and the intertwined biological makeup of all humans, as well as the complexity and nondeterministic nature of human traits and diseases. This narrative could be disseminated more extensively. Genetics already has a major role in public understanding of our personal family history and ancestry, and it is assuming an increasingly prominent role in healthcare. Thus, the imperative to foster a society that is well-versed in the appropriate use and limitations of genetic data. This requires moving away from the reductive and deterministic language often employed in public communication of genetics. Embracing a more inclusive, transparent, and ethically aware approach is not just a moral imperative but also crucial for the sustained progress and credibility of the field.
Acknowledgements
T.L. is funded by NIH grants R01AG057422, R01MH106842, U24HG012090, ERC grant 101043238, and Göran Gustafsson Foundation. Y.I.L. is funded by grants R01GM130738, R01HG011067, a GREGoR Consortium Grant, and the W.M. Keck Foundation. S.R. is funded by NIH grant R35 GM139628. AG is supported by R01HG006399, R01HG012133, R01MH125252, R01CA262577.
Footnotes
Declaration of interests
T.L. is an advisor to Variant Bio with equity in Variant Bio. T.L. is a member of the Advisory Board of Cell.
This Review discusses recent achievements, ongoing efforts and future challenges in human genetics, focusing on the processes underlying genetic variation and its functional effects, as well as genetic architecture of human traits.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921. 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- 2.Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. (2001). The sequence of the human genome. Science 291, 1304–1351. 10.1126/science.1058040. [DOI] [PubMed] [Google Scholar]
- 3.100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, Cipriani V, Ellingford JM, Arno G, Tucci A, et al. (2021). 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N. Engl. J. Med. 385, 1868–1880. 10.1056/NEJMoa2035790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wright CF, Campbell P, Eberhardt RY, Aitken S, Perrett D, Brent S, Danecek P, Gardner EJ, Chundru VK, Lindsay SJ, et al. (2023). Genomic Diagnosis of Rare Pediatric Disease in the United Kingdom and Ireland. N. Engl. J. Med. 388, 1559–1571. 10.1056/NEJMoa2209046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Linder JE, Allworth A, Bland HT, Caraballo PJ, Chisholm RL, Clayton EW, Crosslin DR, Dikilitas O, DiVietro A, Esplin ED, et al. (2023). Returning integrated genomic risk and clinical recommendations: The eMERGE study. Genet. Med. Off. J. Am. Coll. Med. Genet. 25, 100006. 10.1016/j.gim.2023.100006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Trajanoska K, Bhérer C, Taliun D, Zhou S, Richards JB, and Mooser V (2023). From target discovery to clinical drug development with human genetics. Nature 620, 737–745. 10.1038/s41586-023-06388-8. [DOI] [PubMed] [Google Scholar]
- 7.Minikel EV, Painter JL, Dong CC, and Nelson MR (2023). Refining the impact of genetic evidence on clinical success (Pharmacology and Therapeutics) 10.1101/2023.06.23.23291765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, Floratos A, Sham PC, Li MJ, Wang J, et al. (2015). The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860. 10.1038/ng.3314. [DOI] [PubMed] [Google Scholar]
- 9.Lewontin RC (1972). The Apportionment of Human Diversity. In Evolutionary Biology, Dobzhansky T, Hecht MK, and Steere WC, eds. (Springer; US: ), pp. 381–398. 10.1007/978-1-4684-9063-3_14. [DOI] [Google Scholar]
- 10.Lewis ACF, Molina SJ, Appelbaum PS, Dauda B, Di Rienzo A, Fuentes A, Fullerton SM, Garrison NA, Ghosh N, Hammonds EM, et al. (2022). Getting genetic ancestry right for science and society. Science 376, 250–252. 10.1126/science.abm7530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Coop G (2022). Genetic similarity versus genetic ancestry groups as sample descriptors in human genetics. 10.48550/ARXIV.2207.11595. [DOI] [Google Scholar]
- 12.Committee on the Use of Race, Ethnicity, and Ancestry as Population Descriptors in Genomics Research, Board on Health Sciences Policy, Committee on Population, Health and Medicine Division, Division of Behavioral and Social Sciences and Education, and National Academies of Sciences, Engineering, and Medicine (2023). Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field (National Academies Press; ) 10.17226/26902. [DOI] [PubMed] [Google Scholar]
- 13.Green ED, Gunter C, Biesecker LG, Di Francesco V, Easter CL, Feingold EA, Felsenfeld AL, Kaufman DJ, Ostrander EA, Pavan WJ, et al. (2020). Strategic vision for improving human health at The Forefront of Genomics. Nature 586, 683–692. 10.1038/s41586-020-2817-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hubby JL, and Lewontin RC (1966). A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. Genetics 54, 577–594. 10.1093/genetics/54.2.577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J, et al. (2020). Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012. 10.1126/science.aay5012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.International HapMap Consortium, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, et al. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861. 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Greely HT (2001). Human genome diversity: what about the other human genome project? Nat. Rev. Genet. 2, 222–227. 10.1038/35056071. [DOI] [PubMed] [Google Scholar]
- 18.1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, et al. (2022). High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19. 10.1016/j.cell.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, et al. (2018). The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209. 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, Reeve MP, Laivuori H, Aavikko M, Kaunisto MA, et al. (2023). FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518. 10.1038/s41586-022-05473-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Henn BM, Gignoux CR, Jobin M, Granka JM, Macpherson JM, Kidd JM, Rodríguez-Botigué L, Ramachandran S, Hon L, Brisbin A, et al. (2011). Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl. Acad. Sci. 108, 5154–5162. 10.1073/pnas.1017511108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sengupta D, Choudhury A, Fortes-Lima C, Aron S, Whitelaw G, Bostoen K, Gunnink H, Chousou-Polydouri N, Delius P, Tollman S, et al. (2021). Genetic substructure and complex demographic history of South African Bantu speakers. Nat. Commun. 12, 2080. 10.1038/s41467-021-22207-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, and Daly MJ (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591. 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, Ranciaro A, Hirbo J, Beggs W, Thomas N, et al. (2023). Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell 186, 923–939.e14. 10.1016/j.cell.2023.01.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Atkinson EG, Dalvie S, Pichkar Y, Kalungi A, Majara L, Stevenson A, Abebe T, Akena D, Alemayehu M, Ashaba FK, et al. (2022). Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa. Am. J. Hum. Genet. 109, 1667–1679. 10.1016/j.ajhg.2022.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ben-Eghan C, Sun R, Hleap JS, Diaz-Papkovich A, Munter HM, Grant AV, Dupras C, and Gravel S (2020). Don’t ignore genetic data from minority populations. Nature 585, 184–186. 10.1038/d41586-020-02547-3. [DOI] [PubMed] [Google Scholar]
- 28.Smith SP, Shahamatdar S, Cheng W, Zhang S, Paik J, Graff M, Haiman C, Matise TC, North KE, Peters U, et al. (2022). Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am. J. Hum. Genet. 109, 871–884. 10.1016/j.ajhg.2022.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.aDNAauthor Ancient DNA review.
- 30.Ragsdale AP, Weaver TD, Atkinson EG, Hoal EG, Möller M, Henn BM, and Gravel S (2023). A weakly structured stem for human origins in Africa. Nature 617, 755–763. 10.1038/s41586-023-06055-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, et al. (2008). Genes mirror geography within Europe. Nature 456, 98–101. 10.1038/nature07331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang C, Zöllner S, and Rosenberg NA (2012). A quantitative comparison of the similarity between genes and geography in worldwide human populations. PLoS Genet. 8, e1002886. 10.1371/journal.pgen.1002886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Moreno-Estrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, Gignoux CR, Ortiz-Tello PA, Martínez RJ, Hedges DJ, Morris RW, et al. (2013). Reconstructing the population genetic history of the Caribbean. PLoS Genet. 9, e1003925. 10.1371/journal.pgen.1003925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Goldberg A, Rastogi A, and Rosenberg NA (2020). Assortative mating by population of origin in a mechanistic model of admixture. Theor. Popul. Biol. 134, 129–146. 10.1016/j.tpb.2020.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Baharian S, Barakatt M, Gignoux CR, Shringarpure S, Errington J, Blot WJ, Bustamante CD, Kenny EE, Williams SM, Aldrich MC, et al. (2016). The Great Migration and African-American Genomic Diversity. PLoS Genet. 12, e1006059. 10.1371/journal.pgen.1006059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Biddanda A, Rice DP, and Novembre J (2020). A variant-centric perspective on geographic patterns of human allele frequency variation. eLife 9, e60107. 10.7554/eLife.60107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Koenig Z, Yohannes MT, Nkambule LL, Goodrich JK, Kim HA, Zhao X, Wilson MW, Tiao G, Hao SP, Sahakian N, et al. (2023). A harmonized public resource of deeply sequenced diverse human genomes. BioRxiv Prepr. Serv. Biol, 2023.01.23.525248. 10.1101/2023.01.23.525248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Duncan LE, Ostacher M, and Ballon J (2019). How genome-wide association studies (GWAS) made traditional candidate gene studies obsolete. Neuropsychopharmacol. Off. Publ. Am. Coll. Neuropsychopharmacol. 44, 1518–1523. 10.1038/s41386-019-0389-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, and Nadeau JH (2010). Missing heritability and strategies for finding the underlying causes of complex disease. 11, 446–450. 10.1038/nrg2809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, and Yang J (2017). 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22. 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.McClellan J, and King M-C (2010). Genetic heterogeneity in human disease. Cell 141, 210–217. 10.1016/j.cell.2010.03.032. [DOI] [PubMed] [Google Scholar]
- 42.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569. 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Young AI, Benonisdottir S, Przeworski M, and Kong A (2019). Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400. 10.1126/science.aax3710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Howe LJ, Nivard MG, Morris TT, Hansen AF, Rasheed H, Cho Y, Chittoor G, Ahlskog R, Lind PA, Palviainen T, et al. (2022). Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nat. Genet. 54, 581–592. 10.1038/s41588-022-01062-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Van Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK, Gonzaga-Jauregui C, Khalid S, Ye B, Banerjee N, et al. (2020). Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756. 10.1038/s41586-020-2853-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, Benner C, Liu D, Locke AE, Balasubramanian S, et al. (2021). Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634. 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Purcell S (2002). Variance components models for gene-environment interaction in twin analysis. Twin Res. Off. J. Int. Soc. Twin Stud. 5, 554–571. 10.1375/136905202762342026. [DOI] [PubMed] [Google Scholar]
- 48.Tenesa A, and Haley CS (2013). The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 14, 139–149. 10.1038/nrg3377. [DOI] [PubMed] [Google Scholar]
- 49.Dudbridge F (2013). Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348. 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson N, Daly MJ, Price AL, and Neale BM (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295. 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ge T, Chen C-Y, Neale BM, Sabuncu MR, and Smoller JW (2017). Phenome-wide heritability analysis of the UK Biobank. PLoS Genet. 13, e1006711. 10.1371/journal.pgen.1006711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kanai M, Akiyama M, Takahashi A, Matoba N, Momozawa Y, Ikeda M, Iwata N, Ikegawa S, Hirata M, Matsuda K, et al. (2018). Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400. 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]
- 53.Zhang Y, Qi G, Park J-H, and Chatterjee N (2018). Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326. 10.1038/s41588-018-0193-x. [DOI] [PubMed] [Google Scholar]
- 54.O’Connor LJ, Schoech AP, Hormozdiari F, Gazal S, Patterson N, and Price AL (2019). Extreme Polygenicity of Complex Traits Is Explained by Negative Selection. Am. J. Hum. Genet. 105, 456–476. 10.1016/j.ajhg.2019.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Weissbrod O, Hormozdiari F, Benner C, Cui R, Ulirsch J, Gazal S, Schoech AP, van de Geijn B, Reshef Y, Márquez-Luna C, et al. (2020). Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363. 10.1038/s41588-020-00735-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Simons YB, Bullaughey K, Hudson RR, and Sella G (2018). A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985. 10.1371/journal.pbio.2002985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Schoech AP, Jordan DM, Loh P-R, Gazal S, O’Connor LJ, Balick DJ, Palamara PF, Finucane HK, Sunyaev SR, and Price AL (2019). Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790. 10.1038/s41467-019-08424-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Yengo L, Vedantam S, Marouli E, Sidorenko J, Bartell E, Sakaue S, Graff M, Eliasen AU, Jiang Y, Raghavan S, et al. (2022). A saturated map of common genetic variants associated with human height. Nature 610, 704–712. 10.1038/s41586-022-05275-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Boyle EA, Li YI, and Pritchard JK (2017). An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186. 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wray NR, Wijmenga C, Sullivan PF, Yang J, and Visscher PM (2018). Common Disease Is More Complex Than Implied by the Core Gene Omnigenic Model. Cell 173, 1573–1580. 10.1016/j.cell.2018.05.051. [DOI] [PubMed] [Google Scholar]
- 61.Liu X, Li YI, and Pritchard JK (2019). Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell 177, 1022–1034.e6. 10.1016/j.cell.2019.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Gazal S, Loh P-R, Finucane HK, Ganna A, Schoech A, Sunyaev S, and Price AL (2018). Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607. 10.1038/s41588-018-0231-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Wainschtein P, Jain D, Zheng Z, TOPMed Anthropometry Working Group, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Cupples LA, Shadyab AH, McKnight B, Shoemaker BM, Mitchell BD, et al. (2022). Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273. 10.1038/s41588-021-00997-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Weiner DJ, Nadig A, Jagadeesh KA, Dey KK, Neale BM, Robinson EB, Karczewski KJ, and O’Connor LJ (2023). Polygenic architecture of rare coding variation across 394,783 exomes. Nature 614, 492–499. 10.1038/s41586-022-05684-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Rajagopal VM, Watanabe K, Mbatchou J, Ayer A, Quon P, Sharma D, Kessler MD, Praveen K, Gelfman S, Parikshak N, et al. (2023). Rare coding variants in CHRNB2 reduce the likelihood of smoking. Nat. Genet. 55, 1138–1148. 10.1038/s41588-023-01417-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, and Kenny EE (2017). Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am. J. Hum. Genet. 100, 635–649. 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Carlson CS, Matise TC, North KE, Haiman CA, Fesinmeyer MD, Buyske S, Schumacher FR, Peters U, Franceschini N, Ritchie MD, et al. (2013). Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11, e1001661. 10.1371/journal.pbio.1001661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, Boulier K, Privé F, Vilhjálmsson BJ, Olde Loohuis LM, and Pasaniuc B (2023). Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 618, 774–781. 10.1038/s41586-023-06079-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Hou K, Ding Y, Xu Z, Wu Y, Bhattacharya A, Mester R, Belbin GM, Buyske S, Conti DV, Darst BF, et al. (2023). Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55, 549–558. 10.1038/s41588-023-01338-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Wang Y, Guo J, Ni G, Yang J, Visscher PM, and Yengo L (2020). Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865. 10.1038/s41467-020-17719-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Patel RA, Musharoff SA, Spence JP, Pimentel H, Tcheandjieu C, Mostafavi H, Sinnott-Armstrong N, Clarke SL, Smith CJ, Million Veteran Program VA, et al. (2022). Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits. Am. J. Hum. Genet. 109, 1286–1297. 10.1016/j.ajhg.2022.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Brown BC, Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Ye CJ, Price AL, and Zaitlen N (2016). Transethnic Genetic-Correlation Estimates from Summary Statistics. Am. J. Hum. Genet. 99, 76–88. 10.1016/j.ajhg.2016.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL, et al. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518. 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Johnson R, Ding Y, Venkateswaran V, Bhattacharya A, Boulier K, Chiu A, Knyazev S, Schwarz T, Freund M, Zhan L, et al. (2022). Leveraging genomic diversity for discovery in an electronic health record linked biobank: the UCLA ATLAS Community Health Initiative. Genome Med. 14, 104. 10.1186/s13073-022-01106-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Amariuta T, Ishigaki K, Sugishita H, Ohta T, Koido M, Dey KK, Matsuda K, Murakami Y, Price AL, Kawakami E, et al. (2020). Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 52, 1346–1354. 10.1038/s41588-020-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, Okada Y, Biobank Japan Project, Martin AR, Finucane HK, et al. (2022). Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 54, 450–458. 10.1038/s41588-022-01036-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chen F, Wang X, Jang S-K, Quach BC, Weissenkampen JD, Khunsriraksakul C, Yang L, Sauteraud R, Albert CM, Allred NDD, et al. (2023). Multi-ancestry transcriptome-wide association analyses yield insights into tobacco use biology and drug repurposing. Nat. Genet. 55, 291–300. 10.1038/s41588-022-01282-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Lu Z, Gopalan S, Yuan D, Conti DV, Pasaniuc B, Gusev A, and Mancuso N (2022). Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies. Am. J. Hum. Genet. 109, 1388–1404. 10.1016/j.ajhg.2022.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Bitarello BD, and Mathieson I (2020). Polygenic Scores for Height in Admixed Populations. G3 Bethesda Md 10, 4027–4036. 10.1534/g3.120.401658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Marnetto D, Pärna K, Läll K, Molinaro L, Montinaro F, Haller T, Metspalu M, Mägi R, Fischer K, and Pagani L (2020). Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun. 11, 1628. 10.1038/s41467-020-15464-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, and Przeworski M (2020). Variable prediction accuracy of polygenic scores within an ancestry group. eLife 9, e48376. 10.7554/eLife.48376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Border R, Athanasiadis G, Buil A, Schork AJ, Cai N, Young AI, Werge T, Flint J, Kendler KS, Sankararaman S, et al. (2022). Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science 378, 754–761. 10.1126/science.abo2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Diaz-Papkovich A, Zabad S, Ben-Eghan C, Anderson-Trocmé L, Femerling G, Nathan V, Patel J, and Gravel S (2023). Topological stratification of continuous genetic variation in large biobanks (Genomics) 10.1101/2023.07.06.548007. [DOI] [Google Scholar]
- 84.Gorla A, Sankararaman S, Burchard E, Flint J, Zaitlen N, and Rahmani E (2023). Phenotypic subtyping via contrastive learning (Genomics) 10.1101/2023.01.05.522921. [DOI] [Google Scholar]
- 85.Kong A, Thorleifsson G, Frigge ML, Vilhjalmsson BJ, Young AI, Thorgeirsson TE, Benonisdottir S, Oddsson A, Halldorsson BV, Masson G, et al. (2018). The nature of nurture: Effects of parental genotypes. Science 359, 424–428. 10.1126/science.aan6877. [DOI] [PubMed] [Google Scholar]
- 86.Veller C, and Coop G (2023). Interpreting population and family-based genome-wide association studies in the presence of confounding. BioRxiv Prepr. Serv. Biol, 2023.02.26.530052. 10.1101/2023.02.26.530052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Simons YB, Mostafavi H, Smith CJ, Pritchard JK, and Sella G (2022). Simple scaling laws control the genetic architectures of human complex traits (Genetics) 10.1101/2022.10.04.509926. [DOI] [Google Scholar]
- 88.Yair S, and Coop G (2022). Population differentiation of polygenic score predictions under stabilizing selection. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 377, 20200416. 10.1098/rstb.2020.0416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Durvasula A, and Lohmueller KE (2021). Negative selection on complex traits limits phenotype prediction accuracy between populations. Am. J. Hum. Genet. 108, 620–631. 10.1016/j.ajhg.2021.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, Boyle EA, Zhang X, Racimo F, Pritchard JK, et al. (2019). Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725. 10.7554/eLife.39725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CW, Hirschhorn J, Daly MJ, Patterson N, et al. (2019). Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702. 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, and Palamara PF (2023). Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768–776. 10.1038/s41588-023-01379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Hujoel MLA, Sherman MA, Barton AR, Mukamel RE, Sankaran VG, Terao C, and Loh P-R (2022). Influences of rare copy-number variation on human complex traits. Cell 185, 4233–4248.e27. 10.1016/j.cell.2022.09.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Popic V, Rohlicek C, Cunial F, Hajirasouliha I, Meleshko D, Garimella K, and Maheshwari A (2023). Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568. 10.1038/s41592-023-01799-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, et al. (2022). The complete sequence of a human genome. Science 376, 44–53. 10.1126/science.abj6987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Heyne HO, Karjalainen J, Karczewski KJ, Lemmelä SM, Zhou W, FinnGen, Havulinna AS, Kurki M, Rehm HL, Palotie A, et al. (2023). Mono- and biallelic variant effects on disease at biobank scale. Nature 613, 519–525. 10.1038/s41586-022-05420-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Albert FW, Bloom JS, Siegel J, Day L, and Kruglyak L (2018). Genetics of trans-regulatory variation in gene expression. eLife 7, e35471. 10.7554/eLife.35471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Hemani G, Shakhbazov K, Westra H-J, Esko T, Henders AK, McRae AF, Yang J, Gibson G, Martin NG, Metspalu A, et al. (2021). Retraction Note: Detection and replication of epistasis influencing transcription in humans. Nature 596, 306. 10.1038/s41586-021-03766-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Smith SP, Darnell G, Udwin D, Harpak A, Ramachandran S, and Crawford L (2022). Accounting for statistical non-additive interactions enables the recovery of missing heritability from GWAS summary statistics (Genomics) 10.1101/2022.07.21.501001. [DOI] [Google Scholar]
- 100.Norman TM, Horlbeck MA, Replogle JM, Ge AY, Xu A, Jost M, Gilbert LA, and Weissman JS (2019). Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793. 10.1126/science.aax4438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.van der Wijst M, de Vries DH, Groot HE, Trynka G, Hon CC, Bonder MJ, Stegle O, Nawijn MC, Idaghdour Y, van der Harst P, et al. (2020). The single-cell eQTLGen consortium. eLife 9, e52155. 10.7554/eLife.52155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, et al. (2021). Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310. 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Aguet F, Alasoo K, Li YI, Battle A, Im HK, Montgomery SB, and Lappalainen T (2023). Molecular quantitative trait loci. Nat. Rev. Methods Primer 3, 4. 10.1038/s43586-022-00188-6. [DOI] [Google Scholar]
- 104.Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, et al. (2019). Predicting Splicing from Primary Sequence with Deep Learning. Cell 176, 535–548.e24. 10.1016/j.cell.2018.12.015. [DOI] [PubMed] [Google Scholar]
- 105.Avsec Ž, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, Assael Y, Jumper J, Kohli P, and Kelley DR (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203. 10.1038/s41592-021-01252-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, Gal Y, and Marks DS (2021). Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95. 10.1038/s41586-021-04043-8. [DOI] [PubMed] [Google Scholar]
- 107.Zeng T, and Li YI (2022). Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol. 23, 103. 10.1186/s13059-022-02664-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Sasse A, Ng B, Spiro A, Tasaki S, Bennett D, Gaiteri C, De Jager PL, Chikina M, and Mostafavi S (2023). Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings (Bioinformatics) 10.1101/2023.03.16.532969. [DOI] [PubMed] [Google Scholar]
- 109.Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, and Cox NJ (2010). Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet 6, e1000888. 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, et al. (2012). Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195. 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Iotchkova V, Ritchie GRS, Geihs M, Morganella S, Min JL, Walter K, Timpson NJ, UK10K Consortium, Dunham I, Birney E, et al. (2019). GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat. Genet. 51, 343–353. 10.1038/s41588-018-0322-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, Anttila V, Xu H, Zang C, Farh K, et al. (2015). Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235. 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Farh KK, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, Shoresh N, Whitton H, Ryan RJ, Shishkin AA, et al. (2015). Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343. 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Gusev A, Lee SH, Trynka G, Finucane H, Vilhjálmsson BJ, Xu H, Zang C, Ripke S, Bulik-Sullivan B, Stahl E, et al. (2014). Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552. 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Banovich NE, Lan X, McVicker G, van de Geijn B, Degner JF, Blischak JD, Roux J, Pritchard JK, and Gilad Y (2014). Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 10, e1004663. 10.1371/journal.pgen.1004663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martin D, Watt S, Yan Y, Kundu K, Ecker S, et al. (2016). Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell 167, 1398–1414.e24. 10.1016/j.cell.2016.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Kumasaka N, Knights AJ, and Gaffney DJ (2015). Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat Genet. 10.1038/ng.3467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Hormozdiari F, Gazal S, van de Geijn B, Finucane HK, Ju CJ-T, Loh P-R, Schoech A, Reshef Y, Liu X, O’Connor L, et al. (2018). Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047. 10.1038/s41588-018-0148-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Yao DW, O’Connor LJ, Price AL, and Gusev A (2020). Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633. 10.1038/s41588-020-0625-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Lappalainen T, Sammeth M, Friedlander MR, t Hoen PA, Monlong J, Rivas MA, Gonzalez-Porta M, Kurbatova N, Griebel T, Ferreira PG, et al. (2013). Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511. 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121.Li YI, van de Geijn B, Raj A, Knowles DA, Petti AA, Golan D, Gilad Y, and Pritchard JK (2016). RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604. 10.1126/science.aad9417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122.Alasoo K, Rodrigues J, Danesh J, Freitag DF, Paul DS, and Gaffney DJ (2019). Genetic effects on promoter usage are highly context-specific and contribute to complex traits. eLife 8, e41673. 10.7554/eLife.41673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Li Q, Gloudemans MJ, Geisinger JM, Fan B, Aguet F, Sun T, Ramaswami G, Li YI, Ma J-B, Pritchard JK, et al. (2022). RNA editing underlies genetic risk of common inflammatory diseases. Nature 608, 569–577. 10.1038/s41586-022-05052-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124.Qi T, Wu Y, Fang H, Zhang F, Liu S, Zeng J, and Yang J (2022). Genetic control of RNA splicing and its distinct role in complex trait variation. Nat. Genet. 54, 1355–1363. 10.1038/s41588-022-01154-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125.Finucane HK, Reshef YA, Anttila V, Slowikowski K, Gusev A, Byrnes A, Gazal S, Loh P-R, Lareau C, Shoresh N, et al. (2018). Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629. 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126.Albert FW, and Kruglyak L (2015). The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212. 10.1038/nrg3891. [DOI] [PubMed] [Google Scholar]
- 127.GTEx Consortium, Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group, Statistical Methods groups—Analysis Working Group, Enhancing GTEx (eGTEx) groups, NIH Common Fund, NIH/NCI, NIH/NHGRI, NIH/NIMH, NIH/NIDA, Biospecimen Collection Source Site—NDRI, et al. (2017). Genetic effects on gene expression across human tissues. Nature 550, 204–213. 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, and Plagnol V (2014). Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10, e1004383. 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.Wallace C (2021). A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17, e1009440. 10.1371/journal.pgen.1009440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130.Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Consortium, Gte., Nicolae DL, et al. (2015). A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–1098. 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, et al. (2016). Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252. 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, Torstenson ES, Shah KP, Garcia T, Edwards TL, et al. (2018). Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825. 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133.Hukku A, Sampson MG, Luca F, Pique-Regi R, and Wen X (2022). Analyzing and reconciling colocalization and transcriptome-wide association studies from the perspective of inferential reproducibility. Am. J. Hum. Genet. 109, 825–837. 10.1016/j.ajhg.2022.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, Ermel R, Ruusalepp A, Quertermous T, Hao K, et al. (2019). Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599. 10.1038/s41588-019-0385-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Umans BD, Battle A, and Gilad Y (2021). Where Are the Disease-Associated eQTLs? Trends Genet. TIG 37, 109–124. 10.1016/j.tig.2020.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136.Chun S, Casparino A, Patsopoulos NA, Croteau-Chonka DC, Raby BA, De Jager PL, Sunyaev SR, and Cotsapas C (2017). Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605. 10.1038/ng.3795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137.Mu Z, Wei W, Fair B, Miao J, Zhu P, and Li YI (2021). The impact of cell type and context-dependent regulatory variants on human immune traits. Genome Biol. 22, 122. 10.1186/s13059-021-02334-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138.Nasser J, Bergman DT, Fulco CP, Guckelberger P, Doughty BR, Patwardhan TA, Jones TR, Nguyen TH, Ulirsch JC, Lekschas F, et al. (2021). Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243. 10.1038/s41586-021-03446-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Baca SC, Singler C, Zacharia S, Seo J-H, Morova T, Hach F, Ding Y, Schwarz T, Huang C-CF, Anderson J, et al. (2022). Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Nat. Genet. 54, 1364–1375. 10.1038/s41588-022-01168-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Aracena KA, Lin Y-L, Luo K, Pacis A, Gona S, Mu Z, Yotova V, Sindeaux R, Pramatarova A, Simon M-M, et al. (2022). Epigenetic variation impacts ancestry-associated differences in the transcriptional response to influenza infection (Genetics) 10.1101/2022.05.10.491413. [DOI] [PubMed] [Google Scholar]
- 141.Gusev A, Mancuso N, Won H, Kousi M, Finucane HK, Reshef Y, Song L, Safi A, Schizophrenia Working Group of the Psychiatric Genomics Consortium, McCarroll S, et al. (2018). Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat. Genet. 50, 538–548. 10.1038/s41588-018-0092-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142.Banovich NE, Li YI, Raj A, Ward MC, Greenside P, Calderon D, Tung PY, Burnett JE, Myrthil M, Thomas SM, et al. (2018). Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131. 10.1101/gr.224436.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143.Alasoo K, Rodrigues J, Mukhopadhyay S, Knights AJ, Mann AL, Kundu K, HIPSCI Consortium, Hale C, Dougan G, and Gaffney DJ (2018). Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431. 10.1038/s41588-018-0046-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144.Wang X, and Goldstein DB (2020). Enhancer Domains Predict Gene Pathogenicity and Inform Gene Discovery in Complex Disease. Am. J. Hum. Genet. 106, 215–233. 10.1016/j.ajhg.2020.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145.Mostafavi H, Spence JP, Naqvi S, and Pritchard JK (2022). Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery (Genomics) 10.1101/2022.05.07.491045. [DOI] [Google Scholar]
- 146.Strober BJ, Elorbany R, Rhodes K, Krishnan N, Tayeb K, Battle A, and Gilad Y (2019). Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290. 10.1126/science.aaw0040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Mountjoy E, Schmidt EM, Carmona M, Schwartzentruber J, Peat G, Miranda A, Fumis L, Hayhurst J, Buniello A, Karim MA, et al. (2021). An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533. 10.1038/s41588-021-00945-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148.Hsu Y-HH, Pintacuda G, Liu R, Nacu E, Kim A, Tsafou K, Petrossian N, Crotty W, Suh JM, Riseman J, et al. (2023). Using brain cell-type-specific protein interactomes to interpret neurodevelopmental genetic signals in schizophrenia. iScience 26, 106701. 10.1016/j.isci.2023.106701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Morris JA, Caragine C, Daniloski Z, Domingo J, Barry T, Lu L, Davis K, Ziosi M, Glinos DA, Hao S, et al. (2023). Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science, eadh7699. 10.1126/science.adh7699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Mitchell JM, Nemesh J, Ghosh S, Handsaker RE, Mello CJ, Meyer D, Raghunathan K, De Rivera H, Tegtmeyer M, Hawes D, et al. (2020). Mapping genetic effects on cellular phenotypes with “cell villages” (Genetics) 10.1101/2020.06.29.174383. [DOI] [Google Scholar]
- 151.Veller C, Przeworski M, and Coop G (2023). Causal interpretations of family GWAS in the presence of heterogeneous effects (Genetics) 10.1101/2023.11.13.566950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.American Society of Human Genetics (2023). Facing our History—Building an Equitable Future. [DOI] [PMC free article] [PubMed]
- 153.Carlson J, Henn BM, Al-Hindi DR, and Ramachandran S (2022). Counter the weaponization of genetics research by extremists. Nature 610, 444–447. 10.1038/d41586-022-03252-z. [DOI] [PubMed] [Google Scholar]

