Summary
Genome-wide association studies (GWASs) are often performed on ratios composed of a numerator trait divided by a denominator trait. Examples include body mass index (BMI) and the waist-to-hip ratio, among many others. Explicitly or implicitly, the goal of forming the ratio is typically to adjust for an association between the numerator and denominator. While forming ratios may be clinically expedient, there are several important issues with performing GWAS on ratios. Forming a ratio does not “adjust” for the denominator in the sense of conditioning on it, and it is unclear whether associations with ratios are attributable to the numerator, the denominator, or both. Here we demonstrate that associations arising in ratio GWAS can be entirely denominator driven, implying that at least some associations uncovered by ratio GWAS may be due solely to a putative adjustment variable. In a survey of 10 common ratio traits, we find that the ratio model disagrees with the adjusted model (performing GWAS on the numerator while conditioning on the denominator) at around 1/3 of loci. Using BMI as an example, we show that variants detected by only the ratio model are more strongly associated with the denominator (height), while variants detected by only the adjusted model are more strongly associated with the numerator (weight). Although the adjusted model provides effect sizes with a clearer interpretation, it is susceptible to collider bias. We propose and validate a simple method of correcting for the genetic component of collider bias via leave-one-chromosome-out polygenic scoring.
Keywords: genome-wide association study, genetic discovery, polygenic scores, proportion, ratio, quotient, body mass index, collider bias, heritable covariate bias, covariate adjustment
Any variant associated with either numerator or denominator will associate with the ratio in GWAS of sufficient size. Associations with ratio traits are thus not specific to the ratio. Instead, we recommend using a covariate-adjusted or multivariate model, both of which provide clearer interpretations.
Introduction
Ratio traits, those composed of a numerator trait divided by a denominator trait, are widely used in clinical practice, where they provide convenient scalar indicators of health and disease.1 Examples include body mass index (BMI) in obesity (MIM: 601665)2; forced expiratory volume (FEV1) to forced vital capacity (FVC) in chronic obstructive pulmonary disease (COPD) (MIM: 606963)3 and asthma (MIM: 600807)4; phenylalanine to tyrosine concentrations in phenylalanine hydroxylase deficiency (MIM: 261600)5; left ventricular ejection fraction (LVEF) in heart failure6 and dilated cardiomyopathy (MIM: 115200)7; triglycerides (TG) to high-density lipoprotein cholesterol (HDL-C) for insulin resistance (MIM: 125853)8; serum aspartate (AST) to alanine aminotransferase (ALT) in hepatic cirrhosis9; and vertical cup-to-disc ratio (VCDR) in glaucoma (MIM: 137750).10 As biomarkers of health, ratio traits have been used as targets for genome-wide association studies (GWASs) since the first applications of the method (e.g., BMI,11 serum metabolite ratios,12 waist-to-hip ratio [WHR]13), a practice that continues to the present (e.g., adipose tissue volumes,14 TG/HDL-C ratio,15,16 AST/ALT ratio,17 blood cell proportions,18 BMI,19 LVEF,20 metabolite ratios,21,22 protein ratios,23 seated to standing height,24,25 skeletal proportions,26 urinary albumin-creatinine ratio,27 VCDR,28,29 WHR30). While simple ratios may serve a critical role in patient care decisions, this does not imply that ratios are the ideal, or even an appropriate, vehicle for understanding complex trait genetics.
The use of ratio variables in regression models has several well-known statistical issues.31 Explicitly or implicitly, the goal of forming a ratio is generally to control for an association between the numerator and denominator. For example, it is because weight increases with height that weight alone provides a poor proxy for adiposity, and an index that quantifies weight relative to height, such as BMI, is preferred in the clinical context.32,33 On the other hand, GWAS is performed by means of multiple regression, where the association between a genetic variant and the phenotype is sought, often adjusting for multiple factors that may affect the phenotype. When a ratio, such as BMI, is selected as the phenotype, genetic associations become difficult to interpret because the association may be due to the numerator (weight), the denominator (height), or both. When two variables are combined to form a ratio, causal effects on the individual components cannot be distinguished.34 Moreover, multiple regression provides a preferred mechanism of adjusting for the association between height and weight: by conditioning on height as a covariate.31 We refer to this approach as the adjusted model. The adjusted model estimates the expected effect of a covariate on weight while holding height constant. This is the same concept that motives BMI, namely the idea that greater weight at given height suggests adiposity. The drawback to the adjusted model is the potential for conditioning on a heritable covariate to introduce collider bias.35,36
Here we demonstrate several pitfalls that can arise when performing GWAS of ratio traits. As a running example, we consider BMI, the ratio of weight in kilograms to in (Figure S1). Through analytical derivations, simulations, and analyses of real data in which the numerator (i.e., weight) is permuted, we demonstrate that associations with ratio traits can be entirely denominator driven, and that the probability of detecting such associations increases with denominator heritability. Thus, at least some associations with BMI are likely attributable to height (also see Mooldijk et al.34). In a survey of 10 common ratio traits, we find that the ratio and adjusted models disagree at roughly 1/3 of loci, and that associations detected by only the ratio model are enriched for denominator-driven signals. Variants identified by the ratio model only tend to be more strongly associated with height (the denominator), while variants identified by the adjusted model only tend to be more strongly associated with weight (the numerator). We scrutinize the practice of correlating effect sizes from two different models (e.g., ratio and adjusted) fit to the same dataset, and show that doing so tends to overstate the agreement of those models. Finally, we address the issue of heritable covariate bias and propose a method of correcting for the genetic component of collider bias.
Subjects and methods
Statement on ethics
This research was conducted using data from consenting participants in the UK Biobank Resource, under approved application no. 51766.
GWAS Catalog analysis
We downloaded all studies in the NHGRI-EBI GWAS Catalog37 (accessed April 7, 2023) and searched for the following terms in the “MAPPED_TRAIT” field of the studies table: concentration, fraction, index, percent, percentile, proportion, rate, ratio, percentage; BMI, chronic kidney disease, COPD, and WHR. This returned a collection of 667 traits, which we manually reviewed for evidence of GWAS having been performed on a ratio outcome with a heritable denominator, examining the title, abstract, and study as necessary. The list of 362 ratio traits is provided as supplemental data.
GWAS
GWAS among unrelated White-British subjects from the UK Biobank38,39 were performed using PLINK (v.1.9).40 We filtered to imputed genotypes with minor allele frequency , INFO score , and Hardy-Weinberg equilibrium . Samples were filtered to those used in the genetic PCA calculation39; self-reported “White-British,” “White,” or “Irish” ancestry; and without sex chromosome aneuploidy. For GWAS, samples with phenotype values in the bottom 0.01 or top 99.99 percentiles were removed. To facilitate direct comparison of effect sizes, all models included a standard set of covariates, namely age at recruitment (UKB field: 21022), genetic sex (22001), genotyping array (Axiom versus BiLEVE; 22000), and the top 20 genetic principal components (22009). Genome-wide significance was declared at the standard threshold of .41
Clumping
Loci independent at and genome-wide significant (GWS) at loci were identified by applying PLINK’s clumping functionality to genome-wide summary statistics with a window size of 250 kb. Any reference to independent, GWS loci, or clumping refers to clumping with these parameters, unless otherwise noted.
Effect size correlations
To obtain unbiased estimates of effect size correlation, the full UKB cohort (K) was randomly split into two independent subsets of size K, labeled A and B. Subset A was arbitrarily designated the discovery cohort and subset B the validation cohort. GWAS was performed in both cohorts, and correlations were estimated across cohorts, restricting to variants that were independent and GWS in the discovery cohort. As sensitivity analyses, we explored taking B as the discovery cohort and A as the validation cohort, as well as generating additional random splits. Results from these analyses were qualitatively similar to the results presented.
Variant group definition
The full set of loci detected by the ratio and adjusted models was partitioned into three sets (ratio-only, adjusted-only, or both) based on whether or not the lead variant from each ratio locus was within 250 kb of the lead variant for an adjusted locus, and vice versa. Variants having at least a suggestive association with height () were excluded from downstream analysis. Summary statistics from GWAS on obesity-related traits, excluding the UKB cohort where possible, were downloaded from GWAS Catalog, dbGaP, or directly from consortia websites. The summary statistics were lifted over to GRCh38 where needed, and independent, GWS loci were identified by clumping .
Leave-one-chromosome-out polygenic scores
Leave-one-chromosome-out (LOCO) polygenic scoring was performed within the split-sample framework. First, GWAS for height was performed in each of subsets A and B. Using weights from the independent, GWS variants for height from subset B, a whole-genome polygenic score (PGS) S was calculated for each subject in subset A and vice versa for B. Next, for each chromosome k, a LOCO-PGS was formed by adding the PGS contributions from all chromosomes except chromosome k. This resulted in 22 LOCO-PGSs per subject, one for each autosome (our GWAS were confined to autosomes). Finally, GWAS was performed, separately for each chromosome, using the association model:
where G is genotype, X is a set of covariates including height, and is the LOCO-PGS for the chromosome k on which G resides. For the non-heritable component analysis, height was regressed on the LOCO-PGS to obtain residuals , then GWAS was performed, separately for each chromosome, according to the model:
Here, the covariate set now excludes height, and is formed using the for the chromosome on which G resides.
Results
Ratio model associations can be entirely denominator driven
Figure 1A presents a simple schematic of how a genetic variant can influence a ratio via an effect on the denominator only. To empirically demonstrate that associations obtained from ratio GWAS can be entirely denominator driven, we conducted analyses of a null phenotype, permuted body weight, within the UK Biobank (UKB; K).38,39 Permutation keeps the mean and standard deviation (SD) of weight unchanged but abolishes any association with genotype or the denominator (height). Figure 1C compares Manhattan plots for GWAS of weight (unpermuted), height, permuted weight adjusted for height and (the “adjusted model”), and the ratio of permuted weight to (the “ratio model”). As desired, the adjusted model detects no GWS loci ().41 By contrast, the ratio model detects 142 independent GWS loci, and exhibits clear similarity to the marginal Manhattan plots for height. Of these 142 GWS loci, 140 (98.6%) were GWS for height, and the remaining 2 were suggestively significant (; also see Figure S2). The height loci detected by the ratio model tended to be those with the largest effect sizes (Figure S3). The mean absolute effect on height for loci detected by the ratio model was 0.387 (95% CI, 0.351–0.423), compared with 0.205 (95% CI, 0.202–0.209) for height loci not detected by the ratio model (89% increase; Wilcoxon ). Transformation of the ratio, via the logarithm or the rank-based inverse normal transform,42 does not prevent the detection of denominator-driven associations (Figure S4). Because the numerator was permuted, and consequently had no association with genotype, the signals detected by the ratio model in this experiment must be entirely attributable to the denominator.
Figure 1.
Simulated and empirical evidence that associations detected in ratio GWAS can be entirely denominator driven
(A) Example data-generating process in which a variant G affects a ratio phenotype by influencing the denominator X only.
(B) Simulation study (K) of rejection probability as a function of denominator heritability. The numerator was permuted weight, and the denominator was simulated to control the heritability while matching the mean and variance to the observed values for . Each dot is represents the mean of simulations.
(C) Manhattan plots for GWAS conducted in the UK Biobank (K). Each point represents a genetic variant.
Propensity to detect denominator-driven associations increases with denominator heritability
To study the effect of denominator heritability on the probability of observing an association with a ratio phenotype, a random sample of 10K subjects was drawn from the UKB. The numerator was permuted body weight. To enable control of the heritability, the denominator was simulated from an infinitesimal model in such a way that the mean and SD matched the observed values for in UKB (Methods S1). Figure 1B tracks the rejection probability for the ratio and adjusted models. The probability that the adjusted model detects an association remains flat at the type I error rate of 5%, which is expected given that the numerator is unassociated with genotype. By contrast, the probability that the ratio model detects an association grows in concert with the heritability of the denominator (representing ). The solid red line is a theoretical calculation of the expected rejection probability as a function of denominator (Methods S1). At the empirical heritability for height, estimated by linkage disequilibrium (LD) score regression43 in the full cohort, the ratio model detected an association with 55% probability. This experiment demonstrates that the risk of detecting denominator-driven associations increases with the denominator heritability. The interpretation of associations detected by the ratio model is further complicated by the fact that the number of associations detected depends on the mean of the numerator, even when the numerator has no association with genotype (Figures S5 and S6). Additional simulations reported in the supplemental material, sections 1.5–1.6 (Figures S7–S10), examine the effects of environmental correlation and pleiotropy on the operating characteristics of the ratio and adjusted models. The simulation in section 1.7 of the supplemental material (Figure S11) compares the ratio model with the multivariate association model described in Methods S1.
Ratio and adjusted models can reach discrepant conclusions
We surveyed the extent to which ratio traits are used in the literature by querying the NHGRI-EBI GWAS Catalog37 for trait names including concentration, fraction, index, percent(age), percentile, proportion, rate, and ratio, as well as specific traits known to be ratios (e.g., BMI, WHR). We manually reviewed the 667 traits identified for evidence of having a heritable denominator, excluding, for example, per-volume concentrations or per-time rates. Using this approach, we ascertained that 362 traits, representing 3.2% of all traits and 7.8% of all reported associations in the GWAS Catalog, involved ratios with a heritable denominator (supplemental material).
To understand the extent to which the ratio and adjusted models differ in real data, we conducted two GWAS within the UKB on each of 10 phenotypes analyzed as ratios: one with the ratio as the outcome and the other with the numerator as the outcome, adjusting for the denominator (the “adjusted” analysis). The heritabilities of each trait are shown in Table S1, and the numbers of GWS loci identified by the ratio and adjusted models are presented in Table 1. The union of loci detected by the ratio and adjusted models was partitioned into three sets (ratio-only, adjusted-only, or both) based on whether or not the lead variant from each ratio locus was within 250 kb of the lead variant for any adjusted locus, and vice versa. On average the ratio model detected 386 loci while the adjusted model detected 292 loci. That the ratio model tends to identify more loci is expected given that a variant may associate with the ratio through either the numerator or the denominator. On average, 67.6% of all loci detected were detected by both models, 22.8% were unique to the ratio model, and 9.6% were unique to the adjusted model (also see Table S2). Thus, although the ratio and adjusted models are generally in agreement, for 32.6% of loci the two analyses may reach differing conclusions. We also assessed the percentage of signals from the ratio and adjusted models that were within 250 kb of a lead variant for the denominator trait, but not within 250 kb of a lead variant for the numerator trait (“% denominator driven”). On average, 48% of ratio-only signals were shared with the denominator trait, while only 8% of adjusted-only signals were shared with the denominator trait. Thus loci detected by only the ratio model are enriched for denominator-driven associations.
Table 1.
Survey of discrepancies between the ratio and adjusted models across traits analyzed as ratios
Traits | GWS loci | |||||
---|---|---|---|---|---|---|
Numerator | Denominator | N | Ratio | Adjusted | Ratio-only (% denominator driven) | Adjusted-only (% denominator driven) |
Albumin (blood) | Creatinine (blood) | 311,122 | 944 | 556 | 727 (71%) | 314 (0%) |
Albumin (urine) | Creatinine (urine) | 105,501 | 6 | 4 | 6 (0%) | 4 (0%) |
Weight | 355,627 | 844 | 830 | 89 (34%) | 70 (4%) | |
FVC | 325,220 | 538 | 474 | 86 (13%) | 49 (2%) | |
Stroke volume | End diastolic volume | 29,171 | 6 | 0 | 6 (0%) | 0 (.) |
Phenylalanine | Tyrosine | 200,159 | 251 | 174 | 54 (56%) | 10 (0%) |
Platelets | Lymphocytes | 52,133 | 2 | 0 | 2 (0%) | 0 (.) |
Seated height | Standing height | 355,373 | 440 | 115 | 330 (45%) | 39 (8%) |
Trunk fat mass | Whole body fat mass | 349,533 | 359 | 240 | 132 (0%) | 46 (0%) |
Waist circumference | Hip circumference | 356,112 | 465 | 529 | 116 (1%) | 115 (37%) |
Average | 243,995 | 385.5 | 292.2 | 154.8 (48%) | 64.7 (8%) |
Number of independent genome-wide significant loci detected by the ratio and adjusted models, categorized by whether the signal was detected by the ratio model only or the adjusted model only. Loci genome-wide significant for the denominator, but not the numerator, were considered “denominator driven.” Loci were obtained by clumping full-genome summary statistics at and .
Effect size correlation overstates concordance of ratio and adjusted models
Several studies of ratio phenotypes have performed sensitivity analyses in which the effect sizes from a ratio model are compared with those from an unadjusted model11,13,14,20 or an adjusted model.26 Comparisons among effect sizes from different models fit to the same data should be interpreted with caution. As demonstrated analytically in Methods S1, such estimates are inherently correlated due to having been obtained from the same subjects. To enable an unbiased comparison of effect sizes across models, we split the full UKB cohort (K) into two equally sized subsets, A and B (K each). Subset B serves as the validation cohort for associations discovered in subset A and vice versa. Importantly, effect sizes estimated from models fit in A can fairly be correlated with effect sizes estimated in B.
Figure S12 presents the effect size correlations between BMI and a comparator GWAS where the effect sizes were either estimated in the same or in different (split) samples. The comparators include marginal GWAS of height and weight, as well as GWAS of weight adjusted for height (and ). Although correlation remains, the magnitude of correlation declines appreciably when the effect sizes are measured in independent samples. For example, in the case of weight adjusted for height, the correlation with BMI drops from 97.8% to 81.1%.
Ratio associations are enriched for denominator signal
Comparing the ratio and adjusted models enables variants to be partitioned into three classes: variants detected by the ratio model only (“ratio-only”), variants detected by the adjusted model only (“adjusted-only”), and variants detected by both (“both”). To understand the biological significance of these three groups of variants, in Figure S13 we explore the enrichment of signal from marginal associations, calculated in the full UKB cohort, for the numerator (weight) and denominator (height) traits at these variants. Consistent with the hypothesis that variants detected by the ratio model only are more likely to be denominator driven (and that variants detected by the adjusted model only are more likely to be numerator driven), the mean of height was larger at ratio-only loci (two-sided t test, ; ), whereas the mean for weight was larger at adjusted-only loci (; ).
We also compared the power of the ratio and adjusted models at GWS loci for obesity-related traits compiled from external sources (excluding the UKB cohort where possible): body fat percentage,44 chronically elevated alanine aminotransferase,45 coronary artery disease,46 hip circumference,47 low-density lipoprotein (LDL) cholesterol48), and type 2 diabetes (MIM: 125853).49 The adjusted model had larger statistics at variants defined by several obesity-relevant traits, including body fat percentage (paired two-sided t test, ; ), hip circumference (; ), LDL cholesterol (; ), HDL cholesterol (; ), and total cholesterol (; , Table S3). Conversely, in no case did the ratio model have larger statistics. Together, these observations suggest that GWAS of weight adjusted for height may provide a more direct and powerful route to understanding the genetic basis of adverse adiposity than GWAS of BMI.
LOCO polygenic scoring can partially correct for heritable covariate bias
The denominators of the ratios considered in Table 1 are themselves heritable (Table S1). Important work by Aschard et al.35,36 and Day et al.35,36 underscores that adjusting for a heritable covariate can introduce collider bias. Consider the data generating processes represented in Figure 2A. Here, Y is the phenotype, G is genotype, X is a covariate, and B represents background common causes of X and Y, including both genetic (e.g., pleiotropic variants) and environmental (e.g., diet) components. If B has an effect on both X () and Y (), and G has an effect on X (), then conditioning on X opens a backdoor path50 between G and Y, namely . The presence of this backdoor path biases estimation of , the genetic effect of interest. The practical consequence of collider bias is that variants that affect X but not Y will falsely appear to have an effect on Y in models that adjust for X. In cases where G has no effect on X (), X is no longer a collider and the bias vanishes. Supposing , another way to remove collider bias is to condition on B. In general, B will have both genetic and environmental components. We will focus on the former, which can be accounted for using the observed genetic data. In principle, the genetic component of collider bias can be removed by performing conditional GWAS for Y adjusting for all variants that affect X. In practice, adjusting for all variants affecting X may be impractical, either because there are too many or because the complete set of variants affecting X is unknown. Instead, we pursue the strategy of removing the genetic component of collider bias by conditioning on a PGS for X.
Figure 2.
Leave-one-chromosome-out polygenic scoring corrects for collider bias due to the genetic background
(A) Data-generating processes considered in the simulation study. In each case Y is the phenotype, G is genotype, X is a heritable covariate, and B denotes background common causes of X and Y. For the simulation studies, we focus on the case where B is due to the genetic background.
(B) Bias in estimating as a function of the correlation between X and Y due to B. The sample size is and each violin presents the distribution of simulation replicates. The lower, middle, and upper bars of the boxplots demarcate the 25th, 50th, and 75th percentiles, and the whiskers extend to 1.5 the interquartile range. Adjusted and ratio are susceptible to bias; conditional is unbiased and theoretically optimal; LOCO-PGS and non-heritable component are the proposed bias-correction methods.
Drawing on ideas from BOLT-LMM51 and Regenie,52 we propose constructing a LOCO PGS for the heritable covariate X, then performing GWAS adjusting for both X and its LOCO-PGS. Figure S14 provides an overview of the LOCO-PGS workflow. A related strategy is to regress the LOCO-PGS out of the heritable covariate then adjust for the residual “non-heritable” component (in practice, the residual component may retain some heritability if the PGS does not capture all genetic contributions to the heritable covariate53). In either case, the PGS serves to correct for genetic collider bias by breaking the association between the genetic background and the heritable covariate. In Methods S1, we demonstrate analytically that excluding variants in LD with the focal variant is necessary (i.e., adjusting for a global rather than a LOCO PGS would introduce bias), and that LOCO-PGS adjustment successfully removes collider bias due to the genetic background. Aschard et al.35 demonstrated that the magnitude of heritable covariate bias is proportional to , where denotes the covariance. In connection with their result, the LOCO PGS attenuates the component of that is due to genetic background. When the covariance between Y and X is solely the result of genetic factors, the LOCO PGS has the potential to fully eliminate the heritable covariate bias. On the other hand, LOCO PGS cannot remove the component of collider bias due to environmental common causes of X and Y.
For each data-generating process in Figures 2A and 2B empirically assess the bias of the adjusted, ratio, and conditional models, alongside the proposed methods of correcting for genetic background. Adjusted GWAS is unbiased when the heritable covariate X fully mediates the effects of genotype G and genetic background B on Y, or when G has no effect on X, but incurs bias in the collider settings. Conditional GWAS is the optimal strategy of controlling collider bias, but requires knowledge of B in order to condition on it. Both LOCO-PGS and non-heritable covariate adjustment remove bias due to the genetic background, but LOCO-PGS is more efficient (i.e., produces smaller standard errors). Ratio GWAS is often biased, and inefficient, because the association model is misspecified. Specifically, genotype affects the numerator and/or denominator individually, not as a ratio.
Figure 3 compares validation effect sizes from several models with discovery effect size from the adjusted model using real data from the UKB. The slope of 0.92 indicates that the validation effect sizes were generally smaller in magnitude than the discovery effect sizes, which is likely a manifestation of winner’s curse.54 The similarity of the effect sizes from the adjusted, LOCO-PGS, and non-heritable component models suggests either that most variants detected by the adjusted model are not subject to substantial collider bias, or that the collider bias is due primarily to factors other than genetic background, for which the LOCO-PGS cannot account. Qualitatively similar results were found in the example of waist circumference adjusted for hip circumference (Figure S15).
Figure 3.
Empirically, effect sizes from the adjusted and LOCO-PGS analyses are similar
Validation effect sizes from several models are compared with the discovery effect size from the adjusted model, at independent () genome-wide significant () loci for the latter. The discovery and validation effect sizes were estimated in two independent subsets of the UK Biobank (K each). The adjusted model (left) performs GWAS of weight adjusting for height and . The LOCO-PGS model (center) performs GWAS of weight adjusting for height and a leave-one-chromosome-out polygenic score (LOCO-PGS) for height. The non-heritable component model (right) performs GWAS of weight adjusting for the residual after regression height on its LOCO-PGS. The solid gray line is the identity, and the dotted line is the least-squares regression line.
Discussion
Ratio traits are useful heuristics in clinical practice, but introduce statistical challenges when used in regression models.31 The fundamental problem with performing GWAS on ratios is that any variant associated with either numerator or denominator will associate with the ratio in samples of sufficient size. Associations with ratio traits are thus not specific to the ratio per se, and indiscriminate use of ratios will simply tend to uncover a mixture of signals associated with the ratio's components.
For researchers considering GWAS on a ratio trait, we make the following specific recommendations, diagrammed in Figure 4A. First, consider whether the goal of forming the ratio is to account for the denominator. If so, then consider an adjusted analysis, with possible inclusion of a LOCO-PGS for the denominator. If the goal is to identify variants associated with the numerator and/or the denominator, consider a multivariate analysis. Finally, if identifying variants associated with the ratio is genuinely of interest, and it is immaterial whether those variants affect the ratio through the numerator or denominator only, a ratio analysis may be appropriate. Regardless of the analysis selected in step 1, we recommend always placing the results in context by examining the effect sizes and p values from marginal analyses of the numerator and denominator traits. If ratio analysis was performed initially, this reveals whether significant associations are likely attributable to the numerator, denominator, or both. If a multivariate analysis was performed, this can clarify whether the associations with numerator and denominator are comparable in magnitude and in the same or different directions. Finally, if an adjusted analysis was performed, the marginal analyses can disentangle whether the association is attributable to collider bias (Figure 4B). Specifically, if the variant is unassociated with the denominator and has similar effect sizes in the adjusted and unadjusted models, then the association is likely not due to collider bias. Conversely, if the variant is associated with the denominator but not marginally associated with the numerator, the association is likely due to collider bias (intermediate cases are also possible). When collider bias is present, comparing the effect sizes from the adjusted model with and without a LOCO-PGS of the denominator can assess whether the collider bias is primarily genetic versus environmental.
Figure 4.
Decisions trees for choosing the appropriate analysis of a ratio phenotype, and for interpreting the results of an adjusted analysis
(A) Describes how to select the appropriate analysis based on the variants of interest. A ratio analysis should only be conducted after considering the adjusted and multivariate alternatives, and the results of marginal analyses should always be considered to aid in interpretation.
(B) Describes how to determine whether or not an association detected by the adjusted model is likely driven by collider bias.
Pleiotropic effects on the numerators and denominators of ratio traits appear to be widespread. Pleiotropy can arise either in the vertical sense, as when the effect of a variant on the numerator is mediated by the denominator or vice versa; or in the horizontal sense, as when the variant affects the numerator and denominator through distinct pathways. In the survey of discrepancies between the ratio and adjusted models, 44% of loci identified in the ratio model were detected by the adjusted model and 68% of the loci identified in the adjusted model were detected by the ratio model (Table S2). Moreover, the genetic correlation between the numerator and denominator traits was often substantial (Table S4). Thus, it is not the case that all associations identified by ratio GWAS are erroneous. Instead, the conclusion is that at least some associations identified by ratio GWAS may be entirely due to the denominator or, in general, due to only one of the components. If the goal of a ratio GWAS is to detect variants associated with the numerator while accounting for the denominator, then an adjusted analysis may be more appropriate. If the goal is to detect pleiotropic variants, i.e., those affecting both the numerator and the denominator, then a multivariate GWAS may outperform either taking the intersection of individual GWAS or performing ratio GWAS.55,56,57,58 Methods S1 describes a multivariate framework as an alternative to ratio analysis, and the simulation study in Figure S11 confirms that multivariate analysis is as or more powerful than ratio analysis, in addition to being more transparent about the null hypothesis being tested. A future direction is to develop software for performing this multivariate test for pleiotropy at genome scale.
Among the traits considered in the survey of discrepancies between the ratio and adjusted models, WHR is an outlier. Whereas for other traits, the ratio model detected more associations (as expected), for WHR the adjusted model detected more associations. Moreover, for WHR only, more of the adjusted-only associations than the ratio-only associations were putatively denominator driven. Possible explanations for this include extensive pleiotropy between waist and hip circumference (using our summary statistics, the genetic correlation was estimated at 0.88 by LDSC; Table S4), and the potential for strong environmental effects on both waist and hip circumference (e.g., diet). While unraveling the distinctive behavior of WHR is not the focus of the present work, this does provide an exemplar trait where the adjusted model is potentially inappropriate and a multivariate analysis would be preferred.
Recognizing that the purpose of BMI is to provide an index of adiposity that is independent of height, but that the ratio of weight to is not always optimal, Stergiakouli et al. considered performing GWAS on BMI , where the exponent x was selected in an age-dependent manner to minimize phenotypic correlation between BMI and height.59 Although this is a step in the right direction, this strategy is unlikely to fully resolve the issues with performing GWAS on ratios. Selecting x to minimize the phenotypic correlation of BMI with height does not guarantee that the final correlation will be exactly zero, and even if zero correlation could be achieved, this does not imply that BMI and height will be free of higher-order dependencies. Absent the ability to select such that BMI and height are completely independent, it remains possible for variants to associate with BMI via height.
When the denominator is heritable, adjusting for the denominator as a covariate can introduce collider bias.35,36 It is worth noting that adjusting for a heritable covariate does not automatically result in collider bias. As shown here and discussed previously,35 there are causal architectures that do not incur collider bias, such as when the effect of background is fully mediated through the heritable covariate. For settings where collider bias is expected, we proposed additionally adjusting for a LOCO-PGS for the heritable covariate. To further control for effects of genetic background on numerator and denominator, this approach could be extended by including a random effect with covariance proportional to the genetic relatedness matrix.51,60,61 When performing GWAS of weight adjusted for height, we found that addition of a LOCO-PGS had little effect on the estimated effect sizes, suggesting that if collider bias is present, it is primarily due to factors other than genetic background. A limitation of the LOCO-PGS approach is that it cannot remove collider bias introduced by non-genetic common causes, such as the environment. For a method that addresses collider bias due to potentially unobserved environmental factors, see SlopeHunter.62 Another limitation is that LOCO-PGS does not, by default, account for collider bias due to variants on the same chromosome. While the bias introduced by variants on the same chromosome will often be minimal, in cases where it is substantial, this bias can be removed by combining LOCO-PGS with conditional GWAS. Specifically, the association model would adjust for the heritable covariate, its LOCO-PGS, and any variants on the same chromosome with strong effects on the covariate. An important future direction is to compare adjusted GWAS with and without inclusion of the LOCO-PGS more broadly, in order to isolate examples of when LOCO-PGS does and does not have a significant impact on the estimated effect size.
With increasingly high-dimensional phenotypic data, such as that provided by metabolomics and proteomics panels, there is a temptation to conduct hypothesis-free association testing on the ratios of all possible pairs of variables.63,64 In the metabolic context, a rationale for considering ratios is that the ratio of product to substrate may capture information about binding affinity or reaction rate.12,21,64,65 While GWAS of ratios may be justified in some situations, we suggest that the scientific hypothesis be carefully elaborated before a ratio phenotype is selected. The fact that the denominator is likely related to the numerator is not sufficient justification for using a ratio phenotype, as the relationship between numerator and denominator can be modeled in other ways. If potential collider bias precludes adjusting for the denominator as a covariate, then a multivariate analysis that separately estimates the effect of genotype on the numerator and denominator will typically provide greater power and interpretability. Notably, the hypotheses that genotype is associated with both the numerator and denominator, or with one or the other, can all be evaluated within a multivariate framework (Methods S1). Performing GWAS on ratios is generally admissible when the denominator is a non-heritable normalizing factor, such as volume in the case of concentration measurements (e.g., HbA1c) or time in the case of certain rates (e.g., 24-h urinary albumin excretion rate). However, even here it should be considered whether treating the denominator as an offset or covariate is not more appropriate given the other covariates in the model.31
On the basis of improved power, a recent paper advocated for performing GWAS on log-ratios.23 Section 3.6 of the supplemental material contains an extended discussion of performing GWAS on such traits, in which we demonstrate analytically that association with either the numerator or denominator is sufficient for association with the log-ratio. As such, the set of variants targeted by the log-ratio model is the union of the sets of variants affecting the component traits. Direct comparison of power between the log-ratio model and marginal GWAS of the component traits is therefore difficult to justify, because these models target different sets of variants. The log-ratio model, which targets a larger set of variants, will often identify more GWS associations; however, the interpretation of which trait these variants affect will be unclear.
Proportions are ratios in which the numerator is contained within the denominator, as in the case of body fat percentage.66 As ratios, proportions are not exempt from the potential for denominator-driven associations. Taking body fat percentage as an example, suppose total body mass is partitioned into fat mass and lean mass. Body fat percentage is the ratio of fat mass to total body mass. A genetic variant that increases lean mass but has no direct effect on fat mass will exhibit a negative association with body fat percentage because it increases the denominator (total body mass) via its effect on lean mass.
Although we have scrutinized the practice of using a ratio as the outcome in a regression model, ratios can also enter GWAS as covariates. For example, GWAS of WHR are often adjusted for BMI.47,67,68 When including a ratio as a covariate, it is important to recognize that a ratio is implicitly an interaction; in the case of BMI, between weight and height–2.31 Including an interaction without the main effects makes it unclear whether an association is genuinely attributable to the interaction versus one of its components; including the main effects in addition to the interaction disentangles these effects.69 Therefore, in models where adjusting for a ratio is indicated, we follow Kronmal31 in suggesting that the components of the ratio also be included as main effects.
In conclusion, GWAS of ratio traits can identify associations that are entirely denominator driven, and when the rationale for forming the ratio was to adjust for the denominator, such associations may be considered false positives. We recommend reanalyzing ratio GWAS in order to discern whether the associations are attributable to the numerator, the denominator, or both. The practice of considering associations with ratios (e.g., BMI) as conceptually distinct from associations with the components (e.g., height and weight) requires critical re-examination, as variants can associate with the ratio through only one of the components. Downstream analyses based on summary statistics from ratio GWAS (e.g., predictions based on PGSs, causal effect estimates based on Mendelian randomization) may also require reconsideration.
Data and code availability
This work used genotypes and phenotypes from the UK Biobank, which are available upon application to the UK Biobank Access Management System. Publicly available summary statistics were obtained from the NHGRI-EBI GWAS Catalog.37 GWAS and clumping were performed using PLINK (v.1.9).40 Genetic correlations were estimated using LD Score Regression (v.1.0.1).43 Simulation studies were conducted in R.
Consortia
Contributions of insitro Research Team members, listed in alphabetical order.
Downloading, preprocessing, and curation of UK Biobank data
-
•
Francesco Paolo Casale, Eilon Sharon, Thomas W. Soare, James Warren.
Software engineering
-
•
Statistical genetics pipeline: Magdalena Borecka, Francesco Paolo Casale, Anna Merkoulovitch, Colm O’Dushlaine, Thomas W. Soare, Paul Sud, Baris Ungun.
-
•
redun development: Robin Betz, Edward Chee, Patrick R. Conrad, Kevin Ford, Christoph Klein, Donald Naegely, Matthew Rasmussen.
Acknowledgments
The authors are thankful to Emily Fox, who provided helpful feedback during the revision process, and to the participants of the UK Biobank, whose data were used with permission.
Author contributions
Z.R.M., H.S., S.M., and T.W.S. conceived of the project. R.D. performed theoretical derivations. Z.R.M., H.S., D.A., S.M., and T.W.S. performed analyses. K.S. reviewed the list of ratio traits from the GWAS Catalog. All authors provided scientific input. D.K., G.D.S., D.M., and C.O. provided early feedback and direction on the manuscript. Z.R.M. and T.W.S. wrote the first draft of the manuscript. All authors contributed to critical revision of the final manuscript.
Declaration of interests
Z.R.M., R.D., H.S., D.A., S.M., K.S., members of the insitro Research Team, T.K., D.K., C.O., and T.W.S. are current or former employees and shareholders of Insitro. G.D.S. and D.M. are advisors to Insitro.
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.xhgg.2025.100406.
Contributor Information
Zachary R. McCaw, Email: zmccaw@alumni.harvard.edu.
Thomas W. Soare, Email: tsoare@insitro.com.
Web resources
GWAS Catalog, https://www.ebi.ac.uk/gwas/
OMIM, http://www.omim.org.
UK Biobank Data, https://www.ukbiobank.ac.uk.
Supplemental information
References
- 1.Garrow J.S., Webster J. Quetelet’s index (w/h2) as a measure of fatness. Int. J. Obes. 1985;9:147–153. [PubMed] [Google Scholar]
- 2.Orzano A.J., Scott J.G. Diagnosis and treatment of obesity in adults: An applied evidence-based review. J. Am. Board Fam. Pract. 2004;17:359–369. doi: 10.3122/jabfm.17.5.359. [DOI] [PubMed] [Google Scholar]
- 3.Vestbo J., Hurd S.S., Agustí A.G., Jones P.W., Vogelmeier C., Anzueto A., Barnes P.J., Fabbri L.M., Martinez F.J., Nishimura M., et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: Gold executive summary. Am. J. Respir. Crit. Care Med. 2013;187:347–365. doi: 10.1164/rccm.201204-0596PP. [DOI] [PubMed] [Google Scholar]
- 4.Global Initiative for Asthma. Global strategy for asthma management and prevention.
- 5.Vockley J., Andersson H.C., Antshel K.M., Braverman N.E., Burton B.K., Frazier D.M., Mitchell J., Smith W.E., Thompson B.H., Berry S.A., American College of Medical Genetics and Genomics Therapeutics Committee Phenylalanine hydroxylase deficiency: diagnosis and management guideline. Genet. Med. 2014;16:188–200. doi: 10.1038/gim.2013.157. [DOI] [PubMed] [Google Scholar]
- 6.Writing Committee Members. Yancy C.W., Jessup M., Bozkurt B., Butler J., Casey D.E., Jr., Drazner M.H., Fonarow G.C., Geraci S.A., Horwich T., et al. 2013 accf/aha guideline for the management of heart failure: a report of the american college of cardiology foundation/american heart association task force on practice guidelines. Circulation. 2013;128:e240–e327. doi: 10.1161/CIR.0b013e31829e8776. [DOI] [PubMed] [Google Scholar]
- 7.McNally E.M., Mestroni L. Dilated cardiomyopathy: genetic determinants and mechanisms. Circ. Res. 2017;121:731–748. doi: 10.1161/CIRCRESAHA.116.309396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.McLaughlin T., Abbasi F., Cheal K., Chu J., Lamendola C., Reaven G. Use of metabolic markers to identify overweight individuals who are insulin resistant. Ann. Intern. Med. 2003;139:802–809. doi: 10.7326/0003-4819-139-10-200311180-00007. [DOI] [PubMed] [Google Scholar]
- 9.Williams A.L., Hoofnagle J.H. Ratio of serum aspartate to alanine aminotransferase in chronic hepatitis relationship to cirrhosis. Gastroenterology. 1988;95:734–739. doi: 10.1016/s0016-5085(88)80022-2. [DOI] [PubMed] [Google Scholar]
- 10.Garway-Heath D.F., Ruben S.T., Viswanathan A., Hitchings R.A. Vertical cup/disc ratio in relation to optic disc size: its value in the assessment of the glaucoma suspect. Br. J. Ophthalmol. 1998;82:1118–1124. doi: 10.1136/bjo.82.10.1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Loos R.J.F., Lindgren C.M., Li S., Wheeler E., Zhao J.H., Prokopenko I., Inouye M., Freathy R.M., Attwood A.P., Beckmann J.S., et al. Common variants near mc4r are associated with fat mass, weight and risk of obesity. Nat. Genet. 2008;40:768–775. doi: 10.1038/ng.140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Illig T., Gieger C., Zhai G., Römisch-Margl W., Wang-Sattler R., Prehn C., Altmaier E., Kastenmüller G., Kato B.S., Mewes H.W., et al. A genome-wide perspective of genetic variation in human metabolism. Nat. Genet. 2010;42:137–141. doi: 10.1038/ng.507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lindgren C.M., Heid I.M., Randall J.C., Lamina C., Steinthorsdottir V., Qi L., Speliotes E.K., Thorleifsson G., Willer C.J., Herrera B.M., et al. Genome-wide association scan meta-analysis identifies three loci influencing adiposity and fat distribution. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Agrawal S., Wang M., Klarqvist M.D.R., Smith K., Shin J., Dashti H., Diamant N., Choi S.H., Jurgens S.J., Ellinor P.T., et al. Inherited basis of visceral, abdominal subcutaneous and gluteofemoral fat depots. Nat. Commun. 2022;13:3771. doi: 10.1038/s41467-022-30931-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Oliveri A., Rebernick R.J., Kuppa A., Pant A., Chen Y., Du X., Cushing K.C., Bell H.N., Raut C., Prabhu P., et al. Comprehensive genetic study of the insulin resistance marker tg:hdl-c in the uk biobank. Nat. Genet. 2024;56:212–221. doi: 10.1038/s41588-023-01625-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.DeForest N., Wang Y., Zhu Z., Dron J.S., Koesterer R., Natarajan P., Flannick J., Amariuta T., Peloso G.M., Majithia A.R. Genome-wide discovery and integrative genomic characterization of insulin resistance loci using serum triglycerides to hdl-cholesterol ratio as a proxy. Nat. Commun. 2024;15:8068. doi: 10.1038/s41467-024-52105-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sinnott-Armstrong N., Tanigawa Y., Amar D., Mars N., Benner C., Aguirre M., Venkataraman G.R., Wainberg M., Ollila H.M., Kiiskinen T., et al. Genetics of 35 blood and urine biomarkers in the uk biobank. Nat. Genet. 2021;53:185–194. doi: 10.1038/s41588-020-00757-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kachuri L., Jeon S., DeWan A.T., Metayer C., Ma X., Witte J.S., Chiang C.W., Wiemels J.L., de Smith A.J. Genetic determinants of blood-cell traits influence susceptibility to childhood acute lymphoblastic leukemia. Am. J. Hum. Genet. 2021;108:1823–1835. doi: 10.1016/j.ajhg.2021.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yengo L., Sidorenko J., Kemper K.E., Zheng Z., Wood A.R., Weedon M.N., Frayling T.M., Hirschhorn J., Yang J., Visscher P.M., GIANT Consortium Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of european ancestry. Hum. Mol. Genet. 2018;27:3641–3649. doi: 10.1093/hmg/ddy271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pirruccello J.P., Bick A., Wang M., Chaffin M., Friedman S., Yao J., Guo X., Venkatesh B.A., Taylor K.D., Post W.S., et al. Analysis of cardiac magnetic resonance imaging in 36,000 individuals yields genetic insights into dilated cardiomyopathy. Nat. Commun. 2020;11:2254. doi: 10.1038/s41467-020-15823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cheng Y., Schlosser P., Hertel J., Sekula P., Oefner P.J., Spiekerkoetter U., Mielke J., Freitag D.F., Schmidts M., Kronenberg F., et al. Rare genetic variants affecting urine metabolite levels link population variation to inborn errors of metabolism. Nat. Commun. 2021;12:964. doi: 10.1038/s41467-020-20877-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chen Y., Lu T., Pettersson-Kymmer U., Stewart I.D., Butler-Laporte G., Nakanishi T., Cerani A., Liang K.Y.H., Yoshiji S., Willett J.D.S., et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat. Genet. 2023;55:44–53. doi: 10.1038/s41588-022-01270-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Suhre K. Genetic associations with ratios between protein levels detect new pqtls and reveal protein-protein interactions. Cell Genom. 2024;4 doi: 10.1016/j.xgen.2024.100506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chan Y., Salem R.M., Hsu Y.H.H., McMahon G., Pers T.H., Vedantam S., Esko T., Guo M.H., Lim E.T., Franke L., et al. Genome-wide analysis of body proportion classifies height-associated variants by mechanism of action and implicates genes important for skeletal development. Am. J. Hum. Genet. 2015;96:695–708. doi: 10.1016/j.ajhg.2015.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bartell E., Lin K., Tsuo K., Gan W., Vedantam S., Cole J.B., Baronas J.M., Yengo L., Marouli E., Amariuta T., et al. Genetics of skeletal proportions in two different populations. bioRxiv. 2023 doi: 10.1101/2023.05.22.541772. Preprint at. [DOI] [Google Scholar]
- 26.Kun E., Javan E.M., Smith O., Gulamali F., de la Fuente J., Flynn B.I., Vajrala K., Trutner Z., Jayakumar P., Tucker-Drob E.M., et al. The genetic architecture and evolution of the human skeletal form. Science. 2023;381 doi: 10.1126/science.adf8009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Teumer A., Li Y., Ghasemi S., Prins B.P., Wuttke M., Hermle T., Giri A., Sieber K.B., Qiu C., Kirsten H., et al. Genome-wide association meta-analyses and fine-mapping elucidate pathways influencing albuminuria. Nat. Commun. 2019;10:4130. doi: 10.1038/s41467-019-11576-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Alipanahi B., Hormozdiari F., Behsaz B., Cosentino J., McCaw Z.R., Schorsch E., Sculley D., Dorfman E.H., Foster P.J., Peng L.H., et al. Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology. Am. J. Hum. Genet. 2021;108:1217–1230. doi: 10.1016/j.ajhg.2021.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Han X., Gharahkhani P., Hamel A.R., Ong J.S., Rentería M.E., Mehta P., Dong X., Pasutto F., Hammond C., Young T.L., et al. Large-scale multitrait genome-wide association analyses identify hundreds of glaucoma risk loci. Nat. Genet. 2023;55:1116–1125. doi: 10.1038/s41588-023-01428-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hansen G.T., Sobreira D.R., Weber Z.T., Thornburg A.G., Aneas I., Zhang L., Sakabe N.J., Joslin A.C., Haddad G.A., Strobel S.M., et al. Genetics of sexually dimorphic adipose distribution in humans. Nat. Genet. 2023;55:461–470. doi: 10.1038/s41588-023-01306-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Kronmal R.A. Spurious correlation and the fallacy of the ratio standard revisited. JRSSA. 1993;156:379–392. [Google Scholar]
- 32.Khosla T., Lowe C.R. Indices of obesity derived from body weight and height. Br. J. Prev. Soc. Med. 1967;21:122–128. doi: 10.1136/jech.21.3.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Shah N.R., Braverman E.R. Measuring adiposity in patients: the utility of body mass index (bmi), percent body fat, and leptin. PLoS One. 2012;7 doi: 10.1371/journal.pone.0033308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mooldijk S.S., Labrecque J.A., Ikram M.A., Ikram M.K. Ratios in regression analyses with causal questions. Am. J. Epidemiol. 2024;26:kwae162–kwae166. doi: 10.1093/aje/kwae162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Aschard H., Vilhjálmsson B.J., Joshi A.D., Price A.L., Kraft P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am. J. Hum. Genet. 2015;96:329–339. doi: 10.1016/j.ajhg.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Day F.R., Loh P.R., Scott R.A., Ong K.K., Perry J.R.B. A robust example of collider bias in a genetic association study. Am. J. Hum. Genet. 2016;98:392–393. doi: 10.1016/j.ajhg.2015.12.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sollis E., Mosaku A., Abid A., Buniello A., Cerezo M., Gil L., Groza T., Güneş O., Hall P., Hayhurst J., et al. The nhgri-ebi gwas catalog: knowledgebase and deposition resource. Nucleic Acids Res. 2023;51:D977–D985. doi: 10.1093/nar/gkac1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M., et al. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12 doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O'Connell J., et al. The uk biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. Plink: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pe’er I., Yelensky R., Altshuler D., Daly M.J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 2008;32:381–385. doi: 10.1002/gepi.20303. [DOI] [PubMed] [Google Scholar]
- 42.McCaw Z.R., Lane J.M., Saxena R., Redline S., Lin X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics. 2020;76:1262–1272. doi: 10.1111/biom.13214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bulik-Sullivan B.K., Loh P.R., Finucane H.K., Ripke S., Yang J., Schizophrenia Working Group of the Psychiatric Genomics Consortium. Patterson N., Daly M.J., Price A.L., Neale B.M. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lu Y., Day F.R., Gustafsson S., Buchkovich M.L., Na J., Bataille V., Cousminer D.L., Dastani Z., Drong A.W., Esko T., et al. New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. Nat. Commun. 2016;7 doi: 10.1038/ncomms10495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Vujkovic M., Ramdas S., Lorenz K.M., Guo X., Darlay R., Cordell H.J., He J., Gindin Y., Chung C., Myers R.P., et al. A multiancestry genome-wide association study of unexplained chronic alt elevation as a proxy for nonalcoholic fatty liver disease with histological and radiological validation. Nat. Genet. 2022;54:761–771. doi: 10.1038/s41588-022-01078-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Nelson C.P., Goel A., Butterworth A.S., Kanoni S., Webb T.R., Marouli E., Zeng L., Ntalla I., Lai F.Y., Hopewell J.C., et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat. Genet. 2017;49:1385–1391. doi: 10.1038/ng.3913. [DOI] [PubMed] [Google Scholar]
- 47.Shungin D., Winkler T.W., Croteau-Chonka D.C., Ferreira T., Locke A.E., Mägi R., Strawbridge R.J., Pers T.H., Fischer K., Justice A.E., et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–196. doi: 10.1038/nature14132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Graham S.E., Clarke S.L., Wu K.H.H., Kanoni S., Zajac G.J.M., Ramdas S., Surakka I., Ntalla I., Vedantam S., Winkler T.W., et al. The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600:675–679. doi: 10.1038/s41586-021-04064-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Mahajan A., Spracklen C.N., Zhang W., Ng M.C.Y., Petty L.E., Kitajima H., Yu G.Z., Rüeger S., Speidel L., Kim Y.J., et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 2022;54:560–572. doi: 10.1038/s41588-022-01058-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82:669–688. [Google Scholar]
- 51.Loh P.R., Tucker G., Bulik-Sullivan B.K., Vilhjálmsson B.J., Finucane H.K., Salem R.M., Chasman D.I., Ridker P.M., Neale B.M., Berger B., et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Mbatchou J., Barnard L., Backman J., Marcketta A., Kosmicki J.A., Ziyatdinov A., Benner C., O'Dushlaine C., Barber M., Boutkov B., et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 2021;53:1097–1103. doi: 10.1038/s41588-021-00870-7. [DOI] [PubMed] [Google Scholar]
- 53.Silventoinen K., Lahtinen H., Davey Smith G., Morris T.T., Martikainen P. Height, social position and coronary heart disease incidence: the contribution of genetic and environmental factors. J. Epidemiol. Community Health. 2023;77:384–390. doi: 10.1136/jech-2022-219907. [DOI] [PubMed] [Google Scholar]
- 54.Xiao R., Boehnke M. Quantifying and correcting for the winner’s curse in genetic association studies. Genet. Epidemiol. 2009;33:453–462. doi: 10.1002/gepi.20398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ferreira M.A.R., Purcell S.M. A multivariate test of association. Bioinformatics. 2009;25:132–133. doi: 10.1093/bioinformatics/btn563. [DOI] [PubMed] [Google Scholar]
- 56.O’Reilly P.F., Hoggart C.J., Pomyen Y., Calboli F.C.F., Elliott P., Jarvelin M.R., Coin L.J.M. Multiphen: joint model of multiple phenotypes can increase discovery in gwas. PLoS One. 2012;7 doi: 10.1371/journal.pone.0034861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zhou X., Stephens M. Efficient algorithms for multivariate linear mixed models in genome-wide association studies. Nat. Methods. 2014;11:407–409. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Grotzinger A.D., Rhemtulla M., de Vlaming R., Ritchie S.J., Mallard T.T., Hill W.D., Ip H.F., Marioni R.E., McIntosh A.M., Deary I.J., et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 2019;3:513–525. doi: 10.1038/s41562-019-0566-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Stergiakouli E., Gaillard R., Tavaré J.M., Balthasar N., Loos R.J., Taal H.R., Evans D.M., Rivadeneira F., St Pourcain B., Uitterlinden A.G., et al. Genome-wide association study of height-adjusted bmi in childhood identifies functional variant in adcy3. Obesity. 2014;22:2252–2259. doi: 10.1002/oby.20840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Yang J., Zaitlen N.A., Goddard M.E., Visscher P.M., Price A.L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Chen H., Wang C., Conomos M.P., Stilp A.M., Li Z., Sofer T., Szpiro A.A., Chen W., Brehm J.M., Celedón J.C., et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 2016;98:653–666. doi: 10.1016/j.ajhg.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mahmoud O., Dudbridge F., Davey Smith G., Munafo M., Tilling K. A robust method for collider bias correction in conditional genome-wide association studies. Nat. Commun. 2022;13:619. doi: 10.1038/s41467-022-28119-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Petersen A.K., Krumsiek J., Wägele B., Theis F.J., Wichmann H.E., Gieger C., Suhre K. On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies. BMC Bioinf. 2012;13:120. doi: 10.1186/1471-2105-13-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Suhre K., Gieger C. Genetic variation in metabolic phenotypes: study designs and applications. Nat. Rev. Genet. 2012;13:759–769. doi: 10.1038/nrg3314. [DOI] [PubMed] [Google Scholar]
- 65.Suhre K., McCarthy M.I., Schwenk J.M. Genetics meets proteomics: perspectives for large population-based studies. Nat. Rev. Genet. 2021;22:19–37. doi: 10.1038/s41576-020-0268-2. [DOI] [PubMed] [Google Scholar]
- 66.Rask-Andersen M., Karlsson T., Ek W.E., Johansson Å. Genome-wide association study of body fat distribution identifies adiposity loci and sex-specific genetic effects. Nat. Commun. 2019;10:339. doi: 10.1038/s41467-018-08000-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pulit S.L., Stoneman C., Morris A.P., Wood A.R., Glastonbury C.A., Tyrrell J., Yengo L., Ferreira T., Marouli E., Ji Y., et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of european ancestry. Hum. Mol. Genet. 2019;28:166–174. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Deaton A.M., Dubey A., Ward L.D., Dornbos P., Flannick J., AMP-T2D-GENES Consortium. Yee E., Ticau S., Noetzli L., Parker M.M., et al. Rare loss of function variants in the hepatokine gene inhbe protect from abdominal obesity. Nat. Commun. 2022;13:4319. doi: 10.1038/s41467-022-31757-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Clayton D.G. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000540. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This work used genotypes and phenotypes from the UK Biobank, which are available upon application to the UK Biobank Access Management System. Publicly available summary statistics were obtained from the NHGRI-EBI GWAS Catalog.37 GWAS and clumping were performed using PLINK (v.1.9).40 Genetic correlations were estimated using LD Score Regression (v.1.0.1).43 Simulation studies were conducted in R.