Skip to main content
Nature Communications logoLink to Nature Communications
. 2024 Jul 10;15:5801. doi: 10.1038/s41467-024-49998-0

Characterising the genetic architecture of changes in adiposity during adulthood using electronic health records

Samvida S Venkatesh 1,2,, Habib Ganjgahi 2,3, Duncan S Palmer 2,4, Kayesha Coley 5, Gregorio V Linchangco Jr 6,7, Qin Hui 6,7, Peter Wilson 7,8, Yuk-Lam Ho 9, Kelly Cho 9,10, Kadri Arumäe 11; Million Veteran Program; Estonian Biobank Research Team, Laura B L Wittemans 12,13, Christoffer Nellåker 2,13, Uku Vainik 11,14,15, Yan V Sun 6,7, Chris Holmes 3,16,17, Cecilia M Lindgren 1,2,13,18,, George Nicholson 3,
PMCID: PMC11237142  PMID: 38987242

Abstract

Obesity is a heritable disease, characterised by excess adiposity that is measured by body mass index (BMI). While over 1,000 genetic loci are associated with BMI, less is known about the genetic contribution to adiposity trajectories over adulthood. We derive adiposity-change phenotypes from 24.5 million primary-care health records in over 740,000 individuals in the UK Biobank, Million Veteran Program USA, and Estonian Biobank, to discover and validate the genetic architecture of adiposity trajectories. Using multiple BMI measurements over time increases power to identify genetic factors affecting baseline BMI by 14%. In the largest reported genome-wide study of adiposity-change in adulthood, we identify novel associations with BMI-change at six independent loci, including rs429358 (APOE missense variant). The SNP-based heritability of BMI-change (1.98%) is 9-fold lower than that of BMI. The modest genetic correlation between BMI-change and BMI (45.2%) indicates that genetic studies of longitudinal trajectories could uncover novel biology of quantitative traits in adulthood.

Subject terms: Genetics research, Genome-wide association studies, Obesity


Here, the authors identify six genetic variants associated with adult weight-change, by leveraging 24.5 million weight records from the UK, USA, and Estonia. These variants influence weight change independently of baseline weight.

Introduction

Obesity, the accumulation of excess body fat1, which is associated with increased disease burden2,3, has a strong genetic component4. The heritability of body mass index (BMI) is estimated to be 40–70%46, and genome-wide association studies (GWASs) have implicated over 1000 independent loci associated with a range of obesity traits4. The dynamic process of change in weight over time is also thought to have a genetic component7,8. Recent studies reveal the shifting genetic landscape of infant, childhood, and adolescent BMI, which detect age-specific transient effects by performing age-stratified GWASs911. Adult twin studies1214 and an electronic health record (EHR)-based population study15 indicate that long-term patterns of change in adiposity are heritable and have a distinct genetic component to baseline obesity levels. However, less is known about the specific variants and genes that contribute to patterns of adulthood adiposity change. This paucity of GWASs of long-term trajectories of weight change can be partially attributed to the challenges in building and maintaining large-scale genetics cohorts that follow participants over their lifetime16.

Longitudinal data are a key feature of EHRs, whose increased adoption in the clinic and integration into biobanks has powered cost-efficient and scalable genetics research17,18. Despite biases in EHR data, including sparsity, non-random missingness, data inaccuracies, and informed presence, EHR-based genetics studies reliably replicate results from purpose-built cohorts1921. Recent advances in the extraction of phenotypes from longitudinal EHRs at scale show that, as expected22,23, the mean of repeat quantitative measurements can outperform cross-sectional phenotypes for genetic discovery24,25. Repeat measurements further allow for the estimation of longitudinal metrics of trait change, such as trajectory-based clusters26, linear slope27, and within-individual variability over time28, all of which may provide additional information to uncover the genetic underpinnings of disease.

A variety of approaches are available for harnessing the longitudinal component of trajectories in EHR data. Simple models target the gradient of a linear fit over time, such as in a longitudinal linear mixed-effects model framework2830. More complex regression modelling approaches are employed to investigate non-linear changes over time. For example, semi-parametric regression models31 generate flexible longitudinal patterns from combinations of basis functions, such as B-splines, regularised to induce a suitable degree of temporal smoothness3235. Subgroups of individuals with similar non-linear trajectories are often identified through clustering approaches, with subgroup membership then tested for association with clinical outcomes or genetic variation3641. Although it is possible to fit full joint models that incorporate both genetic data and longitudinal trajectories simultaneously28, two-stage approaches wherein summary metrics from models of longitudinal EHRs are taken forward for genetic association analyses are popular for their computational efficiency27.

In this study, we leveraged longitudinal EHRs linked to the UK Biobank (UKBB)42, Million Veteran Program (MVP)43,44, and Estonian Biobank (EstBB)45 to study the genetic architecture of change in adiposity over adulthood. We developed a two-stage analytical pipeline, utilising statistical methods with a history of application in the EHR data context, to derive linear and non-linear trajectories of BMI and weight over time, and to identify clusters of individuals with similar adiposity trajectories. In the second stage, we carried forward the latent phenotypes from these models, which capture both baseline obesity trait levels and change in obesity traits over time, to perform the largest reported genome-wide association analyses for adiposity change in adulthood. Our results demonstrate the added value of EHR-derived longitudinal phenotypes for genetic discovery.

Results

Longitudinal data help identify novel genetic signals for obesity

We obtained BMI and weight records for up to 177,098 individuals of white–British ancestry with up to 1.48 million measurements in UKBB longitudinal records from general practitioner (GP) and UKBB assessment centre measurements (Table 1 and Supplementary Fig. 3). For each individual, we estimated linear change in BMI or weight over time using a linear mixed-effects (LME) model with random intercepts and random longitudinal gradients (Fig. 1A) within six strata—defined as the pair-wise combinations of two adiposity traits (BMI, weight) with three sex subsets (women-only, men-only, combined sexes). We sought replication of genetic findings in two external cohorts with longitudinal EHR data—MVP (N = 437,703) and EstBB (N = 127,769)—whose demographic and obesity trait characteristics are distinct from UKBB. Individuals in MVP are predominantly male (92.4%) and on average 3.5 units of BMI heavier than male participants in the UKBB; on the other hand, participants in EstBB are of similar BMI to those in the UKBB, but are on average 6–8 years younger than their UKBB counterparts (Supplementary Data 23).

Table 1.

Characterisation of obesity trait data in longitudinal records curated from UK Biobank assessment centre visits and linked general practitioner (GP) records

Trait Sex Number of individuals Number of obs. Mean number of repeat obs. (SD) Mean length of follow-up, years (SD) Mean age at first obs., years (SD) Median trait value at first obs. (IQR)
BMI, kg/m2 F 88,243 (54.4%) 696,984 7.90 (7.34) 13.7 (6.63) 48.6 (9.68) 24.6 (22.2, 27.9)
BMI, kg/m2 M 73,965 (45.6%) 581,161 7.86 (7.12) 12.8 (6.55) 50.1 (9.59) 26.1 (24.0, 28.7)
Weight, kg F 96,625 (54.6%) 816,885 8.45 (8.33) 13.9 (6.62) 48.3 (9.63) 65.0 (59.0, 74.0)
Weight, kg M 80,473 (45.4%) 666,258 8.28 (7.82) 12.9 (6.57) 50.0 (9.57) 81.6 (73.8, 90.0)

BMI  body mass index, obs.  observation, S.D. standard deviation, I.Q.R.  inter-quartile range.

Fig. 1. Modelling of longitudinal obesity trait trajectories.

Fig. 1

A Weight trajectories over time, measured as years from the first measurement, in a random sample of 12 individuals in the sex-combined strata. Black points display observed weight records, with blue and pink lines representing predicted fits from linear mixed-effects models and regularised high-dimensional spline models respectively. B Trajectories of cluster centroids, plotted as standardised (std.) and covariate-adjusted (adj.) weight over time (years from first measurement), for the four clusters determined via partitioning-around-medoids (PAM) clustering with a customised distance matrix (see Methods) constructed from the high-dimensional B-spline coefficients estimated in A. C Weight trajectories over time for a random sample of individuals in the 99th percentile probability of belonging to each cluster, as determined by parametric bootstrap. The lines display predicted fits and ribbons represent 95% confidence intervals around the mean fit.

We first investigated whether the individual-level random-intercept terms outputted by the longitudinal LME model, by sharing information across multiple BMI measurements, provided higher statistical power for GWAS than one based on a single, cross-sectional BMI measurement per individual. Despite our GWAS being 4-fold smaller than the largest published analyses46, we identify 14 novel loci and refine 53 previously described signals for obesity traits among the 374 unique fine-mapped lead single-nucleotide polymorphisms (SNPs) (P < 5 × 10−8) across all strata (Fig. 2A, Supplementary Fig. 13, and Supplementary Data 2), see Methods for conditional analysis to classify novel, refined, and reported SNPs47). The 53 refined SNPs are conditionally independent of and represent stronger associations (P < 0.05) than published SNPs in this population. Together, the refined and novel SNPs explain 0.33% of variance in baseline BMI (in addition to the 2.7% explained by previously published SNPs), and 0.83% of variance in baseline weight (in addition to the 4.7% explained by previously reported SNPs) (Fig. 2B). We further quantified the power gained from estimating baseline BMI over repeat longitudinal measurements per individual by comparing genome-wide significant (GWS) SNPs from our baseline BMI GWAS to the largest published BMI meta-analysis to date46. We observe an increase in median chi-squared statistics of GWS SNPs from either study of between 13.4% (females) to 14.8% (males) in our GWAS over what would be expected from a cross-sectional GWAS of equivalent sample size.

Fig. 2. Genome-wide novel and refined SNP associations with baseline obesity estimated over the measurement window for each individual.

Fig. 2

A Combined Manhattan plot displaying genome-wide SNP associations estimated using linear mixed-model tests in BOLT-LMM84 with obesity trait (BMI or weight) across female, male, and sex-combined analysis strata. Each point represents an SNP, with genome-wide significant (GWS) SNPs (P < 5 × 10−8) coloured in green for previously published obesity associations, blue for SNPs in LD (r2 > 0.1) with published associations, yellow for refined SNPs that represent conditionally independent (Pconditional < 0.05) and stronger associations with baseline obesity than published SNPs in the region, and pink for novel associations (see Methods47). Novel SNPs are annotated to their nearest gene. B Proportion of variance in baseline BMI and weight that can be explained by the fine-mapped independent lead SNPs in each strata. In green is the proportion of variance explained by previously published obesity-associated variants (and those in LD with these variants), while that explained by novel and refined variants is in pink. The numbers represent the number of lead SNPs in each of these categories (published/refined and novel).

Nine of the 14 novel SNPs replicate at P < 3.6 × 10−3 (family-wise error rate (FWER) controlled at 5% across 14 tests using the Bonferroni method) in at least one of (1) baseline obesity estimated with LME model intercepts in up to 437,703 individuals the MVP cohort, (2) baseline obesity estimated with LME model intercepts in up to 125,209 individuals the EstBB cohort, or (3) UKBB assessment centre measurements of cross-sectional obesity in up to 230,861 individuals not included in the discovery GWAS (Supplementary Data 3). These include rs6769383, whose nearest gene EDEM1 is involved in carbohydrate metabolism48, rs2861761, whose nearest gene TENM2 is enriched in white adipocytes49, rs11156978 whose nearest gene CHD8 is associated with impaired glucose tolerance in mouse knockouts50, and rs7962636, whose nearest gene MED13L is a transcriptional regulator of white adipocyte differentiation51. We also replicate in MVP the male-specific BMI association of rs79586444, whose nearest gene, DUSP26, is associated with decreased high-density lipoprotein (HDL) cholesterol in mouse knockouts52.

Intra-individual variance is another longitudinal metric of interest, however we (Supplementary Fig. 15) and others28 find no genetic variants associated with intra-individual variance in weight over time. While the intra-individual mean and baseline trait modelled from LME are phenotypically (R2 > 0.95) and genetically highly correlated (R2 > 0.99) (Supplementary Fig. 17), the LME intercept appears better powered for genetic association testing than the average trait, as we discover up to 1.2× more GWS variants associated with the former (Supplementary Data 20).

Ascertainment bias in our discovery cohort could arise from the over-representation of heavier participants in EHR data (Supplementary Data 4)53. On average, women with ten or more weight measurements are 8.3 kg (3.7 units of BMI) heavier than their counterparts with 1–3 measurements; for men, this is an 8.2 kg (3.1 units of BMI) difference. However, the BMI-intercept metric from our longitudinal data is genetically perfectly correlated with the un-ascertained cross-sectional BMI in Genetic Investigation of ANthropometric Traits (GIANT) 201946 (rG = 1 and P < 1 × 10−16 in all strata), and 96% of the GWS associations (P < 5 × 10−8) identified in our GWAS have either been reported, or are correlated with reported obesity-associated SNPs in the GWAS Catalog54 (Supplementary Data 1).

APOE variant associated with weight loss over time, independent of baseline obesity

To identify genetic variants that affect change in adiposity over time, we performed GWASs for patterns of BMI and weight change adjusted for baseline measurements, defined in two ways. First, we created a linear phenotype from subject-specific random gradients, estimated within the LME model framework. Second, to capture non-linear patterns of temporal change, we modelled longitudinal variation in obesity traits using a regularised high-dimensional B-spline basis31 (Fig. 1). Within each of the six strata, we identified four clusters of individuals using k-medoids clustering55,56, representing high gain (k1), moderate gain (k2), stable (k3), and loss (k4) trajectories, and estimated each individual’s probability of belonging to a cluster based on their posterior non-linear obesity trait trajectory (Fig. 1 and Supplementary Fig. 5). We performed GWASs on the linear slope-change phenotype and on individuals’ logit-transformed posterior probabilities of membership in the high gain cluster (k1), high and moderate gain clusters (k1 and k2), or all but the loss cluster (k1, k2, and k3). All analyses were adjusted for baseline obesity trait and confounders, including length of follow-up and number of follow-up measures, to mitigate survivor bias.

A common missense variant in APOE (rs429358) is associated with decrease in both BMI and weight over time, and lower posterior probabilities of gain-cluster membership in all analysis strata (Table 2). Each copy of the minor C allele of rs429358 (minor allele frequency (MAF) = 0.16) is associated with 0.060 standard deviation (SD) decrease (95% confidence interval (CI) = 0.050–0.069, P = 8.6 × 10−35) in expected BMI slope over time and 0.063 SD decrease (0.054–0.072, P = 6.0 × 10−42) in expected weight slope over time (Fig. 3A). Independent of baseline obesity, carriers of the minor C allele of rs429358 are at lower odds of membership in the high-gain BMI and weight clusters (odds ratio (OR) = 0.976, 95% CI = 0.97–0.98, P < 4.9 × 10−19), lowering the membership posterior probability from 40% to 39% on average (Fig. 3B). Although the minor allele of rs429358 is also associated with lower baseline BMI (β = 0.015 SD lower BMI-intercept, 95% CI = 0.0054–0.024) and weight (β = 0.011 SD lower weight intercept, 95% CI = 0.0029–0.020), these associations do not reach GWS (P > 0.002).

Table 2.

Lead SNPs identified from genome-wide association studies (GWAS) of posterior probability of membership in an adiposity-change cluster (high gain k1, high/moderate gain k1/k2, or high/moderate gain and steady k1/k2/k3), independent of baseline obesity

Trait SNP chr:pos (hg37) MAF Nearest TSS Adiposity-change cluster Sex-combined OR (95% CI) Sex-combined P value Female OR (95% CI) Female P value Male OR (95% CI) Male P value Sex-heterogeneity P value
Weight rs9467663 6:26021456 0.418 H4C1 k1 1.01 (1.01–1.01) 1.6e-09 1.01 (1.01–1.02) 4e-06 1.01 (1–1.01) 0.00081 0.785
BMI chr6:26076446 6:26076446 0.437 HFE k1 1.01 (1.01–1.02) 2.1e-09 1.01 (1.01–1.02) 7.1e-06 1.01 (1.01–1.02) 2.7e-05 0.478
BMI rs11778922* 8:20493349 0.379 LZTS1 k1 0.99 (0.987–0.994) 2.1e-06 0.984 (0.979–0.99) 1.3e-08 0.998 (0.992–1) 0.41 0.000578
BMI rs61955499* 13:112565161 0.012 SOX1 k1, k2, or k3 0.978 (0.961–0.995) 0.012 0.935 (0.913–0.958) 3.4e-08 1 (0.978–1.03) 0.78 4.71e-05
Weight rs12953815* 18:2586907 0.496 NDC80 k1, k2, or k3 1 (1–1.01) 0.02 1 (0.995–1) 0.97 1.01 (1.01–1.02) 1.7e-08 2.03e-05
BMI rs429358 19:45411941 0.156 APOE k1 1.02 (1.02–1.03) 4.9e-19 1.03 (1.02–1.03) 3e-12 1.02 (1.01–1.03) 5.2e-08 0.764
BMI  rs429358  19:45411941  0.156  APOE k1 or k2 1.02 (1.02–1.03) 1.1e-17 1.02 (1.02–1.03) 2.6e-11 1.02 (1.01–1.03) 5.4e-08 0.689
BMI  rs429358  19:45411941  0.156  APOE k1, k2, or k3 1.02 (1.02–1.03) 2.5e-16 1.02 (1.02–1.03) 9.3e-11 1.02 (1.01–1.03) 2.5e-07 0.714
Weight  rs429358  19:45411941  0.156  APOE k1 1.02 (1.02–1.03) 1.8e-20 1.02 (1.02–1.03) 1.5e-11 1.02 (1.01–1.03) 8.9e-08 0.807
Weight  rs429358  19:45411941  0.156  APOE k1 or k2 1.02 (1.02–1.03) 2.7e-19 1.02 (1.02–1.03) 6.6e-12 1.02 (1.01–1.02) 6.8e-07 0.886
Weight  rs429358  19:45411941  0.156  APOE k1, k2, or k3 1.02 (1.02–1.03) 1.8e-17 1.02 (1.02–1.03) 1e-10 1.02 (1.01–1.02) 4.3e-06 0.875

Variants marked with * have 5 × 10−8 > P value > 1 × 10−8. MAF  minor allele frequency (European-ancestry), TSS  transcription start site, SE  standard error, OR odds ratio, CI  confidence interval.

Fig. 3. Association of minor C allele of rs429358, missense variant in APOE, with various longitudinal phenotypes.

Fig. 3

A Mean effect size (beta) and 95% CI for associations of rs429358 with BMI and weight intercepts or linear slope change over time estimated from GWAS in all analysis strata (BMI N = 87,908 females and 73,656 males; weight N = 96,264 females and 80,144 males). B Left: mean OR and 95% CI estimated from GWAS for association of rs429358 with posterior probability of membership in the BMI and weight high-gain clusters (k1). BMI N = 87,908 females and 73,656 males; weight N = 96,264 females and 80,144 males. Right: modelled trajectories of standardised (std.) covariate-adjusted (adj.) BMI in carriers of the different rs429358 genotypes. C Proportion of individuals who self-report weight gain, weight loss, or no change in weight over the past year for carriers of each rs429358 genotype. D Mean effect size and 95% CI for associations of rs429358 with slopes over time of waist circumference (WC) (N = 22,680 females and 21,474 males), WC adjusted for BMI (WCadjBMI) (N = 22,591 females and 21,379 males), waist-to-hip ratio (WHR) (N = 22,677 females and 21,474 males), and WHRadjBMI (N = 22,589 females and 21,379 males), estimated from linear mixed-effects models in individuals held-out of discovery analyses (see Supplementary Data 6 for effect estimates and P values). E Mean effect size and 95% CI for associations of rs429358 with linear slope change in quantitative biomarkers over time, estimated from linear mixed-effects models (N between 52,462–146,098 for different biomarkers, see Supplementary Data 8 for details). Across all panels, estimates of trait change are adjusted for baseline trait values, and P values for significance are controlled at 5% across number of tests performed via the Bonferroni method. n.s. non-significant.

The association of rs429358 with adiposity-change phenotypes was replicated at P < 1.39 × 10−3 (FWER controlled at 5% across six variants and six traits tested) in: (1) up to 437,703 individuals in the MVP cohort, (2) up to 125,209 individuals in the EstBB, and (3) up to 17,035 individuals in UKBB with multiple measurements of weight and BMI at repeat assessment centre visits who were excluded from the discovery analyses (Fig. 4 and Supplementary Data 5). Further, based on 301,943 UKBB participants who were not included in the discovery GWASs, and who reported weight change in the last year as “gain”, “about the same”, or “loss”, we found that carriers of each additional copy of the minor C allele of rs429358 are at 0.956 (95% CI = 0.94–0.97) lower odds of being in a higher ordinal weight-gain category, independent of their BMI (Fig. 3C and Supplementary Data 6). We observe consistent effect direction of the rs429358 association with both estimated and self-reported weight loss over time in individuals who self-identify as Asian (maximum N = 8324 individuals), Black (6796), mixed (2681), white not in the white–British ancestry subset (47,174), and other (3994) ethnicities (see Methods for ancestral group definitions, Supplementary Fig. 1 and Supplementary Data 7).

Fig. 4. Effect sizes of rs429358 on BMI-change phenotypes in discovery (UK Biobank (UKBB)) and replication (Million Veterans Program (MVP) and Estonian Biobank (EstBB)) datasets.

Fig. 4

A Mean effect size (beta) and 95% CI for associations of rs429358 with BMI linear slope change over time estimated from linear mixed-effects models (u1) GWAS in all analysis strata (see Supplementary Data 5 for effect estimates and P values). B Mean OR and 95% CI for association of all obesity-change lead variants with posterior probability of membership in the BMI high-gain cluster (k1), high or moderate gain clusters (k1 + k2), or all but loss clusters (k1 + k2 + k3). Across all panels, UKBB N = 162,208, MVP N = 437,703, EstBB N = 127,760; see Supplementary Data 23 for sex-stratified sample sizes. All estimates of trait change are adjusted for baseline trait values, and P values for significance are controlled at 5% across number of tests performed via the Bonferroni method. n.s. non-significant.

Finally, we tested for the effect of rs429358 on change in abdominal adiposity in up to 44,154 individuals of white–British ancestry in UKBB who were not in the discovery set, with repeated assessment centre measurements of waist circumference (WC) and waist-to-hip ratio (WHR). Each copy of the C allele is associated with 0.040 SD decrease (95% CI = 0.021-0.049, P = 2.3 × 10−5) in expected WC slope over time and 0.031 SD decrease (0.012–0.050, P = 1.1 × 10−3) in expected WHR slope over time, independent of baseline values (Fig. 3D and Supplementary Data 6). While the effect direction remains consistent, these associations are no longer significant upon adjustment for BMI (all P > 0.1), suggesting that the observed loss in abdominal adiposity over time may represent a reduction in overall adiposity.

We additionally performed a longitudinal phenome-wide scan to test for the association of rs429358 with changes in 45 quantitative biomarkers obtained from the UKBB-linked primary care records. Each copy of the C allele is associated with an increase in expected slope change over time of total cholesterol (β = 0.030 SD increase, P = 6.4 × 10−12), C-reactive protein (CRP) (β = 0.026, P = 9.6 × 10−7), and HDL cholesterol (β = 0.022, P = 1.0 × 10−5), but a decrease in expected slope change over time of triglycerides (β = − 0.027, P = 2.7 × 10−7), potassium (β = − 0.023, P = 3.9 × 10−6), lymphocytes (β = − 0.020, P = 4.0 × 10−5), and haemoglobin concentration (β = − 0.016, P = 1.0 × 10−3) (FWER controlled at 5% across 45 tests via the Bonferroni method) (Fig. 3E and Supplementary Data 8).

The APOE locus is a highly pleiotropic region that is associated with lipid levels57,58, Alzheimer’s disease59,60, and lifespan61,62, among other traits63, both in the UKBB (Supplementary Fig. 14) and elsewhere. Excluding the 242 individuals with diagnoses of dementia or Alzheimer’s disease in our replication datasets did not alter associations of rs429358 with any of the longitudinal obesity traits (Supplementary Fig. 2), indicating that they are unlikely to be driven solely by weight loss that accompanies dementia. Despite the association of rs429358 with lifespan, we found no association between this variant and follow-up metrics in our study (Supplementary Data 22); we also found no significant difference in the effect of this variant on adiposity change from two sets of models: (1) without including age and related covariates, i.e., follow-up metrics and year of birth, and (2) with these covariates (heterogeneity P value Phet > 0.05) (Supplementary Fig. 16). Finally, we observe no associations between 135 of 138 published lifespan-associated genetic variants and our adiposity-change phenotypes at P < 3.6 × 10−4 (FWER controlled at 5% across 138 tests via the Bonferroni method). Of the three SNPs associated with both weight change and lifespan, two (rs429358 and rs7412) are variants in the APOE gene, and rs1085251 is a known obesity association in the FTO locus (Supplementary Data 16).

Genome-wide architecture of change in adiposity over time is distinct from baseline adiposity

We identify six independent genetic loci associated with distinct longitudinal trajectories of obesity traits (Table 2). This included the APOE locus above and five signals in intergenic regions. rs9467663 (OR = 1.011 for membership in the high-gain weight cluster, P = 1.6 × 10−9) and chr6:26076446 (OR = 1.012 for membership in the high-gain BMI cluster, P = 2.1 × 10−9), are reported associations with haematological traits64. We identify two SNPs, rs11778922 and rs61955499, with female-specific effects on BMI change. rs11778922 (OR = 0.984 for membership in the high-gain BMI cluster, P = 1.3 × 10−8, sex-heterogeneity Psexhet = 5.8 × 10−4, see Methods) has previously been nominally associated with BMI in females46, and rs61955499 (OR = 1.070 for membership in the BMI loss cluster, P = 3.4 × 10−8, Psexhet = 4.7 × 10−5), has previously been nominally associated with low-density lipoprotein (LDL) cholesterol levels65. Finally, rs12953815 is associated with male-specific weight change (OR = 1.012 for membership in the weight loss cluster, P = 1.7 × 10−8, Psexhet = 2.0 × 10−5) and has been previously nominally associated with lung function66.

Other than rs429358, none of the lead variants for adiposity change replicated in either MVP or EstBB at P > 1.39 × 10−3 (FWER controlled at 5% across 6 variants via the Bonferroni method) (Supplementary Data 5). However, we were only sufficiently powered to replicate the effects of three of these in MVP (rs9467663, chr6:26076446, and the male-specific variant rs12953815), and none in EstBB, as replication at 80% power required sample sizes of between 116,000 to 234,000 individuals with repeat measurements of BMI (Supplementary Data 25).

While all lead variants in the discovery GWASs remain significant at P < 5 × 10−7 in GWASs that are not adjusted for follow-up metrics, we discover three variants in the FTO locus that are associated with BMI or weight gain only in analyses that are unadjusted for follow-up metrics (Supplementary Data 21). These associations may reflect genetic contributions to baseline weight rather than weight change, as FTO is among the strongest known loci for obesity, and follow-up metrics are strongly positively correlated with baseline obesity (Supplementary Data 4).

The smaller number of independent GWS associations with adiposity change: six, compared to 374 unique lead SNPs associated with baseline obesity traits, is expected given the 7- to 9-fold lower heritability of adiposity change. The heritability explained by genotyped SNPs (hG2)67 of the posterior probability of belonging to an adiposity-gain cluster is between 1.38% (standard error (SE) = 0.53) in men to 2.82% (0.59) in women, while the hG2 of baseline obesity traits varies between 21.6% (1.09) to 29.0% (1.72) across strata (Fig. 5). Furthermore, we observe that the heritability of BMI and weight trajectories are higher in women than in men (2.89% (0.56) vs 1.05% (0.59) for BMI slopes, Psexhet = 0.012; and 3.42% (0.53) vs 1.69% (0.52) for weight slopes, Psexhet = 9.9 × 10−3). Similarly, we estimate the heritability of BMI slopes in the EstBB to be higher in women (2.15% (0.56) in women vs 1.80% (0.98) in men); however, these values are low and must be interpreted with caution. We do not observe a corresponding difference in the hG2 of baseline BMI or weight between the sexes (Psexhet > 0.1). Finally, baseline and change in obesity traits are genetically correlated, with rG ranging from 0.35 (95% CI = 0.24–0.45) for weight in women to 0.91 (0.59–1.23) for BMI in men (Fig. 5). As expected given their positive correlation, we observe inflation of the χ2 statistics for adiposity-change slope associations amongst lead variants for baseline adiposity (Supplementary Fig. 19). While the genetic correlation between baseline adiposity and adiposity change appears to be higher in men as compared to women, these estimates have wide CIs (overlapping 1) and Psexhet > 0.05 for both BMI and weight.

Fig. 5. Genotyped SNP-based heritability of, and genetic correlation between, baseline obesity trait and obesity-change phenotypes.

Fig. 5

Left column: heritability (hG2) estimate means and 95% CI, calculated using the LDSC software67 on a subset of 1 million HapMap3 SNPs133 for the following traits: baseline BMI and weight, estimated from intercepts of linear mixed-effects models of obesity traits over time (u0), linear slope change in obesity traits over time (u1 adj. u0), adjusted for intercepts, and posterior probability of membership in a high-gain BMI or weight cluster, adjusted for baseline trait value (prob(k1) adj. u0). Right column: Genetic correlation, rG means and 95% CI between the obesity-change and baseline obesity. In all panels, summary statistics for correlations and heritability are derived from discovery studies with sample sizes for: BMI = 87,908 females and 73,656 males; weight = 96,264 females and 80,144 males. Circles represent BMI, triangles represent weight; points are coloured by analysis strata (pink: female-sepcific, green: male-specific, grey: sex-combined). P values display the level of significance of heterogeneity between the female- and male-specific estimates in each panel.

Throughout this study, we evaluate both BMI and weight as obesity traits, and expect these to track closely in adults as height does not change significantly over time. In the 161,891 individuals in our discovery strata with multiple measurements of both BMI and weight, there is a strong correlation between the slopes for weight and BMI change (r2 = 0.88) and between the posterior probabilities of membership in the BMI-gain and weight-gain clusters (r2 = 0.73) (Supplementary Data 9, all P < 1 × 10−16). Moreover, the genetic correlation between change in BMI and weight is nearly perfect (rG for slope terms = 0.98, rG for posterior probability of membership in gain cluster = 0.95, all P < 1 × 10−16), indicating that the genetic architecture highlighted here is robust to the metric of adiposity used to define trajectories.

Discussion

In this large-scale EHR- and genetics-based study of longitudinal trajectories of obesity traits, we demonstrate that modelling multiple observations across time increases power to identify genome-wide signals for baseline BMI and weight and enables the discovery of genetic variants associated with changes in adiposity, which are less heritable than and only partially shared with baseline adiposity. Modelling ~1.5 million observations of BMI and weight from >170,000 individuals in the UKBB, enabled us to identify 14 novel, biologically plausible, genetic signals associated with obesity traits. The discovery of these novel loci highlights that repeat measurements can contribute to narrowing the “missing heritability” gap. Leveraging the bespoke longitudinal adiposity phenotypes developed here, we find six genetic loci associated with changes in BMI and weight over time, including a missense variant in APOE that replicates in two external cohorts in the United States and Estonia. While previous studies have investigated the associations of cross-sectional BMI SNPs or obesity polygenic scores with adiposity trajectories15,68, to the best of our knowledge, this study reports the first genome-wide scan of variants associated with obesity trait trajectories over adulthood.

Accounting for the influence of genetic variation on adiposity change may provide opportunities to personalise obesity prevention and treatment69,70. While several studies have investigated the association between BMI-related genetic variants and weight loss guided by medical70, surgical71,72, dietary73, or behavioural70,7476 interventions, results are inconsistent across studies, intervention types, and genes assessed. Given our evidence that the genetic basis of adiposity change is distinct from baseline levels, we hypothesise that genetic variants associated with longitudinal weight trajectories may be better predictors of long-term weight change following treatment or lifestyle interventions than variants associated with baseline BMI. Moreover, incorporating information on the genetic signals associated with adiposity trajectories will complement current genetics-based strategies to identify genes for pharmaceutical targets77 for obesity treatment.

Previous studies have estimated continuity in the genetic correlation of BMI measured at different ages78, which is theorised to emerge by two possible mechanisms79: (1) common genetic (or environmental) factors are associated with the rates of change in BMI over time, which we test in this study, and (2) that these correlations are induced by time-specific genetic (or environmental) factors in an autoregressive manner, i.e., BMI genetics at time-point t−1 causally affect BMI at time t. Studies testing the latter hypothesis have arrived at opposing conclusions: Gillespie et al.80 find that on a genome-wide scale, age-specific genetic effects in an autoregressive framework do not explain differences in BMI heritability across ages 40–73 years, while Winkler et al.79 did identify 15 genetic loci with differential effects on BMI in younger adults (age <50 years) and older adults (age >50 years). Both studies were pseudo-longitudinal, i.e., the same individuals were not monitored over a period of time, but rather cross-sectional individual data was grouped into age bins. Our work tests a distinct hypothesis and is also, to our knowledge, the first to perform a truly longitudinal genetic study with repeated measures in this age group.

Leveraging EHR to derive longitudinal metrics for genetic discovery may be affected by various biases described earlier81. We attempted to mitigate these biases in three ways: (1) While EHR data over-represent sick patients and individuals with higher BMI, UKBB participants are, on average, healthier and have lower BMIs than the population of the UK82. Therefore, our UKBB-linked EHR discovery cohort is more overweight than a random sampling of UKBB, but in contrast, UKBB as a whole is ascertained towards lower BMI individuals than a random sampling of the UK. (2) Appending the more accurate UKBB assessment center measurements to the EHR data improves overall data quality. (3) Stringent quality control at both the population and individual increases the signal-to-noise ratio by filtering out a subset of inaccurate data entries. Although we were powered to replicate four of the six UKBB-identified variants for adiposity-change in the MVP cohort, only one replicated; the lack of signal for other variants may imply these are false positive results. However, it is also important to consider the differences in the demographic and obesity-related characteristics between these cohorts, as participants in the MVP are much more likely to have cardiovascular disease and be overweight44 compared to those in UKBB; and assigning individuals in the former cohort to adiposity trajectory clusters from the latter may distort the phenotypes. Nevertheless, a majority of the baseline adiposity variants in our discovery GWASs as well as the rs429358 variant for adiposity-change replicate across the UKBB, MVP, and EstBB, suggesting that linking EHRs with biobank data may provide a robust framework for genetic discovery.

The two-stage nature of our approach to associate genetic variants with longitudinal trajectories of obesity traits is highly advantageous because of its computational efficiency and convenience. In particular, our method is composable, as the longitudinal analysis of raw data can first be performed separately using a choice of popular, efficient implementations of models; the first-stage outputs can then be taken forward to a GWAS performed in its own bespoke, highly optimised software. The two-stage method approximates the fitting of a full joint model incorporating raw measurement data and genome-wide SNP data. While a full joint model would propagate posterior uncertainty from the longitudinal sub-model through to the GWAS, the approximation here takes forward a single point estimate, i.e. a best linear unbiased predictor (BLUP) or posterior probability of cluster membership, to GWAS. However, in EHR datasets, the number of measurements, and hence estimation precision, can vary across individuals. The propagation of uncertainty between model components, in a similar vein to Markov melding83, has the potential to further improve the quality of genetic discovery. An interesting area for future research will be to allow for the principled propagation of posterior uncertainty in traits through the highly optimised, multi-locus, mixed-model GWAS methods to perform genetic association in the presence of relatedness and population stratification84.

It is also important that the choice of trajectory metric utilised in genetic analysis is phenotype-aware. While the variance within an individual’s trait value over time may capture meaningful biology for biomarkers such as blood pressure or triglycerides, whose fluctuations are associated with disease development and progress85,86, weight is a more stable trait that shows a steady pattern of change over many years87,88. Our adiposity-change metrics, derived from regression models incorporating linear and non-linear temporal trends, are better suited to identify the genetic component of BMI and weight trajectories, and are robust to the manner in which this is defined. For example, despite self-report being an imprecise metric89, lead SNPs from our obesity-change GWASs are also associated with self-reported weight change. However, our results indicate the relative difficulty of identifying genetic associations with longitudinal changes in obesity traits, compared with identifying loci associated with cross-sectional BMI. Variants associated with cross-sectional BMI must have had a causal impact on expected longitudinal BMI at some periods in individuals’ lifespans; i.e. a cross-sectional BMI phenotype captures the cumulative longitudinal effects of each BMI-associated genotype up to the age at which the individual is measured. In contrast, our derived measures of longitudinal change target the rate of change of BMI over a shorter average time period, and the magnitude of the genetic signal thus tends to be smaller in the longitudinal analysis compared to the cross-sectional one. This means that the weaker longitudinal genetic signal can be obscured by the non-genetic contribution from individuals’ short-and long-term environment, whilst the stronger cross-sectional genetic signal may be detected with higher power as the signal-to-noise ratio is larger. More broadly, there are several factors that might affect the relative power to detect longitudinal effects such as sample size, typically being smaller in longitudinal studies; the longer and more frequent the typical follow-up is in a longitudinal study, the greater the power, and the particular statistical methods used to estimate cross-sectional versus longitudinal traits can affect the accuracy and precision of estimates, and hence the strength of genetic signal detected.

The SNP rs429358 (missense variant in APOE) is robustly associated with loss in BMI and weight, independent of baseline obesity, across men and women, across three global cohorts of European ancestry. APOE codes for apolipoprotein E, which is a core component of plasma lipoproteins that is essential for cholesterol transport and homoeostasis in several tissues across the body, including the central nervous system, muscle, heart, liver, and adipose tissue90,91. The precise pathway by which this variant affects weight change is difficult to pinpoint, as APOE is a highly pleiotropic locus associated with hundreds of biomarkers and diseases63. Here as well, we find associations between rs429358 and 11 biomarker trajectories. Obesity is cross-sectionally associated with several of these, including levels of triglycerides and cholesterol92,93, markers of chronic inflammation94, and haematological traits95. Some of the effects of rs429358 are discordant with previously reported phenotypic correlations between obesity and these biomarkers, however, the causal longitudinal and pleiotropic nature of these associations remain to be established. As rs429358 is also the strongest genetic risk factor for Alzheimer’s disease59,60, which is preceded by weight loss96, we ensured that our findings were robust to the exclusion of individuals with dementia. As longevity may confound the APOE-weight loss association61,62, we adjusted analyses for the length of follow-up in EHR to mitigate against survivor bias; however, we also present age-unadjusted analyses and demonstrate that other lifespan-associated variants are not associated with adiposity change in our GWASs. We thus hypothesise that the APOE effect on weight loss may act through cholesterol- and lipid-metabolism pathways that partly determine response to dietary and environmental factors, as seen in mouse models97,98. Indeed, it has recently been suggested that APOE-mediated cholesterol dysregulation in the brain may influence the onset and severity of Alzheimer’s disease99, suggesting that ageing-associated systemic aberrations in cholesterol homoeostasis could have far-ranging consequences, from weight loss to cognitive decline.

Patterns of weight change in mid-to-late adulthood have been observed to be sex-specific, particularly as women undergo significant changes in weight and body fat distribution around menopause100. Here, we find that the heritability of changes in obesity traits is higher in women than in men, supporting a previous finding that obesity polygenic scores are more strongly associated with weight change trajectories in women than in men68. This is in contrast to baseline obesity, which is equally heritable in men and women, both in our study and as previously reported46. The lower genetic correlation between baseline obesity and obesity-change in women as compared to men, while not statistically significant, may nevertheless indicate sex-differential genome-wide contributions to these phenotypes. We hypothesise that sex hormones could explain some of this sex-specificity, particularly through their role in altering overall obesity and fat distribution around menopause101,102. We were underpowered to study the genome-wide architecture of change in adult WC and WHR (10-fold fewer observations than BMI and weight), whose cross-sectional levels are genetically sex-specific with higher heritability in women46, so more work is needed to disentangle the genetic contribution to changes in adult body fat distribution over time.

While the EHR-linked UKBB cohort has driven genetic discovery for a vast array of human traits in populations of European ancestry103, sample sizes remain under-powered to detect genome-wide associations in other ancestral groups. We were thus limited to replicating European-ancestry associations in other populations, without the ability to discover ancestry-specific variants associated with adult adiposity trajectories. Furthermore, despite the inclusion of >200,000 individuals in the UKBB EHR data, sample sizes remain low to analyse the genetics of longitudinal trajectory metrics, which have lower heritability than the averaged trait value15,104 (~7–9x lower in our study) and are thus more challenging to characterise genetically without corresponding increases in sample size. Another limitation of our study was the exclusion of time-varying covariates, such as medication use, smoking status, and other dietary and environmental covariates from models of adiposity change. It is challenging to extract time-dependent values of these variables from EHRs and difficult to ascertain the direction of causality by which these covariates may be associated with weight change. For example, the use of statins to lower blood pressure may be connected to weight gain, mediated indirectly by change in appetite105, but high blood pressure may itself be a consequence of weight gain106. Inappropriate adjustments along this causal pathway may lead to unexpected collider biases107. In general, despite their longitudinal nature, it is challenging to assign causality to the associations between weight change and covariates or disease diagnoses from EHR observations alone, as there is no prospective study design to follow108. Advances in emulating randomised control trials from longitudinal EHR are beginning to overcome these challenges109,110, and in the future, it will be critical to incorporate information on genetic risk into these simulated studies.

To the best of our knowledge, this is the largest study to date that characterises the genome-wide architecture of adult adiposity trajectories, and the first to identify specific variants that alter BMI and weight in mid- to late-adulthood. We add evidence to support the growing utility of EHRs in genetics research, and particularly highlight opportunities for incorporating longitudinal information to boost power and identify novel associations. In particular, the APOE-associated weight loss identified here contributes to a growing body of evidence on the ageing-associated effects of cholesterol dysregulation. Heterogeneity between men and women in the genome-wide architecture of obesity-change and genetic correlation with baseline obesity highlights the importance of distinguishing between the genetic contributions to mean and lifetime trajectories of phenotypes in sex-specific analyses. In the future, the growing integration of EHR with genetic data in large biobanks will allow us to assess the time-varying associations of rare variants with outsize effects on quantitative traits, as well as to establish genetic and phenotypic relationships among the trajectories of multiple correlated biomarkers across adulthood.

Methods

Identification and quality control of longitudinal obesity records

UK Biobank

This study was conducted using the UKBB resource, which is a prospective UK-based cohort study with approximately 500,000 participants aged 40–69 years at recruitment, on whom a range of medical, environmental, and genetic information has been collected42. Here, we included 409,595 individuals in the white–British ancestry subset identified by Bycroft et al.111 who passed genotype quality control (QC) (see below).

Repeat obesity trait measurements

Obesity-associated traits including BMI and weight were recorded at initial baseline assessment (between 2006 and 2010), as well as at repeat assessments of 20,345 participants (between 2012 and 2013), and at imaging assessments of 52,596 participants (in 2014 and later). We curated a longitudinal research resource by integrating these repeat UKBB assessment centre measurements with the interim release of primary care records provided by GPs for approximately 45% of the UKBB cohort (~230,000 participants, randomly selected)112 (Supplementary Fig. 3). Each individual with at least one BMI record (coded as Clinical Practice Research Datalink (CPRD) code 22K.) or weight record (coded as CPRD code 22A) in the GP data had their respective UKBB assessment centre measurements appended. Following phenotype and genotype QC, this resulted in 162,666 participants of white–British ancestry with multiple BMI measurements and 177,472 participants with multiple weight measurements (Supplementary Fig. 3).

Quality control

We performed both population-level and individual-level longitudinal QC. Participants with codes for history of bariatric surgery (Supplementary Data 10, as identified by Kuan et al.113) were excluded entirely, while BMI and weight observations up to the date of surgery were retained for individuals where this could be determined. Only those measures recorded in adulthood (ages 20–80 years) were retained. We excluded implausible observations, defined as more extreme than ±10% of the UKBB asessment centre minimum and maximum values, respectively (BMI <10.9 kg/m2 or >82.1 kg/m2 and weight <27 kg or >217 kg). We further removed any extreme values >5 SDs away from the population mean to exclude possible technical errors. At the individual-level we excluded multiple observations on the same day, which are likely to be recording errors, by only retaining the observation closest to the individual’s median value of the trait across all time points. Finally, we excluded any extreme measurements on the individual-level. For individual i with Ji data points represented as (measurement, age) pairs (yi,j, ti,j) for j = 1, …, Ji ordered chronologically, i.e., ti,1<<ti,Ji, a “jump” Pi,j for j = 1,…, Ji − 1 was defined as:

Pi,j=log2yi,j+1yi,j/yi,jti,j+1ti,j 1

We removed data points associated with extreme jumps (>3 SDs away from the population mean jump, to exclude possible technical errors) by excluding the observation farther from the individual’s median value of the trait across all time points.

BMI and weight validation data

Participants with BMI and weight observations in UKBB assessment centre measurements who were not included in the interim release of the GP data were held out of discovery analyses (Supplementary Fig. 3). This resulted in 245,447 individuals with at least one BMI observation and 230,861 individuals with at least one weight observation for replication of cross-sectional results. For the replication of longitudinal results, a subset of individuals was used comprising 17,006 individuals with multiple observations of BMI, and 17,035 individuals with multiple observations of weight, from repeat assessment centre visits.

Self-reported weight change data

At each UKBB assessment centre visit, participants were asked the question: “Compared with one year ago, has your weight changed?”, reported as “No—weigh about the same”, “Yes—gained weight”, “Yes—lost weight”, “Do not know”, or “Prefer not to answer”. We coded the 1-yr self-reported weight change response at the first assessment centre visit as an ordinal categorical variable with three levels: “loss”, “no change”, and “gain”, excluding individuals who did not respond or responded with “Do not know” or “Prefer not to answer”. We retained 301,943 individuals of white–British ancestry who were not included in any of the discovery analyses.

Abdominal adiposity data

Similar to the BMI and weight validation datasets, we retained the 44,154 participants with multiple WC and hip circumference (HC) records across repeat assessment centre visits who were not included in the interim release of the GP data, and hence held out of discovery analyses. WHR was calculated at each visit by taking the ratio of WC to HC. We further calculated WC adjusted for BMI (WCadjBMI) and WHR adjusted for BMI (WHRadjBMI) values at each visit for which WC, HC, and BMI were recorded simultaneously by taking the residual of WC and WHR in linear regression models with BMI as the sole predictor.

Models to define baseline adiposity and adiposity change traits

Individual i has Ji data points represented as (measurement, age) pairs (yi,j, ti,j) for j = 1, …, Ji ordered chronologically, i.e. ti,1<<ti,Ji. The following models are all fitted separately in three strata: female-specific, male-specific, and sex-combined.

Intercept and slope traits for GWAS

We implement a two-stage algorithm to estimate and preprocess local intercept and slopes of obesity traits to be taken forward to GWAS in both discovery and validation datasets.

  1. Fit random-slope, random-intercept mixed model with the maximum likelihood estimation procedure in the lme4114 package in R115. We target two quantities: the baseline value of each individual’s clinical trait (the β0 + ui,0 below); and the the linearly approximated rate of change in the trait during each individual’s measurement window (the β1 + ui,1 below):
    yi,j=xiTγ+(β0+ui,0)+(β1+ui,1)(ti,jti,1)+εi,jui,k~N(0,σu,k2),k=0,1εi,j~N(0,σε2), 2
    where individual-specific covariates xi comprise: baseline age, (baseline age)2, data provider, year of birth, and sex. Variance parameters σu,k2 and σε2 are estimated. Fitting model (2) outputs fixed effect model estimates γ^, β^0, β^1 and BLUPs of the random effects u^i,0 and u^i,1.
  2. Linearly adjust and transform the outputted BLUPs. We fit and subtract the linear predictor in each of the linear models:
    u^i,0=xi,0Tγ0+εi,0 3
    u^i,1=xi,1Tγ1+εi,1 4
    where the vector of intercept-adjusting covariates xi,0 in (3) comprise: baseline age, (baseline age)2, sex, year of birth, assessment centre, number of follow-ups, and total length of follow-up (in years). The vector of slope-adjusting covariates xi,1 in (4) comprise the same as xi,0 but additionally include the intercept BLUP u^i,0. The coefficient vectors γ0 and γ1 in (3) and (4) are estimated by least squares and are distinct from the previously estimated γ in (2). We finally apply a deterministic rank-based inverse-normal transformation116 to the residuals from fitting models (3) and (4). For example, the intercept trait for individual i taken forward to GWAS is
    u~i,0=Φ1r(u^i,0xi,0Tγ0^)cN2c+1 5
    where r(u^i,0xi,0Tγ0^) is the rank of the ith residual among all N residuals, the offset c is 0.5, and Φ( ⋅ ) is the cumulative distribution function (CDF) of the standard Gaussian distribution.

The distribution of residuals and BLUPs from the LME models are heavy-tailed relative to a Gaussian (Supplementary Figs. 1012). Such model misspecification could potentially lead to miscalibration of CIs and hypothesis tests based on the standard linear mixed model, although this is likely to be mitigated by the large sample size owing to the central limit theorem. We therefore take forward covariate-adjusted and inverse-normal transformed BLUPs, as described in (5), for genome-wide association testing.

Modelling non-linear trajectories with regularised splines

We model non-linear changes in obesity traits using a regularised B-spline basis of degree 3 (i.e., a cubic spline model) with ndf = 100 degrees of freedom, incorporating ndf − 4 (i.e., ndf − 3[degree] − 1 [intercept]) knots that are spaced evenly across each individual’s first T = 7500 post-baseline days ≈ 20.5 years. It is common practice in semi-parametric regression to use regularised splines with a relatively large number of knots, thereby allowing functional expressiveness without overfitting31,117. Conditional on the spline coefficients, bi, the likelihood for measurements yi (individual i’s Ji-vector of measurements taken at days ti,1,,ti,Ji) is

p(yibi,σ2)=MVN(yiZiXBbi,Iσ2) 6

where: the ndf-vector bi contains the ith individual’s spline basis coefficients; XB is the (T + 1) × ndf matrix of spline basis functions evaluated at days 0, …, T post-baseline; and Zi is a Ji × (T + 1) matrix whose jth row extracts day ti,j − ti,1 post-baseline, i.e.,

[Zi]j,k=1ifk=ti,jti,1+10otherwise.

We specify an order-1 autoregressive (AR(1)) model as a smoothing prior on spline coefficients, bi, which vary smoothly around an individual-specific mean value, μi. On μi we specify a non-informative prior: N(μi0,σμ2) with large SD σμ. The resulting μi-marginalised prior for bi is

p(bi)=MVN(bi0,ΣB)ΣB:=ΣAR(1)+σμ21ΣAR(1)k,k:=σAR(1)2ϕkk, 7

where: ΣAR(1) is the ndf × ndf autocovariance matrix implied by an AR(1) model with lag-1 autocorrelation ϕ0,1 and scale parameter σAR(1)2>0; and 1 is an ndf × ndf matrix of ones.

The prior at (7) and likelihood at (6) are a specific case of the Bayes linear model118, for which the posterior is available in closed form:

p(biyi,ΣB,σ2)=MVN(bimi,σ2Vi)Vi:=XBTZiTZiXB+ΣB11mi:=ViXBTZiTyi. 8

The posterior at (8) can be evaluated separately and in parallel across individuals because the (yi, bi) are conditionally independent across individuals i given the hyperparameters σAR(1)2, ϕ, σμ and σ2. Values of hyperparameters in the smoothing prior are chosen subjectively, via visualisation of randomly selected samples of individual data trajectories, to reflect empirical levels of smoothness: σAR(1)2:=2.5, ϕ ≔ 0.99, σμ ≔ 100 (Supplementary Fig. 4). We additionally compared cluster allocations for 5000 randomly selected individuals across the following settings of hyperparameters: (σAR(1)2:=0.5, ϕ ≔ 0.9, σμ ≔ 10), (σAR(1)2:=2.5, ϕ ≔ 0.99, σμ ≔ 100), and (σAR(1)2:=10, ϕ ≔ 0.999, σμ ≔ 500) (Supplementary Fig. 8).

For each trait separately, we set σ2 to the median of its individual-specific maximum likelihood estimates (MLEs), i.e., σ2:=median{1JiyiZiXBmi22:i=1,,n} where each MLE is calculated from (6) after substituting for bi its maximum a posteriori estimate, mi from (8) (Supplementary Data 12).

The measurements yi inputted into the likelihood for the regularised spline model at (6) are pre-processed by taking the standardised residual from the linear model with the following covariates: baseline age, (baseline age)2, data provider, year of birth, and sex, i.e. from the model yi,j=xiTγ+εi,j fitted across all i = 1, …, N individuals and j = 1, …, Ji time points. Standardisation of residuals then proceeds by subtracting the mean and dividing by the SD of residuals across all individuals and time points.

We focus on individual i’s posterior change from baseline, i.e. on

b~i:=(0,ui,2ui,1,ui,3ui,1,)T 9
Db 10

where the jth row of D is (eje1)T and ek is the kth basis vector, i.e. a column ndf-vector with zeroes everywhere except the kth entry, which is one. To calculate the posterior for b~i we linearly transform the posterior at (8) so that

p(b~iyi,ΣB,σ2)=MVN(b~iDmi,σ2DViDT) 11

with mi and Vi defined at at (8).

Soft clustering of individuals by non-linear adiposity trajectory patterns

See Supplementary Fig. 5 for an overview of the clustering protocol.

Any two individuals typically have quite distinct measurement profiles, with different numbers of measurements taken at ages which may be quite disparate. Therefore the precision with which we can estimate any particular spline coefficient varies across individuals. To incorporate this heteroscedasticity into our clustering framework, we define the following scaled Euclidean distance between each pair of individuals (i,i) in the space of baselined spline basis coefficients:

d(i,i)=k=1ndf([Dmi]k[Dmi]k)2([DViDT]k,k+[DViDT]k,k)σ2 12

where mi and σ2Vi are the posterior mean and covariance of individual i’s spine coefficients bi taken from (8). For each spline coefficient k in (12), the squared difference between individuals’ i and i mean coefficients is standardised by the sum of the corresponding variances.

We perform k-medoids clustering using the partitioning around medoids (PAM) algorithm55,56 as implemented in the pam function in the cluster package119 in R115. We train cluster centroids on a randomly selected subset of 80% of individuals in each analysis strata. We filter individuals in the training set to retain only those with at least L = 2 observations. For a fixed number of clusters, K = 4, we initialise cluster membership according to bins B1:K demarcated by the 0,1K,2K,,1 empirical quantiles of the estimated fold change in obesity trait between baseline and year M = 2:

Bk:=F^1k1K,F^1kKk=1,,KF^():=empirical CDF of[XBDmi]M+1[XBDmi]1:i=1,,Nindividualiin bink[XBDmi]M+1[XBDmi]1Bk. 13

To ensure robustness, we run the clustering algorithm S = 10 times, each on a random sub-sample of size 5000 (without replacement). For each clustering output s = 1, …, S, we calculate the point-wise mean of each cluster’s constituent individuals:

ck,s:=1Ck(s)iCk(s)Dmi 14

For each clustering s, we observe all trajectories cs,1:K to be monotonic and non-overlapping (Supplementary Fig. 6). We can therefore define ordered cluster means c(k),s,

k<k[c(k),s]j>[c(k),s]jj=1,,ndf, 15

and average the kth ordered mean across S clusterings, where the highest-weight cluster mean is given by c(1) and the lowest by c(K):

c(k):=1Ss=1Sc(k),s, 16

with corresponding point-wise SEs. We investigate the sensitivity of the resulting clusters to number of clusters K, filter parameter L (minimum number of measurements), and the cluster initialisation parameter M appearing in (13) via silhouette values120, which evaluate the similarity between cluster members (cohesion) vs others (separation) (Supplementary Fig. 6). We test values of K from 2, …, 8, filtering parameter L ∈ (2, 5, 10), and initialisation parameter M ∈ (1, 2, 5, 10) or random initialisation to choose a combination of parameters that produces dense and separable clusters, i.e. K = 4, L = 2, M = 2. We also qualitatively evaluate cluster centroids across all parameter settings (Supplementary Fig. 7). Finally, we compared cluster allocations over each of the 10 random trains for a set of 5000 randomly sampled individuals held out of the training splits (Supplementary Fig. 9).

Once cluster centroids have been calculated, we define individual i’s soft cluster membership probability of belonging to cluster k as the posterior probability of being closest in Euclidean distance to cluster k’s centroid:

πi,(k):=Ik=argminkb~ic(k)2MVN(b~iDmi,σ2DViDT)db~i 17

where the second term in the integrand is the posterior from (8), and we approximate the integral in (17) using 100 Monte Carlo samples from the posterior.

Finally, we validate the clustering by comparing cluster properties of the randomly selected 80% training set used to define cluster centroids, with the held-out 20% validation set. We assign each individual to the cluster for which they have highest membership probability and compare the proportion of individuals assigned to each cluster, as well as distributions of sex, baseline age, number of follow-up measures, and total length of follow-up of individuals assigned to each cluster. These metrics are similar across training and validation sets in all strata (Supplementary Data 13).

Finally, we take forward bounded logit-transformed cumulative cluster probabilities to GWAS. These outputs are defined as bounded logit(πi,(1)), bounded logit(πi,(1) + πi,(2)), and bounded logit(πi,(1) + πi,(2) + πi,(3)), i.e., the bounded log odds of being in the highest (k1), highest two (k1 or k2), and highest three (k1, k2, or k3) weight clusters respectively. To prevent infinite log odds at π ∈ {0, 1} we defined the following bounded logit transform121:

bounded logit(π)logit(S1)π+0.5Sπ[0,1], 18

where S = 100, the number of Monte Carlo samples from the posterior in approximating (17).

Genome-wide association studies

QC of UK Biobank genotyped and imputed data

Genotyping, initial genotype QC, and imputation on genome build hg19 were performed by UKBB111. We performed post-imputation QC to retain only bi-allelic SNPs with MAF >0.01, info score >0.8, missing call rate < 5%, and Hardy-Weinberg equilibrium (HWE) exact test P > 1 × 10−6. We additionally performed sample QC to exclude individuals with sex chromosome aneuploidies, whose self-reported sex did not match inferred genetic sex, with an excess of third-degree relatives in UKBB, identified as heterozygosity or missingness outliers, excluded from autosome phasing or kinship inference, and any other UKBB recommended exclusions111.

Linear mixed model association analyses for quantitative traits

An overview of the traits carried forward for GWAS is provided in Supplementary Fig. 18. The following association analyses are all performed separately in three strata: female-specific, male-specific, and sex-combined. The intercept and slope traits for GWAS, i.e., u~i,0 and u~i,1 were tested for association with genetic variants, adjusted for the first 21 genetic principal components (PCs) and genotyping array, using the BOLT-LMM software84. We also performed GWAS for the inverse-normal transformed within-individual mean adiposity trait, adjusting for the same covariates described for u~i,0. A similar protocol was followed for the logit-transformed soft clustering probability traits, i.e. πi,1, πi,2, and πi,3 with additional adjustments for baseline trait, baseline age, (baseline age)2, sex, year of birth, assessment centre, number of follow-ups, and total length of follow-up (in years).

Fine-mapping SNP associations

We identified putative causal variants at all GWS loci (defined by merging windows of 1.5 Mb around SNPs with P < 5 × 10−8), using FINEMAP122 to select variants (lead SNPs) with a posterior inclusion probability >95%. Lead SNPs were annotated to the nearest gene transcription start site.

Classifying baseline BMI and weight SNPs as reported, refined, or novel obesity associations

We curated a list of SNPs associated with any of 44 obesity-related traits in the GWAS Catalog54 accessed on 02 Nov 2021, henceforth referred to as published obesity-associated variants (Supplementary Data 1). We then conducted conditional analysis using GCTA-COJO123 for each lead SNP in our GWAS and published obesity-associated variants within 500 kb, classifying variants as reported, refined, or novel based on previously recommended criteria47. Reported SNPs in our study are those whose effects are fully accounted for by published obesity-associated variants within 500 kb. Refined SNPs fulfil all of the following criteria: (1) the refined SNP is correlated (linkage disequilibrium (LD) r2 ≥ 0.1) with at least one published obesity-associated variant within 500 kb, (2) the refined SNP has a significantly stronger effect (P < 0.05 in a two-sample t test for difference in mean effect sizes) on the BMI- or weight-intercept trait than published obesity-associated SNPs and also accounts for the effect of published obesity-associated SNPs in conditional analysis (conditional P > 0.05), and (3) published obesity-associated SNPs cannot fully account for the effect of the refined SNP in conditional analysis (conditional P < 0.05). Finally, a SNP in our study was declared novel if it was not in LD with (r2 < 0.1), and conditionally independent of (conditional P < 0.05), all published obesity-associated variants within 500 kb.

Replication of GWS associations in UK Biobank hold-out sets

BMI and weight intercept-trait genetic associations

We created cross-sectional obesity phenotypes for the 245,447 individuals in the hold-out set for BMI and 230,861 individuals in the hold-out set for weight (Supplementary Fig. 3) by retaining the observed trait value closest to the individual’s median trait value (if multiple observations present). Deterministic rank-based inverse-normal transformation116 was applied to the residual of the obesity trait adjusted for age, age2, year of birth, data provider, and sex. We then tested this trait for association with genetic variants, adjusted for the first 21 genetic PCs and genotyping array, using the BOLT-LMM software84.

BMI and weight slope-trait genetic associations

We created adiposity slope phenotypes for the 17,006 individuals with multiple observations of BMI and 17,035 individuals with multiple observations of weight from repeat assessment centre visits (Supplementary Fig. 3 and Supplementary Data 19) with BLUPs from LMEs models as described in the slope-trait modelling section above. We tested for association of this slope-trait with GWS variants associated with adiposity change in our discovery analyses, adjusted for the first 21 genetic PCs and genotyping array, via the linear regression framework implemented in PLINK124. As PLINK does not account for family structure, we compared each pair of second-degree or closer related individuals (kinship coefficient >0.0884)111 and excluded the individual in the pair having higher genotyping missingness. We repeated the same protocol within each self-identified ethnic group of individuals not of white–British ancestry (Supplementary Data 11).

Genetic associations with BMI and weight cluster probabilities

We fit regularised splines as detailed above to the 17,006 individuals with multiple observations of BMI and 17,035 individuals with multiple observations of weight from repeat assessment centre visits (Supplementary Fig. 3). Soft cluster membership probabilities for these individuals were calculated, and the three logit-transformed πi traits were carried forward for association testing with GWS variants associated with adiposity change in our discovery analyses. As above, we pruned out second-degree or closer related individuals and performed association analysis, adjusted for baseline trait, baseline age, (baseline age)2, assessment centre, first 21 genetic PCs and genotyping array, via the linear regression framework implemented in PLINK124. We repeated the same protocol within each self-identified ethnic group of individuals not of white–British ancestry.

Genetic associations with self-reported weight change

We fit proportional odds logistic regression models implemented in the MASS package125 in R115 to estimate the additive effect of lead SNPs on self-reported one-year weight change coded as an ordinal categorical variable with three levels: “loss”, “no change”, and “gain” in 301,943 individuals (described in the data section above). All models were adjusted for BMI, age, sex, year of birth, data provider, assessment centre, first 21 genetic PCs and genotyping array. We repeated the same protocol within each self-identified ethnic group of individuals not of white–British ancestry.

Replication of GWS associations in external cohorts

Quality control, modelling of adiposity change, and GWAS in external cohorts were all performed exactly as in the UKBB discovery analyses, with any exceptions noted below.

Million Veteran Program

The MVP mega-biobank, with ~950,000 participants enroled to date, is actively recruiting participants from the 6.9 million eligible individuals who make use of the services provided by the Veterans Health Administration (VHA) from around 50 Veterans Affairs (VA) facilities across the United States of America (USA)43. Eligible candidates are registered VHA users who are at least 18 years of age, possess a valid mailing address, and have the ability to provide informed consent. The VA Central Institutional Review Board (IRB) 10-02 protocol gained approval from the VA Central IRB in 2010, and the enrolment of study participants commenced in early 2011. Genetic data for this study was obtained from the custom-genotyped dataset with imputation to the 1000 Genomes project on genome build hg19, and filtered to markers with imputation information score >0.30 with minor allele count >30126. Full characteristics of the MVP cohort43 and associated genetic data126 have been described previously.

Weight, height, and other covariate records were compiled from the MVP Baseline Survey, which collected information on demographics, health status, lifestyle habits, military experience, and physical traits, and supplemented with EHRs. A survey cleaning algorithm was used to process self-reported data, ensuring quality through expert-defined rules, full details of which have been described previously44. Following population-level and individual-level QC of repeat BMI measurements as described above, we retained 404,503 male European-ancestry participants with 20.6 million observations of BMI and 33,200 female European-ancestry participants with 1.94 million observation of BMI.

For each participant, we calculated linear rates of change in BMI over time with the LME models described in (2); we also calculated each individual’s soft cluster membership probability of belonging to clusters whose centroids were defined in the UKBB discovery data (Supplementary Data 24). All analyses were performed in sex-specific and sex-combined strata. Genetic association analysis was performed using REGENIE v2.2.4, software for whole genome regression modelling of large GWASs that accounts for relatedness and population stratification127. All GWASs were adjusted for baseline age, (baseline age)2, the first 10 genetic PCs, and sex (in sex-combined analyses).

Estonian biobank

EstBB is a volunteer-based sample of Estonian residents comprising ~20% of the Estonian adult population (N > 210,000), recruited by medical personnel and through media campaigns. Various health and demographic data have been collected from the participants, both by medical workers and via self-reports, since 2002. The cohort has been described in detail by Leitsalu et al.45. Genetic data for this study was obtained from genotyping with the Illumina global screening array (GSA) microchip, with imputation using a customised reference panel aligned to the hg19 genome, as described previously128.

BMI was available for 193,490 participants. BMI measurements were collected by doctors (through measurements of height and weight) from 2001 to 2023. Population-level and individual-level QC of repeat BMI measurements were performed as described for the UKBB discovery cohort; we additionally excluded individuals with records of use of GLP-1 inhibitors such as semaglutide (blood glucose-lowering drugs that typically also result in weight loss, drug codes A10BJ*). In total, 82,034 female participants with 281,438 measurements of BMI and 45,735 male participants with 164,166 measurements of BMI were retained. Of these, 125,209 passed genotyping QC.

For each participant, we calculated linear rates of change in BMI over time with the LME model described in (2); we also calculated each individual’s soft cluster membership probability of belonging to clusters whose centroids were defined in the UKBB discovery data (Supplementary Data 24). All analyses were performed in sex-specific and sex-combined strata. Genetic association analysis was performed using REGENIE v3.2 software for whole genome regression modelling127. All GWASs were adjusted for baseline age, (baseline age)2, the first 20 genetic PCs, and sex (in sex-combined analyses).

Power calculations for replication sample sizes

We corrected the observed effect sizes from discovery GWASs for winner’s curse through an implementation first described by Palmer et al.129. Briefly, we solve for the bias using the following maximum likelihood model,

βobs=βtrue+sϕβtruescϕβtruescψβtruesc+ψβtruesc 19

where βobs is the effect size in the discovery GWAS, βtrue is the (assumed true) effect size in the source population, and c = 5.33 is the test statistic corresponding to a discovery α = 5 × 10−8. The sample size required to replicate the (assumed true) unbiased effect size is then calculated for nominally significant α = 0.05 and Bonferroni-adjusted for the number of independent variants tested, Mvar (α=0.05Mvar) as follows:

power(α,ncp)=1χ12((χ12)1(1α),ncp) 20

under the alternative distribution which is non-central χ12 with non-centrality parameter per variant (ncp) estimated for a normalised trait with variance 1 as:

ncpN2βobs2AF(1AF)12βobs2AF(1AF) 21

where AF is the variant allele frequency.

Power comparison to GIANT 2019 meta-analysis of BMI

We accessed publicly available summary statistics from the GIANT consortium’s meta-analysis of BMI across UKBB and previous GIANT releases in female-specific (max N = 434,793), male-specific (max N = 374,755), and sex-combined strata (max N = 806,834)46. SNPs included in both the GIANT 2019 meta-analysis and our in-house BMI-intercept GWAS that reached GWS in either study were carried forward for power comparisons, resulting in 26,812 (female-specific strata), 22,123 (male-specific strata), and 82,559 (sex-combined strata) SNPs. Per variant, we calculated the χ2 statistic (as β2SE2) and obtained the ratio of χinhouse2 to χGIANT2. Median χinhouse2χGIANT2 across all GWS SNPs was then compared to the median ratio of sample sizes, i.e. NinhouseNGIANT, to determine the boost in power over that expected from the sample size difference between the two studies.

Single-variant analyses

The following analyses were all conducted in female-specific, male-specific, and sex-combined strata.

Abdominal adiposity change traits

Slope changes in WC, WHR, WCadjBMI, and WHRadjBMI for up to 44,154 individuals with repeat observations were calculated using LMEs models, adjusted and rank-based inverse-normal transformed116 for genetic association testing as described in the slope modelling section above. We estimated the additive association of number of copies of each lead variant minor allele (0, 1, or 2) with slope traits adjusted for the first 21 genetic PCs and genotyping array via linear regression (Supplementary Data 17).

Longitudinal phenome-wide association

We curated a longitudinal research resource for 45 additional quantitative phenotypes in up to 146,099 individuals of white–British ancestry (Supplementary Data 14, as identified by Kuan et al.130) by integrating UKBB assessment centre measurements with the interim release of primary care records provided by GPs, with QC performed as described above for obesity traits. Slope changes in each of these phenotypes were calculated using LMEs models described in (2). A deterministic rank-based inverse-normal transformation116, as described in (5), was applied to the slope BLUP u^i,1. The transformed slope-trait was tested for additive association with number of copies of each lead variant minor allele (0, 1, or 2), adjusted for the intercept BLUP u^i,0, baseline age, (baseline age)2, sex, year of birth, number of follow-ups, total length of follow-up (in years), assessment centre, first 21 genetic PCs and genotyping array (Supplementary Data 18).

Identification of individuals with Alzheimer’s or dementia diagnoses

We identified participants with codes for history or diagnosis of dementia in either primary care or hospital in-patient records (Supplementary Data 15, as identified by Kuan et al.113). We performed sensitivity analyses for the replication of rs429358 associations with all obesity-change phenotypes after excluding up to 242 individuals of white–British ancestry with recorded history or diagnosis of dementia.

Identification of lifespan-associated variants

We curated a list of 138 independent variants associated with longevity in the GWAS Catalog54, accessed on 27 March 2023 (Supplementary Data 16). We identified independent SNPs that passed genotyping and imputation QC filters in UKBB by pair-wise pruning variants in LD (r2 > 0.1) within a 1 Mb window. One of the lead variants identified in this study, i.e., rs429358 in the APOE locus, was pruned out in favour of rs4420638, which is 11 kb away from the lead variant and in LD with rs429358 with r2 = 0.69. We looked up the effects of these variants in the various adiposity-change GWAS summary statistics and established significance at P = 3.60 × 10−4 (Bonferroni-corrected at 5% across 138 tests).

SNP heritability and genetic correlations

We estimated the heritability explained by genotyped SNPs (hG2) and genetic correlations (rG) between obesity-intercept and obesity-change traits, from summary statistics, using LD score regression implemented in the LDSC software67,131, with pre-computed LD-scores based on European-ancestry samples of the 1000 Genomes Project132 restricted to HapMap3 SNPs133. The same protocol was followed to determine rG between BMI-intercept in our in-house study and BMI in the GIANT 2019 meta-analysis.

Joint modelling of intra-individual mean and variance

Analyses were performed using the TrajGWAS package28 in Julia134, for 177,472 unrelated individuals of white–British ancestry with multiple measurements of weight included in the discovery analyses. Briefly, TrajGWAS analysis is conducted in two stages to test for genetic effects on longitudinal trajectory mean, intra-individual variance, and a joint effect on either mean or variance in an LME model framework28. In the first stage, we fit a null model for weight with fixed effects for the intercept, age, age2, sex, and 21 genetic PCs; we included random effects for the intercept and linear slope of age. In the second stage, we performed score testing with the saddle-point approximation under the full model, i.e. including genome-wide effects for all variants with MAF >1% in the genotyped and imputed UKBB data that passed QC.

Sex-heterogeneity testing

We tested for sex-heterogeneity in the effects of adiposity-change lead SNPs by calculating Z-statistics and corresponding P-values for the difference in female-specific and male-specific effects as:

Zsexhet=(β^(F)β^(M))(SE(F)2+SE(M)2) 22

A similar statistic and test was used to determine heterogeneity between (hG2) of all traits in males and females, and rG between obesity-intercepts and obesity-change traits in males and females.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Peer Review File (1,009.3KB, pdf)
41467_2024_49998_MOESM3_ESM.pdf (110.3KB, pdf)

Description of Additional Supplementary Files

Reporting Summary (2MB, pdf)

Acknowledgements

S.S.V. was supported by the Rhodes Scholarships, Clarendon Fund, and the Medical Sciences Doctoral Training Centre at the University of Oxford. K.C. was supported by the University of Leicester (College of Life Sciences) and Health Data Research UK. K.A. was supported by the Estonian Research Council’s Personal Starting Grant PSG759. L.B.L.W. was supported by the Wellcome Trust. U.V. was supported by the Estonian Research Council’s Personal Starting Grant PSG759. C.H. wishes to acknowledge support from the Alan Turing Institute, the EPSRC grant Bayes4Health, Novartis, and Novo Nordisk. C.M.L. is supported by the Li Ka Shing Foundation, NIHR Oxford Biomedical Research Centre, Oxford, NIH (1P50HD104224-01), Gates Foundation (INV-024200), and a Wellcome Trust Investigator Award (221782/Z/20/Z). G.N. acknowledges funding from the NIHR Biomedical Research Centre, Oxford (grant no. NIHR203311). This research has been conducted using the UK Biobank Resource under Application Number 11867. This research was partially supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z with additional support from the NIHR Oxford BRC. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. This research is partially supported by funding from the Department of Veterans Affairs Office of Research and Development, Million Veteran Program Grant I01-BX003340 and I01-BX004821. This publication does not represent the views of the Department of Veterans Affairs or the United States Government. This study was partially funded by the European Union through the European Regional Development Fund Project No. 2014-2020.4.01.15-0012 GENTRANSMED. Data analysis was carried out in part in the High-Performance Computing Centre of the University of Tartu. The activities of the EstBB are regulated by the Human Genes Research Act, which was adopted in 2000 specifically for the operations of the EstBB. Individual-level data analysis in the EstBB was carried out under ethical approvals of 1.1-12/1409 and 1.1-12/2161 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs), using data according to release application 6-7/GI/31993 from the EstBB.

Author contributions

S.S.V., G.N., and C.M.L. conceptualised the study. Data curation and formal analyses were conducted by S.S.V., Kayesha C., G.V.L., Q.H., K.A., U.V., and G.N. S.S.V., H.G., and G.N. developed methodology and software. Data collection was performed by P.W., Y.H., and Kelly C. Funding was acquired by U.V., Y.V.S, C.H., and C.M.L. C.H., G.N., and C.M.L. were responsible for supervision. S.S.V. and G.N. wrote the original draft. S.S.V, H.G., D.S.P., L.B.L.W, C.N., C.H., C.M.L., and G.N. edited the draft.

Peer review

Peer review information

Nature Communications thanks Andrea Ganna, Zoltán Kutalik and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

The GWAS summary statistics generated in this study have been deposited in the GWAS Catalog54. They can be downloaded from the parent directory: ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90429001-GCST90430000/ using the accession numbers provided in Supplementary Data 26 (ranging from GCST90429765 to GCST90429794).

Code availability

All code required to reproduce analyses is publicly available at: https://github.com/lindgrengroup/longitudinal_primarycare/tree/main/adiposity/scripts/manuscript135.

Competing interests

L.B.L.W. is currently employed by Novo Nordisk Research Centre Oxford but, while she conducted the research described in this manuscript, was only affiliated with the University of Oxford. C.H. reports grants from Novo Nordisk and Novartis; C.M.L. reports grants from Bayer AG and Novo Nordisk and has a partner who works at Vertex. The other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lists of authors and their affiliations appear at the end of the paper.

Contributor Information

Samvida S. Venkatesh, Email: samvida@well.ox.ac.uk

Cecilia M. Lindgren, Email: cecilia.lindgren@bdi.ox.ac.uk

George Nicholson, Email: george.nicholson@stats.ox.ac.uk.

Estonian Biobank Research Team:

Andres Metspalu, Lili Milani, Tõnu Esko, Reedik Mägi, Mari Nelis, and Georgi Hudjashov

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-024-49998-0.

References

  • 1.Bluher M. Obesity: global epidemiology and pathogenesis. Nat. Rev. Endocrinol. 2019;15:288–298. doi: 10.1038/s41574-019-0176-8. [DOI] [PubMed] [Google Scholar]
  • 2.Collaborators GBDO, et al. Health effects of overweight and obesity in 195 countries over 25 years. N. Engl. J. Med. 2017;377:13–27. doi: 10.1056/NEJMoa1614362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Must A, et al. The disease burden associated with overweight and obesity. JAMA. 1999;282:1523–1529. doi: 10.1001/jama.282.16.1523. [DOI] [PubMed] [Google Scholar]
  • 4.Loos RJF, Yeo GSH. The genetics of obesity: from discovery to biology. Nat. Rev. Genet. 2022;23:120–133. doi: 10.1038/s41576-021-00414-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Maes HH, Neale MC, Eaves LJ. Genetic and environmental factors in relative body weight and human adiposity. Behav. Genet. 1997;27:325–351. doi: 10.1023/A:1025635913927. [DOI] [PubMed] [Google Scholar]
  • 6.Elks CE, et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Front. Endocrinol. (Lausanne) 2012;3:29. doi: 10.3389/fendo.2012.00029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Khera AV, et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019;177:587–596.e9. doi: 10.1016/j.cell.2019.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hardy R, et al. Life course variations in the associations between fto and mc4r gene variants and body size. Hum. Mol. Genet. 2010;19:545–552. doi: 10.1093/hmg/ddp504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Silventoinen K, et al. Changing genetic architecture of body mass index from infancy to early adulthood: an individual based pooled analysis of 25 twin cohorts. Int. J. Obes. (Lond.) 2022;46:1901–1909. doi: 10.1038/s41366-022-01202-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Helgeland O, et al. Characterization of the genetic architecture of infant and early childhood body mass index. Nat. Metab. 2022;4:344–358. doi: 10.1038/s42255-022-00549-1. [DOI] [PubMed] [Google Scholar]
  • 11.Couto Alves A, et al. Gwas on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI. Sci. Adv. 2019;5:eaaw3095. doi: 10.1126/sciadv.aaw3095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hjelmborg J, et al. Genetic influences on growth traits of bmi: a longitudinal study of adult twins. Obesity. 2008;16:847–852. doi: 10.1038/oby.2007.135. [DOI] [PubMed] [Google Scholar]
  • 13.Fabsitz RR, Sholinsky P, Carmelli D. Genetic influences on adult weight gain and maximum body mass index in male twins. Am. J. Epidemiol. 1994;140:711–720. doi: 10.1093/oxfordjournals.aje.a117319. [DOI] [PubMed] [Google Scholar]
  • 14.Austin MA, et al. Genetic influences on changes in body mass index: a longitudinal analysis of women twins. Obes. Res. 1997;5:326–331. doi: 10.1002/j.1550-8528.1997.tb00559.x. [DOI] [PubMed] [Google Scholar]
  • 15.Xu J, et al. Exploring the clinical and genetic associations of adult weight trajectories using electronic health records in a racially diverse biobank: a phenome-wide and polygenic risk study. Lancet Digit Health. 2022;4:e604–e614. doi: 10.1016/S2589-7500(22)00099-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat. Med. 2020;26:29–38. doi: 10.1038/s41591-019-0727-5. [DOI] [PubMed] [Google Scholar]
  • 17.Wolford BN, Willer CJ, Surakka I. Electronic health records: the next wave of complex disease genetics. Hum. Mol. Genet. 2018;27:R14–R21. doi: 10.1093/hmg/ddy081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wei WQ, Denny JC. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med. 2015;7:41. doi: 10.1186/s13073-015-0166-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gottesman O, et al. The electronic medical records and genomics (emerge) network: past, present, and future. Genet. Med. 2013;15:761–771. doi: 10.1038/gim.2013.72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Monda KL, et al. A meta-analysis identifies new loci associated with body mass index in individuals of african ancestry. Nat. Genet. 2013;45:690–696. doi: 10.1038/ng.2608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Postmus I, et al. Pharmacogenetic meta-analysis of genome-wide association studies of ldl cholesterol response to statins. Nat. Commun. 2014;5:5068. doi: 10.1038/ncomms6068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chiu YF, Justice AE, Melton PE. Longitudinal analytical approaches to genetic data. BMC Genet. 2016;2:4. doi: 10.1186/s12863-015-0312-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Fan R, et al. Longitudinal association analysis of quantitative traits. Genet. Epidemiol. 2012;36:856–869. doi: 10.1002/gepi.21673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Furlotte NA, Eskin E, Eyheramendy S. Genome-wide association mapping with longitudinal data. Genet. Epidemiol. 2012;36:463–471. doi: 10.1002/gepi.21640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Goldstein JA, et al. Labwas: novel findings and study design recommendations from a meta-analysis of clinical labs in two independent biobanks. PLoS Genet. 2020;16:e1009077. doi: 10.1371/journal.pgen.1009077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Justice AE, et al. Genome-wide association of trajectories of systolic blood pressure change. BMC Proc. 2016;10:321–327. doi: 10.1186/s12919-016-0050-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gauderman WJ, et al. Longitudinal data analysis in pedigree studies. Genet. Epidemiol. 2003;1:S18–28. doi: 10.1002/gepi.10280. [DOI] [PubMed] [Google Scholar]
  • 28.Ko S, et al. Gwas of longitudinal trajectories at biobank scale. Am. J. Hum. Genet. 2022;109:433–445. doi: 10.1016/j.ajhg.2022.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. doi: 10.2307/2529876. [DOI] [PubMed] [Google Scholar]
  • 30.Xu H, et al. High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes. Bioinformatics. 2020;36:3004–3010. doi: 10.1093/bioinformatics/btaa120. [DOI] [PubMed] [Google Scholar]
  • 31.Ruppert, D., Wand, M. P. & Carroll, R. J. Semiparametric regression. Cambridge Series in Statistical and Probabilistic Mathematics. https://www.cambridge.org/core/books/semiparametric-regression/02FC9A9435232CA67532B4D31874412C (Cambridge University Press, Cambridge, 2003).
  • 32.Das K, et al. A dynamic model for genome-wide association studies. Hum. Genet. 2011;129:629–639. doi: 10.1007/s00439-011-0960-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Das K, et al. Dynamic semiparametric Bayesian models for genetic mapping of complex trait with irregular longitudinal data. Stat. Med. 2013;32:509–523. doi: 10.1002/sim.5535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Li Z, Sillanpää MJ. A bayesian nonparametric approach for mapping dynamic quantitative traits. Genetics. 2013;194:997–1016. doi: 10.1534/genetics.113.152736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li J, Wang Z, Li R, Wu R. Bayesian group lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. Ann. Appl. Stat. 2015;9:640–664. doi: 10.1214/15-AOAS808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Anh Luong, D. T. & Chandola, V. A K-means approach to clustering disease progressions. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI), 268–274 (2017).
  • 37.Hedman AK, et al. Identification of novel pheno-groups in heart failure with preserved ejection fraction using machine learning. Heart. 2020;106:342–349. doi: 10.1136/heartjnl-2019-315481. [DOI] [PubMed] [Google Scholar]
  • 38.Lee, C. & Schaar, M. V. D. Temporal phenotyping using deep predictive clustering of disease progression. In: Proceedings of the 37th International Conference on Machine Learning, 5767–5777 (PMLR, 2020). https://proceedings.mlr.press/v119/lee20h.html. ISSN: 2640-3498.
  • 39.Mullin S, et al. Longitudinal K-means approaches to clustering and analyzing EHR opioid use trajectories for clinical subtypes. J. Biomed. Inform. 2021;122:103889. doi: 10.1016/j.jbi.2021.103889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lee C, Rashbass J, van der Schaar M. Outcome-oriented deep temporal phenotyping of disease progression. IEEE Trans. Biomed. Eng. 2021;68:2423–2434. doi: 10.1109/TBME.2020.3041815. [DOI] [PubMed] [Google Scholar]
  • 41.Carr, O., Javer, A., Rockenschaub, P., Parsons, O. & Durichen, R. Longitudinal patient stratification of electronic health records with flexible adjustment for clinical outcomes. In Proceedings of Machine Learning for Health. https://proceedings.mlr.press/v158/carr21a.html. 220–238 (PMLR, 2021).
  • 42.Sudlow C, et al. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gaziano JM, et al. Million veteran program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 2016;70:214–223. doi: 10.1016/j.jclinepi.2015.09.016. [DOI] [PubMed] [Google Scholar]
  • 44.Nguyen XT, et al. Baseline characterization and annual trends of body mass index for a mega-biobank cohort of us veterans 2011-2017. J. Health Res. Rev. Dev. Ctries. 2018;5:98–107. doi: 10.4103/jhrr.jhrr_10_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Leitsalu L, et al. Cohort profile: Estonian biobank of the estonian genome center, university of tartu. Int. J. Epidemiol. 2015;44:1137–1147. doi: 10.1093/ije/dyt268. [DOI] [PubMed] [Google Scholar]
  • 46.Pulit SL, et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of european ancestry. Hum. Mol. Genet. 2019;28:166–174. doi: 10.1093/hmg/ddy327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Benonisdottir S, et al. Epigenetic and genetic components of height regulation. Nat. Commun. 2016;7:13490. doi: 10.1038/ncomms13490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Shenkman M, et al. Mannosidase activity of edem1 and edem2 depends on an unfolded state of their glycoprotein substrates. Commun. Biol. 2018;1:172. doi: 10.1038/s42003-018-0174-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tews D, et al. Teneurin-2 (tenm2) deficiency induces ucp1 expression in differentiating human fat cells. Mol. Cell Endocrinol. 2017;443:106–113. doi: 10.1016/j.mce.2017.01.015. [DOI] [PubMed] [Google Scholar]
  • 50.Jung H, et al. Sexually dimorphic behavior, neuronal activity, and gene expression in chd8-mutant mice. Nat. Neurosci. 2018;21:1218–1228. doi: 10.1038/s41593-018-0208-z. [DOI] [PubMed] [Google Scholar]
  • 51.Mo D, et al. Transcriptome landscape of porcine intramuscular adipocytes during differentiation. J. Agric Food Chem. 2017;65:6317–6328. doi: 10.1021/acs.jafc.7b02039. [DOI] [PubMed] [Google Scholar]
  • 52.Groza T, et al. The international mouse phenotyping consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Res. 2023;51:D1038–D1045. doi: 10.1093/nar/gkac972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Pirastu N, et al. Genetic analyses identify widespread sex-differential participation bias. Nat. Genet. 2021;53:663–671. doi: 10.1038/s41588-021-00846-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Welter D, et al. The nhgri gwas catalog, a curated resource of snp-trait associations. Nucleic Acids Res. 2014;42:D1001–6. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Reynolds AP, Richards G, de la Iglesia B, Rayward-Smith VJ. Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorithms. 2006;5:475–504. doi: 10.1007/s10852-005-9022-1. [DOI] [Google Scholar]
  • 56.Schubert, E. & Rousseeuw, P. J. Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Amato, G., Gennaro, C., Oria, V. & Radovanović, M. (eds.) Similarity Search and Applications, Lecture Notes in Computer Science, 171–187 (Springer International Publishing, Cham, 2019).
  • 57.Surakka I, et al. The impact of low-frequency and rare variants on lipid levels. Nat. Genet. 2015;47:589–597. doi: 10.1038/ng.3300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hoffmann TJ, et al. A large electronic-health-record-based genome-wide study of serum lipids. Nat. Genet. 2018;50:401–413. doi: 10.1038/s41588-018-0064-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Shen L, et al. Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in mci and ad: a study of the adni cohort. Neuroimage. 2010;53:1051–1063. doi: 10.1016/j.neuroimage.2010.01.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Nazarian A, Yashin AI, Kulminski AM. Genome-wide analysis of genetic predisposition to alzheimer’s disease and related sex disparities. Alzheimers Res. Ther. 2019;11:5. doi: 10.1186/s13195-018-0458-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Joshi PK, et al. Variants near chrna3/5 and apoe have age- and sex-related effects on human lifespan. Nat. Commun. 2016;7:11174. doi: 10.1038/ncomms11174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Pilling LC, et al. Human longevity: 25 genetic loci associated in 389,166 uk biobank participants. Aging (Albany NY) 2017;9:2504–2520. doi: 10.18632/aging.101334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Lumsden AL, Mulugeta A, Zhou A, Hypponen E. Apolipoprotein e (apoe) genotype-associated disease risks: a phenome-wide, registry-based, case-control study utilising the uk biobank. EBioMedicine. 2020;59:102954. doi: 10.1016/j.ebiom.2020.102954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Astle WJ, et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–1429 e19. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kettunen J, et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of lpa. Nat. Commun. 2016;7:11122. doi: 10.1038/ncomms11122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Shrine N, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 2019;51:481–493. doi: 10.1038/s41588-018-0321-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Bulik-Sullivan BK, et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Song M, et al. Associations between genetic variants associated with body mass index and trajectories of body fatness across the life course: a longitudinal analysis. Int. J. Epidemiol. 2018;47:506–515. doi: 10.1093/ije/dyx255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Bray MS, et al. Nih working group report-using genomic information to guide weight management: from universal to precision treatment. Obes. (Silver Spring) 2016;24:14–22. doi: 10.1002/oby.21381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Delahanty LM, et al. Genetic predictors of weight loss and weight regain after intensive lifestyle modification, metformin treatment, or standard care in the diabetes prevention program. Diab Care. 2012;35:363–366. doi: 10.2337/dc11-1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Liou TH, et al. Esr1, fto, and ucp2 genes interact with bariatric surgery affecting weight loss and glycemic control in severely obese patients. Obes. Surg. 2011;21:1758–1765. doi: 10.1007/s11695-011-0457-3. [DOI] [PubMed] [Google Scholar]
  • 72.Sarzynski MA, et al. Associations of markers in 11 obesity candidate genes with maximal weight loss and weight regain in the sos bariatric surgery cases. Int J. Obes. 2011;35:676–683. doi: 10.1038/ijo.2010.166. [DOI] [PubMed] [Google Scholar]
  • 73.Zhang X, et al. Fto genotype and 2-year change in body composition and fat distribution in response to weight-loss diets: the pounds lost trial. Diabetes. 2012;61:3005–3011. doi: 10.2337/db11-1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Papandonatos GD, et al. Genetic predisposition to weight loss and regain with lifestyle intervention: analyses from the diabetes prevention program and the look ahead randomized controlled trials. Diabetes. 2015;64:4312–4321. doi: 10.2337/db15-0441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.McCaffery JM, et al. Genetic predictors of change in waist circumference and waist-to-hip ratio with lifestyle intervention: the trans-nih consortium for genetics of weight loss response to lifestyle intervention. Diabetes. 2022;71:669–676. doi: 10.2337/db21-0741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Holzapfel, C. et al. Association between single nucleotide polymorphisms and weight reduction in behavioural interventions-a pooled analysis. Nutrients13, 819 (2021). [DOI] [PMC free article] [PubMed]
  • 77.Nelson MR, et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 2015;47:856–860. doi: 10.1038/ng.3314. [DOI] [PubMed] [Google Scholar]
  • 78.Silventoinen K, Kaprio J. Genetics of tracking of body mass index from birth to late middle age: evidence from twin and family studies. Obes. Facts. 2009;2:196–202. doi: 10.1159/000219675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Winkler TW, et al. The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study. PLOS Genet. 2015;11:e1005378. doi: 10.1371/journal.pgen.1005378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Gillespie NA, et al. Determining the stability of genome-wide factors in BMI between ages 40 to 69 years. PLOS Genet. 2022;18:e1010303. doi: 10.1371/journal.pgen.1010303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Beesley, L. J., Fritsche, L. G. & Mukherjee, B. A modeling framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records. bioRxiv. https://www.biorxiv.org/content/early/2019/05/14/499392 (2019). [DOI] [PubMed]
  • 82.Fry A, et al. Comparison of sociodemographic and health-related characteristics of uk biobank participants with those of the general population. Am. J. Epidemiol. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Goudie RJB, Presanis AM, Lunn D, Angelis DD, Wernisch L. Joining and splitting models with Markov melding. Bayesian Anal. 2019;14:81–109. doi: 10.1214/18-BA1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Loh PR, et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Li H, et al. Triglyceride-glucose index variability and incident cardiovascular disease: a prospective cohort study. Cardiovasc. Diabetol. 2022;21:105. doi: 10.1186/s12933-022-01541-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Nuyujukian DS, et al. Blood pressure variability and risk of heart failure in accord and the vadt. Diabetes Care. 2020;43:1471–1478. doi: 10.2337/dc19-2540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Speakman JR, et al. Set points, settling points and some alternative models: theoretical options to understand how genes and environments combine to regulate body adiposity. Dis. Model. Mech. 2011;4:733–745. doi: 10.1242/dmm.008698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Muller, M. J., Geisler, C., Heymsfield, S. B. & Bosy-Westphal, A. Recent advances in understanding body weight homeostasis in humans. F1000Res 7, F1000 (2018). [DOI] [PMC free article] [PubMed]
  • 89.Nawaz H, Chan W, Abdulrahman M, Larson D, Katz DL. Self-reported weight and height: implications for obesity research. Am. J. Prev. Med. 2001;20:294–298. doi: 10.1016/S0749-3797(01)00293-8. [DOI] [PubMed] [Google Scholar]
  • 90.Kowal RC, Herz J, Goldstein JL, Esser V, Brown MS. Low density lipoprotein receptor-related protein mediates uptake of cholesteryl esters derived from apoprotein e-enriched lipoproteins. Proc. Natl. Acad. Sci. USA. 1989;86:5810–5814. doi: 10.1073/pnas.86.15.5810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Kockx M, Traini M, Kritharides L. Cell-specific production, secretion, and function of apolipoprotein e. J. Mol. Med. 2018;96:361–371. doi: 10.1007/s00109-018-1632-y. [DOI] [PubMed] [Google Scholar]
  • 92.Garrison RJ, et al. Obesity and lipoprotein cholesterol in the framingham offspring study. Metabolism. 1980;29:1053–1060. doi: 10.1016/0026-0495(80)90216-4. [DOI] [PubMed] [Google Scholar]
  • 93.Albrink MJ, et al. Intercorrelations among plasma high density lipoprotein, obesity and triglycerides in a normal population. Lipids. 1980;15:668–676. doi: 10.1007/BF02534017. [DOI] [PubMed] [Google Scholar]
  • 94.Panagiotakos DB, Pitsavos C, Yannakoulia M, Chrysohoou C, Stefanadis C. The implication of obesity and central fat on markers of chronic inflammation: the Attica study. Atherosclerosis. 2005;183:308–315. doi: 10.1016/j.atherosclerosis.2005.03.010. [DOI] [PubMed] [Google Scholar]
  • 95.Purdy JC, Shatzel JJ. The hematologic consequences of obesity. Eur. J. Haematol. 2021;106:306–319. doi: 10.1111/ejh.13560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Gillette Guyonnet S, et al. Iana (international academy on nutrition and aging) expert group: weight loss and alzheimer’s disease. J. Nutr. Health Aging. 2007;11:38–48. [PubMed] [Google Scholar]
  • 97.von Hardenberg S, Gnewuch C, Schmitz G, Borlak J. Apoe is a major determinant of hepatic bile acid homeostasis in mice. J. Nutr. Biochem. 2018;52:82–91. doi: 10.1016/j.jnutbio.2017.09.008. [DOI] [PubMed] [Google Scholar]
  • 98.Wang J, et al. Apoe and the role of very low density lipoproteins in adipose tissue inflammation. Atherosclerosis. 2012;223:342–349. doi: 10.1016/j.atherosclerosis.2012.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Blanchard JW, et al. Apoe4 impairs myelination via cholesterol dysregulation in oligodendrocytes. Nature. 2022;611:769–779. doi: 10.1038/s41586-022-05439-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Greendale, G. A. et al. Changes in body composition and weight during the menopause transition. JCI Insight.4, e124865 (2019). [DOI] [PMC free article] [PubMed]
  • 101.Davies KM, Heaney RP, Recker RR, Barger-Lux MJ, Lappe JM. Hormones, weight change and menopause. Int. J. Obes. Relat. Metab. Disord. 2001;25:874–879. doi: 10.1038/sj.ijo.0801593. [DOI] [PubMed] [Google Scholar]
  • 102.Chen YW, Hang D, Kvaerner AS, Giovannucci E, Song M. Associations between body shape across the life course and adulthood concentrations of sex hormones in men and pre- and postmenopausal women: a multicohort study. Br. J. Nutr. 2022;127:1000–1009. doi: 10.1017/S0007114521001732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Conroy M, et al. The advantages of UK biobank’s open-access strategy for health research. J. Intern. Med. 2019;286:389–397. doi: 10.1111/joim.12955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Coady SA, et al. Genetic variability of adult body mass index: a longitudinal assessment in framingham families. Obes. Res. 2002;10:675–681. doi: 10.1038/oby.2002.91. [DOI] [PubMed] [Google Scholar]
  • 105.Singh, P. et al. Statins decrease leptin expression in human white adipocytes. Physiol. Rep.6, e13566 (2018). [DOI] [PMC free article] [PubMed]
  • 106.McCarron DA, Reusser ME. Body weight and blood pressure regulation. Am. J. Clin. Nutr. 1996;63:423S–425S. doi: 10.1093/ajcn/63.3.423. [DOI] [PubMed] [Google Scholar]
  • 107.Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–625. doi: 10.1097/01.ede.0000135174.63482.43. [DOI] [PubMed] [Google Scholar]
  • 108.Beesley LJ, et al. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat. Med. 2020;39:773–800. doi: 10.1002/sim.8445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Kutcher SA, Brophy JM, Banack HR, Kaufman JS, Samuel M. Emulating a randomised controlled trial with observational data: an introduction to the target trial framework. Can. J. Cardiol. 2021;37:1365–1377. doi: 10.1016/j.cjca.2021.05.012. [DOI] [PubMed] [Google Scholar]
  • 110.Shortreed SM, Rutter CM, Cook AJ, Simon GE. Improving pragmatic clinical trial design using real-world data. Clin. Trials. 2019;16:273–282. doi: 10.1177/1740774519833679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Bycroft C, et al. The uk biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Team, U. B. UK Biobank Primary Care Linked Data (2019), version 1.0 edn. https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/primary_care_data.pdf (2019).
  • 113.Kuan V, et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the english national health service. Lancet Digit Health. 2019;1:e63–e77. doi: 10.1016/S2589-7500(19)30012-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Bates D, Machler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015;67:1–48. doi: 10.18637/jss.v067.i01. [DOI] [Google Scholar]
  • 115.R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2021). https://www.R-project.org/.
  • 116.Beasley TM, Erickson S, Allison DB. Rank-based inverse normal transformations are increasingly used, but are they merited? Behav. Genet. 2009;39:580–595. doi: 10.1007/s10519-009-9281-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Stat. Sci. 1996;11:89–121. doi: 10.1214/ss/1038425655. [DOI] [Google Scholar]
  • 118.O’Hagan, A. & Kendall, M. G. Kendall’s advanced theory of statistics: bayesian inference. Volume 2B (Edward Arnold, 1994). Google-Books-ID: DlrEMgEACAAJ.
  • 119.Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K.cluster: Cluster Analysis Basics and Extensionshttps://CRAN.R-project.org/package = cluster. R package version 2.1.4 — For new features, see the ‘Changelog’ file (in the package source) (2022).
  • 120.Peter JR. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987;20:53–65. doi: 10.1016/0377-0427(87)90125-7. [DOI] [Google Scholar]
  • 121.Smithson M, Verkuilen J. A better lemon squeezer? maximum-likelihood regression with beta-distributed dependent variables. Psychol. Methods. 2006;11:54–71. doi: 10.1037/1082-989X.11.1.54. [DOI] [PubMed] [Google Scholar]
  • 122.Benner C, et al. Finemap: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–1501. doi: 10.1093/bioinformatics/btw018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Yang J, Lee SH, Goddard ME, Visscher PM. Gcta: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Chang CC, et al. Second-generation plink: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S (Springer, New York, 2002), fourth edn. https://www.stats.ox.ac.uk/pub/MASS4/ (2002).
  • 126.Hunter-Zinck H, et al. Genotyping array design and data quality control in the million veteran program. Am. J. Hum. Genet. 2020;106:535–548. doi: 10.1016/j.ajhg.2020.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Mbatchou J, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 2021;53:1097–1103. doi: 10.1038/s41588-021-00870-7. [DOI] [PubMed] [Google Scholar]
  • 128.Mitt M, et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage wgs-based imputation reference panel. Eur. J. Hum. Genet. 2017;25:869–876. doi: 10.1038/ejhg.2017.51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Palmer C, Pe’er I. Statistical correction of the winner’s curse explains replication variability in quantitative trait genome-wide association studies. PLoS Genet. 2017;13:e1006916. doi: 10.1371/journal.pgen.1006916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Denaxas S, et al. A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the uk biobank using different primary care ehr and clinical terminology systems. JAMIA Open. 2020;3:545–556. doi: 10.1093/jamiaopen/ooaa047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Bulik-Sullivan B, et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Genomes Project C, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.International HapMap C, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: a fresh approach to numerical computing. SIAM Rev. 2017;59:65–98. doi: 10.1137/141000671. [DOI] [Google Scholar]
  • 135.Venkatesh, S. S. & Nicholson, G. The genetic architecture of changes in adiposity during adulthood. GitHub repository 10.5281/zenodo.11108733 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File (1,009.3KB, pdf)
41467_2024_49998_MOESM3_ESM.pdf (110.3KB, pdf)

Description of Additional Supplementary Files

Reporting Summary (2MB, pdf)

Data Availability Statement

The GWAS summary statistics generated in this study have been deposited in the GWAS Catalog54. They can be downloaded from the parent directory: ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90429001-GCST90430000/ using the accession numbers provided in Supplementary Data 26 (ranging from GCST90429765 to GCST90429794).

All code required to reproduce analyses is publicly available at: https://github.com/lindgrengroup/longitudinal_primarycare/tree/main/adiposity/scripts/manuscript135.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES