Abstract
Type 2 diabetes (T2D) is a heterogeneous disease that develops through diverse pathophysiological processes. To characterise the genetic contribution to these processes across ancestry groups, we aggregate genome-wide association study (GWAS) data from 2,535,601 individuals (39.7% non-European ancestry), including 428,452 T2D cases. We identify 1,289 independent association signals at genome-wide significance (P<5×10−8) that map to 611 loci, of which 145 loci are previously unreported. We define eight non-overlapping clusters of T2D signals characterised by distinct profiles of cardiometabolic trait associations. These clusters are differentially enriched for cell-type specific regions of open chromatin, including pancreatic islets, adipocytes, endothelial, and enteroendocrine cells. We build cluster-specific partitioned genetic risk scores (GRS) in an additional 137,559 individuals of diverse ancestry, including 10,159 T2D cases, and test their association with T2D-related vascular outcomes. Cluster-specific partitioned GRS are more strongly associated with coronary artery disease and end-stage diabetic nephropathy than an overall T2D GRS across ancestry groups, highlighting the importance of obesity-related processes in the development of vascular outcomes. Our findings demonstrate the value of integrating multi-ancestry GWAS with single-cell epigenomics to disentangle the aetiological heterogeneity driving the development and progression of T2D, which may offer a route to optimise global access to genetically-informed diabetes care.
Diabetes mellitus is a huge public health burden, with an estimated prevalence of 537 million adults worldwide in 2021, of whom >90% are affected by type 2 diabetes (T2D)1. The biological processes through which T2D and its primary disabling complications of vascular and renal disease develop are diverse, and include insulin secretion and insulin resistance. This aetiological heterogeneity leads to substantial variability in patient phenotypes, including age of disease onset, manifestation of disease complications, and response to management strategies2,3. Whilst environment and lifestyle are well-established risk factors for T2D, the disease heritability has been estimated to be 69% amongst individuals in the age range of 35 to 60 years4. The largest previous genome-wide association studies (GWAS) of T2D have identified >500 risk loci5,6, which demonstrated variable patterns of association with clinical features mediated via effector genes acting through distinct molecular mechanisms that are often cell-type specific7,8. Through the newly-established Type 2 Diabetes Global Genomics Initiative, we present findings from the largest T2D GWAS meta-analysis to date, comprising >2.5 million individuals of diverse ancestry and a near three-fold increase in effective sample size over previous efforts5,6. We leverage the power afforded by this increased sample size with emerging single-cell functional genomics data derived from disease-relevant tissues to disentangle the aetiological heterogeneity of T2D. For the first time across multiple ancestry groups, we construct partitioned genetic risk scores (GRS)9 and assess their association with T2D-related macrovascular outcomes and progression to microvascular complications in individuals with T2D. Taken together, our findings showcase the value of analyses conducted in multiple ancestries to understand the pathophysiological processes driving development of T2D and risk of vascular complications, and advance global opportunities for clinical translation of T2D GWAS findings.
Study overview.
We assembled GWAS including 428,452 T2D cases and 2,107,149 controls (effective sample size of 1,246,658) (Supplementary Tables 1 and 2). We organised these GWAS into five groups representing individuals with similar genetic background and refer to these as “ancestry groups”. Specifically, we considered: a European ancestry group (EUR, 60.3% of the effective sample size); an East Asian ancestry group (EAS, 19.8%); an African ancestry group, including admixed African Americans (AFR, 10.7%); a Hispanic group, who share genetic similarity with populations from the Americas, Africa, and Europe (HIS, 5.9%); and a South Asian ancestry group (SAS, 3.3%). The GWAS in each ancestry group were imputed to reference panels from the Trans-Omics for Precision Medicine Program10 (42.1% of the effective sample size), Haplotype Reference Consortium11 (14.7%), 1000 Genomes Project12,13 (11.8%), or whole-genome sequence data specific for and genetically similar to individuals in the study (31.4%). Subsequent association analyses accounted for study-level population stratification and relatedness, and adjusted for age and sex, where appropriate, and additional study-specific covariates (Supplementary Table 3, Methods). We analysed bi-allelic autosomal single nucleotide variants (SNVs) with minor allele frequency (MAF) ≥0.5% in at least one of the five ancestry groups from the 1000 Genomes Project13.
Discovery of 145 previously-unreported T2D loci.
We aggregated association summary statistics across GWAS via multi-ancestry meta-regression, implemented in MR-MEGA14, which allows for allelic effect heterogeneity that is correlated with ancestry. MR-MEGA is not restricted to broad continental ancestry categories but represents ancestry as multidimensional and continuous axes of genetic variation, which better reflect the continuum of human genetic diversity and demographic history15. We included three axes of genetic variation as covariates that separated GWAS from different ancestry groups, but which also revealed finer-scale population differences within ancestry groups (Supplementary Figure 1, Methods). Accounting for ancestry-correlated allelic effect heterogeneity among GWAS in the meta-regression provided better control for residual structure than a fixed-effects meta-analysis, as measured by the genomic control inflation factor (λGC=1.120 and λGC=1.396, respectively). At the conventional genome-wide significance threshold (P<5×10−8), we identified 1,289 distinct T2D association signals that were represented by independent index SNVs that were not in linkage disequilibrium (LD), defined as r2<0.05 in all ancestry groups from the 1000 Genomes Project13 (Supplementary Figure 2, Supplementary Table 4, Methods). The 1,289 association signals mapped to 611 loci, of which 145 loci (23.7%) have not been previously reported in T2D GWAS. We observed multiple distinct association signals at 235 (38.5%) loci, of which 34 loci were represented by more than five index SNVs. At association signals mapping to loci not previously reported for T2D, index SNVs were predominantly common (MAF >5% in at least one ancestry group) with odds-ratios (ORs) <1.1 (Supplementary Figure 3).
T2D association signals are enriched for ancestry-correlated heterogeneity.
Previous multi-ancestry GWAS have demonstrated widespread heterogeneity in allelic effects at T2D association signals across ancestry groups6,16. Such heterogeneity is not driven by variation in allele frequency between ancestry groups, but may reflect differences in LD structure, or interactions with environment or polygenic background. We took advantage of the meta-regression model to partition heterogeneity into an ancestry-correlated component explained by three axes of genetic variation, and a residual component reflecting differences in environmental exposures (that are not correlated with ancestry) and/or study design (Supplementary Table 5). We observed 127 (9.9%) independent T2D association signals with significant evidence for ancestry-correlated heterogeneity (PHET<3.9×10−5, Bonferroni correction for 1,289 signals). We would expect <1 signal to meet this threshold of significance, highlighting that ancestry-correlated heterogeneity is strongly enriched at T2D associations (one-sided binomial test P<2.2×10−16). In contrast, we observed significant evidence of residual heterogeneity at only 4 (0.3%) association signals (one-side binomial test P=0.031). These results therefore suggest that differences in allelic effects at index SNVs are more likely correlated with genetic ancestry than due to other factors that vary between GWAS.
We next sought to understand better the drivers of differences in allelic effects between GWAS at the 127 association signals with significant evidence of ancestry-correlated heterogeneity. To do this, we assessed the contribution of each of the three axes of genetic variation to heterogeneity (Methods). For 118 (92.9%) signals, allelic effect sizes were most strongly associated with the first two axes of genetic variation, which separate AFR, EAS, and EUR GWAS (Supplementary Figure 1, Supplementary Table 6). This may simply reflect power to detect heterogeneity because these three ancestry groups make the largest contributions to the effective sample size of the multi-ancestry meta-analysis. The magnitude and direction of the association of index SNVs with these two axes reflected differences in allelic effect size between AFR and EAS GWAS on the AFR-EAS axis, and AFR and EUR GWAS on the AFR-EUR axis (Supplementary Figure 4). The most significant evidence of ancestry-correlated heterogeneity was observed for the T2D association signal at the HNF1A locus indexed by rs1169299 (PHET=4.8×10−35). This index SNV was negatively associated with the AFR-EAS axis (PHET=2.7×10−11), and positively associated with the AFR-EUR axis (PHET=4.6×10−9), corresponding to an AFR allelic effect (OR=1.02) that was intermediate between the EAS and EUR allelic effects (OR=0.95 and OR=1.05, respectively). In contrast, the association signal indexed by rs2237884, at the locus encompassing INS, IGF2, and KCNQ1, was not associated with either the AFR-EAS axis (PHET=0.61) or AFR-EUR axis (PHET=0.56), indicating no difference in allelic effects between AFR, EAS, and EUR GWAS (OR=1.03 for all three ancestry groups). Instead, the heterogeneity for this signal was driven by association with the third axis of genetic variation (PHET=2.8×10−8), which separates HIS and SAS GWAS (OR=1.01 and OR=0.97, respectively).
Ancestry-correlated heterogeneity can occur because of interaction between index SNVs and environmental/lifestyle factors that vary between ancestry groups, if not accounted for in the association analysis17. We therefore investigated whether ancestry-correlated heterogeneity could be explained by differences in the distribution of body mass index (BMI) across ancestry groups (Supplementary Figure 5). To do this, we extended the MR-MEGA meta-regression model to allow for allelic effect heterogeneity at index SNVs due to mean BMI in controls, in addition to variation by ancestry (Methods). Across all index SNVs, there was a negative correlation between T2D risk and BMI in controls (Supplementary Table 7), where the mean reduction in log-OR per unit of BMI was 0.0016 (P=9.6×10−53). After adjustment for BMI, the number of independent T2D associations with significant evidence for ancestry-correlated heterogeneity decreased from 127 (9.9%) to 49 (3.8%). These results indicate that ancestry-correlated heterogeneity at T2D index SNVs can be partly but not fully explained by differences in the distribution of BMI between ancestry groups.
The contribution of under-represented populations to evidence for locus discovery.
Despite the global burden of common complex diseases, including T2D, GWAS are biased towards EUR and EAS ancestry groups18. Failure to detect T2D association signals that contribute to disease risk outside of these ancestry groups limits opportunities for clinical translation of GWAS into under-represented ancestries and has the potential to exacerbate health disparities19. This multi-ancestry meta-analysis includes 98,060 T2D cases and 198,020 controls from AFR, HIS, and SAS GWAS (19.9% of the effective sample size), which substantially increases sample sizes in these under-represented groups over previous efforts5,6,20. To assess their contribution to the discovery of T2D association signals, we repeated the meta-regression using only EAS and EUR GWAS, including just one axis of genetic variation needed to separate these two ancestry groups (Supplementary Figure 1, Methods). Of the 1,289 association signals identified in the multi-ancestry meta-analysis, 589 did not retain genome-wide significance after exclusion of AFR, HIS, and SAS GWAS (Supplementary Figure 6, Supplementary Table 8), emphasizing the power of population diversity for locus discovery.
Eight mechanistic clusters of T2D index SNVs with distinct cardiometabolic profiles.
To understand the genetic contribution to phenotypic heterogeneity in T2D, we classified the 1,289 index SNVs according to their profile of associations (aligned to the T2D risk allele) with 37 cardiometabolic phenotypes from the largest available GWAS. These included glycaemic traits, anthropometric measures, body fat and adipose tissue volume, blood pressure, circulating plasma lipid levels, and biomarkers of liver function and lipid metabolism21–29 (Supplementary Table 9). We applied an unsupervised “hard clustering” approach with imputation of missing phenotype associations, which defines non-overlapping but exhaustive subsets of index SNVs with similar cardiometabolic profiles (Methods). According to the majority rule across 27 indices of cluster performance30, eight clusters of index SNVs was considered “optimal” (Figure 1, Table 1, Supplementary Figure 7, Supplementary Tables 10 and 11). Despite differences in the methods and phenotypes used for clustering compared to previous efforts7,8,31 (Methods), there is clear overlap in the cardiometabolic features and loci of five of the identified clusters with those previously reported, representing beta-cell dysfunction with positive or negative association with proinsulin (PI), and insulin resistance mediated via obesity, lipodystrophy, and liver/lipid metabolism. However, by increasing the number of index SNVs in the clustering by nearly four-fold over previous efforts, we provide a more granular view of the biological processes through which T2D associations impact disease, and highlight previously-unreported clusters of signals with cardiometabolic profiles that are representative of the metabolic syndrome, body fat, and residual glycaemic effects. We observed significant differences in allelic effects on T2D between the eight clusters (P<2.2×10−16), with weaker associations observed at index SNVs assigned to the three previously-unreported residual glycaemic, body fat, and metabolic syndrome clusters (Supplementary Figure 8, Supplementary Table 12).
Table 1.
Mechanistic cluster | Cardiometabolic profile | Number of T2D associations | Exemplar loci | Physiological impact | |
---|---|---|---|---|---|
Insulin secretion | Insulin sensitivity | ||||
Beta-cell +PI | +FG*, +2hG*, +HbAlc, +PI* | 91 | TCF7L2, KCNQ1, CDKAL1, CDKN2A-CDKN2B, SLC30A8 | − | + |
Beta-cell −PI | +FG*, +2hG*, +HbAlc, −PI* | 89 | CDC123-CAMK1D, HNFIB, KCNJ11-ABCC8, HNF4A, HNFIA | − | + |
Residual glycaemic | +FG*, +HbAlc | 389 | GCC1-PAX4-LEP, ANKRD55, GCKR, UBE2E2 | − | − |
Body fat | +body fat, +ASAT* | 273 | ZMIZ1, HMGA2, CTBP1 | + | − |
Metabolic syndrome | +FG*, +FI*, +WHR, +VAT*, −GFAT*, +TG, −HDL, +BP | 166 | IGF2BP2, CCND2, HHEX- IDE, JAZF1, GPSM1 | + | − |
Obesity | +BMI, +WHR, +body fat, +BMR, +TG, −HDL | 233 | FTO, MC4R, MACF1, TMEM18 | + | − |
Lipodystrophy | +FI*, +WHR, −body fat, −GFAT*, +TG, −HDL, +BP | 45 | IRS1, GRB14-COBBL1, PPARG | + | − |
Liver/lipid metabolism | −LDL, −TC, +liver fat, +liver biomarkers | 3 | TOMM40-APOE-GIPR, TM6SF2, PNPLA3 | − | − |
+/−: T2D risk alleles associated with increased/decreased phenotype values.
FG: fasting glucose. FI: fasting insulin. 2hG: two-hour glucose. HbA1c: glycated haemoglobin. PI: proinsulin. BMI: body mass index. WHR: waist-hip ratio. VAT: visceral adipose tissue volume. ASAT: abdominal subcutaneous adipose tissue volume. GFAT: gluteofemoral adipose tissue volume. LDL: low-density lipoprotein cholesterol. HDL: high-density lipoprotein cholesterol. TC: total cholesterol. TG: triglycerides. BMR: basal metabolic rate. BP: blood pressure.
adjusted for BMI.
T2D risk alleles at index SNVs in the two beta-cell dysfunction clusters are associated with increased fasting glucose (FG), two-hour glucose, and glycated haemoglobin (HbA1c), and with decreased fasting insulin (FI). Index SNVs in both clusters are also associated with PI, but with opposite directions of effect for the T2D risk allele. These findings are consistent with observed differences in directions of effect of PI-associated SNVs with FG21, which differentiate signals that reflect defects in the insulin synthesis pathway that occur upstream (decreased PI) or downstream (increased PI) of the first enzymatic step of the conversion of PI to insulin and C-peptide32. The beta-cell +PI cluster, where the T2D risk allele is associated with increased PI, comprises 91 index SNVs, which include T2D associations that map to loci encompassing genes with established roles in beta-cell function (such as TCF7L2 and CDKAL1). The beta-cell -PI cluster, where the T2D risk allele is associated with decreased PI, comprises 89 index SNVs, which include T2D associations that map to monogenic diabetes genes that play a role in beta-cell development (such as HNF1A, HNF4A, HNF1B, KCNJ11, and ABCC8).
The clusters reflecting mechanisms of insulin resistance mediated via obesity, lipodystrophy, and liver/lipid metabolism, include index SNVs that are associated with anthropometric measures and circulating plasma lipid levels. T2D risk alleles at the 233 index SNVs in the obesity cluster are associated with increased BMI, waist-hip ratio (WHR), body fat percentage and basal metabolic rate, and with decreased high-density lipoprotein (HDL) cholesterol. The index SNVs assigned to this cluster include T2D associations that map to loci previously reported for BMI (such as FTO and MC4R). The lipodystrophy cluster comprises 45 index SNVs for which T2D risk alleles are associated with increased FI, WHR, blood pressure, and triglycerides, and with decreased body fat percentage, gluteofemoral adipose tissue (GFAT) volume, and HDL cholesterol. Index SNVs in this cluster include T2D associations that map to loci previously reported for regional adiposity (such as IRS1, GRB14, and PPARG). T2D risk alleles at the three index SNVs assigned to the liver/lipid metabolism cluster are associated with increased liver fat and liver-related biomarkers, and with decreased low-density lipoprotein cholesterol and total cholesterol. These index SNVs map to loci that have been previously implicated in non-alcoholic fatty liver disease (such as TM6SF2 and PNPLA3).
The cardiometabolic profiles of the three remaining clusters are less well-defined, likely reflecting the weaker T2D associations of index SNVs assigned to them (Supplementary Figure 8). T2D risk alleles at the 166 index SNVs assigned to the metabolic syndrome cluster are associated with increased FG, WHR, triglycerides, and blood pressure, and with decreased HDL cholesterol, which together are used to define the syndrome. T2D risk alleles in this cluster are also associated with increased FI, and with accumulations of unhealthy fat depots: increased visceral adipose tissue (VAT) volume and liver fat, and with decreased GFAT volume. Observationally, individuals with the metabolic syndrome are at increased risk of T2D33, although Mendelian randomisation studies indicate that a causal effect is driven by increased waist circumference and increased FG34. T2D risk alleles at the 273 index SNVs assigned to the body fat cluster are associated with increased abdominal subcutaneous adipose tissue volume, VAT volume, and body fat percentage. Whilst the profile of associations with cardiometabolic phenotypes share these features in common with obesity-mediated insulin resistance, index SNVs in the body fat cluster are not strongly associated with BMI, lipid levels, or basal metabolic rate. Previous investigations have highlighted that body fat percentage is predictive of abnormal blood glucose in individuals with a healthy BMI35. Finally, T2D risk alleles at the 389 index SNVs assigned to the residual glycaemic cluster are most strongly associated with increased FG and HbA1c, but unlike the two beta-cell clusters, are not associated with PI or decreased FI.
Clusters are differentially associated with insulin-related endophenotypes.
The clustering provides a framework to better understand the diverse physiological processes through which T2D develops. We assessed the association of index SNVs with insulin-related endophenotypes that were not used for clustering and derived from hyperinsulinemic-euglycemic clamp assessments and oral glucose tolerance tests (OGTT)36, and homeostatic model assessment measures of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR)37 (Methods). We observed significant heterogeneity in the effects of T2D risk alleles at index SNVs between clusters on HOMA-B (PHET<2.2×10−16), HOMA-IR (PHET=4.1×10−15), insulin secretion (OGTT-derived area under the curve for insulin normalised for glucose from baseline to 30 minutes, PHET=0.0026), and insulin sensitivity (clamp-derived glucose infusion rate, PHET=0.026). T2D risk alleles at index SNVs showed a gradient of effects on these correlated measures across clusters (Supplementary Figure 9, Supplementary Tables 13 and 14), representing a cline from insulin production and processing in the two beta-cell clusters (increased insulin sensitivity; decreased insulin secretion, HOMA-B, and HOMA-IR) through to insulin resistance (decreased insulin sensitivity; increased insulin secretion, HOMA-B, and HOMA-IR) that was most extreme in the lipodystrophy cluster.
Clusters are differentially associated with insulin resistance-related disorders.
T2D is genetically correlated with other insulin resistance-related disorders, including gestational diabetes mellitus (GDM) and polycystic ovary syndrome (PCOS). To understand the shared biological pathways driving these genetic correlations, we extracted association summary statistics for each T2D index SNV from the largest available published GWAS for both disorders38,39 (Methods). We observed significant heterogeneity in the effects of T2D risk alleles at index SNVs between clusters for both disorders (Supplementary Figure 10, Supplementary Table 15): GDM (PHET=7.0×10−16) and PCOS (PHET=0.00022). Index SNVs in the beta-cell +PI cluster demonstrated the strongest associations with GDM. This cluster includes T2D index SNVs that overlap with association signals previously reported for GDM, mapping to/near MTNR1B, CDKAL1, TCF7L2, and CDKN2A-CDKN2B, consistent with hyperglycaemia due to beta-cell dysfunction on a background of pregnancy-induced physiologic insulin resistance40. In contrast, PCOS is most strongly associated with index SNVs in the obesity cluster, consistent with previous reports that the genetic correlation with T2D is primarily driven by higher BMI41.
Clusters are differentially enriched for cell-type specific regions of open chromatin.
To gain insight into tissue-specific regulatory processes underpinning mechanistic clusters, we integrated T2D association signals with Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) peaks for 222 cell types derived from adult and foetal tissues via single-cell atlases of chromatin accessibility (CATLAS and DESCARTES)42,43. This expands the range of cell types considered over previous efforts to understand cluster-specific enrichment for regulatory processes that were conducted only in bulk tissue or single-cell data in pancreatic islets7,8,31. Index SNVs assigned to six of the eight clusters were significantly enriched (P<0.00023, Bonferroni correction for 222 cell types) for regions of open chromatin that were specific to 20 cell types when compared to “null” SNVs that were not in LD with index SNVs, defined as r2<0.05, in any ancestry group (Figure 2, Supplementary Table 16, Methods).
We observed significant enrichment for regions of open chromatin in foetal islets and adult neuroendocrine cells in pancreatic islets (alpha, beta, gamma, and delta) in the beta-cell +PI, beta-cell −PI, and residual glycaemic clusters. The beta-cell −PI cluster was also enriched in adult enterochromaffin cells, a type of enteroendocrine cell that plays an essential role in regulating intestinal motility and secretion in the gastrointestinal tract44. Enterochromaffin cells are a major target for GLP-1 and highly express GLP-1 receptor, whose agonists are widely used as medications for T2D45. The residual glycaemic cluster was also enriched in foetal and adult pancreatic ductal cells.
Some cell types important in the pathogenesis of obesity, such as the hypothalamus in the central nervous system, are not sufficiently covered by CATLAS/DESCARTES, and so were not available for this analysis. However, the obesity cluster was enriched for regions of open chromatin in adult pancreatic islets, although not as strongly as the beta-cell clusters. Enrichment was observed only for alpha, gamma, and delta cells, suggesting potential alternative pathways through which islets impact the development of T2D than through insulin secretion from beta cells. The obesity cluster was further enriched in foetal adrenal gland (chromaffin cells and adrenal neurons), foetal heart (ventricular cardiomyocytes), and foetal kidney (metanephric cells). Previous studies have reported enrichment of BMI loci/heritability for epigenomic annotations in pancreatic islets and adrenal gland46,47, consistent with our findings.
The remaining four clusters (lipodystrophy, metabolic syndrome, body fat, and liver/lipid metabolism) were not significantly enriched for regions of open chromatin in pancreatic islets. The lipodystrophy cluster was enriched only in adult adipocytes, which confirms previous reports in bulk adipose tissue8,31. Consistent with these results, association signals for WHR, triglycerides, and HDL cholesterol, which are strongly impacted by index SNVs in the lipodystrophy cluster, have been shown to be enriched in candidate cis-regulatory elements in adipocytes43. The metabolic syndrome cluster was enriched in cells that reside in the walls of blood vessels (adult pericytes and foetal endothelial cells), foetal kidney (mesangial cells), and foetal fibroblasts. Association signals for systolic and diastolic blood pressure, a key component of the metabolic syndrome, have been shown to be enriched in candidate cis-regulatory elements in these cell types43. Endothelial dysfunction is not only a consequence of insulin resistance, but impairs insulin signalling to further reduce insulin sensitivity, thereby providing a pathophysiological mechanism that links metabolic and cardiovascular components of metabolic syndrome48. We observed no significant enrichments in the body fat cluster or liver/lipid metabolism cluster.
Ancestry-correlated heterogeneity differs between mechanistic clusters.
We next sought to investigate whether the observed ancestry-correlated differences in allelic effects on T2D between ancestry groups varied across mechanistic clusters. To do this, we compared the magnitude and direction of association of index SNVs in each cluster with the first two axes of genetic variation that drive ancestry-correlated heterogeneity at T2D association signals (Methods). We observed significant differences in mean Z-scores for association between clusters for both the AFR-EAS axis (P=5.0×10−6) and the AFR-EUR axis (P=1.5×10−6). Index SNVs in the two beta-cell clusters were most positively associated with the AFR-EAS axis, indicating allelic effects on T2D that were greater in EAS than in AFR and EUR GWAS (Supplementary Figure 11, Supplementary Table 17). In contrast, index SNVs in the lipodystrophy and obesity clusters were most positively associated with the AFR-EUR axis, indicating allelic effects on T2D that were greater in EUR than in EAS or AFR GWAS. These results indicate that ancestry-correlated heterogeneity varies between mechanistic clusters, which may have clinical implications for T2D management across ancestry groups.
Cluster-specific partitioned GRS are associated with vascular outcomes across ancestry groups.
The major complications in individuals with T2D are macrovascular outcomes including coronary artery disease (CAD), ischemic stroke, and peripheral artery disease, and progression to severe microvascular complications, including end-stage diabetic nephropathy (ESDN) and proliferative diabetic retinopathy. We tested for association of a cluster-specific partitioned GRS with these clinical outcomes in up to 137,559 individuals (including 10,159 T2D cases) across four ancestry groups (AFR, EAS, EUR, and HIS) from the All of Us Research Program and Biobank Japan (Methods). These individuals were not included in the multi-ancestry meta-analysis, and so allowed us to avoid potential inflated type I error rates due to overlap between the discovery and testing datasets. To maximise sample size, we tested macrovascular outcomes in all individuals, adjusted for T2D status, and we tested microvascular complications only in individuals with T2D (Methods, Supplementary Table 18). We first conducted GRS association analyses within ancestry groups, and subsequently aggregated results across ancestry groups via fixed-effects meta-analysis (Methods, Supplementary Table 19). We also tested the association of an overall GRS, agnostic to cluster membership, to assess the evidence that mechanistic clusters in the partitioned GRS are differentially associated with T2D-related clinical outcomes. Figure 3 provides an overview of the associations of the overall GRS and each cluster-specific component of the partitioned GRS with the five T2D-related clinical outcomes across ancestry groups.
We observed significant positive association (P<0.0063, Bonferroni correction for eight clusters) of two components of the partitioned GRS with CAD: the lipodystrophy cluster (OR=1.04 per standard deviation of GRS, P=0.00020) and the obesity cluster (OR=1.03, P=0.0036). There was no evidence of heterogeneity in the effects of these two clusters on CAD across ancestry groups (Supplementary Figure 12, Supplementary Table 19). Importantly, after adjustment for a CAD GRS derived from a recently-published multi-ancestry meta-analysis of CAD GWAS49, the positive CAD association with both components of the GRS remained nominally significant (Supplementary Figure 13, Supplementary Table 20): lipodystrophy cluster (OR=1.03, P=0.029) and obesity cluster (OR=1.04, P=0.011). These results demonstrate that additional information is provided by the partitioned GRS for association with CAD in individuals with T2D beyond a more general CAD GRS. CAD association signals have been reported to be enriched for annotations (from transcriptomics and epigenomics) in bulk tissues including aorta/arteries, heart, adrenal gland, and adipose49–51. This is in line with the observed enrichment of T2D index SNVs assigned to the obesity and lipodystrophy clusters in regions of open chromatin in foetal ventricular cardiomyocytes, foetal adrenal neuron and adult chromaffin cells in the adrenal gland, and adult adipocytes, providing a clear link to shared biological mechanisms driving development of both diseases. Whilst no components of the partitioned GRS were significantly associated with the other macrovascular outcomes, the obesity cluster had the strongest effect on both ischemic stroke (OR=1.04, P=0.022) and peripheral artery disease (OR=1.05, P=0.014). T2D risk alleles at index SNVs in the obesity cluster were strongly associated with increased C-Reactive Protein, indicating a potentially more pronounced role of inflammation in the development of ischemic stroke and peripheral artery disease than in CAD. We observed no significant evidence for association of the overall GRS with CAD (P=0.12), ischemic stroke (P=0.16), or peripheral artery disease (P=0.93). These results therefore highlight the advantages of the partitioned GRS over an overall GRS for detecting association with T2D-related macrovascular outcomes and providing insight into the biological processes that lead to these disease complications.
We observed significant positive association of the obesity cluster from the partitioned GRS with ESDN (OR=1.25, P=0.0022). We also observed a significant negative association of the beta-cell +PI cluster with ESDN (OR=0.82, P=0.0048). There was no evidence of heterogeneity in the effects of these two clusters on ESDN across ancestry groups, (Supplementary Figure 14, Supplementary Table 19), and the overall GRS was not associated with ESDN (P=0.97). T2D index SNVs assigned to the obesity cluster are enriched for regions of open chromatin in foetal metanephric cells, which is consistent with the observation that association signals for estimated glomerular filtration rate, a measure of renal function used to define chronic kidney disease, are enriched for kidney tissue-specific regulatory annotations52. The obesity cluster from the partitioned GRS was also significantly positively associated with proliferative diabetic retinopathy (OR=1.19, P=0.0041), with no evidence of heterogeneity in effects across ancestry groups (Supplementary Figure 15, Supplementary Table 19). However, in contrast to ESDN, the overall GRS was more significantly associated with proliferative diabetic nephropathy (OR=1.32, P=4.3×10−6) than any component of the partitioned GRS. Taken together, these results suggest that ESDN is disproportionately associated with obesity and confirm previous reports that proliferative diabetic retinopathy is driven by hyperglycaemia53 and therefore more strongly associated with the overall burden of T2D risk variants.
DISCUSSION
T2D is a heterogeneous polygenic disease, which is typically diagnosed based solely on elevated glucose levels and treated without regard to the diverse pathophysiological processes that lead to the development and progression of hyperglycaemia. Instead, management strategies involve trial and error following treatment algorithms that are not informed by personalised pathways to disease. The complications of T2D have substantial morbidity and mortality and thus pose a huge public health burden. A more detailed understanding of the biological mechanisms through which the disease develops and progresses to T2D complications are therefore urgently needed and eagerly anticipated.
To better understand the aetiological heterogeneity of T2D across diverse populations, we assembled the largest collection of T2D GWAS for five ancestry groups through the Type 2 Diabetes Global Genomics Initiative. By increasing the effective sample size by almost three-fold compared to previous efforts, we identified a total of 611 loci attaining genome-wide association significance, 145 (23.7%) of which have not been previously reported. Of the 1,289 signals mapping to these loci, 589 (45.7%) would not have been identified without inclusion of GWAS from under-represented ancestry groups. We observed a highly significant enrichment of T2D association signals with evidence of ancestry-correlated heterogeneity. BMI could not fully explain this enrichment, indicating that interactions with other environmental/lifestyle factors that vary with ancestry are also likely to contribute to the observed differences in allelic effects between GWAS.
Within the landscape of the genetic architecture of T2D, we identified eight clusters of index SNVs with distinct profiles of associations with 37 cardiometabolic phenotypes, which defined pathophysiologic-relevant groupings. The addition of previously-unreported T2D signals identified through the multi-ancestry meta-analysis helped define three clusters that were not detected in previous clustering efforts7,8,31. These three clusters have cardiometabolic profiles that are consistent with T2D-related biology: residual glycaemic effects, accumulations of body fat, and the metabolic syndrome. Integration of cluster-specific T2D association signals with single-cell chromatin accessibility data across diverse cell types provided new insights into distinct tissue-specific regulatory processes and biological pathways through which these mechanistic clusters are likely to lead to disease. Finally, we leveraged the information captured by these mechanistic clusters to develop a partitioned GRS that was more strongly associated with T2D-related macrovascular outcomes and ESDN than an overall T2D GRS. Whilst the effect sizes of the cluster-specific components of the partitioned GRS were small, they enhance understanding of key biological processes driving heterogeneity in these clinical features of T2D phenotypes, and motivate future work to strengthen these effects through identification of additional, statistically-compelling T2D associations in larger sample sizes. Previous efforts have highlighted associations of cluster-specific components of a partitioned GRS with CAD and renal function in EUR populations54. For the first time across multiple ancestry groups, we demonstrated more significant associations of clinical outcomes with cluster-specific components of the partitioned GRS than with the overall GRS, suggesting that development of complications may be associated with genetic burden in particular biological pathways that is consistent between populations.
Our findings demonstrate the value of integrating multi-ancestry GWAS of T2D and cardiometabolic traits with single-cell epigenomics across diverse tissues to disentangle the aetiological heterogeneity driving the development and progression of T2D across population groups. Improved understanding of the varied pathophysiological processes that link T2D to vascular outcomes may offer a route to genetically-informed diabetes care and global opportunities for the clinical translation of T2D GWAS findings.
METHODS
Ethics statement.
Study-level ethics statements are provided in the Supplementary Note.
Study-level analyses.
Within each study, we assigned individuals to one of five ancestry groups using self-report and genetic background (Supplementary Tables 1 and 2). Any individuals that could not be assigned to a single ancestry group were excluded as population outliers. Within each ancestry group-specific GWAS, we conducted quality control of genotype data. Subsequently, we pre-phased individuals and imputed up to reference panels from the Trans-Omics for Precision Medicine Program10, Haplotype Reference Consortium11, 1000 Genomes Project (phase 1, March 2012 release; phase 3, October 2014 release)12,13, or population-specific whole-genome sequencing55–61 (Supplementary Table 3). Studies imputed to reference panels mapped to GRCh38/hg38 were lifted back to hg19 using the UCSC liftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver). We excluded SNVs with poor imputation quality and/or minor allele count (MAC) <5 from downstream association analyses (Supplementary Table 3).
Within each ancestry group-specific GWAS, we tested for association of each SNV with T2D via generalised linear regression, under an additive model in the dosage of the minor allele, with adjustment for age and sex (where appropriate), and additional study-specific covariates (Supplementary Table 3). We employed different strategies to account for population stratification and/or kinship: (i) exclude closely related individuals and adjust for principal components derived from a genetic relatedness matrix (GRM) as additional covariates in the regression model; or (ii) incorporate a random effect for the GRM in a mixed model (Supplementary Table 3). Allelic effects and corresponding standard errors that were estimated from a linear mixed model were converted to the log-odds scale62. We corrected study-level association summary statistics for residual structure by the LD-score regression intercept63 (Supplementary Table 3), which was calculated using an LD reference that we derived from ancestry-matched haplotypes from the 1000 Genomes Project (phase 3, October 2014 release)13.
Multi-ancestry meta-analyses.
We analysed autosomal bi-allelic SNVs that overlap reference panels from the 1000 Genomes Project reference panel (phase 3, October 2014 release)13 and the Haplotype Reference Consortium11. We considered only those SNVs with MAF > 0.5% in haplotypes in at least one of the five major ancestry groups in the 1000 Genomes Project (phase 3, October 2014 release)13. We excluded SNVs that differed in allele frequency by >20% when comparing reference panels in the same subsets of haplotypes.
We used meta-regression, implemented in MR-MEGA14, to aggregate association summary statistics across GWAS. MR-MEGA models allelic effect heterogeneity that is correlated with genetic ancestry by including axes of genetic variation as covariates in the meta-regression model to capture population diversity between GWAS. We used SNVs that were reported in all studies to construct a distance matrix of mean effect allele frequency differences between each pair of GWAS. We implemented multi-dimensional scaling of the distance matrix to obtain three principal components that represent axes of genetic variation to separate GWAS from the five major ancestry groups (Supplementary Figure 1).
For each SNV, we aggregated inverse-variance weighted allelic effects across GWAS via linear regression, including three axes of genetic variation as covariates. We tested for: (i) association with T2D allowing for ancestry-correlated allelic effect heterogeneity between GWAS; (ii) ancestry-correlated allelic effect heterogeneity between GWAS (defined by the axes of genetic variation); and (iii) residual allelic effect heterogeneity between GWAS (due to differences in environmental exposures and/or study design). We also aggregated association summary statistics across GWAS via fixed-effects meta-analysis (inverse-variance weighting of allelic effects) using METAL64. To assess the extent of residual structure between GWAS, we calculated the genomic control inflation factor65 for the multi-ancestry meta-regression and fixed-effects meta-analysis.
Defining T2D signals and loci.
We began by identifying all SNVs attaining the traditional genome-wide significance threshold (P<5×10−8) for association with T2D from the multi-ancestry meta-regression. Clumps were formed around index variants, which were selected using a greedy algorithm in PLINKv1.966, after ranking SNVs by ascending P-value. SNVs <5Mb from an index SNV were assigned to the clump if r2>0.05 in at least one of the five ancestry groups using haplotypes from the 1000 Genomes Project (phase 3, October 2014 release)13. Index SNVs separated by <1Mb were assigned to the same locus. Each locus was then defined as mapping 500kb up- and down-stream of index SNVs contained within it. We considered the locus to have been previously reported if it contained variants discovered in published large-scale T2D GWAS at genome-wide significance.
Ancestry group-specific meta-analyses.
We aggregated association summary statistics across GWAS from the same ancestry group via fixed-effects meta-analysis based on inverse-variance weighting of allelic log-OR to obtain effect size estimates using METAL64. We estimated the mean effect allele frequency across GWAS from each ancestry group, weighted by the effective sample size of the study. We generated forest plots of association summary statistics of index SNVs across ancestry groups using the R package meta (https://cran.r-project.org/package=meta/).
Contribution of each of axes of genetic variation to ancestry-correlated heterogeneity.
For each index SNV, we calculated a Z-score (beta/SE) for association with each axis of variation by aligning the effect from the meta-regression model to the T2D-risk allele. For each index SNV, we identified the axis of genetic variation with the strongest association (greatest magnitude Z-score).
Impact of BMI on ancestry-correlated heterogeneity in allelic effects.
For each index SNV, we aggregated inverse-variance weighted allelic effects across GWAS via linear regression, implemented in MR-MEGA14, including as covariates: (i) three axes of genetic variation; and (ii) mean BMI in controls. We tested for: (i) ancestry-correlated allelic effect heterogeneity between GWAS, after adjustment for BMI; and (ii) allelic effect heterogeneity due to BMI, after adjustment for ancestry. For the jth index SNV, we estimated the meta-regression coefficient and corresponding variance, per unit of BMI, denoted by γj and νj, respectively. Across all index SNVs, the mean effect on log-OR per unit of BMI was given by ∑j γj/νj, with corresponding variance [∑j 1⁄νj]−1.
European and East Asian ancestry group meta-regression.
For each SNV, we aggregated inverse-variance weighted allelic effects across EAS and EUR GWAS via linear regression, including only the first axis of genetic variation as a covariate, since this was sufficient to separate the two ancestry groups (Supplementary Figure 1). We tested for association with T2D allowing for ancestry-correlated allelic effect heterogeneity between GWAS under this model.
Defining clusters of T2D index SNVs with distinct cardiometabolic profiles.
We considered cardiometabolic-related quantitative phenotypes that are used to define T2D status and/or are associated with risk of T2D or complications. We excluded phenotypes for which GWAS summary statistics were available only after imputation to reference panels from the International HapMap Project67 because they did not provide sufficient coverage of SNVs included in the multi-ancestry meta-analysis. We considered the largest available GWAS meta-analysis (ancestry-specific or multi-ancestry) that provided the following association summary statistics for each SNV: effect allele, other allele, effect estimate aligned to the effect allele, and standard error of the effect estimate. A summary of phenotypes, GWAS resources, and references are available in Supplementary Table 9. We re-aligned the effect estimate to the T2D risk allele from the fixed-effects multi-ancestry meta-analysis, denoted βij for the jth index SNV and the ith phenotype. We then calculated a sample size corrected Z-score, given by , where sij is the standard error of the effect estimate of the jth index SNV and the ith phenotype, and Ni is the maximum sample size reported for the ith phenotype. Standardisation by sample size enabled a more uniform weighting of cardiometabolic phenotypes in the subsequent clustering of index SNVs. Where association summary statistics were not reported for the jth index SNV and the ith phenotype, the Z-score was set as “missing”.
Previous efforts to construct partitioned GRS for T2D have implemented “soft clustering” approaches, such as Bayesian non-negative matrix factorisation, that generate weights for cluster membership for each index SNV8. Assignment of index SNVs to clusters is then determined given a threshold weight for cluster membership, allowing for the possibility that a T2D association signal impacts on disease through multiple pathophysiological pathways. In practice, however, the choice of threshold for cluster membership is subjective, and some index SNVs remain unassigned. Bayesian non-negative matrix factorisation also considers positive and negative associations with the same phenotype as independent variables in the clustering, and cannot allow for missing phenotype associations, relying instead on the use of proxy SNVs, which can be challenging to identify in a multi-ancestry context.
To address these potential limitations, we conducted K-means clustering of index SNVs with imputation of missing Z-scores using the R package ClustImpute (https://cran.r-project.org/package=ClustImpute). For a pre-defined number of clusters, ClustImpute replaces missing Z-scores at random from the marginal distribution for the phenotype in the first iteration and performs K-means clustering. In subsequent iterations, missing Z-scores are updated, conditional on the current cluster assignment, so that correlations between phenotypes are considered. At each iteration, penalizing weights are imposed on imputed values and successively decreased (to zero) as the missing data imputation improves. In this “hard clustering” approach, each index SNV is assigned to exactly one cluster. Finally, we determined the “optimal” number of clusters according to the majority rule across 27 indices of cluster performance30, implemented in the R package NbClust (https://cran.r-project.org/package=NbClust).
We tested for association of the ith phenotype with index SNVs across clusters in a linear regression model, given by E(Zij) = ∑k γikCjk, where Cjk is an indicator variable that takes the value “1” if the jth index SNV was assigned to the kth cluster and “0” otherwise. The strength/direction of the association of each phenotype with each cluster was then presented in a heatmap, where the “temperature” was defined by the direction of the regression coefficient γik and the corresponding −log10 P-value. Regression models were fitted using the glm function in R.
Cluster-specific associations of index SNVs with T2D.
We tested for association of T2D with index SNVs across clusters in a linear regression model, given by E(βj) = ∑k γkCjk, where Cjk is an indicator variable that takes the value “1” if the jth index SNV was assigned to the kth cluster and “0” otherwise, and weighted by the inverse of the variance of the allelic effect. We tested for heterogeneity in cluster effects on each endophenotype by comparing the deviance of this model with that of E(βj) = γ0, again weighted by the inverse of the variance of the allelic effect. Regression models were fitted using the glm function in R.
Cluster-specific associations of index SNVs with insulin-related endophenotypes and insulin resistance-related disorders.
We extracted association summary statistics for measures of glucose homeostasis derived from hyperinsulinemic-euglycemic clamp assessments and oral glucose tolerance tests (OGTT) performed by the GUARDIAN Consortium36, which were obtained from GWAS undertaken in up to 1,316 Mexican American participants without diabetes from the Mexican American Coronary Artery Disease (MACAD) study68 and the Hypertension and Insulin Resistance (HTN-IR) study69. The measures used were: insulin sensitivity (clamp-derived glucose infusion rate in 1,316 participants from MACAD and HTN-IR); insulin clearance (clamp-derived metabolic clearance rate of insulin in 1,261 participants from MACAD and HTN-IR); and insulin secretion (OGTT-derived area under the curve for insulin normalised for glucose from baseline to 30 minutes in 513 participants from MACAD). We also extracted association summary statistics for homeostatic model assessment measures of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) from published GWAS meta-analyses of up to 36,466 non-diabetic individuals of European ancestry from MAGIC37. Note that we did not include HOMA measures as part of the cardiometabolic phenotype clustering because they were imputed only up to reference panels from the International HapMap Consortium67, and thus included a high proportion of missing Z-scores across index SNVs that we anticipated would introduce too much noise with ClustImpute. We also extracted association summary statistics for insulin resistance-related disorders from published GWAS meta-analyses of: (i) 5,485 GDM cases and 347,856 female controls of diverse ancestry from the GenDIP Consortium39; and (ii) 10,074 PCOS cases and 103,164 female controls of European ancestry38.
For each endophenotype/disorder, we aligned the effect estimate to the T2D risk allele from the fixed-effects multi-ancestry meta-analysis, denoted βj for the jth index SNV. We then calculated the Z-score, given by Zj = βj/sj, where sj is the standard error of the effect estimate of the jth index SNV. We tested for association of each endophenotype with index SNVs across clusters in a linear regression model, given by E(Zj) = ∑k γkCjk, where Cjk is an indicator variable that takes the value “1” if the jth index SNV was assigned to the kth cluster and “0” otherwise. We tested for heterogeneity in cluster effects on each endophenotype by comparing the deviance of this model with that of E(Zj) = γ0. Regression models were fitted using the glm function in R.
Enrichment of T2D associations for cell type-specific regions of open chromatin within clusters.
For each T2D association signal, we defined “null” SNVs that mapped within 50kb of the index SNV and were not in LD (r2>0.05) with the index SNV in any of the five major ancestry groups using haplotypes from the 1000 Genomes Project (phase 3, October 2014 release)13. We defined an indicator variable, Yj, taking the value “1” if the jth SNV is an index SNV and “0” if the jth SNV is a null SNV. We mapped index SNVs and null SNVs to genic regions, as defined by the Ensembl Project (release 104)70, including protein-coding exons, and 3’ UTRs and 5’ UTRs. We defined indicator variables, , , and , that each take the value “1” if the jth SNV mapped to the respective genic annotation and “0” otherwise. We also mapped index SNVs and null SNVs to ATAC-seq peaks for 222 cell types derived from adult and foetal tissues via single-cell atlases of chromatin accessibility (CATLAS and DESCARTES)42,43. We defined an indicator variable, Xij, that takes the value “1” if the jth SNV mapped to an ATAC-seq peak for the ith cell type and “0” otherwise.
Within each cluster, we modelled enrichment of T2D associations for ATAC-seq peaks in the ith cell type, after accounting for genic annotations, in a Firth bias-reduced logistic regression, given by
where f is the logit link function. In this expression, α0 is an intercept, αEXON, α3UTR, and α5UTR are log-fold enrichments of genic annotations, and θi is the log-fold enrichment of ATAC-seq peaks in the ith cell type. We conducted a test of enrichment of the ith cell type by comparing the deviances of models in which θi = 0 and θi is unconstrained. We identified cell types with significant evidence of enrichment (P<0.00023, Bonferroni correction for 222 cell types). All models were fitted using the R package logistf (https://cran.r-project.org/package=logistf).
Difference in ancestry-correlated heterogeneity between mechanistic clusters.
We tested for differences in Z-scores (beta/SE) for association of index SNVs in each cluster with the ith axis of genetic variation by comparing two linear models via ANOVA: (i) f−1(Zij) = τ0i; and (ii) f−1(Zij) = ∑k τkiCjk. In these expressions: f is the identity link function; Zij is Z-score for the jth index SNV; Cjk is an indicator variable that takes the value “1” if the jth index SNV was assigned to the kth cluster and “0” otherwise; and τ0i and τki are regression coefficients. Regression models were fitted using the glm function in R.
Cluster-specific partitioned GRS analyses.
We tested for association of a cluster-specific partitioned GRS and an overall GRS with T2D-related macrovascular outcomes (CAD, ischemic stroke, and peripheral artery disease) and microvascular complications (ESDN and proliferative diabetic retinopathy) in participants from the All of Us Research Program (AoURP, multiple ancestry groups) and Biobank Japan (BBJ, East Asian ancestry).
AoURP cohort description, sequencing, quality control, and phenotype derivation.
We considered 78,260 participants with whole-genome sequencing (WGS) and electronic health record (EHR) data from the AoURP Controlled Tier Dataset v571,72. Details of the generation and quality control of the genomic data can be found in the AoURP Genomic Quality Report release C2021Q3R5 (https://support.researchallofus.org/hc/en-us/article_attachments/7002034709652/All_Of_Us_Beta_Release_Genomic_Quality_Report__4_.pdf). We used genetic ancestries defined in the AoURP Genomic Quality Report release C2021Q3R5. We removed related individuals in the maximal independent set (kinship score >0.1) as described in the AoURP Genomic Quality Report release C2021Q3R5. To reduce the computational burden of the WGS dataset, we considered only bi-allelic SNVs with MAF ≥0.1% across all participants. To correct for population structure, within each ancestry, we derived principal components using Hail v0.2.107 (https://hail.is/docs/0.2/index.html). In these calculations, we excluded SNVs that were not present in the 1000 Genomes Project (phase 3, October 2014 release) reference panel13. We also excluded SNVs with MAF <1% and deviation from Hardy-Weinberg equilibrium (P<10−6), and subsequently extracted autosomal LD-pruned SNVs (r2<0.05). Cases of T2D, T2D-related macrovascular outcomes, and microvascular complications were derived from the combination of diagnosis codes (ICD-9-CM and ICD-10-CM), drug exposures, and LOINC codes for laboratory test results, extracted from EHR data. Full details are provided in the Supplementary Methods.
BBJ cohort description, genotyping, quality control, and phenotype derivation.
BBJ is a multi-institutional hospital-based registry that comprises DNA and medical records from individuals of Japanese ancestry73,74. The first BBJ cohort comprises approximately 200,000 participants with at least one of 47 common diseases collected between 2003 and 2007. The second BBJ cohort comprises approximately 67,000 participants with at least one of 38 common diseases collected between 2013 and 2017. Physicians of 66 cooperating hospitals determined the eligibility of cases. Only those individuals who were not included in the multi-ancestry meta-analysis were considered for testing of the partitioned GRS.
Genomic DNA was prepared following standard protocols from peripheral blood samples and genotyped using the Illumina Asian Screening Array, following the manufacturer’s instructions. We excluded individuals with call rate <98% and outliers from the cluster of East Asian populations based on principal component analysis with reference individuals from Phase II HapMap67. We excluded SNVs with call rate <99%, MAC <5, exact Hardy-Weinberg equilibrium P<10−10, and >5% difference in MAF when compared with Japanese whole-genome sequence data58,75 and the Tohoku Medical Megabank Project76. After quality control, we performed pre-phasing using SHAPEIT477. Phased haplotypes were imputed to the combined reference panel of 1000 Genomes Project Phase 3 and Japanese whole-genome sequencing data from 1,037 individuals58,75 using Minimac478. We subsequently excluded individuals with a mismatch between inferred genetic sex and sex registered in clinical information, who were not in a set of unrelated individuals defined by using PLINK with king-cutoff <0.09375, or were outliers of heterozygosity rates (more than 5 SD from the mean). To correct for population structure, we derived principal components using PLINKv2.079, calculated from a set of autosomal LD-pruned SNVs (r2<0.1) with MAF ≥0.5% after excluding the major histocompatibility complex region.
We selected participants of at least 18 years of age for GRS analyses. We defined T2D cases as participants with a diagnosis of T2D, made by physicians at participating hospitals, but not type 1 diabetes, mitochondrial diabetes, maturity-onset diabetes of the young, or any other type of diabetes80. We extracted cases of microvascular complications from medical records in which diagnosis was made by physicians at participating hospitals. We defined controls for microvascular complications as T2D cases without any diagnosis of diabetic nephropathy or diabetic retinopathy. We defined CAD as a composite of stable angina, unstable angina, and myocardial infarction. These conditions, in addition to ischemic stroke and peripheral artery disease, were diagnosed by physicians at collaborating hospitals based on general medical practices following relevant guidelines.
GRS association analyses in AoURP and BBJ.
We conducted analyses separately for each ancestry group in AoURP (African, American, and European) and BBJ (East Asian). For each ancestry, we performed GRS analyses for T2D-related macrovascular outcomes using all individuals, irrespective of T2D status, and for microvascular complications in individuals with T2D only. For each analysis, we calculated overall and cluster-specific GRS for each individual, with each index SNV weighted by the allelic log-OR from the ancestry-specific meta-analyses. We did not include index SNVs with MAF <1% in the GRS. We also excluded index SNVs with poor imputation quality (r2<0.7) in BBJ, and those with extreme deviation from Hardy-Weinberg equilibrium (P<10−6) in AoURP. We standardised overall and cluster-specific GRS to have mean zero and unit variance. We tested for association with each outcome/complication via generalised linear regression, under a joint model including all eight cluster-specific GRS, and a model including only the overall GRS. We adjusted association analyses with macrovascular outcomes, conducted in all individuals, for T2D status. We adjusted association analyses with microvascular complications for duration of T2D. In AoURP and BBJ, we adjusted analyses for age, sex, and the first 20 principal components. In AoURP, we defined age as age at last hospital visit. In BBJ, we defined age as age at first record. In BBJ, we also adjusted for recruitment phase and status of the registered common diseases (other than T2D) to account for ascertainment. For CAD, we also conducted sensitivity analyses by including, as an additional covariate, a CAD GRS from the largest published multi-ancestry CAD GWAS49. The GRS was constructed from index SNVs for 241 conditionally independent CAD associations, weighted by the multi-ancestry allelic log-OR (ancestry-specific effects were not available), and standardised to have mean zero and unit variance. All association analyses were conducted using the glm function in R.
GRS multi-ancestry meta-analyses.
For each outcome, we aggregated association summary statistics from the partitioned GRS model and the overall GRS model across ancestries via multivariate fixed-effects meta-analysis (weighting by the inverse covariance matrix)81. We also tested for heterogeneity using the multivariate Cochran Q statistic81. All meta-analyses were conducted using the R package mvmeta (https://cran.r-project.org/package=mvmeta).
Supplementary Material
ACKNOWLEDGEMENTS
Central analyses were supported by Japan Agency for Medical Research and Development (JP21km0405213, JP20km0405202, and JP21tm0424218), NHGRI (HG011723), American Diabetes Association Innovative and Clinical Translational Award (1-19-ICTS-068), American Heart Association Postdoctoral Fellowship (15POST24470131 and 17POST33650016), American Diabetes Association (11-22-JDFPM-06), Corporal Michael J Crescenz VA Medical Center Research, NIDDK (DK126194 and DK105535), Versus Arthritis (21754), NIHR Manchester Biomedical Research Centre (NIHR203308). A complete list of acknowledgments and funding appears in the Supplementary Note.
We thank the International Consortium of Blood Pressure (ICBP) and the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) for providing pre-publication access to GWAS summary statistics for blood pressure, proinsulin, and post-challenge insulin resistance measures.
The views expressed in this article are those of the authors and do not necessarily represent those of: the UK National Health Service, the UK National Institute for Health Research, or the UK Department of Health and Social Care; the US National Heart, Lung, and Blood Institute, the US National Institute of Neurological Disorders and Stroke, the US National Institute on Aging, the US National Institutes of Health, the US Department of Health and Human Services, the US Department of Veterans Affairs, the US Food and Drug Administration, or the US Government.
COMPETING INTERESTS
R.A.S. is now an employee of GlaxoSmithKline. G.T. is an employee of deCODE genetics/Amgen Inc. A.S.B. reports institutional grants from AstraZeneca, Bayer, Biogen, BioMarin, Bioverativ, Novartis, Regeneron and Sanofi. J.Danesh serves on scientific advisory boards for AstraZeneca, Novartis, and UK Biobank, and has received multiple grants from academic, charitable and industry sources outside of the submitted work. L.S.E. is now an employee of Bristol Myers Squibb. J.S.F. has consulted for Shionogi Inc. T.M.F. has consulted for Sanofi, Boehringer Ingelheim, and received funding from GlaxoSmithKline. H.C.G. holds the McMaster-Sanofi Population Health Institute Chair in Diabetes Research and Care; reports research grants from Eli Lilly, AstraZeneca, Merck, Novo Nordisk and Sanofi; honoraria for speaking from AstraZeneca, Boehringer Ingelheim, Eli Lilly, Novo Nordisk, DKSH, Zuellig, Roche, and Sanofi; and consulting fees from Abbott, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Merck, Novo Nordisk, Pfizer, Sanofi, Kowa and Hanmi.lth Institute Chair in Diabetes Research and Care; reports research grants from Eli Lilly, AstraZeneca, Merck, Novo Nordisk and Sanofi; honoraria for speaking from AstraZeneca, Boehringer Ingelheim, Eli Lilly, Novo Nordisk, and Sanofi; and consulting fees from Abbott, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Merck, Novo Nordisk, Janssen, Sanofi, and Kowa. M.Ingelsson is a paid consultant to BioArctic AB. R.L.-G. is a part-time consultant of Metabolon Inc. A.E.L. is now an employee of Regeneron Genetics Center LLC and holds shares in Regeneron Pharmaceuticals. M.A.N. currently serves on the scientific advisory board for Clover Therapeutics and is an advisor to Neuron23 Inc. S.R.P. has received grant funding from Bayer Pharmaceuticals, Philips Respironics and Respicardia. N.Sattar has consulted for or been on speakers bureau for Abbott, Amgen, Astrazeneca, Boehringer Ingelheim, Eli Lilly, Hanmi, Novartis, Novo Nordisk, Sanofi and Pfizer and has received grant funding from Astrazeneca, Boehringer Ingelheim, Novartis and Roche Diagnostics. V.S. is now an employee of deCODE genetics/Amgen Inc. A.M.S. receives funding from Seven Bridges Genomics to develop tools for the NHLBI BioData Catalyst consortium. U.T. is an employee of deCODE genetics/Amgen Inc. E.Ingelsson is now an employee of GlaxoSmithKline. B.M.P. serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. R.C.W.M. reports research funding from AstraZeneca, Bayer, Novo Nordisk, Pfizer, Tricida Inc. and Sanofi, and has consulted for or received speakers fees from AstraZeneca, Bayer, Boehringer Ingelheim, all of which have been donated to the Chinese University of Hong Kong to support diabetes research. D.O.M.-K. is a part-time clinical research consultant for Metabolon Inc. S.Liu reports consulting payments and honoraria or promises of the same for scientific presentations or reviews at numerous venues, including but not limited to Barilla, by-Health Inc, Ausa Pharmed Co.LTD, Fred Hutchinson Cancer Center, Harvard University, University of Buffalo, Guangdong General Hospital and Academy of Medical Sciences, Consulting member for Novo Nordisk, Inc; member of the Data Safety and Monitoring Board for a trial of pulmonary hypertension in diabetes patients at Massachusetts General Hospital; receives royalties from UpToDate; receives an honorarium from the American Society for Nutrition for his duties as Associate Editor. K.Stefansson is an employee of deCODE genetics/Amgen Inc. M.I.M. has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly, and research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda; and is now an employee of Genentech and a holder of Roche stock. J.B.M. is an Academic Associate for Quest Diagnostics R&D. A.Mahajan is an employee of Genentech, and a holder of Roche stock.
CONSORTIA
VA Million Veteran Program. Kim M. Lorenz10,11,12, Kyung Min Lee155,156, Julie A. Lynch155,156, Philip S. Tsao234,286,287, Kyong-Mi Chang10,288, Marijana Vujkovic10,288,292, and Benjamin F. Voight10,11,12,277. Contributors to the VA Million Veteran Program are listed in the Supplementary Note.
AMED GRIFIN Diabetes Initiative Japan. Ken Suzuki1,2,3, Kyuto Sonehara3,16,17,18, Shinichi Namba3, Momoko Horikoshi128, Shiro Maeda128,176,177, Koichi Matsuda179, Yukinori Okada3,16,18,188, Nobuhiro Shojima2, Kenichi Yamamoto3,188,225, Toshimasa Yamauchi2, and Takashi Kadowaki2,283. Contributors to AMED GRIFIN Diabetes Initiative Japan are listed in the Supplementary Note.
Biobank Japan Project. Koichi Matsuda179. Contributors to the Biobank Japan Project are listed in the Supplementary Note.
Penn Medicine BioBank. Lindsay A. Guare124, Daniel J. Rader12,276,277,278, Marylyn D. Ritchie12,279,280, Marijana Vujkovic10,288,292, and Benjamin F. Voight10,11,12,277. Contributors to the Penn Medicine BioBank are listed in the Supplementary Note.
Regeneron Genetics Center. Contributors to the Regeneron Genetics Center are listed in the Supplementary Note.
eMERGE Consortium. Yoonjung Yoonie Joo62,63,64, Abel N. Kho148,149, and M. Geoffrey Hayes63,246,247. Contributors to the eMERGE Consortium are listed in the Supplementary Note.
International Consortium of Blood Pressure (ICBP). Zoha Kamali78,79. Contributors to ICBP are listed in the Supplementary Note.
Meta-Analyses of Glucose and Insulin-Related Traits Consortium (MAGIC). K Alaine Broadaway66, Alice Williamson34,77, Karen L. Mohlke66, and Claudia Langenberg34,272,273. Contributors to MAGIC are listed in the Supplementary Note.
Footnotes
Code availability. Analyses were conducted using publicly available software: UCSC liftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver), MR-MEGA (https://genomics.ut.ee/en/tools), METAL (https://genome.sph.umich.edu/wiki/METAL), PLINKv1.9 (https://www.cog-genomics.org/plink/1.9/), Hail v0.2.107 (https://hail.is/docs/0.2/index.html), SHAPEIT4 (https://odelaneau.github.io/shapeit4/), Minimac4 (https://genome.sph.umich.edu/wiki/Minimac4), PLINKv2.0 (https://www.cog-genomics.org/plink/2.0/). Analyses were also conducted using the following R packages: meta (https://cran.r-project.org/package=meta), ClustImpute (https://cran.r-project.org/package=ClustImpute), NbClust (https://cran.r-project.org/package=NbClust), logistf (https://cran.r-project.org/package=logistf), and mvmeta (https://cran.r-project.org/package=mvmeta).
Data availability.
Genome-wide association summary statistics from the multi-ancestry meta-analysis reported in this study will be made available through the DIAGRAM Consortium website (http://www.diagram-consortium.org/downloads.html) an acceptance of the manuscript.
REFERENCES
- 1.International Diabetes Federation. IDF Diabetes Atlas. 10th edn., Brussels, Belgium: (2021). https://diabetesatlas.org/. [Google Scholar]
- 2.McCarthy M. I. Painting a new picture of personalised medicine for diabetes. Diabetologia 60, 793–799 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pearson E. R. Type 2 diabetes: a multifaceted disease. Diabetologia 62, 1107–1112 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Almgren P. et al. Heritability and familiality of type 2 diabetes and related quantitative traits in the Botnia Study. Diabetologia 54, 2811–2819 (2011). [DOI] [PubMed] [Google Scholar]
- 5.Vujkovic M. et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat Genet 52, 680–691 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Mahajan A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat Genet 54, 560–572 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mahajan A. et al. Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. Nat Genet 50, 559–571 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Udler M. S. et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med 15, e1002654 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Udler M. S., McCarthy M. I., Florez J. C. & Mahajan A. Genetic risk scores for diabetes diagnosis and precision medicine. Endocr Rev 40, 1500–1520 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Taliun D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McCarthy S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–1283 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mägi R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum Mol Genet 26, 3639–3650 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lewis A. C. F. et al. Getting genetic ancestry right for science and society. Science 376, 250–252 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wojcik G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rosenberg N. A. et al. Genome-wide association studies in diverse populations. Nat Rev Genet 11, 356–366 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sirugo G., Williams S. M. & Tishkoff S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Uitterlinden A. G. Diversity in human genetics studies accelerates discovery and improves health care. Nat Rev Cardiol 19, 289–290 (2022). [DOI] [PubMed] [Google Scholar]
- 20.Loh M. et al. Identification of genetic effects underlying type 2 diabetes in South Asian and European populations. Commun Biol 5, 329 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Warren H., Edwards T, Vaez A. & et al. Genome-wide analysis in over 1 million individuals reveals over 2,000 independent genetic signals for blood pressure. Preprint at: https://www.researchsquare.com/article/rs-1409164/v1 (2022).
- 22.Warrington N. M. et al. Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat Genet 51, 804–814 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Chen J. et al. The trans-ancestral genomic architecture of glycemic traits. Nat Genet 53, 840–860 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Liu Y. et al. Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning. Elife 10 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sulc J. et al. Composite trait Mendelian randomization reveals distinct metabolic and lifestyle consequences of differences in body shape. Commun Biol 4, 1064 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sakaue S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53, 1415–1424 (2021). [DOI] [PubMed] [Google Scholar]
- 27.Graham S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Agrawal S. et al. Inherited basis of visceral, abdominal subcutaneous and gluteofemoral fat depots. Nat Commun 13, 3771 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Broadaway K. A. et al. Loci for insulin processing and secretion provide insight into type 2 diabetes risk. Am J Hum Genet 110, 284–299 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Charrad M., Ghazzali N., Boiteau V. & Niknafs A. NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw 61, 1–36 (2014). [Google Scholar]
- 31.Kim H. et al. High-throughput genetic clustering of type 2 diabetes loci reveals heterogeneous mechanistic pathways of metabolic disease. Diabetologia 66, 495–507 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Strawbridge R. J. et al. Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes. Diabetes 60, 2624–2634 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ford E. S., Li C. & Sattar N. Metabolic syndrome and incident diabetes: current state of the evidence. Diabetes Care 31, 1898–1904 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Marott S. C., Nordestgaard B. G., Tybjærg-Hansen A. & Benn M. Components of the metabolic syndrome and risk of type 2 diabetes. J Clin Endocrinol Metab 101, 3212–3221 (2016). [DOI] [PubMed] [Google Scholar]
- 35.Jo A. & Mainous A. G. 3rd. Informational value of percent body fat with body mass index for the risk of abnormal blood glucose: a nationally representative cross-sectional study. BMJ Open 8, e019200 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Palmer N. D. et al. Genetic variants associated with quantitative glucose homeostasis traits translate to type 2 diabetes in Mexican Americans: The GUARDIAN (Genetics Underlying Diabetes in Hispanics) Consortium. Diabetes 64, 1853–1866 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Dupuis J. et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 42, 105–116 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Day F. et al. Large-scale genome-wide meta-analysis of polycystic ovary syndrome suggests shared genetic architecture for different diagnosis criteria. PLoS Genet 14, e1007813 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Pervjakova N. et al. Multi-ancestry genome-wide association study of gestational diabetes mellitus highlights genetic links with type 2 diabetes. Hum Mol Genet 31, 3377–3391 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Plows J. F., Stanley J. L., Baker P. N., Reynolds C. M. & Vickers M. H. The pathophysiology of gestational Diabetes Mellitus. Int J Mol Sci 19 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Liu Q. et al. A genome-wide cross-trait analysis identifies shared loci and causal relationships of type 2 diabetes and glycaemic traits with polycystic ovary syndrome. Diabetologia 65, 1483–1494 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Domcke S. et al. A human cell atlas of fetal chromatin accessibility. Science 370 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhang K. et al. A single-cell atlas of chromatin accessibility in the human genome. Cell 184, 5985–6001.e5919 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bertrand P. P. & Bertrand R. L. Serotonin release and uptake in the gastrointestinal tract. Auton Neurosci 153, 47–57 (2010). [DOI] [PubMed] [Google Scholar]
- 45.Lund M. L. et al. Enterochromaffin 5-HT cells - A major target for GLP-1 and gut microbial metabolites. Mol Metab 11, 70–83 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Finucane H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Akiyama M. et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat Genet 49, 1458–1467 (2017). [DOI] [PubMed] [Google Scholar]
- 48.Kim J. A., Montagnani M., Koh K. K. & Quon M. J. Reciprocal relationships between insulin resistance and endothelial dysfunction: molecular and pathophysiological mechanisms. Circulation 113, 1888–1904 (2006). [DOI] [PubMed] [Google Scholar]
- 49.Aragam K. G. et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat Genet 54, 1803–1815 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Matsunaga H. et al. Transethnic meta-analysis of genome-wide association studies identifies three new loci and characterizes population-specific differences for coronary artery disease. Circ Genom Precis Med 13, e002670 (2020). [DOI] [PubMed] [Google Scholar]
- 51.Koyama S. et al. Population-specific and trans-ancestry genome-wide analyses identify distinct and shared genetic risk loci for coronary artery disease. Nat Genet 52, 1169–1177 (2020). [DOI] [PubMed] [Google Scholar]
- 52.Morris A. P. et al. Trans-ethnic kidney function association study reveals putative causal genes and effects on kidney-specific disease aetiologies. Nat Commun 10, 29 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.UK Prospective Diabetes Study (UKPDS) Group. Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). UK Prospective Diabetes Study (UKPDS) Group. Lancet 352, 837–853 (1998). [PubMed] [Google Scholar]
- 54.DiCorpo D. et al. Type 2 diabetes partitioned polygenic scores associate with disease outcomes in 454,193 individuals across 13 cohorts. Diabetes Care 45, 674–683 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mitt M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 25, 869–876 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Jónsson H. et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci Data 4, 170115 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Moon S. et al. The Korea Biobank Array: Design and identification of coding variants associated with blood biochemical traits. Sci Rep 9, 1382 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Akiyama M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat Commun 10, 4393 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gurdasani D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Gurdasani D. et al. Uganda genome resource enables insights into population history and genomic discovery in Africa. Cell 179, 984–1002.e1036 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kurki M. I. et al. FinnGen: Unique genetic insights from combining isolated population and national health register data. Preprint at: https://www.medrxiv.org/content/medrxiv/early/2022/03/06/2022.03.03.22271360.full.pdf (2022).
- 62.Cook J. P., Mahajan A. & Morris A. P. Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes. Eur J Hum Genet 25, 240–245 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bulik-Sullivan B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Willer C. J., Li Y. & Abecasis G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Devlin B. & Roeder K. Genomic control for association studies. Biometrics 55, 997–1004 (1999). [DOI] [PubMed] [Google Scholar]
- 66.Purcell S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003). [DOI] [PubMed] [Google Scholar]
- 68.Goodarzi M. O. et al. Determination and use of haplotypes: ethnic comparison and association of the lipoprotein lipase gene and coronary artery disease in Mexican-Americans. Genet Med 5, 322–327 (2003). [DOI] [PubMed] [Google Scholar]
- 69.Xiang A. H. et al. Evidence for joint genetic control of insulin sensitivity and systolic blood pressure in hispanic families with a hypertensive proband. Circulation 103, 78–83 (2001). [DOI] [PubMed] [Google Scholar]
- 70.Cunningham F. et al. Ensembl 2022. Nucleic Acids Res 50, D988–d995 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.All of Us Research Program Investigators et al. The “All of Us” Research Program. N Engl J Med 381, 668–676 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Ramirez A. H. et al. The All of Us Research Program: Data quality, utility, and diversity. Patterns (N Y) 3, 100570 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Nagai A. et al. Overview of the BioBank Japan Project: Study design and profile. J Epidemiol 27, S2–s8 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hirata M. et al. Cross-sectional analysis of BioBank Japan clinical data: A large cohort of 200,000 patients with 47 common diseases. J Epidemiol 27, S9–s21 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Okada Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat Commun 9, 1631 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Tadaka S. et al. 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome. Hum Genome Var 6, 28 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Delaneau O., Zagury J. F., Robinson M. R., Marchini J. L. & Dermitzakis E. T. Accurate, scalable and integrative haplotype estimation. Nat Commun 10, 5436 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Das S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Chang C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Committee of the Japan Diabetes Society on the Diagnostic Criteria of Diabetes Mellitus et al. Report of the committee on the classification and diagnostic criteria of diabetes mellitus. J Diabetes Investig 1, 212–228 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Gasparrini A., Armstrong B. & Kenward M. G. Multivariate meta-analysis for non-linear and other multi-parameter associations. Stat Med 31, 3821–3839 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Genome-wide association summary statistics from the multi-ancestry meta-analysis reported in this study will be made available through the DIAGRAM Consortium website (http://www.diagram-consortium.org/downloads.html) an acceptance of the manuscript.