Abstract
OBJECTIVE
Type 2 diabetes (T2D) and its associated complications develop heterogeneously over decades, but few studies span the progression from prediabetes to clinical events. We investigated whether long-term metabolic trajectories beginning in prediabetes delineate subgroups with differential complication risk.
RESEARCH DESIGN AND METHODS
Clinical data from 1,732 Diabetes Prevention Program/Outcomes Study participants (follow-up 19 years) were analyzed across 12 phenotypes. Tensor decomposition was used to capture longitudinal patterns, and Gaussian mixture modeling was used to define longitudinal clusters. Cluster-specific complications were quantified with Cox and logistic regression.
RESULTS
Four clusters emerged. Clusters 1 and 2 (73% of participants) maintained stable glycemia, blood pressure, and lipids. Although 49% and 71%, respectively, developed T2D, cumulative micro- and macrovascular events remained low. Cluster 3 (12%) showed the steepest rise in insulin resistance and hyperglycemia, with 92% of the subgroup progressing to T2D and a markedly higher rate of retinopathy (odds ratio [OR] 8.8, 95% CI 3.9–20.1) and neuropathy (OR 3.4, 95% CI 2.1–5.5). Cluster 4 (15%) presented with baseline microalbuminuria often prior to the development of T2D (73%). It was distinguished by progressive estimated glomerular filtration rate decline and a doubling of cardiovascular events (hazard ratio 2.0, 95% CI 1.4–3.0), despite serum lipids comparable with other groups.
CONCLUSIONS
Two-thirds of individuals with prediabetes follow metabolically resilient trajectories, whereas distinct insulin-resistant or renal-dysfunction trajectories precede micro- or macrovascular complications, respectively. The optimal window for macrovascular complication prevention in individuals with prediabetes microalbuminuria may precede progression to T2D.
Graphical Abstract

Introduction
One in three Americans has prediabetes (1), but progression to type 2 diabetes (T2D) and its complications are highly variable (2). This variability is influenced by a complex interplay of genetic, environmental, and metabolic factors leading to diverse clinical outcomes (3). Some individuals may develop microvascular complications, such as retinopathy, nephropathy, and neuropathy, while others are more prone to macrovascular complications, such as cardiovascular disease, even without progression to T2D (4). Current standards of care partially address this clinical heterogeneity by leveraging additional biomarkers, such as the urine albumin-to-creatinine ratio (uACR), to personalize risk prediction for renal complications (5). However, up to 40% of individuals diagnosed with T2D who develop renal dysfunction do not have antecedent microalbuminuria (6). Further progress toward personalized care depends on our ability to identify unique risk profiles and disease trajectories (i.e., “subtypes”) in individuals who appear similar at baseline (7).
Current investigation into the subclassification of individuals with prediabetes and T2D supports the existence of subtypes with differential clinical risk profiles but is limited by reliance on cross-sectional “snapshots” of data. In a seminal study, Ahlqvist et al. (8) defined five subgroups of individuals with differing risks of future clinical complications by clustering clinical variables at the time of T2D diagnosis. This work has been replicated and augmented with additional clinical variables and outcomes, such as treatment response (9,10). Similarly, Wagner et al. (11) defined six subgroups of individuals with prediabetes with differing outcome risks by using a combination of clinical variables, MRI, and genetic information. These groups overlapped at the time of progression to diabetes with the clusters defined by Ahlqvist et al. (8), but only partially, highlighting the challenges of aligning cross-sectional data-defined subtypes across a multidecade trajectory of glycemic deterioration. This challenge is further corroborated by studies showing that subtype assignments based on a temporal “snapshot” are not stable over time even in the same individuals (10,12).
To address whether data-driven subtypes truly emerge and persist over time, we turned to the Diabetes Prevention Program and Outcomes Study (DPP/DPPOS), one of the few studies that prospectively followed individuals with prediabetes over multiple decades (13), providing the longitudinal depth needed to trace metabolic trajectories from prediabetes through incident T2D and its complications. We adapted tensor decomposition (14) to analyze multiple time-varying clinical phenotypes across 1,732 DPP/DPPOS participants to distinguish clinical trajectories over two decades and identify clusters of individuals with differential disease susceptibility and complication risk. Our findings highlight that approximately two-thirds of individuals are protected from T2D-related complications even if they develop T2D. The remaining one-third are differentially susceptible to micro- and macrovascular complications that are preceded by distinct trajectories of increasing insulin resistance and decreasing renal function, respectively.
Research Design and Methods
The DPP and DPPOS
Detailed information on the landmark DPP clinical trial and the DPPOS long-term follow-up study has been published previously (13,15). In brief, participants with prediabetes, defined by a 95–125 mg/dL fasting glucose level and a 140–199 mg/dL 2-h glucose after a 75-g oral load, were enrolled and randomized into lifestyle, metformin, or placebo interventions. Follow-up visits occurred every 6 months, and clinical phenotypes were measured yearly or twice-yearly for up to 25 years. These analyses were conducted under an approved DPP ancillary study (AS20-47: “Validating blood, urine, and phenotypic biomarkers of tissue specific insulin resistance and its clinical sequelae”).
DPP/DPPOS Data Preprocessing
To enable longitudinal comparison, clinical measurements from 3,003 DPP/DPPOS participants were aligned by time since randomization in the original DPP study (13). The following clinical phenotypes were extracted for analysis: fasting glucose, fasting insulin, hemoglobin A1c (A1C), HDL, LDL, total cholesterol, systolic blood pressure (SBP), diastolic blood pressure (DBP), estimated glomerular filtration rate (eGFR), uACR, BMI, and waist circumference. To be included in the subsequent analysis, participants had to have at least one measurement of all clinical phenotypes at or after 19 years of follow-up (i.e., visit 38 onward). Participants with baseline clinical phenotype values >5 SDs from the mean were removed in keeping with prior conventions (8). These filtering and data sufficiency criteria resulted in a final study population of 1,732 individuals. On a per-individual basis, the resulting clinical phenotype data were smoothed and interpolated using Gaussian smoothing with a 1-visit (6 months) bandwidth (16). The uACR values were log-transformed to normalize their distribution. HDL values, for which higher levels are known to be metabolically beneficial, were inverted to facilitate downstream interpretation (17,18). Similarly, eGFR values were also inverted. Each of the clinical phenotypes was scaled by the Euclidean norm of all values available for that phenotype, according to previously established best practices (14).
Tensor Decomposition
We analyzed a three-dimensional data set consisting of 1,732 individuals, 12 clinical phenotypes, and 38 time points. Since many clustering methods are optimized for two-dimensional data, we applied dimensionality reduction to facilitate cluster analysis while preserving key features. Specifically, we used nonnegative Parallel Factor (PARAFAC) decomposition, a form of tensor factorization similar to principal component analysis for matrices (14,19,20). We selected the smallest number of factors that kept the reconstruction error <10%, resulting in a rank, R, of 12 (Supplementary Fig. 1A and Supplementary Methods). Each factor represents a set of one-dimensional vectors corresponding to individuals, phenotypes, and time points, allowing us to identify any meaningful patterns in the data. The decomposition was performed using nonnegative PARAFAC with hierarchical alternating least squares from TensorLy 0.8.1, with random initialization and up to 1,000 iterations. Further methodological details are provided in the Supplementary Methods.
Clustering
Patient factors from the above tensor decomposition were concatenated into a matrix with individual × rank dimensions. Clustering on this matrix was performed using Gaussian mixture modeling (GMM) (Scikit Learn 1.3.2). To identify the optimal cluster number (k), we assessed cluster stability using the adjusted Rand index (ARI). The optimal cluster number (k = 4) was chosen as the largest value of k with median ARI >0.75 over 200 bootstrap samples (Supplementary Fig. 1B) following the standards in other studies (9,21). Each individual was assigned a cluster label that corresponded to their most probable cluster, which was used for visualization and tabulation. Cluster probabilities are displayed in Fig. 2B. To evaluate cluster association with the incidence of metabolic complications, the full set of cluster probabilities for each individual were used in time-to-event and logistic regression analysis. Bar plots comparing cluster differences of clinical phenotypes used z-score normalized data from baseline.
Figure 2.
Clustering of longitudinal metabolic trajectories and temporal trends by cluster. A: Individual loading vectors produced by the tensor decomposition were visualized in two dimensions using the Uniform Manifold Approximation and Projection (UMAP) algorithm with hyperparameters set to a minimum distance = 0.5 and neighbors = 10. Each point representing 1 of the 1,732 individuals was colored by their associated cluster assignment (n = 544, n = 714, n = 215, and n = 259). B: Visualization of the cluster probabilities per individual grouped by assigned cluster. Each subplot represents a cluster, with title indicating cluster label. Each stacked vertical bar on the subplot represents an individual’s probabilities for belonging to each of the four clusters. Individuals are assigned to the subplot (i.e., cluster) for which they have the highest probability. Most individuals have a high probability for a single cluster, but some individuals (indicated by multicolored bars) could belong to multiple clusters. C: Each subplot shows clinical phenotype trajectories by visit, where each visit indicates a 6-month interval. The average temporal trajectories computed from the smoothed and interpolated data for each phenotype are shown colored by cluster. Some phenotypes show stable trends over time (e.g., BMI, waist circumference), whereas others vary temporally in specific clusters (e.g., increasing fasting insulin in cluster 3). Some phenotypes vary at baseline and differences persist over time (e.g., uACR), whereas other phenotypes are similar at baseline but exhibit differing trajectories over time (e.g., fasting glucose, A1C). DBP, diastolic blood pressure; glucose, fasting glucose; insulin, fasting insulin; SBP, systolic blood pressure; TC, total cholesterol; waist C, waist circumference.
Outcomes Analysis
Outcome incidence differences between clusters were compared for T2D diagnosis, renal dysfunction, retinopathy, neuropathy, and extended major adverse cardiovascular events (eMACE), which included cardiovascular death, nonfatal stroke, nonfatal myocardial infarction, hospitalized congestive heart failure, hospitalized unstable angina, coronary revascularization, peripheral revascularization, coronary heart disease by angiography, and silent myocardial infarction by electrocardiogram (22). Diabetes diagnosis and eMACE were obtained from adjudicated DPP/DPPOS records. Renal dysfunction was defined as two occurrences of eGFR <60 mL/min/1.73 m2 within 1.5 years. Time to event was tabulated as time since randomization and visualized using a Kaplan-Meier estimator (scikit 1.3.2). Retinopathy was defined as a score >20 on the Early Treatment Diabetic Retinopathy Study (ETDRS) scale, which corresponds to mild nonproliferative diabetic retinopathy (23). Neuropathy signs and symptoms (full definition in the Supplementary Methods) was only ascertained once during DPPOS visit 17A (∼23 years after randomization) (24) and thus was analyzed by multivariate logistic regression.
Formal statistical testing was not performed for diabetes incidence or renal dysfunction to avoid circularity, as these diagnoses are based on serum glucose and eGFR, both of which were used in cluster identification. For the withheld outcomes, Cox regression was performed for eMACE, and logistic regression was performed for retinopathy and neuropathy. Both methods were adjusted for age, sex, treatment at randomization, smoking status (defined as ever having reported smoking), and medication indicators for lipid-lowering medication, antihypertensive medication, and glucose-lowering medication (all medication indicators are defined as ever reported taking medication). Statistical comparisons were performed using cluster 1 as the baseline, and the significance threshold was set at P < 0.017 (P < 0.05/3), corresponding to a Bonferroni correction of the number of pairwise comparisons performed (each of the three clusters compared with cluster 1). For the neuropathy analysis, 143 individuals without neuropathy measurements were excluded; these individuals did not segregate disproportionately into any cluster (P = 0.12 by χ2).
Comparison With Cross-sectional Clustering
The longitudinal clusters obtained were compared with two cross-sections: baseline (composed of the values at the first visit) and end (composed of the values at 19 years after randomization). For the baseline and end cross-sections, clustering with GMM was performed with k = 4 to match the number of longitudinal clusters. The cluster similarity between the four longitudinal clusters, baseline, and end clustering was measured using ARI and the Jaccard Index (25) and was reported in Sankey plots generated using the online webpage SankeyMATIC (sankeymatic.com).
Data and Resource Availability
In accordance with the National Institutes of Health Public Access Policy, we continue to provide all manuscripts to PubMed Central, including this article. For DPP/DPPOS, protocols and lifestyle and medication intervention manuals have been provided to the public through the public website www.dppos.org. The DPPOS abides by the National Institute of Diabetes and Digestive and Kidney Diseases data-sharing policy and implementation guidance as required by the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health (https://repository.niddk.nih.gov/studies/dppos/).
Results
To analyze longitudinal metabolic trajectories, we aggregated and interpolated data for 12 clinical phenotypes encompassing glycemic parameters, serum lipids, renal function, and body weight collected biannually (38 visits over 19 years) (Fig. 1A). HDL cholesterol and eGFR values were inverted to harmonize directions-of-effect with other phenotypes (e.g., increased glucose, insulin, and LDL represent poorer metabolic health) to facilitate longitudinal comparison and interpretability in downstream analysis. We applied tensor decomposition to the resulting three-dimensional matrix consisting of 789,792 individual-phenotype-time data points to reduce the dimensionality of the data (Fig. 1B). Profiling tensor decompositions over different numbers of factors (rank) identified that a rank of 12 was optimal to recreate the full data (with reconstruction error <10%) (Supplementary Fig. 1A). The dimensionality reduction resulted in a representation of the data with 12 loading vectors corresponding to each axis of data: individual participants, clinical phenotypes, and time.
Figure 1.
Dimensionality reduction and clustering of longitudinal metabolic trajectories of DPP/DPPOS participants over 19 years of follow-up. A: The DPP/DPPOS data were interpolated using Gaussian smoothing and restructured into a three-dimensional tensor with 1,732 participants × 19 years (38 time points) × 12 phenotypes, resulting in 789,792 data points to be analyzed. DBP, diastolic blood pressure; glucose, fasting glucose; insulin, fasting insulin; SBP, systolic blood pressure; TC, total cholesterol; waist C, waist circumference. B: Tensor decomposition using PARAFAC was performed to reduce the dimensionality of the data. The reduced dimensional representation consists of a series of loading vectors corresponding to each of the original dimensions: individuals, clinical phenotypes, and time. Twelve sets of loading vectors (i.e., rank) optimally reconstruct the original data. A matrix composed of the individuals’ loading vectors over all 12 ranks was extracted for downstream clustering. C: Participant loading vectors were clustered using GMM. UMAP, Uniform Manifold Approximation and Projection. D: Participant clusters from C were used to compute cluster-specific phenotype trajectories from A. E: Participant clusters from C were used to analyze diabetes incidence and the occurrence of clinical complications. Validation outcomes include diabetes diagnosis and renal dysfunction. Withheld outcomes include eMACE, retinopathy, and neuropathy.
The loading vectors for individuals were aggregated into a two-dimensional matrix (12 ranks × 1,732 individuals) and clustered using Gaussian Mixture Modeling (GMM) (Fig. 1C), a soft clustering technique that assigns each individual with a probability of belonging to each cluster. Resampling analyses showed that four clusters maintained optimal stability (ARI >0.75) (Supplementary Fig. 1B). Initial inspection of the four clusters showed that they partitioned the analysis cohort (n = 1,732) asymmetrically, with the largest cluster containing more than three times the number of participants as the smallest cluster (n = 714 vs. n = 215) (Table 1 and Fig. 2A). Analysis of the full range of soft clustering probabilities per individual indicated that most individuals were well placed (Fig. 2B). Thus, for discussion and visualization, we assigned each individual to the cluster with maximum probability. None of the clusters differed remarkably in baseline enrollment characteristics with regard to sex, intervention arm assignment, or smoking status (Table 1 and Supplementary Table 1). These clusters were further analyzed for patterns of co-occurring clinical phenotypes and susceptibility to micro- and macrovascular complication events over the 19-year follow-up period (Fig. 1D and E).
Table 1.
Characteristics of DPP/DPPOS participants analyzed by cluster
| Cluster | |||||
|---|---|---|---|---|---|
| Total (N = 1,732) | 1 (n = 544) | 2 (n = 714) | 3 (n = 215) | 4 (n = 259) | |
| Baseline demographics | |||||
| Age (years) | 49.9 (9.1) | 53.0 (9.0) | 47.2 (8.0) | 46.1 (7.6) | 53.9 (9.7) |
| Women, n (%) | 1,186 (68) | 384 (71) | 494 (69) | 144 (67) | 164 (63) |
| White, n (%) | 949 (55) | 322 (59) | 397 (56) | 105 (49) | 125 (48) |
| Baseline clinical phenotypes | |||||
| Fasting glucose (mg/dL) | 106.6 (8.1) | 105.2 (7.9) | 106.7 (8.0) | 108.1 (8.8) | 107.8 (7.7) |
| Fasting insulin (uU/mL) | 25.8 (13.9) | 17.2 (7.8) | 28.3 (12.3) | 38.9 (17.2) | 26.3 (13.5) |
| A1C (%) | 5.9 (0.5) | 5.9 (0.5) | 5.8 (0.5) | 5.9 (0.5) | 6.0 (0.5) |
| A1C (mmol/mol) | 40.9 (5.4) | 40.8 (5.3) | 40.3 (5.1) | 41.3 (5.9) | 42.3 (5.6) |
| HDL (mg/dL) | 46.0 (11.9) | 51.3 (11.9) | 43.4 (10.2) | 40.7 (9.7) | 46.5 (13.4) |
| LDL (mg/dL) | 125.7 (32.1) | 128.3 (31.9) | 125.2 (31.0) | 121.2 (34.2) | 125.1 (33.5) |
| Total cholesterol (mg/dL) | 204.4 (35.0) | 208.7 (34.7) | 203.0 (34.3) | 200.1 (34.7) | 202.8 (36.7) |
| SBP (mmHg) | 122.7 (14.1) | 121.0 (13.3) | 120.8 (13.3) | 125.0 (14.3) | 129.9 (15.4) |
| DBP (mmHg) | 78.3 (9.4) | 76.6 (8.8) | 78.1 (9.1) | 79.7 (10.1) | 80.8 (10.0) |
| eGFR (mL/min/1.73 m2) | 98.9 (14.5) | 96.6 (13.7) | 101.7 (13.0) | 102.2 (15.2) | 93.4 (16.5) |
| uACR (mg/g) | 12.7 (32.2) | 7.1 (11.1) | 6.4 (5.4) | 16.2 (36.7) | 38.9 (68.1) |
| BMI (kg/m²) | 33.6 (6.3) | 30.0 (4.2) | 34.9 (6.0) | 37.3 (7.0) | 34.4 (6.3) |
| Waist circumference (cm) | 103.8 (13.9) | 95.7 (10.3) | 106.6 (12.9) | 111.5 (15.8) | 107.0 (13.3) |
| Longitudinal outcomes | |||||
| Diabetes diagnosis, n (%) | 1,161 (67) | 266 (49) | 509 (71) | 198 (92) | 188 (73) |
| Renal dysfunction, n (%) | 262 (15) | 78 (14) | 25 (4) | 43 (20) | 116 (45) |
| eMACE, n (%) | 239 (14) | 65 (12) | 76 (11) | 24 (11) | 74 (29) |
| Retinopathy, n (%) | 93 (5) | 12 (2) | 22 (3) | 40 (19) | 19 (7) |
| Neuropathy, n (%) | 348 (20) | 91 (17) | 126 (18) | 68 (32) | 63 (24) |
Data are presented as mean (SD) for continuous variables or n (%) for categorical variables. DBP, diastolic blood pressure. Age represents age at randomization. Clinical phenotype values are computed for baseline measurements at DPP initiation. Outcomes are tabulated across the entire follow-up period. Renal dysfunction is defined as two occurrences of eGFR <60 mL/min/1.73 m2 within 1.5 years. Retinopathy is defined as >20 on the ETDRS scale. Percentages are rounded.
Cluster 1 (n = 544) at baseline was notable for having the lowest initial BMI (mean 30.02 kg/m2), waist circumference (mean 95.7 cm), and fasting insulin (mean 17.2 µU/mL) and the highest baseline HDL cholesterol (mean 51.3 mg/dL) and total cholesterol (mean 208.7 mg/dL) (Table 1 and Supplementary Fig. 2). These trends persisted over the 19-year follow-up period (Fig. 2C) and can be confirmed in the raw clinical data (lacking smoothing and interpolation) (Supplementary Fig. 3). There was substantial variability at the individual level underscoring the importance of multiphenotype analysis (Supplementary Fig. 4A). Unsurprisingly, cluster 1 had the lowest cumulative incidence of T2D among the four clusters (49% compared with 71%, 92%, and 73%). With regard to complications, this cluster had a lower aggregate cumulative occurrence of microvascular complications (renal dysfunction, retinopathy, and neuropathy) and macrovascular complications (eMACE) than the cohort average (Table 1 and Fig. 3).
Figure 3.
Progression to T2D and complications by cluster. A–C: Kaplan-Meier curves for incidence of diabetes and complications. Each cluster is represented by a different colored line. Each visit indicates a 6-month interval. Values below indicate counts (dropout). A: Probability of diabetes diagnosis. B: Probability of renal dysfunction defined as two occurrences of eGFR <60 mL/min/1.73 m2 within 1.5 years. C: Probability of eMACE. Cluster 1 vs. cluster 4 show significant differences in the eMACE hazard ratio based on Cox regression. D: Bar chart of the proportion of retinopathy by cluster. Significant differences in the odds ratio were observed in cluster 1 vs. 3 and cluster 1 vs. 4 by logistic regression. E: Bar chart of the proportion of neuropathy by cluster. Significant differences in the odds ratio were observed in cluster 1 vs. 3 by logistic regression. ***P < 10−6; **P < 0.001; *P < 0.017; ns, not significant. All statistical tests included the following covariate adjustments: age, sex, treatment, smoking status, lipid-lowering medication, antihypertensive medication, and glucose-lowering medication.
Cluster 2 (n = 714) was unremarkable at baseline, with baseline clinical phenotypes indistinguishable from the entire cohort. These phenotypes (e.g., BMI) remained stable over the 19-year follow-up period (Fig. 2C). BMI was elevated compared with cluster 1. Despite a relatively high incidence of T2D (71%), participants in this cluster had lower than average occurrences of both micro- and macrovascular complications (Table 1 and Fig. 3).
Cluster 3 (n = 215) was notable for having a baseline elevated BMI (cluster mean 37.3 kg/m2), waist circumference (mean 111.5 cm), fasting insulin (mean 38.9 µU/mL), and the lowest baseline HDL cholesterol (mean 40.7 mg/dL); these are features that classically meet clinical criteria for metabolic syndrome (26). The temporal trends for this cluster showed a progressive increase in fasting insulin and glycemia (Fig. 2C). As expected, cluster 3 had the highest cumulative incidence of T2D (92%) and higher rates of microvascular complications. This included large increases in the risk for moderate or greater (ETDRS >20) retinopathy (cluster 3: odds ratio 8.84, 95% CI 3.88–20.13, P < 10−6) (Fig. 3D), neuropathy (cluster 3: odds ratio 3.42, 95% CI 2.14–5.49, P < 10−6) (Fig. 3E) and renal dysfunction (20% vs. 15% cohort average) (Table 1). Remarkably, cluster 3 did not have elevated macrovascular risk compared with the other clusters (eMACE 11%) (Fig. 3C).
Cluster 4 (n = 259) was notable for having baseline microalbuminuria (uACR mean 38.9 mg/g), lower eGFR (mean 93.4 mL/min/1.73 m2), and increased blood pressure (SBP 129.9 mmHg). Over the follow-up period, albuminuria increased and renal function declined more rapidly compared with other clusters (Fig. 2C). SBP trajectories varied over time but remained the highest in cluster 4. The cumulative incidence of T2D (73%) and glycemic control trajectories were similar to cluster 2. The cumulative incidence of renal dysfunction was the highest among all the clusters (45% Stage 3a; eGFR <60 mL/min/1.73 m2), an expected result given the other observed temporal patterns. With regard to other microvascular complications, cluster 4 had a slightly increased risk for retinopathy (cluster 4: odds ratio 3.05 95% CI 1.29–7.19, P < 0.017) (Fig. 3D). Notably, cluster 4 had the highest risk for incident macrovascular complications (eMACE cluster 4: hazard ratio 2.02, 95% CI 1.37–2.97, P < 0.001) (Fig. 3C), with almost one-third of individuals in this cluster experiencing a cardiovascular event (cardiovascular disease [CVD]) during the follow-up period. Despite having the highest risk of CVD, cluster 4 did not have elevated LDL (mean 125.1 mg/dL) or total cholesterol (mean 202.8 mg/dL), particularly when compared with other clusters (Table 1).
As longitudinal data are challenging to collect and more technically demanding to analyze, we compared our longitudinal clustering in the DPP/DPPOS with cross-sectional clustering to evaluate whether these clusters could have been identified with cross-sectional data. Using the same participants and clinical phenotypes at two time points: baseline (before randomization to the DPP study interventions) and at the end of the study (19 years after randomization), we clustered data into four clusters and conducted pairwise comparisons between baseline, year 19, and longitudinal clusterings (Supplementary Fig. 5) using two concordance metrics: the ARI (9,21) and Jaccard Index (8). Minimal cluster similarity was observed between longitudinal: baseline (ARI = 0.11, Jaccard Index = 0.24) (Supplementary Fig. 5A) and year 19: longitudinal (ARI = 0.18, Jaccard Index = 0.21) (Supplementary Fig. 5C) clusters. The lowest concordance was observed between the two cross-sectional clusters taken 19 years apart on the same participants (ARI = 0.04, Jaccard Index = 0.19) (Supplementary Fig. 5B). The observed low concordance between clustering strategies indicated that longitudinal trends could not be captured by selective cross-sectional clustering.
Conclusions
In this study, we leveraged longitudinal clinical data and unsupervised clustering approaches to identify distinct subgroups of individuals with prediabetes with differential risks of progression to T2D and related complications. We find that two-thirds of individuals with prediabetes (clusters 1 and 2) are relatively protected from diabetes-associated complications and exhibit metabolic resilience over two decades. The remaining one-third segregate into groups with increased susceptibility to micro- (cluster 3) and macrovascular (cluster 4) complications. These micro- and macrovascular complication-susceptible clusters were preceded by worsening trajectories of insulin resistance/hyperglycemia and microalbuminuria, respectively. Our study and the clusters we report extend previous investigations using cross-sectional data in prediabetes (11) or at the time of diabetes diagnosis (8) in a multiethnic, prospectively ascertained cohort. Our longitudinal approach resolves the methodological incompatibility of aligning clusters defined from cross-sectional data enabling a comprehensive view of patterns of biomarker change across metabolic disease progression. These temporal patterns reveal both metabolic resilience over decades in individuals despite obesity and, for complication-susceptible subgroups, a window for preventative intervention prior to the diagnosis of T2D.
A remarkable finding of our study is that clusters 1 and 2, representing over two-thirds of randomized individuals, demonstrated protection from metabolic complications over time, even though half or more of the individuals in those clusters progressed to T2D (Table 1). Individuals in these clusters showed metabolic resilience, maintaining stable blood pressure, lipids, and glycemic control for over two decades (Fig. 2C). Cluster 1 had increased mean age and lower insulin levels than cluster 2, and cluster 2 had a higher average BMI, prompting the descriptive labels “metabolically stable, insulin sensitive” (cluster 1) and “mild obesity related” (cluster 2). These clusters, especially cluster 2, seem to capture individuals with metabolically healthy obesity, a phenotype where excess adiposity coexists with preserved cardiometabolic health (27). Concordantly, their cumulative incidence of kidney disease and CVD was similar or lower than the prevalence of chronic kidney disease (28) and cardiovascular disease (29) in middle-aged U.S. adults. Clinically, this supports standard preventive approaches rather than aggressive interventions.
Conversely, individuals in clusters 3 and 4 exhibited significant differences in their susceptibility to micro- and macrovascular complications. At baseline, cluster 3 showed indistinguishable A1C and at most a 3 mg/dL increase in fasting glucose compared with the other clusters (Table 1). Over time, however, cluster 3 exhibited a trajectory of increasing hyperglycemia and insulin resistance, and ultimately had the highest rates of retinopathy and neuropathy. Microalbuminuria indicative of diabetic nephropathy was also observed but occurred later, almost 10 years after randomization after most cluster 3 participants had already been diagnosed with T2D (Fig. 2C and Fig. 3B). Thus, we labeled cluster 4 “progressive insulin resistant hyperglycemia.” Clinically, this pattern of phenotypes marked by the highest BMI and insulin resistance of all the clusters suggests that individuals in this group would be ideal candidates for insulin and weight-sparing glucose-lowering regimens, such as sodium–glucose cotransporter 2 inhibitors and glucagon-like peptide 1 analogs. The persistent glycemic deterioration prior to classical microvascular complications underscores the importance of long-term glycemic control.
In contrast to cluster 3, cluster 4 already exhibited microalbuminuria at baseline that worsened over two decades and was accompanied by the sharpest decrease in eGFR observed among all clusters (Fig. 2C). In addition to the highest rate of renal dysfunction, cluster 3 had the highest susceptibility to macrovascular complications, with nearly one-third of individuals experiencing a CVD by the end of the follow-up period (Table 1). Thus, cluster 4 was labeled “prediabetic cardio-renal risk” because its trajectory, worsening renal function, followed by CVD, mirrors prior evidence that impaired kidney function is an independent, causal driver of CVD (30). Among conventional CVD risk factors, cluster 4 exhibited elevated average blood pressure, but had indistinguishable serum lipids from the other clusters. Blood pressure and serum lipids both remained stable or decreased over the follow-up period (Fig. 2C). Cluster 4 was also very similar to cluster 2 (mild obesity related) with regard to body morphometry and glycemic trajectories (Fig. 2C). Our data suggest that microalbuminuria and renal dysfunction were the main observable clinical parameters driving CVD risk in cluster 4, a risk that diverges prior to the conversion to T2D. Clinically, these findings imply that early and aggressive treatment during prediabetes with renal protective agents, such as ACE and sodium–glucose cotransporter 2 inhibitors, could prevent CVD events in this subgroup.
Among the studies that have proposed T2D classification subtypes from cross-sectional data analysis (8–10,31) (systematically reviewed in Misra et al. [10]), we compared our clusters with Ahlqvist et al. (8), who performed clustering at the time of diabetes diagnosis in a Swedish diabetes registry, and Wagner et al. (11), who performed clustering in a German cohort with prediabetes. Despite differences in methodology, cohort composition, and ancestry, all three studies support the existence of certain physiological archetypes: 1) insulin sensitive with mild diabetes (DPP: cluster 1; Ahlqvist: mild age-related diabetes; Wagner: “low risk 1 & 2”), 2) metabolically healthy obesity (DPP: cluster 2; Ahlqvist: mild obesity-related diabetes; Wagner: “low risk obese”), and 3) severe insulin resistance (DPP: cluster 3; Ahlqvist: severe insulin-resistant diabetes/SIRD; Wagner: “high-risk insulin resistant 5 and 6”). A notable difference in our clustering from both Ahlqvist et al. and Wagner et al. is that the incorporation of temporal trajectories allows us to segregate microalbuminuria following T2D with other microvascular complications (cluster 3) from microalbuminuria preceding T2D, which segregates most strongly with CVD (cluster 4). While neither Ahlqvist et al. (8) nor Wagner et al. (11) distinguished preexisting vs postdiabetes renal dysfunction, the Wagner et al. subgroup with prediabetes with the highest risk of CVD (“high-risk insulin resistant 5”) also had the highest baseline uACR, supporting our finding that prediabetic renal dysfunction may be a clinically and physiologically distinct from post-T2D diabetic nephropathy.
While our study is temporally comprehensive and multiethnic, we note several limitations. First, our analysis did not identify the known physiological archetype of β-cell failure/autoimmune diabetes. This is likely due to the DPP/DPPOS study being depleted of individuals with autoimmunity and severe β-cell dysfunction, as previously shown in post hoc analysis (32). Second, the need for longitudinal data limits direct clinical application, as the length and frequency of follow-up are rarely as extensive in real-world settings. Third, the challenge of obtaining such longitudinal data also limits our ability to externally validate our findings, which will be needed to ensure generalizability beyond the DPP. Finally, while our approach identifies associations between clusters and complications, the causal mechanisms underlying these associations remain to be fully elucidated. To this end, incorporating genetic and environmental risk factors in future models could offer deeper insights into the mechanisms driving heterogeneity and improve risk stratification. Future studies integrating multiomic data and mechanistic investigations will be essential to understand the pathophysiological underpinnings of these clusters.
In summary, our longitudinal clustering reveals that most individuals with prediabetes remain metabolically resilient, while those susceptible to micro- and macrovascular complications are driven by divergent trajectories of insulin resistance, hyperglycemia, and renal dysfunction. These clinical trajectories suggest temporal windows for intervention such as early renal and CVD protective therapies in individuals with prediabetic microalbuminuria, and weight-sparing glucose-lowering agents in severely insulin-resistant individuals. Future research should focus on refining these clusters molecularly and physiologically, identifying novel biomarkers to assign cluster membership at baseline, and validating their applicability to diverse patient populations.
This article contains supplementary material online at https://doi.org/10.2337/figshare.29647037.
Article Information
Acknowledgments. The authors thank Alexandra M. Blee, Department of Medicine, University of California San Diego, for assistance with editing the manuscript and the DPP Publications and Presentation Committee for their suggestions. The analyses in the study were conducted under an approved DPP ancillary study (AS20-47: “Validating blood, urine, and phenotypic biomarkers of tissue specific insulin resistance and its clinical sequelae”). The authors gratefully acknowledge the contributions of the DPP and DPPOS investigators and participants. During the preparation of this work, the authors used ChatGPT to generate alternative phrasing for portions of the methods to simplify them for a nontechnical audience. After using this tool/service, the authors formally reviewed the content for its accuracy and edited it as necessary. The authors take full responsibility for all the content of this publication.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The opinions expressed are those of the study group and do not necessarily reflect the views of the funding agencies.
Duality of Interest. A.R.M. is a consultant for Terns Pharmaceuticals. P.R. is a consultant for Simula Research Laboratories in Oslo, Norway, and receives income. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. E.K. and A.R.M. conceived this study. E.K., N.J.L.-S., N.C., P.R., and A.R.M. designed and conducted the analysis. E.K. and A.R.M. conceived this study. E.K. and A.R.M. wrote the draft of the paper. All authors contributed to data interpretation and read and approved the final report. A.R.M. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Prior Presentation. Parts of this study were presented as an oral presentation at the 84th Scientific Sessions of the American Diabetes Association, Orlando, FL, 21–24 June 2024.
Handling Editors. The journal editors responsible for overseeing the review of the manuscript were Elizabeth Selvin and Alka M. Kanaya.
Funding Statement
This research was funded in whole or in part by National Institutes of Health (NIH), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) grants R01DK129840 and R01DK123422, and by the U.S. Department of Veterans Affairs, Office of Research and Development grant 1I01BX006293 to A.R.M. N.L.S. is supported by the National Institute of Biomedical Imaging and Bioengineering (T32EB9380) and a University of California, San Diego Sloan Scholar Fellowship from the Alfred P. Sloan Foundation. Research reported in this publication was supported by the NIDDK of the NIH under award numbers U01 DK048489, U01 DK048339, U01 DK048377, U01 DK048349, U01 DK048381, U01 DK048468, U01 DK048434, U01 DK048485, U01 DK048375, U01 DK048514, U01 DK048437, U01 DK048413, U01 DK048411, U01 DK048406, U01 DK048380, U01 DK048397, U01 DK048412, U01 DK048404, U01 DK048387, U01 DK048407, U01 DK048443, and U01 DK048400, by providing funding during DPP and DPPOS to the clinical centers and the Coordinating Center for the design and conduct of the study, and collection, management, analysis, and interpretation of the data. Funding was also provided by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Institute on Aging, the National Eye Institute, the National Heart Lung and Blood Institute, the National Cancer Institute, the Office of Research on Women’s Health, the National Institute on Minority Health and Health Disparities, the Centers for Disease Control and Prevention, and the American Diabetes Association. The Southwestern American Indian Centers were supported directly by the NIDDK, including its Intramural Research Program, and the Indian Health Service. The General Clinical Research Center Program, National Center for Research Resources, and the Department of Veterans Affairs supported data collection at many of the clinical centers. Merck KGaA provided medication for DPPOS. DPP/DPPOS have also received donated materials, equipment, or medicines for concomitant conditions from Bristol-Myers Squibb, Parke-Davis, and LifeScan Inc., Health O Meter, Hoechst Marion Roussel, Inc., Merck-Medco Managed Care, Inc., Merck and Co., Nike Sports Marketing, Slim Fast Foods Co., and Quaker Oats Co. McKesson BioServices Corp., Matthews Media Group, Inc., and the Henry M. Jackson Foundation provided support services under subcontract with the Coordinating Center.
Footnotes
A complete list of the Diabetes Prevention Program Research Group can be found in the supplementary material online.
This article is part of a special article collection available at https://diabetesjournals.org/collection/2292/DPP-and-DPPOS-Article-Collection.
Contributor Information
Amit R. Majithia, Email: amajithia@health.ucsd.edu.
Diabetes Prevention Program Research Group:
George A. Bray, Kishore M. Gadde, Iris W. Culbert, Jennifer Arceneaux, Annie Chatellier, Amber Dragg, Catherine M. Champagne, Crystal Duncan, Barbara Eberhardt, Frank Greenway, Fonda G. Guillory, April A. Herbert, Michael L. Jeffirs, Betty M. Kennedy, Erma Levy, Monica Lockett, Jennifer C. Lovejoy, Laura H. Morris, Lee E. Melancon, Donna H. Ryan, Deborah A. Sanford, Kenneth G. Smith, Lisa L. Smith, Julia A. St. Amant, Richard T. Tulley, Paula C. Vicknair, Donald Williamson, Jeffery J. Zachwieja, Kenneth S. Polonsky, Janet Tobian, David A. Ehrmann, Margaret J. Matulik, Karla A. Temple, Bart Clark, Kirsten Czech, Catherine DeSandre, Brittnie Dotson, Ruthanne Hilbrich, Wylie McNabb, Ann R. Semenske, Celeste C. Thomas, Jose F. Caro, Kevin Furlong, Barry J. Goldstein, Pamela G. Watson, Kellie A. Smith, Jewel Mendoza, Marsha Simmons, Wendi Wildman, Renee Liberoni, John Spandorfer, Constance Pepe, Richard P. Donahue, Ronald B. Goldberg, Ronald Prineas, Jeanette Calles, Anna Giannella, Patricia Rowe, Juliet Sanguily, Paul Cassanova-Romero, Sumaya Castillo-Florez, Hermes J. Florez, Rajesh Garg, Lascelles Kirby, Olga Lara, Carmen Larreal, Valerie McLymont, Jadell Mendez, Arlette Perry, Patrice Saab, Bertha Veciana, Steven M. Haffner, Helen P. Hazuda, Maria G. Montez, Kathy Hattaway, Juan Isaac, Carlos Lorenzo, Arlene Martinez, Monica Salazar, Tatiana Walker, Dana Dabelea, Richard F. Hamman, Patricia V. Nash, Sheila C. Steinke, Lisa Testaverde, Jennifer Truong, Denise R. Anderson, Larry B. Ballonoff, Alexis Bouffard, Brian Bucca, B. Ned Calonge, Lynne Delve, Martha Farago, James O. Hill, Shelley R. Hoyer, Tonya Jenkins, Bonnie T. Jortberg, Dione Lenz, Marsha Miller, Thomas Nilan, Leigh Perreault, David W. Price, Judith G. Regensteiner, Emily B. Schroeder, Helen Seagle, Carissa M. Smith, Brent VanDorsten, Edward S. Horton, Medha Munshi, Kathleen E. Lawton, Sharon D. Jackson, Catherine S. Poirier, Kati Swift, Ronald A. Arky, Marybeth Bryant, Jacqueline P. Burke, Enrique Caballero, Karen M. Callaphan, Barbara Fargnoli, Therese Franklin, Om P. Ganda, Ashley Guidi, Mathew Guido, Alan M. Jacobsen, Lyn M. Kula, Margaret Kocal, Lori Lambert, Kathleen E. Lawton, Sarah Ledbury, Maureen A. Malloy, Roeland J.W. Middelbeek, Maryanne Nicosia, Cathryn F. Oldmixon, Jocelyn Pan, Marizel Quitingon, Riley Rainville, Stacy Rubtchinsky, Ellen W. Seely, Jessica Sansoucy, Dana Schweizer, Donald Simonson, Fannie Smith, Caren G. Solomon, Jeanne Spellman, James Warram, Steven E. Kahn, Brenda K. Montgomery, Basma Fattaleh, Celeste Colegrove, Wilfred Fujimoto, Robert H. Knopp, Edward W. Lipkin, Michelle Marr, Ivy Morgan-Taggart, Anne Murillo, Kayla O’Neal, Dace Trence, Lonnese Taylor, April Thomas, Elaine C. Tsai, Samuel Dagogo-Jack, Abbas E. Kitabchi, Mary E. Murphy, Laura Taylor, Jennifer Dolgoff, William B. Applegate, Michael Bryer-Ash, Debra Clark, Sandra L. Frieson, Uzoma Ibebuogu, Raed Imseis, Helen Lambeth, Lynne C. Lichtermann, Hooman Oktaei, Harriet Ricks, Lily M.K. Rutledge, Amy R. Sherman, Clara M. Smith, Judith E. Soberman, Beverly Williams-Cleaves, Avnisha Patel, Ebenezer A. Nyenwe, Ethel Faye Hampton, Boyd E. Metzger, Mark E. Molitch, Amisha Wallia, Mariana K. Johnson, Daphne T. Adelman, Catherine Behrends, Michelle Cook, Marian Fitzgibbon, Mimi M. Giles, Deloris Heard, Cheryl K.H. Johnson, Diane Larsen, Anne Lowe, Megan Lyman, David McPherson, Samsam C. Penn, Thomas Pitts, Renee Reinhart, Susan Roston, Pamela A. Schinleber, Matthew O’Brien, Monica Hartmuller, David M. Nathan, Charles McKitrick, Heather Turgeon, Mary Larkin, Marielle Mugford, Kathy Abbott, Ellen Anderson, Laurie Bissett, Kristy Bondi, Enrico Cagliero, Jose C. Florez, Linda Delahanty, Valerie Goldman, Elaine Grassa, Lindsery Gurry, Kali D’Anna, Fernelle Leandre, Peter Lou, Alexandra Poulos, Elyse Raymond, Valerie Ripley, Christine Stevens, Beverly Tseng, Kathy Chu, Nopporn Thangthaeng, Jerrold M. Olefsky, Elizabeth Barrett-Connor, Sunder Mudaliar, Maria Rosario Araneta, Mary Lou Carrion-Petersen, Karen Vejvoda, Sarah Bassiouni, Madeline Beltran, Lauren N. Claravall, Jonalle M. Dowden, Steven V. Edelman, Pranav Garimella, Robert R. Henry, Javiva Horne, Marycie Lamkin, Simona Szerdi Janesch, Diana Leos, William Polonsky, Rosa Ruiz, Jean Smith, Jennifer Torio-Hurley, F. Xavier Pi-Sunyer, Blandine Laferrere, Jane E. Lee, Susan Hagamen, David B. Allison, Nnenna Agharanya, Nancy J. Aronoff, Maria Baldo, Jill P. Crandall, Sandra T. Foo, Kim Kelly-Dinham, Jose A. Luchsinger, Carmen Pal, Kathy Parkes, Mary Beth Pena, Ellen S. Rooney, Gretchen E.H. Van Wye, Kristine A. Viscovich, Mary de Groot, David G. Marrero, Kieren J. Mather, Melvin J. Prince, Susie M. Kelly, Marcia A. Jackson, Gina McAtee, Paula Putenney, Ronald T. Ackermann, Carolyn M. Cantrell, Yolanda F. Dotson, Edwin S. Fineberg, Megan Fultz, John C. Guare, Angela Hadden, James M. Ignaut, Marion S. Kirkman, Erin O’Kelly Phillips, Kisha L. Pinner, Beverly D. Porter, Paris J. Roach, Nancy D. Rowland, Madelyn L. Wheeler, Vanita Aroda, Michelle Magee, Robert E. Ratner, Michelle Magee, Gretchen Youssef, Sue Shapiro, Natalie Andon, Catherine Bavido-Arrage, Geraldine Boggs, Marjorie Bronsord, Ernestine Brown, Holly Love Burkott, Wayman W. Cheatham, Susan Cola, Cindy Evans, Peggy Gibbs, Tracy Kellum, Lilia Leon, Milvia Lagarda, Claresa Levatan, Milajurine Lindsay, Asha K. Nair, Jean Park, Maureen Passaro, Angela Silverman, Gabriel Uwaifo, Debra Wells-Thayer, Renee Wiggins, Mohammed F. Saad, Karol Watson, Christine Darwin, Preethi Srikanthan, Tamara Horwich, Adrian Casillas, Arleen Brown, Maria Budget, Sujata Jinagouda, Medhat Botrous, Anthony Sosa, Sameh Tadros, Khan Akbar, Claudia Conzues, Perpetua Magpuri, Carmen Muro, Noemi Neira, Kathy Ngo, Michelle Chan, Veronica Villarreal, Amer Rassam, Debra Waters, Kathy Xapthalamous, Julio V. Santiago, Samuel Dagogo-Jack, Neil H. White, Angela L. Brown, Samia Das, Prajakta Khare-Ranade, Tamara Stich, Ana Santiago, Edwin Fisher, Emma Hurt, Tracy Jones, Michelle Kerr, Lucy Ryder, Cormarie Wernimont, Sherita Hill Golden, Christopher D. Saudek, Vanessa Bradley, Emily Sullivan, Tracy Whittington, Caroline Abbas, Adrienne Allen, Frederick L. Brancati, Sharon Cappelli, Jeanne M. Clark, Jeanne B. Charleston, Janice Freel, Katherine Horak, Alicia Greene, Dawn Jiggetts, Deloris Johnson, Hope Joseph, Kimberly Loman, Nestoras Mathioudakis, Henry Mosley, John Reusing, Richard R. Rubin, Alafia Samuels, Thomas Shields, Shawne Stephens, Kerry J. Stewart, LeeLana Thomas, Evonne Utsey, Paula Williamson, David S. Schade, Karwyn S. Adams, Janene L. Canady, Carolyn Johannes, Claire Hemphill, Penny Hyde, Leslie F. Atler, Patrick J. Boyle, Mark R. Burge, Lisa Chai, Kathleen Colleran, Ateka Fondino, Ysela Gonzales, Doris A. Hernandez-McGinnis, Patricia Katz, Carolyn King, Julia Middendorf, Amer Rassam, Sofya Rubinchik, Willette Senter, Debra Waters, Jill Crandall, Harry Shamoon, Janet O. Brown, Gilda Trandafirescu, Danielle Powell, Norica Tomuta, Elsie Adorno, Liane Cox, Helena Duffy, Samuel Engel, Allison Friedler, Angela Goldstein, Crystal J. Howard-Century, Jennifer Lukin, Stacey Kloiber, Nadege Longchamp, Helen Martinez, Dorothy Pompi, Jonathan Scheindlin, Elissa Violino, Elizabeth A. Walker, Judith Wylie-Rosett, Elise Zimmerman, Joel Zonszein, Trevor Orchard, Elizabeth Venditti, Rena R. Wing, Susan Jeffries, Gaye Koenning, M. Kaye Kramer, Marie Smith, Susan Barr, Catherine Benchoff, Miriam Boraz, Lisa Clifford, Rebecca Culyba, Marlene Frazier, Ryan Gilligan, Stephanie Guimond, Susan Harrier, Louann Harris, Andrea Kriska, Qurashia Manjoo, Monica Mullen, Alicia Noel, Amy Otto, Jessica Pettigrew, Bonny Rockette-Wagner, Debra Rubinstein, Linda Semler, Cheryl F. Smith, Valarie Weinzierl, Katherine V. Williams, Tara Wilson, Bonnie Gillis, Marjorie K. Mau, Narleen K. Baker-Ladao, John S. Melish, Richard F. Arakaki, Renee W. Latimer, Mae K. Isonaga, Ralph Beddow, Nina E. Bermudez, Lorna Dias, Jillian Inouye, Kathy Mikami, Pharis Mohideen, Sharon K. Odom, Raynette U. Perry, Robin E. Yamamoto, William C. Knowler, Robert L. Hanson, Harelda Anderson, Norman Cooeyate, Charlotte Dodge, Mary A. Hoskin, Carol A. Percy, Alvera Enote, Camille Natewa, Kelly J. Acton, Vickie L. Andre, Rosalyn Barber, Shandiin Begay, Peter H. Bennett, Mary Beth Benson, Evelyn C. Bird, Brenda A. Broussard, Brian C. Bucca, Marcella Chavez, Sherron Cook, Jeff Curtis, Tara Dacawyma, Matthew S. Doughty, Roberta Duncan, Cyndy Edgerton, Jacqueline M. Ghahate, Justin Glass, Martia Glass, Dorothy Gohdes, Wendy Grant, Ellie Horse, Louise E. Ingraham, Merry Jackson, Priscilla Jay, Roylen S. Kaskalla, Karen Kavena, David Kessler, Kathleen M. Kobus, Jonathan Krakoff, Jason Kurland, Catherine Manus, Cherie McCabe, Sara Michaels, Tina Morgan, Yolanda Nashboo, Julie A. Nelson, Steven Poirier, Evette Polczynski, Christopher Piromalli, Mike Reidy, Jeanine Roumain, Debra Rowse, Robert J. Roy, Sandra Sangster, Janet Sewenemewa, Miranda Smart, Chelsea Spencer, Darryl Tonemah, Rachel Williams, Charlton Wilson, Michelle Yazzie, Raymond Bain, Sarah Fowler, Marinella Temprosa, Michael D. Larsen, Kathleen Jablonski, Tina Brenneman, Sharon L. Edelstein, Solome Abebe, Julie Bamdad, Melanie Barkalow, Joel Bethepu, Tsedenia Bezabeh, Anna Bowers, Nicole Butler, Jackie Callaghan, Caitlin E. Carter, Costas Christophi, Gregory M. Dwyer, Mary Foulkes, Yuping Gao, Robert Gooding, Adrienne Gottlieb, Kristina L. Grimes, Nisha Grover-Fairchild, Lori Haffner, Heather Hoffman, Steve Jones, Tara L. Jones, Richard Katz, Preethy Kolinjivadi, John M. Lachin, Yong Ma, Pamela Mucik, Robert Orlosky, Qing Pan, Susan Reamer, James Rochon, Alla Sapozhnikova, Hanna Sherif, Charlotte Stimpson, Ashley Hogan Tjaden, Fredricka Walker-Murray, Lindsay Doherty, Audrey McMaster, Rhea Mundra, Hannah Rapoport, Nolan Kuenster, Elizabeth M. Venditti, Andrea M. Kriska, Linda Semler, Valerie Weinzierl, Santica Marcovina, F. Alan Aldrich, Jessica Harting, John Albers, Greg Strylewicz, Robert Janicek, Anthony Killeen, Deanna Gabrielson, R. Eastman, Judith Fradkin, Sanford Garfield, Christine Lee, Edward Gregg, Ping Zhang, Dan O’Leary, Gregory Evans, Matthew Budoff, Chris Dailing, Elizabeth Stamm, Ann Schwartz, Caroline Navy, Lisa Palermo, Pentti Rautaharju, Ronald J. Prineas, Teresa Alexander, Charles Campbell, Sharon Hall, Yabing Li, Margaret Mills, Nancy Pemberton, Farida Rautaharju, Zhuming Zhang, Elsayed Z. Soliman, Julie Hu, Susan Hensley, Lisa Keasler, Tonya Taylor, Barbara Blodi, Ronald Danis, Matthew Davis, Larry Hubbard, Ryan Endres, Deborah Elsas, Samantha Johnson, Dawn Myers, Nancy Barrett, Heather Baumhauer, Wendy Benz, Holly Cohn, Ellie Corkery, Kristi Dohm, Amitha Domalpally, Vonnie Gama, Anne Goulding, Andy Ewen, Cynthia Hurtenbach, Daniel Lawrence, Kyle McDaniel, Jeong Pak, James Reimers, Ruth Shaw, Maria Swift, Pamela Vargo, Sheila Watson, Jose A. Luchsinger, Jennifer Manly, Elizabeth Mayer-Davis, Robert R. Moran, Ted Ganiats, Kristin David, Andrew J. Sarkin, Erik Groessl, Naomi Katzir, Helen Chong, William H. Herman, Michael Brändle, Morton B. Brown, Jose C. Florez, David Altshuler, Liana K. Billings, Ling Chen, Maegan Harden, Robert L. Hanson, William C. Knowler, Toni I. Pollin, Alan R. Shuldiner, Kathleen Jablonski, Paul W. Franks, and Marie-France Hivert
Supporting information
References
- 1. National Diabetes Statistics Report . 2022. Accessed 21 August 2022. Available from https://www.cdc.gov/diabetes/data/statistics-report/index.html
- 2. Redondo MJ, Hagopian WA, Oram R, et al. The clinical consequences of heterogeneity within and between different diabetes types. Diabetologia 2020;63:2040–2048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Silvia P, Simona Z, Ernesto M, Raffaella B.. “H” for heterogeneity in the algorithm for type 2 diabetes management. Curr Diab Rep 2020;20:14. [DOI] [PubMed] [Google Scholar]
- 4. Zheng Y, Ley SH, Hu FB.. Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nat Rev Endocrinol 2018;14:88–98 [DOI] [PubMed] [Google Scholar]
- 5. American Diabetes Association Professional Practice Committee . 4. Comprehensive medical evaluation and assessment of comorbidities: Standards of Care in Diabetes—2025. Diabetes Care 2025;48(Suppl. 1):S59–S85 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Yamanouchi M, Furuichi K, Hoshino J, Ubara Y, Wada T.. Nonproteinuric diabetic kidney disease. Clin Exp Nephrol 2020;24:573–581 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Faerch K, Hulman A, Pj Solomon T.. Heterogeneity of pre-diabetes and type 2 diabetes: implications for prediction, prevention and treatment responsiveness. Curr Diabetes Rev 2015;12:30–41 [DOI] [PubMed] [Google Scholar]
- 8. Ahlqvist E, Storm P, Käräjämäki A, et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol 2018;6:361–369 [DOI] [PubMed] [Google Scholar]
- 9. Nair ATN, Wesolowska-Andersen A, Brorsson C, et al. Heterogeneity in phenotype, disease progression and drug response in type 2 diabetes. Nat Med 2022;28:982–988 [DOI] [PubMed] [Google Scholar]
- 10. Misra S, Wagner R, Ozkan B, et al. ADA/EASD PMDI . Precision subclassification of type 2 diabetes: a systematic review. Commun Med 2023;3:138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Wagner R, Heni M, Tabák AG, et al. Pathophysiology-based subphenotyping of individuals at elevated risk for type 2 diabetes. Nat Med 2021;27:49–57 [DOI] [PubMed] [Google Scholar]
- 12. Zaharia OP, Strassburger K, Strom A, et al. Risk of diabetes-associated diseases in subgroups of patients with recent-onset diabetes: a 5-year follow-up study. Lancet Diabetes Endocrinol 2019;7:684–694 [DOI] [PubMed] [Google Scholar]
- 13. Knowler WC, Barrett-Connor E, Fowler SE, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med 2002;346:393–403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Bro R. PARAFAC. Tutorial and applications. Chemometr Intell Lab Syst 1997;38:149–171 [Google Scholar]
- 15. Diabetes Prevention Program Research Group . Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study. Lancet Diabetes Endocrinol 2015;3:866–875 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Hastie T, Tibshirani R, Friedman J.. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Series in Statistics) . 2nd ed. New York, Springer, 2009, p. 745 [Google Scholar]
- 17. Rader DJ, Hovingh GK.. HDL and cardiovascular disease. Lancet 2014;384:618–625 [DOI] [PubMed] [Google Scholar]
- 18. Wang M, Briggs MR.. HDL: the metabolism, function, and therapeutic importance. Chem Rev 2004;104:119–138 [DOI] [PubMed] [Google Scholar]
- 19. Gillis N, Glineur F.. Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization. Neural Comput 2012;24:1085–1105 [DOI] [PubMed] [Google Scholar]
- 20. Zhao J, Zhang Y, Schlueter DJ, et al. Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: cardiovascular disease case study. J Biomed Inform 2019;98:103270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Wesolowska-Andersen A, Brorsson CA, Bizzotto R, et al. Four groups of type 2 diabetes contribute to the etiological and clinical heterogeneity in newly diagnosed individuals: an IMI DIRECT study. Cell Rep Med 2022;3:100477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Goldberg RB, Orchard TJ, Crandall JP, et al. On behalf of the Diabetes Prevention Program Research Group . Effects of long-term metformin and lifestyle interventions on cardiovascular events in the Diabetes Prevention Program and its outcome study. Circulation 2022;145:1632–1641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Davis M, Fisher M, Gangnon R, et al. Risk factors for high-risk proliferative diabetic retinopathy and severe visual loss: early Treatment Diabetic Retinopathy Study Report #18. Invest Ophthalmol Vis Sci 1998;39:233–252 [PubMed] [Google Scholar]
- 24. Lee CG, Ciarleglio A, Edelstein SL, et al. Diabetes Prevention Program Research Group . Prevalence of distal symmetrical polyneuropathy by Diabetes Prevention Program treatment group, diabetes status, duration of diabetes, and cumulative glycemic exposure. Diabetes Care 2024;47:810–817 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Hennig C. Cluster-wise assessment of cluster stability. Comput Stat Data Anal 2007;52:258–271 [Google Scholar]
- 26. Samson SL, Garber AJ.. Metabolic syndrome. Endocrinol Metab Clin North Am 2014;43:1–23 [DOI] [PubMed] [Google Scholar]
- 27. Blüher M. Metabolically healthy obesity. Endocr Rev 2020;41:405–420 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Coresh J, Selvin E, Stevens LA, et al. Prevalence of chronic kidney disease in the United States. JAMA 2007;298:2038–2047 [DOI] [PubMed] [Google Scholar]
- 29. Martin SS, Aday AW, Allen NB, et al. Heart disease and stroke statistics: a report of US and global data from the American Heart Association. Circulation 2025;151:e41–e660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Go AS, Chertow GM, Fan D, McCulloch CE, Hsu C-y. Chronic kidney disease and the risks of death, cardiovascular events, and hospitalization. N Engl J Med 2004;351:1296–1305 [DOI] [PubMed] [Google Scholar]
- 31. Li L, Cheng W-Y, Glicksberg BS, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med 2015;7:311ra174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Dabelea D, Ma Y, Knowler WC, et al. . Diabetes autoantibodies do not predict progression to diabetes in adults: the Diabetes Prevention Program. Diabet Med 2014;31:1064–1068 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



