Skip to main content
PLOS Medicine logoLink to PLOS Medicine
. 2020 Aug 7;17(8):e1003232. doi: 10.1371/journal.pmed.1003232

Development and validation of a model for predicting incident type 2 diabetes using quantitative clinical data and a Bayesian logistic model: A nationwide cohort and modeling study

Lua Wilkinson 1,2,*, Nengjun Yi 3, Tapan Mehta 4, Suzanne Judd 3, W Timothy Garvey 1,5
Editor: Karine Clément6
PMCID: PMC7413417  PMID: 32764746

Abstract

Background

Obesity is closely related to the development of insulin resistance and type 2 diabetes (T2D). The prevention of T2D has become imperative to stem the rising rates of this disease. Weight loss is highly effective in preventing T2D; however, the at-risk pool is large, and a clinically meaningful metric for risk stratification to guide interventions remains a challenge. The objective of this study is to predict T2D risk using full-information continuous analysis of nationally sampled data from white and black American adults age ≥45 years.

Methods and findings

A sample of 12,043 black (33%) and white individuals from a population-based cohort, REasons for Geographic And Racial Differences in Stroke (REGARDS) (enrolled 2003–2007), was observed through 2013–2016. The mean participant age was 63.12 ± 8.62 years, and 43.7% were male. Mean BMI was 28.55 ± 5.61 kg/m2. Risk factors for T2D regularly recorded in the primary care setting were used to evaluate future T2D risk using Bayesian logistic regression. External validation was performed using 9,710 participants (19% black) from Atherosclerotic Risk in Communities (ARIC) (enrolled 1987–1989), observed through 1996–1998. The mean participant age in this cohort was 53.86 ± 5.65 years, and 44.6% were male. Mean BMI was 27.15 ± 4.92 kg/m2. Predictive performance was assessed using the receiver operating characteristic (ROC) curves and area under the curve (AUC) statistics. The primary outcome was incident T2D. By 2016 in REGARDS, there were 1,602 incident cases of T2D. Risk factors used to predict T2D progression included age, sex, race, BMI, triglycerides, high-density lipoprotein, blood pressure, and blood glucose. The Bayesian logistic model (AUC = 0.79) outperformed the Framingham risk score (AUC = 0.76), the American Diabetes Association risk score (AUC = 0.64), and a cardiometabolic disease system (using Adult Treatment Panel III criteria) (AUC = 0.75). Validation in ARIC was robust (AUC = 0.85). Main limitations include the limited generalizability of the REGARDS sample to black and white, older Americans, and no time to diagnosis for T2D.

Conclusions

Our results show that a Bayesian logistic model using full-information continuous predictors has high predictive discrimination, and can be used to quantify race- and sex-specific T2D risk, providing a new, powerful predictive tool. This tool can be used for T2D prevention efforts including weight loss therapy by allowing clinicians to target high-risk individuals in a manner that could be used to optimize outcomes.


In a modelling study, Lua Wilkinson and colleagues quantify race- and sex-specific risk of Type 2 diabetes in a large US based cohort.

Author summary

Why was this study done?

  • Obesity affects approximately 42% of the US population and causes significant morbidity, including a marked increase in insulin resistance and type 2 diabetes (T2D), and varies by sex and race.

  • Weight loss is effective in preventing T2D, but the at-risk pool is large and weight loss interventions are time-consuming and costly.

  • A simple tool to identify those at risk for developing T2D is needed.

What did the researchers do and find?

  • We performed a Bayesian logistic regression study using data from the national REGARDS (2003–2016) and ARIC (1987–1998) cohorts in the United States for risks associated with T2D in black and white American adults.

  • We investigated 8 demographic and metabolic syndrome risk factors for T2D and incorporated Bayesian hierarchical techniques into the development of a risk prediction calculation.

  • These 8 simple traits showed improved ability to predict progression to T2D compared to other commonly used paradigms, and can be used in clinical settings to target those at high risk for developing T2D.

What do these findings mean?

  • Using a different methodology, with simple, objective traits regularly measured in a clinical setting (by tests that can be performed by non-specialists), we showed that metabolic traits related to insulin resistance can be used to predict T2D in black and white American adults.

  • Rational strategies such as this can be used by clinicians to quantitatively assess T2D risk among those with obesity at high risk for the disease.

Introduction

The prevalence of type 2 diabetes (T2D) continues to rise, creating a greater burden in patients and adverse impacts in public health [1]. The rising prevalence of T2D is linked to escalating rates of obesity, and both T2D and obesity disproportionately affect certain populations, often along social, demographic, or economic lines. For example, non-Hispanic black Americans are affected by T2D with almost double the prevalence (13.4%) of non-Hispanic whites (7.3%), and also exhibit higher rates of obesity—particularly when comparing black and white women (13.2% versus 6.8%, respectively) [1].

Strategies for effective T2D prevention have become critically important to reduce the impact of this disease. A robust body of evidence is conclusive that weight loss is highly effective in preventing T2D—regardless of whether weight loss is achieved through lifestyle therapy [2], anti-obesity medications [3], or bariatric surgery [4]. However, the challenge that remains is 2-fold: First, sustained weight loss using the current tools of obesity management are labor intensive on the part of both the healthcare team and the patient, and, second, the at-risk pool of patients for T2D is quite large. By way of illustration, the National Health and Nutrition Examination Survey (NHANES) demonstrated that, in 2013–2014, 70.7% of US adults had overweight or obesity and 34.2% had metabolic syndrome, and all these individuals are at high risk of developing T2D [1].

Clearly, risk stratification approaches are needed to identify those at highest risk of T2D, and to optimize the benefit/risk ratio and cost-effectiveness of the application of weight loss therapy in the prevention of T2D. The majority of risk assessment strategies use binary predictors for risk factors, including those employed by National Cholesterol Education Program Adult Treatment Panel III (ATP III) [5]. Discretizing continuous predictors can result in the loss of valuable information and reduce the clinical usefulness of the predictive model [6]. For example, the risk conferred by metabolic syndrome traits exists over a spectrum of values, and binary responses do not adequately classify T2D risk over the quantitative range of risk factors [7]. Finally, the predictive value of various risk factors and risk scores may not be generalizable from one population to another. In particular, African Americans have been understudied with respect to risk models, score development, replication, and validation [8].

Guo et al. [9] earlier developed a cardiometabolic disease staging (CMDS) system using binary predictors using data from the Coronary Artery Risk Development in Young Adults (CARDIA) [10] and Atherosclerotic Risk in Communities (ARIC) [11] cohorts to predict incident T2D with specificity for sex and race. CMDS was developed using quantitative measures of metabolic syndrome traits (i.e., ATP III criteria) [12], with the limitation that these cohorts were not designed as nationally representative. Additionally, a binary prediction approach such as this does not fully take into account the risk conferred by cardiometabolic disease manifestations due to pathophysiological processes of adipocyte dysfunction, systemic inflammation, and oxidative stress [13]. There have been attempts to observe an association between metabolic syndrome z-scores and risk of future T2D using a continuous metabolic severity score [14]. However, these analyses fitted separate logistic models for each metabolic syndrome trait and did not consider possible interactions, such as between high-density lipoprotein (HDL) and triglycerides [15].

Our current objective was to create a highly predictive score that rigorously captures race and sex differences in T2D risk. This was done using a large national cohort of black and white Americans from the REasons for Geographic And Racial Differences in Stroke (REGARDS) study. We compared the predictive ability of a CMDS T2D prediction model using individual laboratory and anthropometric measurements as continuous functions with Bayesian logistic regression. We also compared the predictive accuracy of enhanced CMDS with other existing T2D prediction scores by looking at receiver operating characteristic (ROC) curves and area under the curve (AUC) statistics. The purpose of this analysis is to create a tool using quantitative predictors available in real-world clinical practice that identifies individuals who are most likely to benefit from therapies to prevent T2D. The application of CMDS allows clinicians treating those with overweight/obesity to target effective weight loss strategies in those at highest risk of T2D, in order to optimize the benefit/risk ratio and cost-effectiveness of interventions.

Methods

The institutional review board of the University of Alabama at Birmingham designated this analysis as not human subjects research and waived the need for approval. The analyses were prespecified and approved by the REGARDS committee. This study is reported as per the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guideline (S1 TRIPOD Checklist).

Study populations

The enhanced CMDS model was developed in REGARDS and externally validated in ARIC. REGARDS was chosen as it is one of the largest and most recent surveys of black and white adults that collected information relevant to T2D risk. ARIC was chosen for external validation as it is a recent longitudinal scientific sample of black and white Americans. These analyses used only de-identified data.

REGARDS

The REGARDS study is an ongoing longitudinal survey designed to look at stroke mortality of black and white Americans. The design has been reported elsewhere [16]. In this scientific sample from the US, a total of 30,239 black and white men and women age 45 years and older from 48 states and the District of Columbia were enrolled between 2003 and 2007. Participants were interviewed by telephone, followed by an in-home visit for physiological measures and obtaining biosamples, at baseline, and then observed for a median follow-up duration of 9.5 ± 0.9 years (second in-home visit, 2013–2016). Follow-up time is rounded to 10 years for reporting. Information on incident T2D was collected at baseline and follow-up. We restricted the analysis to those without T2D at baseline who had completed the second in-home visit. Between the first and second visit, 5,713 individuals died, and 8,532 withdrew from further follow-up, leaving a population of 15,938 with follow-up data available. Individuals with T2D at baseline (n = 3,260) and those missing relevant covariate information at baseline (n = 635) were excluded, leaving a final study population of 12,043 individuals. Site institutional review boards approved the protocol, and informed consent was obtained.

Collection of blood specimens, physical measurements, and urine was performed using standardized methods. Participants were asked to fast for 10–12 hours before the visit (n = 9,332). Those who did not fast (n = 1,440) or had no information on fasting (n = 1,271) were included as non-fasters. T2D was defined by fasting blood glucose level ≥ 7.0 mmol/l, non-fasting blood glucose ≥ 11.1 mmol/l, self-reported T2D, or being on diabetes medication. Race was defined by self-report as black or white. Standardized blood pressure was taken twice in-home and calculated as the average of the 2 measurements. Lipids were assayed using either the fasting or non-fasting sample.

ARIC

The ARIC study is a longitudinal, ongoing prospective study initiated in 1987 [17]. ARIC includes 15,792 black and white men and women age 45–64 years at baseline from 4 US communities: Jackson, Mississippi; Forsyth County, North Carolina; Minneapolis, Minnesota; and Washington County, Maryland. Individuals were interviewed at 4 distinct follow-up time points between 1990 and 2013. We restricted this analysis to 2 time points (1987–1989 and 1996–1998), matching the length of follow-up in REGARDS. Information on T2D was collected at both time points—those with T2D at baseline and/or missing relevant covariate information were excluded, along with those lost to follow-up or death; the final population included 9,710 individuals.

Analysis of fasting and plasma specimens was performed at central laboratories. For incident T2D, we included those with self-report of T2D or being on T2D medication, as described by the ARIC protocol [18]. Site institutional review boards approved the study at each site, and informed consent was obtained.

Predictors used to determine T2D risk

In order to predict future T2D we relied on objective, quantitative traits commonly available in clinical care venues, particularly in patients presenting with obesity or metabolic syndrome: blood glucose, BMI and waist circumference, systolic blood pressure (SBP) and diastolic blood pressure (DBP), HDL cholesterol, and triglycerides [19]. We assessed these traits as continuous predictors. Additionally, in order to improve clinical and general applications of the score, we examined, using correlation matrices and AUC, whether substituting BMI for waist circumference changed predictive ability.

Statistical methods

We used Bayesian logistic regression models to analyze our data by jointly fitting prespecified predictors and/or their interactions. Following Gelman et al. [20], we used weakly informative Cauchy priors, which have the advantage of providing minimal prior information to constrain the coefficients in a reasonable range, stabilizing the model fitting, and improving the model prediction performance [20,21]. We fitted the Bayesian logistic regression models with Cauchy priors by incorporating an approximate expectation-maximization algorithm into the usual iteratively weighted least squares used in classical logistic regression. For large datasets and only a few predictors, conventional logistic regression may perform similarly to Bayesian logistic regression. However, Bayesian models with weakly informative priors can provide more reliable results if there are problems of correlation and overfitting.

We built a Bayesian logistic model using REGARDS and evaluated its predictive values in ARIC. We used several measures to assess the predictive performance, including AUC, mean squared error, and misclassification [22,23]. We compared the main-effect model, with only the main effects of the predictors mentioned above, with the interacting model, which included all the main effects and also multiple interactions, including sex × race, SBP × DBP, HDL × triglycerides, waist circumference × BMI, BMI × HDL, and BMI × triglycerides. We also tested if using mean arterial pressure conferred any benefit over SBP and DBP. The model fitting and predictive evaluation were implemented using R package BhGLM (Bayesian hierarchical generalized linear models) (https://github.com/nyiuab/BhGLM) [24].

We also compared our method with several other predictive modeling methods, including lasso, generalized additive modeling, random forests, and support vector machine learning (S1 Text). We found that our Bayesian logistic model outperformed these alternative approaches (S1 Table).

To create a useable, interactive instrument, we calculated the predictive risk probabilities based on the fitted Bayesian logistic model, allowing one to simply input an individual’s actual data into a computer program and receive a risk probability based on his/her personal anthropometric, demographic, and laboratory values. The formula for calculating the predictive risk probabilities of incident T2D can be found in S2 Text.

Comparisons to other risk scores

We compared the AUC from the Bayesian logistic model with the CMDS model [19] developed using discontinuous traits conforming with ATP III criteria. We also report the differences in AUC between the current Bayesian logistic model and the Framingham [25] and American Diabetes Association [26] risk scores. We recalculated the AUC for these scores using logistic regression methods and available REGARDS data; we were unable to include family history, as it is a somewhat subjective and nonquantitative variable that is unavailable for REGARDS participants.

Results

Baseline characteristics of study participants are reported in Table 1. In REGARDS, 12,043 eligible participants without T2D at baseline completed the follow-up examination and had complete data on relevant covariates (mean age 63.1 years, range 45–92 years; 33% black). During a follow-up time ranging from 7.4 to 13.4 years (median 9.5 ± 0.9 years), there were 1,602 cases of new T2D (13.3%). Approximately 75% of participants were overweight or had obesity. Ranges of all variables included are reported in S2 Table.

Table 1. Baseline characteristics of included participants.

Characteristic Study
REGARDS ARIC
Total Black women Black men White women White men Total
Population n 12,043 2,578 1,394 4,204 3,867 9,710
White1, n (%) 8,071 (67)   7,906 (81.4)
Male1, n (%) 5,261 (43.7)         4,326 (44.6)
Age (years) 63.12 (8.62) 62.19 (8.69) 62.27 (8.24) 63.18 (8.81) 63.98 (8.43) 53.86 (5.65)
Body mass index (kg/m2) 28.55 (5.61) 31.16 (6.67) 28.63 (5.07) 27.56 (5.65) 27.87 (4.27) 27.15 (4.92)
Waist circumference (cm) 93.24 (14.3) 93.94 (14.32) 97.71 (12.80) 86.50 (13.97) 98.48 (12.02) 95.57 (13.12)
Systolic blood pressure (mm Hg) 125.00 (15.49) 127.51 (16.43) 129.21 (15.37) 121.65 (15.35) 125.45 (14.27) 118.66 (16.99)
Diastolic blood pressure (mm Hg) 76.33 (9.18) 77.99 (9.34) 79.15 (9.52) 74.14 (8.79) 76.60 (8.81) 72.88 (10.58)
Blood glucose (mmol/l) 5.16 (0.68) 5.21 (0.74) 5.27 (0.74) 5.05 (0.62) 5.19 (0.67) 5.47 (0.51)
HDL cholesterol (mmol/l) 1.38 (0.42) 1.52 (0.41) 1.26 (0.37) 1.54 (0.42) 1.17 (0.34) 1.36 (0.44)
Triglycerides (mmol/l) 1.42 (0.89) 1.15 (0.61) 1.31 (1.21) 1.48 (0.78) 1.56 (0.98) 1.40 (0.85)
ATP III2, n (%)            
    0 risk factors 2,113 (17.5) 254 (9.9) 199 (14.3) 976 (23.2) 684 (17.7) 1,666 (17.2)
    1 risk factor 3,340 (27.7) 603 (23.4) 462 (33.1) 1,143 (27.2) 1,132 (29.3) 2,521 (26.0)
    2 risk factors 3,244 (26.9) 881 (34.2) 415 (29.8) 984 (23.4) 964 (24.9) 2,336 (24.1)
    3 or more risk factors 2,246 (27.8) 840 (32.6) 318 (22.8) 1,101 (26.2) 1,087 (28.1) 3,189 (32.8)
Diabetes incidence3 at second in-home visit, n (%) 1,602 (13.3) 482 (18.7) 257 (18.4) 386 (9.2) 477 (12.3) 927 (9.5)

Data are mean (SD) unless otherwise indicated.

1Race and sex were self-reported.

2Risk factors defined as follows: fasting blood glucose > 5.55 mmol/l; waist circumference > 102 cm in men, >88 cm in women; systolic blood pressure > 130 mm Hg or diastolic blood pressure > 85 mm Hg or on antihypertensive medication; HDL cholesterol < 1.03 mmol/l in men, <1.29 mmol/l in women; and fasting triglycerides > 1.69 mmol/l or on lipid-lowering medication.

3Incident diabetes is defined as fasting glucose ≥ 7.0 mmol/l, non-fasting glucose ≥ 11.1 mmol/l, currently on medication for diabetes, or self-report of diabetes diagnosis.

ARIC, Atherosclerotic Risk in Communities; ATP III, Adult Treatment Panel III; HDL, high-density lipoprotein; REGARDS, REasons for Geographic And Racial Differences in Stroke.

For external validation using ARIC, 9,710 participants completed the follow-up examination (mean age 53.9 years, range 45–66 years; 19% black) and had complete data on relevant covariates. During a follow-up time of 10 years, there were 927 cases of new T2D (9.5%). Almost 65% of participants had overweight or obesity.

Black females had the highest incidence of T2D in REGARDS (18.7%) and ARIC (16.3%); white females had the lowest (9.2% and 6.4%). Black females had the highest prevalence of obesity in both surveys, using both BMI (51% and 43%) and elevated waist circumference (64% and 74%). In terms of cardiometabolic risk profile, 34% of people in REGARDS presented with metabolic syndrome, 33% in ARIC.

The fitted models and their predictive values

Results from the fitted Bayesian logistic model with main effects are presented in Table 2. The model fitted in REGARDS had an AUC of 0.79 (95% CI 0.78–0.80). External validation using the model generated in REGARDS was conducted in the ARIC cohort, for which the AUC was 0.85 (95% CI 0.83–0.86). This model included the variables of SBP, DBP, blood glucose, BMI, HDL, triglycerides, age (45–92 years), sex (male or female), and race (black or white). Importantly, this model incorporated risk conferred over the continuum of values for each risk factor as well as the effect that age, race, and sex have on the contributions of the factors to overall T2D risk.

Table 2. Predictive power, validation, and interactions.

Model AUC MSE1 Misclassification
REGARDS: Development2 0.789 0.099 0.131
ARIC: External validation 0.846 0.074 0.090
Interactions3      
Sex and race with main effects4 0.794 0.098 0.130
SBP × DBP5 0.788 0.099 0.131
MAP6 0.789 0.099 0.131
HDL × triglycerides7 0.779 0.100 0.133
Waist circumference × BMI8 0.780 0.100 0.132
BMI × HDL9 0.787 0.099 0.131
BMI × triglycerides10 0.785 0.100 0.132

1MAP calculated as the average squared difference between the observed and predicted values.

2Analyzed using Bayesian logistic regression. Diabetes incidence ~ age + sex + race + BMI + triglycerides + HDL cholesterol + SBP + DBP + blood glucose.

3Analyzed using the REGARDS dataset by Bayesian logistic regression.

4Diabetes incidence ~ (age + BMI + triglycerides + HDL cholesterol + SBP + DBP + blood glucose) × (sex:race).

5Diabetes incidence ~ age + sex + race + BMI + triglycerides + HDL cholesterol + SBP:DBP + blood glucose.

6MAP calculated as [(2 ×DBP) + SBP]/3. Diabetes incidence ~ age + sex + race + BMI + triglycerides + HDL cholesterol + MAP + blood glucose.

7Diabetes incidence ~ age + sex + race + BMI + triglycerides:HDL cholesterol + SBP + DBP + blood glucose.

8Diabetes incidence ~ age + sex + race + BMI:waist circumference + triglycerides + HDL cholesterol + SBP + DBP + blood glucose.

9Diabetes incidence ~ age + sex + race + BMI:HDL cholesterol + triglycerides + SBP + DBP + blood glucose.

10Diabetes incidence ~ age + sex + race + BMI:triglycerides + HDL cholesterol + SBP + DBP + blood glucose.

ARIC, Atherosclerotic Risk in Communities; AUC, area under the curve; DBP, diastolic blood pressure; HDL, high-density lipoprotein; MAP, mean arterial pressure; MSE, mean squared error; REGARDS, REasons for Geographic And Racial Differences in Stroke; SBP, systolic blood pressure.

We did not observe significant improvements in AUC and other measures when including interactions in the predictive model, and, in fact, only observed a mild improvement when interacting sex and race with main effects, where the AUC went from 0.789 (main effects with sex and race included as main effects, no interaction) to 0.794 (interaction). Inclusion of interactions involving SBP and DBP, HDL and triglycerides, and waist circumference and BMI did not enhance the predictive accuracy, nor did the substitution of mean arterial pressure for DBP and SBP.

Fig 1 shows the estimated odds ratios (ORs) of incident T2D for individual risk factors in REGARDS based on the main-effect model. All factors used to construct the fitted model, except for sex and DBP, significantly impact the risk of T2D. While these other factors significantly provide additional information about T2D risk when added to the model, the risk factors associated with the greatest impact on odds of future T2D were blood glucose (OR 1.06, 95% CI 1.06 to 1.07) and race (white, OR 0.63, 95% CI 0.56 to 0.71).

Fig 1. Odds ratios of incident type 2 diabetes for individual risk factors used to construct the fitted main-effect logistic model.

Fig 1

The points and lines present the estimated values and 95% CIs, respectively. Odds ratios are as follows: systolic blood pressure (SBP), 1.006 (95% CI 1.001 to 1.011); diastolic blood pressure (DBP), 1.003 (95% CI 0.995 to 1.012); blood glucose (BG), 1.064 (95% CI 1.059 to 1.069); BMI, 1.055 (95% CI 1.044 to 1.066); high-density lipoprotein (HDL) cholesterol, 0.982 (95% CI 0.979 to 0.987); triglycerides (TG), 1.001 (95% CI 1.001 to 1.002); age, 0.987 (95% CI 0.979 to 0.994); white race (raceW), 0.628 (95% CI 0.556 to 0.709); male sex (sexM), 0.919 (95% CI 0.808 to 1.046). The references for the binary predictors race and sex are black and female, respectively.

Correlation of parameters: Waist circumference and BMI

Waist circumference and BMI are similarly correlated with T2D risk (waist circumference correlation coefficient [CC] 0.19; BMI CC 0.19). Additionally, waist circumference and BMI are correlated with each other (CC 0.74). Although waist circumference is not routinely assessed in many clinical venues, we alternatively analyzed its predictive power, and found that the AUC for the model with waist circumference was 0.791, compared to 0.789 for the model with BMI. While waist circumference did appear to confer a minimal improvement to the AUC, given the clinical application of the Bayesian logistic model, BMI is an appropriate substitute for waist circumference without substantial loss of predictability. In addition, we tested the correlation between all other parameters: SBP and DBP showed a CC of 0.62; all other pairs of parameters had CC < 0.2.

Comparisons to previous models

Fig 2 presents ROC curves comparing the Bayesian logistic model using continuous variables to our previous score model using binary ATP III criteria and age predictors (AUC 0.75) [19], as well as the Framingham (AUC 0.76) and American Diabetes Association (AUC 0.64) scoring systems using logistic regression methods with variables available in REGARDS. The AUC for the Bayesian logistic model, using continuous variables from REGARDS, was 0.79.

Fig 2. Receiver operating characteristic curves for the Bayesian logistic model (BhGLM), the CMDS score based on discontinuous ATP III criteria, the Framingham risk score, and the American Diabetes Association risk score in the REGARDS cohort.

Fig 2

The Bayesian score included the following risk factors: age, sex, race, BMI, triglycerides, HDL cholesterol, blood pressure, and blood glucose. The CMDS score using ATP III criteria thresholds included sex, race, BMI, triglycerides, HDL cholesterol, blood pressure, and blood glucose using binary ATP III criteria. The Framingham risk score was a simple clinical score using fasting glucose, BMI, HDL cholesterol, triglycerides, and blood pressure. The American Diabetes Association risk score included age, sex, blood pressure, BMI, and physical activity. ATP III, Adult Treatment Panel III; CMDS, cardiometabolic disease staging; HDL, high-density lipoprotein; REGARDS, REasons for Geographic And Racial Differences in Stroke.

Predictive risk probabilities

Based on the fitted main-effect logistic model, we obtained a formula for calculating the probability of T2D for any individual given the values of the risk factors (S2 Text). Results for the probability of T2D as predicted by each individual risk factor included in the final Bayesian logistic model are displayed in Fig 3 over the continuum of values, stratified by sex and race. There are several salient observations to be made. First, the data show that DBP and SBP confer a higher probability of T2D over the entire range of values in black individuals compared to white individuals and in females compared to males. Second, for any given level of HDL or triglycerides, black individuals have a higher probability of T2D than white individuals; however, probabilities tend to equalize at the extremes of very high HDL and very low triglyceride values. Third, probabilities appear nearly indistinguishable over the range of blood glucose, BMI, HDL, and triglyceride values when males are compared with females. When black males and females are compared with their white counterparts, the data for HDL and triglycerides also visually appear indistinguishable. Finally, the probability of incident T2D declines as a function of age; however, probabilities were higher at any given age in black individuals than white individuals and, to a lesser extent, in females than males.

Fig 3. Predicted probabilities for each predictor associated with type 2 diabetes by sex and race.

Fig 3

(A) By sex. (B) By race. B, black; F, female; HDL, high-density lipoprotein; M, male; W, white.

Finally, Fig 4 illustrates the distribution of risk among individuals in the population as a function of race. The distribution of predicted probabilities is right-shifted towards higher risk among black individuals (mean 0.19, median 0.14) compared with white individuals (mean 0.11, median 0.07). Furthermore, in both races, the validity of these predictions based on observed frequencies is quite robust over the full range of predicted probability.

Fig 4. Validity of predictions of incident type 2 diabetes in the development sample for white and black populations.

Fig 4

The distribution of predicted probabilities is shown at the bottom of the graphs. The mean and median of the predicted probability are also shown. The triangles indicate the observed frequencies by deciles of predicted probability.

Discussion

In the current study, we present novel findings: (1) a practical and robust T2D risk calculation for incident T2D based on metabolic syndrome criteria; (2) a tool with improved capability for predicting progression to T2D compared with other commonly used paradigms (i.e., models developed in the Framingham Heart Study and by the American Diabetes Association), and generated using only quantitative data readily available to the clinician; (3) the first risk prediction tool, to our knowledge, for individuals of African descent derived from a large scientific US sample; (4) development and validation of the risk calculation model across 2 national cohorts in both black and white men and women; and (5) a unique T2D risk model that incorporates Bayesian hierarchical techniques into its risk prediction calculation. Metabolic syndrome traits constitute the basis of the prediction model, and the high AUC values highlight insulin resistance as the central pathophysiological process giving rise to these traits in the pathogenesis of T2D.

Quantitative and qualitative difference from other scores

This study substantially advances our previous work in smaller, regional cohorts, which demonstrated that metabolic syndrome traits can be used to predict progression to T2D in individuals with overweight or obesity. An earlier iteration of the idea for a tool that used metabolic syndrome trait thresholds assigned patients to 5 discreet risk strata and assumed that each trait contributed equally to T2D risk [19]. To enhance the predictive value, the binary predictors (i.e., values above and below threshold values) were differentially weighted based on their ability to confer risk for T2D and used to generate a numerical risk score. The relative proportion of risk attributable to each trait did vary as a function of race; however, the cohorts were smaller than the REGARDS cohort and did not represent a national sample of black and white individuals. The current version is a tool created using the Bayesian logistic regression approach (implemented in BhGLM) that effectively weights the contribution to overall risk for each factor over the continuum of values and incorporates effects of race and sex. Indeed, compared with the previous iteration, for which the AUC for the ROC was 0.72 [19], the AUC was improved to 0.79 when the model was fitted using REGARDS data. Also, by using the largest black American cohort currently available for these types of studies, this tool now provides a uniquely robust quantitative risk assessment in black Americans.

Clinical implications

The current Bayesian logistic model quantifies the 10-year risk for developing T2D. Weight loss medications and structured lifestyle interventions designed to achieve weight loss have been demonstrated to be highly effective in preventing T2D among patients with overweight or obesity [27]. Obesity, however, is highly prevalent, and weight loss interventions are laborious and entail clinical costs. Risk assessment can be used to identify those individuals at highest risk of T2D in whom weight loss interventions will have a higher benefit/risk ratio and be most cost-effective. More research is justified to assess the potential of our predictive model for individualizing care and selecting interventions to prevent cardiometabolic disease. For example, in a pooled study of 3,286 individuals who were overweight or had obesity participating in a clinical trial employing a weight loss medication (phentermine/topiramate extended release), the earlier iteration of the T2D risk model [19] effectively stratified T2D risk, and demonstrated that numbers needed to treat to prevent 1 case of T2D were markedly reduced in participants with higher risk scores at baseline [3]. Therefore, the current model offers healthcare professionals a more robust tool to assess T2D risk using quantitative clinical data that would be available based on clinical practice guidelines for patients with obesity [28].

To enhance the clinical utility of this tool, we additionally examined whether BMI could be substituted for waist circumference since waist circumference is not routinely measured in clinical venues. We found that the substitution of BMI for waist circumference did result in a minimal decrease in AUC; however, risk prediction remained robust such that BMI can be substituted for waist circumference in risk prediction.

Strengths and limitations

The main strength of this study is the use of a large, nationally sampled, biracial cohort with validation in a second cohort. The participants are well characterized, and only reproducible quantitative data (e.g., as opposed to less reliable or subjective data such as family history or reported physical activity) are used in generating the risk score. This allowed us to create a more meaningful, interactive system using readily available clinical data, which can be applied to quantify T2D risk in individual patients. Thus, this approach has clinical utility for identifying those most likely to benefit from therapeutic interventions to prevent T2D.

A limitation of this study is that we only have 2 time points from which to assess 10-year risk of T2D; therefore, no time-to-event models were applied. Between the first and second REGARDS survey, 8,532 participants withdrew from further follow-up; upon inspecting demographic and metabolic profile (including the 8 traits examined) differences between these 8,532 participants and those who remained, only baseline BMI and DBP showed no significant difference. While the reasons for withdrawing from follow-up are unknown, previous work in this population shows that missing data do not change exposure outcome relationships in a study such as REGARDS [29]. Participants were only non-Hispanic white or non-Hispanic black, so generalizability to other populations will require caution, and future studies that address this issue would extend the racial/ethnic reach of risk assessment using our model. The mean age of the REGARDS participants at baseline was 63.12 years, so generalizability to younger populations is not advised. We did not have physical activity or family history information, so were unable to input these when comparing this tool to the Framingham and American Diabetes Association tools.

Conclusion

The tool presented here, using nationally sampled data from black and white Americans, has high model discrimination using readily available quantitative clinical information. The predictive value is enhanced by adding race (black or white) data. This study also quantified the differential contribution of metabolic syndrome traits to T2D risk among black and white men and women, and established the first robust tool to our knowledge for predicting T2D among individuals of African descent. Weight loss achieved by structured lifestyle interventions and anti-obesity medications is highly effective in preventing progression to T2D [3032]. This tool can be used by clinicians and researchers to quantitatively assess T2D risk among patients with overweight/obesity. Hopefully, rational strategies for the medical care of patients with obesity based on risk will lead to greater access to evidence-based therapies.

Supporting information

S1 TRIPOD Checklist. Prediction model development.

(DOCX)

S1 Table. External evaluations for Bayesian logistic model and 4 alternative methods.

(DOCX)

S2 Table. Ranges of variables used to calculate T2D probabilities.

(DOCX)

S1 Text. Comparison with alternative methods.

(DOCX)

S2 Text. Calculating the predictive risk probabilities of incident T2D.

(DOCX)

Acknowledgments

The authors would like to thank the other investigators, the staff, and the participants of the REGARDS study for their valuable contributions.

Abbreviations

ARIC

Atherosclerotic Risk in Communities

ATP III

Adult Treatment Panel III

AUC

area under the curve

CC

correlation coefficient

CMDS

cardiometabolic disease staging

DBP

diastolic blood pressure

HDL

high-density lipoprotein

OR

odds ratio

REGARDS

REasons for Geographic And Racial Differences in Stroke

ROC

receiver operating characteristic

SBP

systolic blood pressure

T2D

type 2 diabetes

Data Availability

This study uses data from the Reasons for Geographic and Racial Differences in Stroke (REGARDS) cohort. In order to abide by its obligations with NIH/NINDS and the Institutional Review Board of the University of Alabama at Birmingham, REGARDS facilitates data sharing through formal data use agreements. Any investigator is welcome to access the REGARDS data through this process. Requests for data access may be sent to regardsadmin@uab.edu. For R codes, readers can contact Nengjun Yi, nyi@uab.edu.

Funding Statement

This research project is supported by cooperative agreement U01 NS041588 co-funded by the National Institute of Neurological Disorders and Stroke (NINDS) and the National Institute on Aging (NIA), National Institutes of Health, Department of Health and Human Service. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NINDS or the NIA. Representatives of the NINDS were involved in the review of the manuscript but not directly involved in the collection, management, analysis or interpretation of the data. The authors thank the other investigators, the staff, and the participants of the REGARDS study for their valuable contributions. A full list of participating REGARDS investigators and institutions can be found at http://www.regardsstudy.org. Additionally, the authors acknowledge support from the UAB Obesity Training Program (T32 DK062710); the American Heart Association Strategically Focused Obesity Research Network center at the University of Alabama at Birmingham (17SFRN33610070); the Merit Review program of the Department of Veterans Affairs (I01CX000432); the UAB Diabetes Research Center (P30 DK079626); and the UAB Nutrition Obesity Research Center (DK056336). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Centers for Disease Control and Prevention. National diabetes statistics report 2017: estimates of diabetes and its burden in the United States. Atlanta: Centers for Disease Control and Prevention; 2017. [Google Scholar]
  • 2.Diabetes Prevention Program Research Group. The Diabetes Prevention Program (DPP): description of lifestyle intervention. Diabetes Care. 2002;25:2165–71. 10.2337/diacare.25.12.2165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Guo F, Garvey WT. Cardiometabolic disease staging predicts effectiveness of weight loss therapy to prevent type 2 diabetes: pooled results from phase III clinical trials assessing phentermine/topiramate extended release. Diabetes Care. 2017;40(7):856–62. 10.2337/dc17-0088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Booth H, Khan O, Prevost T, Reddy M, Dregan A, Charlton J, et al. Incidence of type 2 diabetes after bariatric surgery: population-based matched cohort study. Lancet Diabetes Endocrinol. 2014;2(12):963–8. 10.1016/S2213-8587(14)70214-1 [DOI] [PubMed] [Google Scholar]
  • 5.Grundy SM, Cleeman JI, Daniels SR, Donato KA, Eckel RH, Franklin BA, et al. Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. Circulation. 2005;112(17):2735–52. 10.1161/CIRCULATIONAHA.105.169404 [DOI] [PubMed] [Google Scholar]
  • 6.Eisenmann JC. On the use of a continuous metabolic syndrome score in pediatric research. Cardiovasc Diabetol. 2008;7(1):17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.DeBoer MD, Gurka MJ. Clinical utility of metabolic syndrome severity scores: considerations for practitioners. Diabetes Metab Syndr Obes. 2017;10:65–72. 10.2147/DMSO.S101624 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2 diabetes: systematic review. BMJ. 2011;343:d7163 10.1136/bmj.d7163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Guo F, Moellering DR, Garvey WT. The progression of cardiometabolic disease: validation of a new cardiometabolic disease staging system applicable to obesity. Obesity (Silver Spring). 2014;22(1):110–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, Jacobs DR, et al. CAR DIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. 1988;41(11):1105–16. 10.1016/0895-4356(88)90080-7 [DOI] [PubMed] [Google Scholar]
  • 11.The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. The ARIC Investigators. Am J Epidemiol. 1989;129(4):687–702. [PubMed] [Google Scholar]
  • 12.Ford ES, Li C, Sattar N. Metabolic syndrome and incident diabetes: current state of the evidence. Diabetes Care. 2008;31(9):1898–904. 10.2337/dc08-0423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kahn R, Buse J, Ferrannini E, Stern M. The metabolic syndrome: time for a critical appraisal—joint statement from the American Diabetes Association and the European Association for the Study of Diabetes. Diabetologia. 2005;28(9):2289–304. [DOI] [PubMed] [Google Scholar]
  • 14.Gurka MJ, Golden SH, Musani SK, Sims M, Vishnu A, Guo Y, et al. Independent associations between a metabolic syndrome severity score and future diabetes by sex and race: the Atherosclerosis Risk In Communities Study and Jackson Heart Study. Diabetologia. 2017;60(7):1261–70. 10.1007/s00125-017-4267-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Abbasi A, Peelen LM, Corpeleijn E, van der Schouw YT, Stolk RP, Spijkerman AMW, et al. Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study. BMJ. 2012;345:e5900 10.1136/bmj.e5900 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Howard VJ, Cushman M, Pulley L, Gomez CR, Go RC, Prineas RJ, et al. The reasons for geographic and racial differences in stroke study: objectives and design. Neuroepidemiology. 2005;25(3):135–43. 10.1159/000086678 [DOI] [PubMed] [Google Scholar]
  • 17.Schmidt MI, Duncan BB, Bang H, Pankow JS, Ballantyne CM, Golden SH, et al. Identifying individuals at high risk for diabetes: the Atherosclerosis Risk in Communities study. Diabetes Care. 2005;28(8):2013–8. 10.2337/diacare.28.8.2013 [DOI] [PubMed] [Google Scholar]
  • 18.Duncan BB, Schmidt MI, Pankow JS, Ballantyne CM, Couper D, Vigo A, et al. Low-grade systemic inflammation and the development of type 2 diabetes: the Atherosclerosis Risk in Communities study. Diabetes. 2003;52(7):1799–805. 10.2337/diabetes.52.7.1799 [DOI] [PubMed] [Google Scholar]
  • 19.Guo F, Garvey WT. Development of a weighted cardiometabolic disease staging (CMDS) system for the prediction of future diabetes. J Clin Endocrinol Metab. 2015;100(10):3871–7. 10.1210/jc.2015-2691 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gelman A, Jakulin A, Pittau MG, Su YS. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat. 2008;2(4):1360–83. [Google Scholar]
  • 21.Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. 3rd edition Boca Raton: Chapman and Hall/CRC Press; 2014. [Google Scholar]
  • 22.Ivanescu AE, Li P, George B, Brown AW, Keith SW, Raju D, et al. The importance of prediction model validation and assessment in obesity and nutrition research. Int J Obes. 2016;40(6):887–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, et al. Discrimination and calibration of clinical prediction models. JAMA. 2017;318(14):1377 10.1001/jama.2017.12126 [DOI] [PubMed] [Google Scholar]
  • 24.Yi N, Tang Z, Zhang X, Guo B. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics. 2019;35(8):1419–21. 10.1093/bioinformatics/bty803 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wilson PWF, Meigs JB, Sullivan L, Fox CS, Nathan DM, D’Agostino RB. Prediction of incident diabetes mellitus in middle-aged adults: the Framingham Offspring Study. Arch Intern Med. 2007;167(10):1068–74. 10.1001/archinte.167.10.1068 [DOI] [PubMed] [Google Scholar]
  • 26.Bang H, Edwards AM, Bomback AS, Ballantyne CM, Brillon D, Callahan MA, et al. Development and validation of a patient self-assessment score for diabetes risk. Ann Intern Med. 2009;151(11):775–83. 10.7326/0003-4819-151-11-200912010-00005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jensen MD, Ryan DH, Apovian CM, Ard JD, Comuzzie AG, Donato KA, et al. 2013 AHA/ACC/TOS guideline for the management of overweight and obesity in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and The Obesity Society. J Am Coll Cardiol. 2014;63(25 Pt B):2985–3023. [DOI] [PubMed] [Google Scholar]
  • 28.Garvey WT, Mechanick JI, Brett EM, Garber AJ, Hurley DL, Jastreboff AM, et al. American Association of Clinical Endocrinologists and American College of Endocrinology comprehensive clinical practice guidelines for medical care of patients with obesity. Endocr Pract. 2016;22(Suppl 3):1–203. [DOI] [PubMed] [Google Scholar]
  • 29.Long DL, Howard G, Long DM, Judd S, Manly JJ, McClure LA, et al. An investigation of selection bias in estimating racial disparity in stroke risk factors. Am J Epidemiol. 2019;188(3):587–97. 10.1093/aje/kwy253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Knowler WC, Barrett-Connor E, Fowler SE, Hamman RF, Lachin JM, Walker EA, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 2002;346(6):393–403 10.1056/NEJMoa012512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Garvey WT, Ryan DH, Henry R, Bohannon NJV, Toplak H, Schwiers M, et al. Prevention of type 2 diabetes in subjects with prediabetes and metabolic syndrome treated with phentermine and topiramate extended release. Diabetes Care. 2014;37(4):912–21. 10.2337/dc13-1518 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.le Roux CW, Astrup A, Fujioka K, Greenway F, Lau DCW, Van Gaal L, et al. 3 years of liraglutide versus placebo for type 2 diabetes risk reduction and weight management in individuals with prediabetes: a randomised, double-blind trial. Lancet. 2017;389(10077):1399–409. 10.1016/S0140-6736(17)30069-7 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Adya Misra

6 Feb 2020

Dear Dr Wilkinson,

Thank you for submitting your manuscript entitled "Robust Prediction of Incident Diabetes Using Quantitative Clinical Data and a Bayesian Logistic Model for Cardiometabolic Disease Staging" for consideration by PLOS Medicine.

Your manuscript has now been evaluated by the PLOS Medicine editorial staff [as well as by an academic editor with relevant expertise] and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by .

Login to Editorial Manager here: https://www.editorialmanager.com/pmedicine

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Feel free to email us at plosmedicine@plos.org if you have any queries relating to your submission.

Kind regards,

Adya Misra, PhD,

Senior Editor

PLOS Medicine

Decision Letter 1

Adya Misra

18 May 2020

Dear Dr. Wilkinson,

Thank you very much for submitting your manuscript "Robust Prediction of Incident Diabetes Using Quantitative Clinical Data and a Bayesian Logistic Model for Cardiometabolic Disease Staging" (PMEDICINE-D-20-00156R1) for consideration at PLOS Medicine.

Your paper was evaluated by a senior editor and discussed among all the editors here. It was also discussed with an academic editor with relevant expertise, and sent to independent reviewers, including a statistical reviewer. The reviews are appended at the bottom of this email and any accompanying reviewer attachments can be seen via the link below:

[LINK]

In light of these reviews, I am afraid that we will not be able to accept the manuscript for publication in the journal in its current form, but we would like to consider a revised version that addresses the reviewers' and editors' comments. Obviously we cannot make any decision about publication until we have seen the revised manuscript and your response, and we plan to seek re-review by one or more of the reviewers.

In revising the manuscript for further consideration, your revisions should address the specific points made by each reviewer and the editors. Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments, the changes you have made in the manuscript, and include either an excerpt of the revised text or the location (eg: page and line number) where each change can be found. Please submit a clean version of the paper as the main article file; a version with changes marked should be uploaded as a marked up manuscript.

In addition, we request that you upload any figures associated with your paper as individual TIF or EPS files with 300dpi resolution at resubmission; please read our figure guidelines for more information on our requirements: http://journals.plos.org/plosmedicine/s/figures. While revising your submission, please upload your figure files to the PACE digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at PLOSMedicine@plos.org.

We expect to receive your revised manuscript by May 29 2020 11:59PM. Please email us (plosmedicine@plos.org) if you have any questions or concerns.

***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.***

We ask every co-author listed on the manuscript to fill in a contributing author statement, making sure to declare all competing interests. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. If new competing interests are declared later in the revision process, this may also hold up the submission. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT. You can see our competing interests policy here: http://journals.plos.org/plosmedicine/s/competing-interests.

Please use the following link to submit the revised manuscript:

https://www.editorialmanager.com/pmedicine/

Your article can be found in the "Submissions Needing Revision" folder.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plosmedicine/s/submission-guidelines#loc-methods.

Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it.

We look forward to receiving your revised manuscript.

Sincerely,

Adya Misra, PhD

Senior Editor

PLOS Medicine

plosmedicine.org

-----------------------------------------------------------

Requests from the editors:

Please revise your title according to PLOS Medicine's style. Your title must be nondeclarative and not a question. It should begin with main concept if possible. "Effect of" should be used only if causality can be inferred, i.e., for an RCT. Please place the study design ("A randomized controlled trial," "A retrospective study," "A modelling study," etc.) in the subtitle (ie, after a colon).

Abstract

Please provide brief participant demographics from both cohorts

If you meant T2D, please do replace instances of “diabetes” with Type 2 diabetes or T2D.

Last sentence of the methods and findings section should be a limitation of your study design/methodology

Conclusions

Please start this section with “our results show” or similar

Please avoid overreaching conclusions and use of adjectives such as “superior” to describe the predictive model

Author Summary

At this stage, we ask that you include a short, non-technical Author Summary of your research to make findings accessible to a wide audience that includes both scientists and non-scientists. The Author Summary should immediately follow the Abstract in your revised manuscript. This text is subject to editorial change and should be distinct from the scientific abstract. Please see our author guidelines for more information: https://journals.plos.org/plosmedicine/s/revising-your-manuscript#loc-author-summary

Prospective analysis plan

Did your study have a prospective protocol or analysis plan? Please state this (either way) early in the Methods section.

a) If a prospective analysis plan (from your funding proposal, IRB or other ethics committee submission, study protocol, or other planning document written before analyzing the data) was used in designing the study, please include the relevant prospectively written document with your revised manuscript as a Supporting Information file to be published alongside your study, and cite it in the Methods section. A legend for this file should be included at the end of your manuscript.

b) If no such document exists, please make sure that the Methods section transparently describes when analyses were planned, and when/why any data-driven changes to analyses took place.

c) In either case, changes in the analysis-- including those made in response to peer review comments-- should be identified as such in the Methods section of the paper, with rationale.

Please ensure that the study is reported according to TRIPOD guideline, and include the completed checklist as Supporting Information. When completing the checklist, please use section and paragraph numbers, rather than page numbers. Please add the following statement, or similar, to the Methods: "This study is reported as per the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guideline (S1 Checklist)."

Please report your study according to the relevant guideline, which can be found here: http://www.equator-network.org/

Introduction

Could you add a reference at lines 67-68?

Could you please add a space between the text and reference brackets throughout, followed by a full stop.

Please introduce NHANES on first view

Line 105- please introduce HDL

Methods

Line 130- please can you add United States here?

Please provide the full name of the ethics committee that approved the protocol(s)

Line 142- it would be useful to have a bit more information regarding what analyses/measurements were carried out even if brief. You may wish to add a citation in addition, if this has been previously published

Please format your bibliography to Vancouver style

Comments from the reviewers:

Reviewer #1: The stated purpose of this analysis is to create a tool using quantitative predictors available in real-world clinical practice that identifies individuals who are most likely to benefit from therapies to prevent diabetes.

Comments:

Is the REGARDS dataset (2003-2007), collected for stroke patients, representative of the wider population in order to analyse T2DM incidence?

How do the baseline population characteristics compare to the wider population, in order to extrapolate results?

"External validation was performed using 9,710 participants from Atherosclerotic Risk in Communities (ARIC) (1987-1989), observed through 1996-1998."

Is this population an unbiased sampling frame for T2DM?

"...Atherosclerotic Risk in Communities (ARIC)[11] cohorts to predict incident diabetes with specificity for sex and race. CMDS was developed using quantitative measures of metabolic syndrome traits (i.e., ATP-III criteria)[12], with the limitation that these cohorts were not designed as nationally representative."

Are the authors referring to the same cohort that they are using for their external validation?

How did the authors cope with any potential changes in definition or testing methods of T2DM over time between baseline and follow up?

The authors use a Bayesian logistic model using full-information continuous predictors. The novelty of this research piece is in its' application of continuous variables and interactions.

The method of Bayesian logistic modelling, and the means of measuring and comparing performance, seem appropriate given the context of this research question.

"Between the first and second visit, 5,713 individuals died, and 8,532 withdrew from further follow-up, leaving a population of 15,938 with follow-up data available."

What was cause of death (i.e. were any T2DM related?)? Were reasons for withdrawing from follow up provided (in order to understand if this missing data can be considered to be missing at random)?

"Diabetes was defined by having a fasting blood glucose level ≥126mg/dL, a non-fasting blood glucose ≥ 200mg/dL, self-reported diabetes or on diabetes medication"

Is there a risk of misclassification for cases of self diagnosed T2DM? This is part of the published ARIC protocol, but what is the potential impact in this setting?

There are some grammatical errors in the text, for example the statistical methods section."Cauchy priors, which has advantage of providing"

Did the authors assess correlation between parameters in the model, and the effect this might have on the model outcome and interpretation?

The authors provide a clear description of findings in the results section, aligned with informative tables and figures.

"additional reason for greater predictive value is that the previous models did not include age.[19] "

Can the authors compare their model with previous models updated to include age (albeit dichotomously), in order to compare how much uplift in the model performance is down to the inclusion of this variable, and how much is due to the novel application of Bayesian modelling?

Reviewer #2: This is a well-conducted study that developed a Bayesian logistic model using full-information continuous predictors to predict T2DM risk. This tool can be used for diabetes prevention efforts including weight loss therapy by allowing clinicians to target high risk individuals in a manner that could be used to optimize outcomes.

The authors developed the model in REGARDS and validated it in ARIC, two large national cohorts including both White and Black men and women. They compared the Bayesian method with several other predictive modeling methods, including lasso, generalized additive model, random forests, and support vector machine learning, and found that the Bayesian logistic model outperformed these alternative approaches.

They further compared the AUC from the Bayesian logistic model to the CMDS model, the Framingham and American Diabetes Association risk scores. The AUC for the Bayesian logistic model was superior at AUC 0.79 to other models or scores, such as the CMDS model (AUC 0.75), the Framingham scoring system (AUC 0.76) and American Diabetes Association scoring system (AUC 0.64).

Reviewer #3: This manuscript conducted a Bayesian logistic model using the full-information continuous analysis with the nationally sampled data from white and black American adults to predict T2DM risk. The manuscript is well written and provides a powerful predictive tool to used for diabetes prevention. There were some issues related to this manuscript.

1. As the authors indicated that there were 8,532 participants withdrew from further follow-up in the REGARDS. Since that the REGARDS is a nationally sampled longitudinal survey, are there any differences between these 8,532 participants and the participants remained for the further analysis in the present study?

2. Line 230, age (45-92) should be age (45-92 y).

3. In the Table A2, the min value of BMI, waist circumference, blood glucose, and HDL are extremely low. it is better to exclude these participants with outlier. Also, the unit (kg/m2) for BMI and cm for waist circumference should add in this table.

4. The mean age of the REGARDS survey is 63.12 years. Therefore, generalizability to younger population also requires caution.

Any attachments provided with reviews can be seen via the following link:

[LINK]

Decision Letter 2

Adya Misra

10 Jun 2020

Dear Dr. Wilkinson,

Thank you very much for re-submitting your manuscript "Development and Validation of a Model for Predicting Incident Type-2 Diabetes Using Quantitative Clinical Data and a Bayesian Logistic Model: A Nationwide Cohort and Modelling Study" (PMEDICINE-D-20-00156R2) for review by PLOS Medicine.

I have discussed the paper with my colleagues and the academic editor and it was also seen again by reviewers. I am pleased to say that provided the remaining editorial and production issues are dealt with we are planning to accept the paper for publication in the journal.

The remaining issues that need to be addressed are listed at the end of this email. Any accompanying reviewer attachments can be seen via the link below. Please take these into account before resubmitting your manuscript:

[LINK]

Our publications team (plosmedicine@plos.org) will be in touch shortly about the production requirements for your paper, and the link and deadline for resubmission. DO NOT RESUBMIT BEFORE YOU'VE RECEIVED THE PRODUCTION REQUIREMENTS.

***Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.***

In revising the manuscript for further consideration here, please ensure you address the specific points made by each reviewer and the editors. In your rebuttal letter you should indicate your response to the reviewers' and editors' comments and the changes you have made in the manuscript. Please submit a clean version of the paper as the main article file. A version with changes marked must also be uploaded as a marked up manuscript file.

Please also check the guidelines for revised papers at http://journals.plos.org/plosmedicine/s/revising-your-manuscript for any that apply to your paper. If you haven't already, we ask that you provide a short, non-technical Author Summary of your research to make findings accessible to a wide audience that includes both scientists and non-scientists. The Author Summary should immediately follow the Abstract in your revised manuscript. This text is subject to editorial change and should be distinct from the scientific abstract.

We ask every co-author listed on the manuscript to fill in a contributing author statement. If any of the co-authors have not filled in the statement, we will remind them to do so when the paper is revised. If all statements are not completed in a timely fashion this could hold up the re-review process. Should there be a problem getting one of your co-authors to fill in a statement we will be in contact. YOU MUST NOT ADD OR REMOVE AUTHORS UNLESS YOU HAVE ALERTED THE EDITOR HANDLING THE MANUSCRIPT TO THE CHANGE AND THEY SPECIFICALLY HAVE AGREED TO IT.

Please ensure that the paper adheres to the PLOS Data Availability Policy (see http://journals.plos.org/plosmedicine/s/data-availability), which requires that all data underlying the study's findings be provided in a repository or as Supporting Information. For data residing with a third party, authors are required to provide instructions with contact information for obtaining the data. PLOS journals do not allow statements supported by "data not shown" or "unpublished results." For such statements, authors must provide supporting data or cite public sources that include it.

If you have any questions in the meantime, please contact me or the journal staff on plosmedicine@plos.org.

We look forward to receiving the revised manuscript by Jun 15 2020 11:59PM.

Sincerely,

Adya Misra, PhD

Senior Editor

PLOS Medicine

plosmedicine.org

------------------------------------------------------------

Requests from Editors:

COI – can you please say who is a paid employee of Novo Nordisk. I think this needs clarity.

Abstract-could you add some more demographic information, like mean age and maybe BMI ranges.

Line 237 “We found that our Bayesian logistic model outperformed these alternative approaches.” Please can they provide a call out to Table A1 where the comparison is.

Reference call outs should be in square brackets please

Line 101- suggest rephrasing “along ethnic, social, or economic lines”

Throughout- please take care to avoid saying “participants were obese” and instead say “participants suffered from obesity” or similar to avoid the use of stigmatising language. For example Line 258

Please provide exact p-values, for example at line 304 unless the p-value is <0.001 and check that the p-values are provided throughout, where appropriate

Line 337 should say “with improved capability”

Please temper the assertions of primacy by adding “to our knowledge” in the discussion

Please add a sentence in the methods to note the analyses were prespecified and that the analysis plans are provided as SI files

Please remove all iterations of "[Internet]" from the reference list.

I think Table 1 is not visible in the manuscript PDF and may need adjusting

Please remove page numbers from the TRIPOD checklist as these are likely to change. Instead please use paragraphs and sections

Comments from Reviewers:

Any attachments provided with reviews can be seen via the following link:

[LINK]

Decision Letter 3

Adya Misra

13 Jul 2020

Dear Dr Wilkinson,

On behalf of my colleagues and the academic editor, Dr. Karine Clément, I am delighted to inform you that your manuscript entitled "Development and Validation of a Model for Predicting Incident Type-2 Diabetes Using Quantitative Clinical Data and a Bayesian Logistic Model: A Nationwide Cohort and Modelling Study" (PMEDICINE-D-20-00156R3) has been accepted for publication in PLOS Medicine.

PRODUCTION PROCESS

Before publication you will see the copyedited word document (in around 1-2 weeks from now) and a PDF galley proof shortly after that. The copyeditor will be in touch shortly before sending you the copyedited Word document. We will make some revisions at the copyediting stage to conform to our general style, and for clarification. When you receive this version you should check and revise it very carefully, including figures, tables, references, and supporting information, because corrections at the next stage (proofs) will be strictly limited to (1) errors in author names or affiliations, (2) errors of scientific fact that would cause misunderstandings to readers, and (3) printer's (introduced) errors.

If you are likely to be away when either this document or the proof is sent, please ensure we have contact information of a second person, as we will need you to respond quickly at each point.

PRESS

A selection of our articles each week are press released by the journal. You will be contacted nearer the time if we are press releasing your article in order to approve the content and check the contact information for journalists is correct. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact.

PROFILE INFORMATION

Now that your manuscript has been accepted, please log into EM and update your profile. Go to https://www.editorialmanager.com/pmedicine, log in, and click on the "Update My Information" link at the top of the page. Please update your user information to ensure an efficient production and billing process.

Thank you again for submitting the manuscript to PLOS Medicine. We look forward to publishing it.

Best wishes,

Adya Misra, PhD

Senior Editor

PLOS Medicine

plosmedicine.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 TRIPOD Checklist. Prediction model development.

    (DOCX)

    S1 Table. External evaluations for Bayesian logistic model and 4 alternative methods.

    (DOCX)

    S2 Table. Ranges of variables used to calculate T2D probabilities.

    (DOCX)

    S1 Text. Comparison with alternative methods.

    (DOCX)

    S2 Text. Calculating the predictive risk probabilities of incident T2D.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: FInal_revisions_RESPONSE.docx

    Data Availability Statement

    This study uses data from the Reasons for Geographic and Racial Differences in Stroke (REGARDS) cohort. In order to abide by its obligations with NIH/NINDS and the Institutional Review Board of the University of Alabama at Birmingham, REGARDS facilitates data sharing through formal data use agreements. Any investigator is welcome to access the REGARDS data through this process. Requests for data access may be sent to regardsadmin@uab.edu. For R codes, readers can contact Nengjun Yi, nyi@uab.edu.


    Articles from PLoS Medicine are provided here courtesy of PLOS

    RESOURCES