Skip to main content
The Journal of Clinical Endocrinology and Metabolism logoLink to The Journal of Clinical Endocrinology and Metabolism
. 2022 Aug 17;107(11):3120–3127. doi: 10.1210/clinem/dgac487

Metabolic and Genetic Markers Improve Prediction of Incident Type 2 Diabetes: A Nested Case-Control Study in Chinese

Jia Liu 1,#, Lu Wang 2,#, Yun Qian 3,, Qian Shen 4, Man Yang 5, Yunqiu Dong 6, Hai Chen 7, Zhijie Yang 8, Yaqi Liu 9, Xuan Cui 10, Hongxia Ma 11, Guangfu Jin 12
PMCID: PMC9681609  PMID: 35977051

Abstract

Context

It is essential to improve the current predictive ability for type 2 diabetes (T2D) risk.

Objective

We aimed to identify novel metabolic markers for future T2D in Chinese individuals of Han ethnicity and to determine whether the combined effect of metabolic and genetic markers improves the accuracy of prediction models containing clinical factors.

Methods

A nested case-control study containing 220 incident T2D patients and 220 age- and sex- matched controls from normoglycemic Chinese individuals of Han ethnicity was conducted within the Wuxi Non-Communicable Disease cohort with a 12-year follow-up. Metabolic profiling detection was performed by high-performance liquid chromatography‒mass spectrometry (HPLC-MS) by an untargeted strategy and 20 single nucleotide polymorphisms (SNPs) associated with T2D were genotyped using the Iplex Sequenom MassARRAY platform. Machine learning methods were used to identify metabolites associated with future T2D risk.

Results

We found that abnormal levels of 5 metabolites were associated with increased risk of future T2D: riboflavin, cnidioside A, 2-methoxy-5-(1H-1, 2, 4-triazol-5-yl)- 4-(trifluoromethyl) pyridine, 7-methylxanthine, and mestranol. The genetic risk score (GRS) based on 20 SNPs was significantly associated with T2D risk (OR = 1.35; 95% CI, 1.08-1.70 per SD). The area under the receiver operating characteristic curve (AUC) was greater for the model containing metabolites, GRS, and clinical traits than for the model containing clinical traits only (0.960 vs 0.798, P = 7.91 × 10-16).

Conclusion

In individuals with normal fasting glucose levels, abnormal levels of 5 metabolites were associated with future T2D. The combination of newly discovered metabolic markers and genetic markers could improve the prediction of incident T2D.

Keywords: metabolite, genetic variants, nested case-control study, type 2 diabetes


Type 2 diabetes (T2D) is a disorder characterized by hyperglycemia, insulin resistance, and deficient pancreatic beta-cell function; this subtype accounts for approximately 90% of all diabetes cases (1). According to the International Diabetes Federation (IDF), an estimated 463 million adults worldwide were affected by diabetes in 2019, and 10% of global health expenditure was spent on diabetes (760 billion USD) (2). Globally, the age-standardized mortality rate for diabetes has shown an unfavorable trend, with a 3% increase from 2000 to 2019 (3). The prevalence of T2D in adults aged ≥ 18 years in China rapidly increased from less than 1% in 1980 to 11.2% in 2017 (4). Moreover, the prevalence of T2D in young people (young-onset T2D) has increased in almost every region over the past years (5, 6). Diabetes has become an urgent public health problem worldwide, and it is critical to identify high-risk individuals as a first step in providing precision preventive interventions (7).

Many studies have shown that medical history and laboratory tests, such as body mass index (BMI) and fasting plasma glucose (FPG), could be useful for predicting the future risk of diabetes (8). However, these predictors emerge after years of subclinical metabolic dysfunction and provide little additional insight regarding pathophysiologic mechanisms (9). It is essential to identify novel markers for risk prediction in early T2D.

Metabolomics is a technology defined as the quantification of molecular metabolites in biological samples with high-throughput characterization (10). Several nested case-control studies or cohort studies have been carried out in Caucasian populations to find metabolic markers related to the future incidence of T2D (11-14). In particular, Qiu et al conducted a nested case-control study and targeted metabonomic approach in 2 prospective cohorts of Chinese adults and found 4 metabolites associated with an increased risk of T2D (15). Genetic factors also contribute to the pathogenesis of T2D (1). T2D has an estimated rate of heritability of 30% to 70% (16). In our previous study, we successfully validated the association between single nucleotide polymorphisms (SNPs) and T2D risk among Chinese Han individuals (17). SNPs have been shown to slightly improve the prediction ability of T2D in the context of clinical prediction models in several studies (18-21). These studies indicated that examining genetic information in addition to traditional risk factors could improve the accuracy of prediction by traditional risk factors for T2D, especially in younger populations. A study conducted in the Framingham offspring cohort indicated that metabolite and genetic traits could provide complementary information to each other for predicting future T2D (22). However, there has been no untargeted metabonomic study for identifying markers related to future T2D, and there has been no report on the combination of genetic and metabolic information to predict the risk of T2D in the Chinese population.

In addition, machine learning is a rapidly growing field that attempts to predict an outcome by extracting suitable variables from large datasets to build an algorithm (23). Recently, machine learning approaches have been increasingly applied to the clinical epidemiology of diabetes, including risk stratification and personalized medicine decisions. Common machine learning methods include neural networks, boosting machines, and random forest. Tree-based machine learning often helps to guide risk prediction and clinical interventions with higher sensitivity and specificity. In addition, neural networking can be applied to medical image data (24). Interestingly, machine learning has been used in several -omics studies (genomic, metabonomic, and gut microbiome) along with traditional electronic medical records in diabetes (25-28). For example, plasma N-glycans was identified as biomarkers for future T2D risk by a machine learning approach in the EPIC-Potsdam cohort study (25). Several risk prediction models have been established for T2D (29, 30). However, no study has focused on the use of metabolic information to predict T2D using a machine learning method in Chinese individuals of Han ethnicity.

Therefore, we hypothesized that the combination of metabolic and genetic traits can improve the predictive value for T2D compared with using only on traditional risk factors in the Chinese population. We designed a nested case-control study in the Wuxi Non-Communicable Diseases cohort (Wuxi NCDs cohort), using an untargeted metabonomic strategy and machine learning methods to identify metabolites related to future T2D and evaluated whether the combination of genetic and metabolic information could improve the prediction ability.

Methods

Study Subjects

The Wuxi Non-Communicable Diseases cohort (Wuxi NCDs cohort) was established in 2007, and participants were recruited with random cluster sampling from residents of 2 communities in the city of Wuxi, Jiangsu, China. All participants were at least 30 years old and had lived in their current residence for more than 5 years. In addition, these participants had no intention of moving elsewhere in 5 years. In total, 10 858 participants completed the baseline survey, including a face-to-face questionnaire, physical examination, and blood biochemical test. Overnight fasting blood samples of every participant were collected and processed immediately and stored at −80 °C until assayed. Information system–based follow-ups were conducted every year to determine the participants’ living status and diabetes status. Once the participants were diagnosed with T2D for the first time, demographic data and diagnosis information were available from the Wuxi chronic disease reporting system, health record management system and hospital information system. Medical records were reviewed, and necessary clinical information was collected by trained staff for diagnosis validation. T2D cases were defined based on fasting plasma glucose levels (FPG ≥ 7.0 mmol/L or a history of T2D). After case reviews, as of December 2019, a total of 595 participants developed T2D. Of these, 220 out of 595 new onset T2D cases were randomly selected for genotyping and plasma metabolomics. Controls were randomly selected from participants who were free of cancer, diabetes, and cardiovascular disease at baseline and who were also free of T2D during the follow-up period. Controls were 1:1 matched for age (± 5 years) and sex to the 220 incident cases.

At baseline, subjects were interviewed in person by trained interviewers to collect personal information and demographic data after signing informed consent forms. Personal information, including name, ID number, address, career, history of diabetes, hypertension, stroke and cancer history, and smoking and drinking status, was collected. Physical examinations, including height, weight, waist circumference (WC), and blood pressure, were performed at the same time. Approximately 5 mL of venous blood was collected from each subject after fasting for more than 8 hours. The FPG and lipid levels (triglycerides [TG], total cholesterol [TC], and high-density lipoprotein cholesterol [HDL-C]) were measured in a standard method by a biochemistry auto-analyzer (Olympus C2734-Au640). This study was approved by the Institutional Review Board of Wuxi Center for Disease Control and Prevention.

Metabolic Profiling

Metabolic profiling detection was performed by high-performance liquid chromatography–mass spectrometry (HPLC-MS) (ThermoFisher, Germany). In brief, 100-μL plasma samples were extracted using 400 μL methanol, followed by centrifugation (10 minutes, 15 000 rpm, 4 °C). The supernatants were diluted in 53% methanol and then centrifuged (10 minutes, 15 000 rpm, 4 °C). The supernatants were injected in the HPLC-MS for metabolic profiling detection with mobile phase A (0.1% formic acid) and mobile phase B (methanol). A total of 440 samples were pooled to act as a quality control sample. In addition, a blank control (53% methanol) was added to eliminate the background level. Qualitative analysis was performed in consideration of retention time and mass charge ratio. For quantitative analysis, the peak area was used to calculate the value. Internal standard peak areas were monitored for quality control, and individuals with peak areas deviating from the mean by more than 2 SD during a single day’s analysis were re-analyzed. In our current study, the quality control (QC) was the same as previously reported (31). In brief, all samples were arranged in a random order blinded to the technicians. Forty-four pooled plasma samples (440 samples) were arranged as QC samples and placed between 10 samples to monitor the stability of the HPLC-MS system. The raw data were processed using TraceFinder 3.1 (Thermo Fisher Scientific). The data were normalized to the QC samples using a LOESS correction curve for further analysis. The peak area of each metabolite was normalized relative to the pooled plasma reference samples. All samples were analyzed in random and blinded order.

SNP Selection

We have identified 9 SNPs in 8 genes independently associated with T2D risk in a 2-stage case-control study (17). In the current study, the 9 SNPs (rs13266634, rs10811661, rs2237897, rs1552224, rs7756992, rs9472138, rs1111875, rs7923837, rs8050136) were included. In addition, we included the other 12 SNPs that had been validated to be associated with T2D in Chinese individuals of Han ethnicity (32-40). However, rs7756992 was removed as design failure. In total, 20 T2D-associated SNPs were included in the current study. More details (Supplementary Table 1) can be found in an online repository (41).

DNA Isolation and Genotyping

Genomic DNA was isolated from the leukocyte pellets of venous blood by proteinase K digestion, followed by phenol-chloroform extraction and ethanol precipitation. All of the DNA samples were checked for quality by DNA electrophoresis. For the participants, SNPs were genotyped by the Iplex Sequenom MassARRAY platform (Sequenom, Inc., San Diego, CA, USA). For QC, there were 2 nontemplate controls in each plate.

Statistical Analysis

The distribution of continuous variables, such as age, FPG, blood lipids, systolic blood pressure (SBP), diastolic blood pressure (DBP), BMI, and WC, were described using the mean ± SD. Categorical variables (sex, smoking and drinking status, diabetes family history) were reported as frequencies (percentages). The number of risk alleles carried by each SNP was coded as 0, 1, or 2, and the values of 20 SNPs were combined for each individual to obtain the GRS. Metabolite data were natural-log-transformed for further analysis.

First, logistic regression followed by random forest analysis was used to identify the metabolites associated with diabetes risk after Bonferroni correction. The random forest algorithm was proposed by Leo Breiman. In our study, the predictors were metabolites identified by logistic regression. The ntree is the number of ensemble tree in random forest. In our current study, the ntree was 500. In our current research, increase in node purity (incnodepurity) was accepted to measure importance for metabolites.

For the identified metabolites in the random forest, a metabolic score was created by summing the quartile ranks for each participant as the equation: Metabolite score = i=15Quartile   rank   (i), where i means the identified 5 metabolites. We calculated the area under the curve (AUC) for different models in the direction of pROC package in R, which was created by Robin et al (42). Briefly, the area under the receiver operating characteristics (ROC) curve is computed with the trapezoidal rule, and the 95% CI is computed with 2000 bootstrap replicates. In addition, we calculated the AUC as the average across 10 test sets for different models by 10-fold cross-validation to test the stability of the AUC. For the comparison between different models, 2000 bootstrap replications were also conducted with the following process: 1) n = 2000 for bootstrap replication; 2) for each bootstrap replicate, the AUCs of the 2 ROC curves are computed and the difference is stored; 3) use the formula: D = (AUC1-AUC2)/s, where s is the SD of bootstrap difference and AUC1, AUC2 are AUCs of the 2 original ROC curves; 4) D is compared to the normal distribution, and calculated the P value for the comparison of 2 models. All of the statistical analyses were performed with epicalc, randomForest package for machine learning and pROC package for AUC analysis in R software (version 3.5.1; The R Foundation for Statistical Computing).

Results

The demographic and clinical characteristics of incident T2D cases and healthy controls at baseline are summarized in Table 1. There were no significant differences in the distributions of age, sex, smoking, drinking, diabetes family history, or total cholesterol (TC) level between cases and controls (P > 0.05). T2D incident cases had significantly higher FPG levels than healthy controls at baseline (5.21 ± 0.79 vs 4.58 ± 0.57). In addition, there were significant differences in TG, HDL-C, SBP, DBP, BMI, and WC between the T2D incident cases and healthy controls at baseline. The GRS calculated from 20 SNPs was significantly associated with T2D risk (OR = 1.35; 95% CI, 1.08-1.70 per SD).

Table 1.

Demographic and clinical characteristics of study subjects at baseline

Character Case (N = 220) Control (N = 220) P
Age (years) 53.35 ± 6.71 53.35 ± 6.71 -
Male (N, %) 92 (41.82%) 92 (41.82%) -
Smoking (N, %) 59 (26.82%) 53 (24.09%) 0.668
Drinking (N, %) 22 (10.00%) 21 (9.55%) 0.593
Having diabetes family history (N, %) 33 (15.00%) 25 (11.36%) 0.468
Fasting plasma glucose (mmol/L) 5.21 ± 0.79 4.58 ± 0.57 <0.001
Total cholesterol (mmol/L) 5.03 ± 0.93 4.93 ± 1.03 0.267
Triglycerides (mmol/L) 3.38 ± 2.73 2.47 ± 1.47 <0.001
High-density lipoprotein cholesterol (mmol/L) 1.22 ± 0.29 1.32 ± 0.43 0.004
Systolic blood pressure (mmHg) 133.77 ± 19.10 122.10 ± 14.62 <0.001
Diastolic blood pressure (mmHg) 85.79 ± 10.60 79.58 ± 8.78 <0.001
Body mass index (kg/m2) 25.38 ± 3.26 23.53 ± 2.74 <0.001
Waist circumference (cm) 88.37 ± 9.48 84.09 ± 10.01 <0.001

A total of 1579 metabolites were detected in our metabolic platform. The metabolite data were natural-log-transformed for further use. Logistic regression was used to evaluate the association between metabolites and future T2D risk after adjusting for FPG, TG, HDL-C, SBP, DBP, BMI, and WC. As a result, 79 metabolites were identified to be associated with further T2D risk after Bonferroni correction (P < 0.05/1579). To identify robust associations between metabolites and T2D risk, random forest was applied. After the random forest test, 5 metabolites (riboflavin, cnidioside A, 2-methoxy-5-(1H-1, 2, 4-triazol-5-yl)- 4-(trifluoromethyl) pyridine, 7-methylxanthine, and mestranol) were selected for further analysis as their IncNodePurity > 10 (Table 2). The metabolite score was created by summing the quartile ranks of these 5 metabolites for each participant. The sample size and percentage of diabetes cases across each metabolite score were summarized in Supplementary Table 2 in the online repository (41). As shown, with the increase in the metabolite score, the percentage of new T2D cases also increased steadily. The boxplot of the 5 selected metabolites is shown in Fig. 1, and the correlation between the identified 5 metabolites (Supplementary Table 3) can be found in an online repository (41).

Table 2.

The logistic regression and random forest result between metabolites and T2D risk

Metabolite OR 95% CI P IncNodePurity
Mestranol 12.03 7.34-19.73 3.21 × 10-23 11.80
Cnidioside A 10.72 8.66-12.86 1.86 × 10-20 13.43
Riboflavin 0.05 0.03-0.09 1.99 × 10-20 18.02
S-(Methyl)Glutathione 8.09 5.15-12.70 1.33 × 10-19 0.93
D-Saccharic acid 4.71 3.25-6.83 7.26 × 10-19 2.33
2-methoxy-5-(1H-1,2,4-triazol-5-yl)-4-(trifluoromethyl)pyridine 14.68 8.15-26.44 2.56 × 10-18 13.05
(2R,3S,4S,5R,6R)-2-(hydroxymethyl)-6-(propan-2-yloxy)oxane-3,4,5-triol 7.01 4.46-11.03 1.17 × 10-17 2.20
Furoic acid 5.07 4.85-5.34 1.24 × 10-17 0.67
7-Methylxanthine 5.41 3.68-7.98 1.35 × 10-17 11.97
D-Glucarate 4.11 3.88-4.39 1.04 × 10-16 1.42
Cer-AP (t18:1/16:0) 4.18 3.94-4.48 2.01 × 10-16 0.44
3-amino-2-phenyl-2H-pyrazolo[4,3-c]pyridine-4,6-diol 3.94 2.81-5.54 9.54 × 10-15 1.67
2,5-Furandicarboxylic acid 2.63 1.99-3.48 2.53 × 10-14 0.37
D-threo-Isocitric acid 2.53 1.92-3.35 7.09 × 10-14 0.35
Oxaceprol 3.50 1.29-9.50 3.85 × 10-12 3.76
Alpha ketoglutaric acid 2.22 1.71-2.90 1.55 × 10-11 0.15
Tangeritin 3.32 2.34-4.71 4.58 × 10-11 0.11
2,3-Dinor-TXB2 3.30 3.04-3.63 1.22 × 10-10 0.10
4-acetyl-4- (ethoxycarbonyl) heptanedioic acid 2.72 1.98-3.74 1.41 × 10-10 0.27
2-methyl-1,2-dihydrophthalazin-1-one 2.59 1.95-3.43 1.52 × 10-10 0.62
L-Fucose 0.34 0.25-0.47 3.75 × 10-10 0.50
1,5-Anhydro-D-glucitol 0.41 0.24-0.67 4.30 × 10-10 0.34
UR-144 N-Heptyl analog 0.43 0.33-0.56 7.38 × 10-10 4.03
PC (20:4/22:6) 0.44 0.33-0.58 1.71 × 10-9 0.31
2-Methylbutyl beta-D-glucopyranoside 2.56 1.86-3.53 3.00 × 10-9 0.08
2-Hydroxy-2-methylbutanoic acid 0.55 0.38-0.77 5.89 × 10-9 0.39
PC (16:2e/4:0) 0.46 0.35-0.60 1.48 × 10-8 0.20
N-[5-(tert-butyl)-3-isoxazolyl]-N’-[2-(trifluoromethoxy)phenyl]urea 2.12 1.65-2.73 1.64 × 10-8 1.88
N-[2,5-bis(2,2,2-trifluoroethoxy)benzoyl]-N’-(4-methoxyphenyl)urea 0.49 0.39-0.64 1.66 × 10-8 1.83
WQH 2.29 1.74-3.01 1.99 × 10-8 0.92
1,3-Dimethyluric acid 1.85 1.45-2.38 2.23 × 10-8 0.51
PE (18:0/20:4) 1.79 1.57-2.07 4.41 × 10-8 0.14
3-(1-cyano-1,2-dihydroisoquinolin-2-yl)-3-oxopropyl propionate 0.49 0.31-0.71 5.76 × 10-8 0.53
Theophylline 1.92 1.48-2.49 7.23 × 10-8 0.27
Bilirubin 0.49 0.38-0.62 7.61 × 10-8 0.36
β-Cortolone 2.17 1.67-2.83 7.74 × 10-8 0.12
13,14-dihydro-15-keto-tetranor Prostaglandin D1 2.02 1.82-2.27 1.16 × 10-7 0.10
D-Mannitol 1-phosphate 0.68 0.50-0.91 1.25 × 10-7 0.11
Tetrahydroaldosterone 2.08 1.61-2.69 1.59 × 10-7 0.53
Oleamide 0.48 0.37-0.62 1.61 × 10-7 0.14
Arachidic acid 1.97 1.73-2.27 2.13 × 10-7 0.13
Bialaphos 2.25 1.60-3.15 2.55 × 10-7 0.09
N,N-Dimethyldecylamine N-oxide 1.87 1.45-2.41 5.37 × 10-7 0.09
DL-α-Tocopherol 1.82 1.59-2.10 5.52 × 10-7 0.12
2-benzylideneindan-1-one 2.00 1.51-2.66 6.35 × 10-7 0.07
Artesute 0.56 0.44-0.71 6.87 × 10-7 0.16
2-Methoxyestradiol 0.55 0.43-0.70 8.60 × 10-7 0.11
4-Methylaminoantipyrine 2.00 1.36-2.78 8.68 × 10-7 0.73
N1-{5-[(4-chlorophenyl)thio]-4-fluoro-2-nitrophenyl}acetamide 1.85 1.64-2.11 1.36 × 10-6 0.13
5-Fluoro-2-[(3S)-1-(2-methylbenzyl)-3-pyrrolidinyl]-1H-benzimidazole 1.81 1.43-2.29 1.40 × 10-6 2.24
Desoxycortone 2.04 1.56-2.66 1.70 × 10-6 0.20
1-(4-hydroxyphenyl)propane-1,2-diol 0.54 0.42-0.69 2.09 × 10-6 0.26
OxPC (16:0-18:3 + 1O) 0.61 0.42-0.84 2.48 × 10-6 0.18
Acetylcholine 0.51 0.39-0.66 2.56 × 10-6 0.15
RKK 1.86 1.43-2.43 2.67 × 10-6 0.41
Uridine 5’-Diphospho-N-acetylgalactosamine 1.82 1.62-2.08 3.49 × 10-6 0.14
Heptadecanoic Acid 1.92 1.47-2.50 3.83 × 10-6 0.09
2,3,4,9-Tetrahydro-1H-β-carboline-3-carboxylic acid 1.84 1.43-2.35 3.87 × 10-6 0.25
4-Hydroxyphenylpyruvic acid 0.66 0.48-0.90 5.35 × 10-6 0.23
4-Hydroxyisoleucine 0.66 0.49-0.88 6.00 × 10-6 0.34
GM3 d36:1; [M-H]- 0.69 0.52-0.80 7.21 × 10-6 0.75
6-fluoro-2-methyl-4-quinolyl 5-methyl-3-phenylisoxazole-4-carboxylate 1.74 1.37-2.21 8.25 × 10-6 0.54
OxPC (18:1-20:3 + 2O(1Cyc)) 0.65 0.46-0.89 1.12 × 10-5 0.13
Phylloquinone 1.79 1.58-2.04 1.17 × 10-5 0.40
PC (18:1e/20:4) 0.68 0.50-0.90 1.23 × 10-5 0.34
N-Acetyl-DL-glutamic acid 1.73 1.51-2.00 1.31 × 10-5 0.08
Dl-Lanthionine 0.71 0.53-0.93 1.65 × 10-5 0.21
Pseudouridine 0.68 0.51-0.90 1.73 × 10-5 0.13
6α-Naltrexol 0.64 0.51-0.81 1.83 × 10-5 0.39
6-[(3-pyridylcarbonyl)oxy]hexa-2,4-diynyl nicotinate 1.77 1.38-2.27 1.90 × 10-5 0.15
Cytidine 0.69 0.56-0.86 1.91 × 10-5 0.28
2-Arachidonoyl glycerol 1.68 1.31-2.15 2.04 × 10-5 0.12
Metronidazole-OH 3.79 2.08-6.90 2.22 × 10-5 1.55
PC (20:0/16:0) 0.61 0.47-0.78 2.29 × 10-5 0.25
5-Hydroxytryptophan 0.67 0.54-0.83 2.59 × 10-5 0.16
Erucic acid 0.67 0.55-0.84 2.81 × 10-5 0.09
Adrenic acid 1.78 1.34-2.37 3.08 × 10-5 0.08
1-Methylguanine 0.67 0.54-0.83 3.08 × 10-5 0.13
Linolelaidic acid (C18:2N6T) 1.67 1.30-2.14 3.14 × 10-5 0.12

Figure 1.

Figure 1.

The boxplot of identified 5 metabolite levels. (A) 7-Methylxanthine; (B) Riboflavin; (C) Mestranol; (D) Cnidioside A; (E) 2-methoxy-5-(1H-1, 2, 4-triazol-5- yl)- 4-(trifluoromethyl) pyridine.

We evaluated the effects of genetic variants and metabolites in predicting T2D incidence in the future. We constructed the basic prediction model containing clinical characteristics (BMI, WC, SBP, DBP, TG, HDL-C, and FPG), and the AUC for the basic model was 0.798. The ability of the prediction model to classify the risk of T2D was improved by the addition of the metabolite score and GRS (AUC = 0.960, P = 7.91 × 10-16) (Fig. 2). The 10-fold cross-validation indicated that the prediction model, including the metabolite score and GRS, had a robust performance (mean AUC = 0.958) that was superior to the traditional model (mean AUC = 0.798) (Supplementary Table 4 in online repository (41)). The sensitivity, specificity and Youden index at the cutoff value are displayed in Supplementary Table 5 (41).

Figure 2.

Figure 2.

The ROC of the basic model, genetic model, and combined model. The error bar indicates the 95% CI of AUC.

In addition, we compared the predictive ability of the model consisting of the top 5 metabolites identified by logistic regression (mestranol, cnidioside A, riboflavin, S-(methyl) glutathione, and D-saccharic acid) and the model composed of the top 5 metabolites identified by random forest in our current study. The results showed that the predictive ability of random forest-selected metabolites was marginally better than those selected by logistic regression (AUC: 0.923 vs 0.908, P = 0.068).

Discussion

We conducted a prospective nested case-control study to investigate the combined effects of genetic and metabolite biomarkers on predictive models for incident T2D in the future. Of 440 individuals with normal fasting glucose at baseline, 220 were subjects who had progressed to T2D, and 220 were age- and sex-matched controls with normal FPG during the follow-up period of 12 years. Abnormal levels of 5 metabolites (riboflavin, cnidioside A, 2-methoxy-5-(1H-1, 2, 4-triazol-5-yl)- 4-(trifluoromethyl) pyridine, 7-methylxanthine, and mestranol) were associated with future T2D incidence. Metabolic and genetic biomarkers together contribute distinct information for significantly improving the risk prediction of future T2D beyond the clinical risk factors. Our findings were also important in providing new pathogenic insights into T2D development.

Recently, studies focusing on the association between metabolites and T2D risk have emerged. Wang et al (11) conducted a nested case-control study in the Framingham Study to evaluate whether metabolite profiles could predict the development of diabetes. Five branched-chain and aromatic amino acids (isoleucine, leucine, valine, tyrosine, and phenylalanine) were identified and could be used for predicting future diabetes. Similarly, Qiu et al (15) conducted an association analysis in 2 prospective Chinese cohorts with targeted metabolomics. They identified 4 metabolites (alanine, phenylamine, tyrosine, and palmitoyl carnitine) associated with an increased risk of developing T2D. In our current study, we identified 5 metabolites associated with future T2D risk. Riboflavin, also known as vitamin B2, is an essential component of cellular biochemistry. Our results showed that it can reduce the risk of T2D in the future. An experimental study suggested that supplementation with dairy riboflavin might help reduce diabetic complications by reducing inflammation caused by oxidative stress (43). In addition, a population-based cross-section study in Korean women showed that low riboflavin intake was associated with increased diabetes risk (OR = 1.493; 95% CI, 1.137-1.959; P = 0.004) (44). Mestranol is a kind of oral contraceptive. Many women who used oral contraceptives including mestranol, had elevated blood glucose levels (45-47). Rebholz et al conducted an untargeted metabolomics research and found that 7-methylxanthine was lower after 8 weeks of a DASH (Dietary Approaches to Stop Hypertension) diet compared with a control diet (48). Epidemiologic studies have shown that higher adherence to the DASH diet was associated with a multitude of favorable health outcomes, including diabetes (49-51). Of interest, 7-methylxanthine was positively associated with desserts, which were well-known risk factors for T2D (52, 53). These studies may partly explain our metabolite results in the current study. However, for the other 2 metabolites identified, the mechanisms remain unclear.

T2D is a common hereditary disease. Our previous studies showed that adding genetic factors to clinical factors could slightly improve the prediction ability of T2D (17). In the nested case-control study, we selected 20 SNPs verified in Chinese Hans to assess the predictive value of future T2D. After adding metabolite markers and genetic markers, the predictive ability increased significantly (AUC = 0.960 vs 0.798 for clinical factors only). This evidence indicated together that metabolite markers and genetic markers should be helpful in identifying individuals at high risk of T2D.

There were several strengths in our current study. First, we conducted a nested case-control study in a noncommunicable diseases cohort with 12 years of follow-up. All metabolic biomarkers and clinical risk factors were measured at baseline. The risk model consisting of this information may be helpful in identifying high-risk individuals with T2D and performing precision intervention. To the best of our knowledge, no previous studies have focused on metabolic and genetic information in Chinese cohorts. Second, a non-target metabolomics platform covering more than 1500 metabolites as well as a machine learning method were appropriately used to analyze the relative full metabolic profile of the study population. However, the limitations of our study should also be discussed. First, the participants in our current study mainly consisted of middle-aged Chinese individuals of Han ethnicity, and caution must be taken when considering our results in other age groups or ethnic populations. Second, the analysis was conducted based on a nested case-control design; we could only provide preliminary results, and an independent cohort study is warranted to confirm the prediction model. Third, although we identified metabolic and genetic markers associated with T2D risk, the mechanisms remain unclear. Further studies are warranted to investigate whether genetic and metabolomics data are signaling different biological etiologies for T2D development.

In summary, our results suggested that metabolic and genetic markers may be helpful for predicting future T2D risk. We provided novel insights into the pathophysiology of T2D, and further studies are urgently warranted to elucidate the underlying mechanisms.

Acknowledgments

The authors wish to thank all of the study participants, research staff, and students who participated in this work.

Abbreviations

AUC

area under the curve

BMI

body mass index

DBP

diastolic blood pressure

FPG

fasting plasma glucose

HDL-C

high-density lipoprotein cholesterol

HPLC-MS

high-performance liquid chromatography–mass spectrometry

QC

quality control

ROC

receiver operating characteristics curve

SBP

systolic blood pressure

SNP

single nucleotide polymorphism

T2D

type 2 diabetes

TC

total cholesterol

TG

triglycerides

Contributor Information

Jia Liu, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Lu Wang, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Yun Qian, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Qian Shen, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Man Yang, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Yunqiu Dong, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Hai Chen, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Zhijie Yang, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Yaqi Liu, Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China.

Xuan Cui, Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, Jiangsu, China.

Hongxia Ma, Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, Jiangsu, China.

Guangfu Jin, Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, Jiangsu, China.

Financial Support

The research was supported by Medical Key Discipline Program of Wuxi Health Commission (LCZX2021006); Jiangsu Province for High-level Health Talents Research Fund (LGY2018014); Wuxi Key Projects of Precision Medicine (J202006); Top Talent Support Program for Advanced Talents; Top Talent Support Program for Young and Middle-aged People (BJ2020096); the Project of National Natural Science Foundation of China (81072379); the Wuxi science and technology development fund project (WX18IIAN038); Wuxi Health Committee Key Program (SW001).

Disclosures

The authors have nothing to disclose.

Data Availability

Some or all datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.

References

  • 1. Stumvoll M, Goldstein BJ, van Haeften TW. Type 2 diabetes: principles of pathogenesis and therapy. Lancet. 2005;365(9467):1333–1346. [DOI] [PubMed] [Google Scholar]
  • 2. The International Diabetes Federation . Accessed on Mar 08, 2022. http://www.diabetesatlas.org/
  • 3. World Health Organization. World Health Statistics 2021. Accessed on Mar 08, 2022. http://www.who.int/.
  • 4. Li Y, Teng D, Shi X, et al. Prevalence of diabetes recorded in mainland China using 2018 diagnostic criteria from the American Diabetes Association: national cross sectional study. BMJ. 2020;369:m997. doi:10.1136/bmj.m997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Magliano DJ, Sacre JW, Harding JL, Gregg EW, Zimmet PZ, Shaw JE. Young-onset type 2 diabetes mellitus - implications for morbidity and mortality. Nat Rev Endocrinol. 2020;16(6):321–331. [DOI] [PubMed] [Google Scholar]
  • 6. Candler TP, Mahmoud O, Lynn RM, Majbar AA, Barrett TG, Shield JPH. Continuing rise of Type 2 diabetes incidence in children and young people in the UK. Diabet Med. 2018;35(6):737–744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Knowler WC, Barrett-Connor E, Fowler SE, et al. ; Diabetes Prevention Program Research G . Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 2002;346(6):393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Wilson PW, Meigs JB, Sullivan L, Fox CS, Nathan DM, D’AgostinoRB, Sr. Prediction of incident diabetes mellitus in middle-aged adults: the Framingham Offspring Study. Arch Intern Med. 2007;167(10):1068–1074. [DOI] [PubMed] [Google Scholar]
  • 9. Hulman A, Simmons RK, Brunner EJ, et al. Trajectories of glycaemia, insulin sensitivity and insulin secretion in South Asian and white individuals before diagnosis of type 2 diabetes: a longitudinal analysis from the Whitehall II cohort study. Diabetologia. 2017;60(7):1252–1260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wishart DS, Jewison T, Guo AC, et al. HMDB 3.0--The Human Metabolome Database in 2013. Nucleic Acids Res. 2013;41(Database issue): D801-D 807. doi:10.1093/nar/gks1065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Wang TJ, Larson MG, Vasan RS, et al. Metabolite profiles and the risk of developing diabetes. Nat Med. 2011;17(4):448–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Floegel A, Stefan N, Yu Z, et al. Identification of serum metabolites associated with risk of type 2 diabetes using a targeted metabolomic approach. Diabetes. 2013;62(2):639–648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Rhee EP, Cheng S, Larson MG, et al. Lipid profiling identifies a triacylglycerol signature of insulin resistance and improves diabetes prediction in humans. J Clin Invest. 2011;121(4):1402–1411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Drogan D, Dunn WB, Lin W, et al. Untargeted metabolic profiling identifies altered serum metabolites of type 2 diabetes mellitus in a prospective, nested case control study. Clin Chem. 2015;61(3):487–497. [DOI] [PubMed] [Google Scholar]
  • 15. Qiu G, Zheng Y, Wang H, et al. Plasma metabolomics identified novel metabolites associated with risk of type 2 diabetes in two prospective cohorts of Chinese adults. Int J Epidemiol. 2016;45(5):1507–1516. [DOI] [PubMed] [Google Scholar]
  • 16. Almgren P, Lehtovirta M, Isomaa B, et al. Heritability and familiality of type 2 diabetes and related quantitative traits in the Botnia Study. Diabetologia. 2011;54(11):2811–2819. [DOI] [PubMed] [Google Scholar]
  • 17. Qian Y, Lu F, Dong M, et al. Cumulative effect and predictive value of genetic variants associated with type 2 diabetes in Han Chinese: a case-control study. PLoS One. 2015;10(1):e0116537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Meigs JB, Shrader P, Sullivan LM, et al. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med. 2008;359(21):2208–2219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Lyssenko V, Jonsson A, Almgren P, et al. Clinical risk factors, DNA variants, and the development of type 2 diabetes. N Engl J Med. 2008;359(21):2220–2232. [DOI] [PubMed] [Google Scholar]
  • 20. Vassy JL, Durant NH, Kabagambe EK, et al. A genotype risk score predicts type 2 diabetes from young adulthood: the CARDIA study. Diabetologia. 2012;55(10):2604–2612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Imamura M, Shigemizu D, Tsunoda T, et al. Assessing the clinical utility of a genetic risk score constructed using 49 susceptibility alleles for type 2 diabetes in a Japanese population. J Clin Endocrinol Metab. 2013;98(10):E1667–E1673. [DOI] [PubMed] [Google Scholar]
  • 22. Walford GA, Porneala BC, Dauriz M, et al. Metabolite traits and genetic risk provide complementary information for the prediction of future type 2 diabetes. Diabetes Care. 2014;37(9):2508–2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Doupe P, Faghmous J, Basu S. Machine Learning for Health Services Researchers. Value Health. 2019;22(7):808–815. [DOI] [PubMed] [Google Scholar]
  • 24. Basu S, Johnson KT, Berkowitz SA. Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes. Curr Diab Rep. 2020;20(12):80. [DOI] [PubMed] [Google Scholar]
  • 25. Wittenbecher C, Stambuk T, Kuxhaus O, et al. Plasma N-Glycans as Emerging Biomarkers of Cardiometabolic Risk: A Prospective Investigation in the EPIC-Potsdam Cohort Study. Diabetes Care. 2020;43(3):661–668. [DOI] [PubMed] [Google Scholar]
  • 26. Dong SS, Guo Y, Yao S, et al. Integrating regulatory features data for prediction of functional disease-associated SNPs. Brief Bioinform. 2019;20(1):26–32. [DOI] [PubMed] [Google Scholar]
  • 27. Gou W, Ling CW, He Y, et al. Interpretable Machine Learning Framework Reveals Robust Gut Microbiome Features Associated With Type 2 Diabetes. Diabetes Care. 2021;44(2):358–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Cardozo G, Pintarelli GB, Andreis GR, Lopes ACW, Marques JLB. Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. Biomed Res Int. 2022;2022:8114049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Li J, Chen Q, Hu X, et al. Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques. Int J Med Inform. 2021;149:104429. [DOI] [PubMed] [Google Scholar]
  • 30. Deberneh HM, Kim I. Prediction of Type 2 Diabetes Based on Machine Learning Algorithm. Int J Environ Res Public Health. 2021;18(6):3317. doi: 10.3390/ijerph18063317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Wang C, Qin N, Zhu M, et al. Metabolome-wide association study identified the association between a circulating polyunsaturated fatty acids variant rs174548 and lung cancer. Carcinogenesis. 2017;38(11):1147–1154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Lu F, Qian Y, Li H, et al. Genetic variants on chromosome 6p21.1 and 6p22.3 are associated with type 2 diabetes risk: a case-control study in Han Chinese. J Hum Genet. 2012;57(5):320–325. [DOI] [PubMed] [Google Scholar]
  • 33. Dou H, Ma E, Yin L, Jin Y, Wang H. The association between gene polymorphism of TCF7L2 and type 2 diabetes in Chinese Han population: a meta-analysis. PLoS One. 2013;8(3):e59495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Yang L, Zhou X, Luo Y, et al. Association between KCNJ11 gene polymorphisms and risk of type 2 diabetes mellitus in East Asian populations: a meta-analysis in 42,573 individuals. Mol Biol Rep. 2012;39(1):645–659. [DOI] [PubMed] [Google Scholar]
  • 35. Shu XO, Long J, Cai Q, et al. Identification of new genetic risk variants for type 2 diabetes. PLoS Genet. 2010;6(9):e1001127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Deng X, Liu H, Nalima, Qiqiger A, Zhu J. Association of polymorphisms rs290487, rs864745, rs4430796 and rs23136 with type 2 diabetes in the Uyghur population in China. Int J Clin Exp Pathol. 2017;10(8):8813–8819. [PMC free article] [PubMed] [Google Scholar]
  • 37. Zhang SM, Xiao JZ, Ren Q, et al. Replication of association study between type 2 diabetes mellitus and IGF2BP2 in Han Chinese population. Chin Med J (Engl). 2013;126(21):4013–4018. [PubMed] [Google Scholar]
  • 38. Chang YC, Chiu YF, Liu PH, et al. Replication of genome-wide association signals of type 2 diabetes in Han Chinese in a prospective cohort. Clin Endocrinol (Oxf). 2012;76(3):365–372. [DOI] [PubMed] [Google Scholar]
  • 39. Lu S, Xie Y, Lin K, et al. Genome-wide association studies-derived susceptibility loci in type 2 diabetes: confirmation in a Chinese population. Clin Invest Med. 2012;35(5):E327. [DOI] [PubMed] [Google Scholar]
  • 40. Wang F, Han XY, Ren Q, et al. Effect of genetic variants in KCNJ11, ABCC8, PPARG and HNF4A loci on the susceptibility of type 2 diabetes in Chinese Han population. Chin Med J (Engl). 2009;122(20):2477–2482. [PubMed] [Google Scholar]
  • 41. Liu J, Wang L, Qian Y, et al. Metabolic and genetic markers improve the prediction of incident type 2 diabetes: A nested case-control study in Chinese. figshare. Posted July 20, 2022. 10.6084/m9.figshare.19783459.v4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 2011;12:77. doi:10.1186/1471-2105-12-77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Alam MM, Iqbal S, Naseem I. Ameliorative effect of riboflavin on hyperglycemia, oxidative stress and DNA damage in type-2 diabetic mice: Mechanistic and therapeutic strategies. Arch Biochem Biophys. 2015;584:10–19. [DOI] [PubMed] [Google Scholar]
  • 44. Shin WY, Kim JH. Low riboflavin intake is associated with cardiometabolic risks in Korean women. Asia Pac J Clin Nutr. 2019;28(2):285–299. [DOI] [PubMed] [Google Scholar]
  • 45. Kalkhoff RK. Effects of oral contraceptive agents on carbohydrate metabolism. J Steroid Biochem. 1975;6(6):949–956. [DOI] [PubMed] [Google Scholar]
  • 46. Spellacy WN. Carbohydrate metabolism in male infertility and female fertility-control patients. Fertil Steril. 1976;27(10):1132–1141. [DOI] [PubMed] [Google Scholar]
  • 47. Lederer J. [Diabetogenic effect of contraceptives]. Louv Med. 1973;92(3):143–150. [PubMed] [Google Scholar]
  • 48. Rebholz CM, Lichtenstein AH, Zheng Z, Appel LJ, Coresh J. Serum untargeted metabolomic profile of the Dietary Approaches to Stop Hypertension (DASH) dietary pattern. Am J Clin Nutr. 2018;108(2):243–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Razavi Zade M, Telkabadi MH, Bahmani F, Salehi B, Farshbaf S, Asemi Z. The effects of DASH diet on weight loss and metabolic status in adults with non-alcoholic fatty liver disease: a randomized clinical trial. Liver Int. 2016;36(4):563–571. [DOI] [PubMed] [Google Scholar]
  • 50. Blumenthal JA, Babyak MA, Sherwood A, et al. Effects of the dietary approaches to stop hypertension diet alone and in combination with exercise and caloric restriction on insulin sensitivity and lipids. Hypertension. 2010;55(5):1199–1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Hinderliter AL, Babyak MA, Sherwood A, Blumenthal JA. The DASH diet and insulin sensitivity. Curr Hypertens Rep. 2011;13(1):67–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Playdon MC, Sampson JN, Cross AJ, et al. Comparing metabolite profiles of habitual diet in serum and urine. Am J Clin Nutr. 2016;104(3):776–789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Olsson K, Ramne S, González-Padilla E, Ericson U, Sonestedt E. Associations of carbohydrates and carbohydrate-rich foods with incidence of type 2 diabetes. Br J Nutr. 2021;126(7):1065–1075. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Some or all datasets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.


Articles from The Journal of Clinical Endocrinology and Metabolism are provided here courtesy of The Endocrine Society

RESOURCES