Short abstract
Objective
To develop a model to predict gestational diabetes mellitus incorporating classical and a novel risk factor, visceral fat mass.
Methods
Three hundred two obese non-diabetic pregnant women underwent body composition analysis at booking by bioimpedance analysis. Of this cohort, 72 (24%) developed gestational diabetes mellitus. Principal component analysis was initially performed to identify possible clustering of the gestational diabetes mellitus and non-GDM groups. A machine learning algorithm was then applied to develop a GDM predictive model utilising random forest and decision tree modelling.
Results
The predictive model was trained on 227 samples and validated using an independent testing subset of 75 samples where the model achieved a validation prediction accuracy of 77.53%. According to the decision tree developed, visceral fat mass emerged as the most important variable in determining the risk of gestational diabetes mellitus.
Conclusions
We present a model incorporating visceral fat mass, which is a novel risk factor in predicting gestational diabetes mellitus in obese pregnant women.
Keywords: Gestational diabetes, obesity, visceral fat mass, predictive model, principal component analysis, machine learning
Introduction
The rising prevalence of gestational diabetes is concerning because of the risk of pregnancy complications such as macrosomia, shoulder dystocia, caesarean section and neonatal hypoglycaemia and also because of the risk to the mother and offspring of diabetes and cardiovascular disease in later life.1–3 Changes in the diagnostic criteria for gestational diabetes mellitus (GDM), the obesity epidemic, increasing maternal age and unhealthy lifestyles have all been implicated in the increasing prevalence of GDM.4,5
Identifying women at greatest risk of GDM early in their pregnancy would allow lifestyle modification interventions and possibly drug treatments to be implemented in order to reduce the risk of complications.6 Metformin, for example, can be used to reduce the risk of GDM in women with polycystic ovaries.7
Various strategies are adopted to detect overt or gestational diabetes in pregnancy depending on the local prevalence of diabetes. Some centres in the UK have adopted the IADPSG strategy, which recommends universal testing though our local policy was to continue using WHO criteria.4 Our current policy of GDM screening is based on selective screening of women at high risk of GDM based on (i) maternal age, (ii) body mass index (BMI), (iii) history of polycystic ovarian syndrome as defined by the Rotterdam criteria,8 (iv) family history of diabetes, (v) previous GDM, (vi) ethnicity and (vii) previous macrosomia. Selective screening using risk factors above has low sensitivity (50–69%) and specificity (58–68%) and in one study, 39% of women with GDM would have been missed if only selective risk factor testing had been used.9 Better selection processes for selective screening may reduce the need for oral glucose tolerance testing in women at low risk with resulting savings in costs and in burdensome diagnostic testing.
Obesity is a strong predictor for GDM with odds ratios compared with normal weight women of about 3 for women with Class I obesity10 and 5–8 for Class II and III obesity.11 Nevertheless, only 24% of Class I obese12 or Class II and III obese13 women developed GDM in the control arms of two recent prospective trials investigating the possible beneficial effects of metformin in these women. Abdominal obesity may be a better predictor both for GDM and future development of diabetes outside pregnancy.10,14
In a prospective study of 302 obese pregnant women, we found that central obesity as assessed by early pregnancy waist-hip ratio (WHR) and visceral fat mass (VFM) measured by bioimpedance was an independent predictor of GDM in addition to classical risk factors.15
The aim of this study was to develop a mathematical model to accurately predict GDM in obese pregnant women in early pregnancy. We used principal component analysis (PCA) initially but since the PCA showed no clear clustering of the GDM and non-GDM groups, machine learning using decision tree and random forests were used.
Patients and methods
The London–Surrey Borders Research Ethics committee advised us that ethical approval is not required for the study as all women would only undergo routine clinical investigations and management. No study specific procedure is undertaken on any of the participants.
Details of the study methods have been previously published.15 In brief, we enrolled 302 obese pregnant women with no established diabetes attending the weight management clinic at St Helier Hospital, Carshalton, Surrey, UK in 2010–2011. The median age of these women was 31 years (range 26–34 years), the median BMI was 38.2 kg/m2 (range 36.1–41.4 kg/m2) and the median VFM was 182.8 units (range 164.3–207.7 units). About 74.5% of the women were Caucasian. All women underwent 75 g oral glucose tolerance test between 24 and 28 weeks of gestation. GDM was defined by the 1999 WHO criteria.16 Seventy-two of the 302 enrolled women (23.8%) subsequently developed GDM and were medically managed in the joint antenatal obstetric and diabetes clinic by a standard protocol. All women underwent body composition analysis at booking (median gestation (weeks): 1514–17) by Direct Segmental Multi-Frequency Bioelectrical Impedance Analysis Method (DSM-BIA Method) using an Inbody 720R machine. This method is based on the electric resistance difference between the fat and other components.17 The device measures body mass index, WHR, lean body mass, total percentage body fat (PBF) and visceral fat area. The InBody 720 has been validated and correlates well with intraabdominal fat area assessed by CT scan18 and DEXA.19 It has been also been shown to be safe in the second and third trimesters of pregnancy and has also been validated against deuterium and hydro-densitometry techniques for body composition analysis.20,21
Data mining and analysis
The dataset consisted of the following variables; maternal age, weight, body mass index, percentage body fat, visceral fat mass, lean body mass, history of polycystic ovarian syndrome, family history of diabetes, history of hypertension and previous macrosomia. PCA was performed on this dataset. PCA is a multivariate analysis for clustering input data according to their variance. PCA showed no clear clustering of the GDM and non-GDM groups. We then applied decision tree and random forests algorithms to the data after feeding the computer programme with the training dataset to recognise the presence or absence of gestational diabetes. This process is termed supervised machine learning.22–24
A decision tree algorithm classifies data items by asking a series of questions about the features associated with the items. Each question is contained in a node, and every internal node points to one child node for each positive answer to its question. There is a hierarchy in the questioning, encoded as a tree. In its simplest form, yes-or-no questions are asked, and each internal node has a ‘yes’ child and a ‘no’ child. An item is sorted into a class as it passes down from the topmost node, the root, to a node without children, a leaf, depending on the answers. The item is then assigned to the class that has been associated with the leaf it reaches. If trained on high-quality data, decision trees can make very accurate predictions.23
Random forest (RF) is an ensemble algorithm of decision trees aggregated together. This method constructs multiple versions of the training data by sampling with replacement (bootstrapping), and combining the machine learning algorithms to make predictions.21
RF was implemented with 200 trees using the ‘randomForest’ function from the ‘randomForest’ package in R.25 The performance of the developed model was validated using the Monte Carlo cross-validation method.26 For K = 100, the samples from each dataset were randomly distributed into training and testing datasets in 100 different splits. Then, the performance was calculated as an average of the performance of the 100 models. Firstly, the input dataset (n = 302) was randomly split over 100 iterations into a training dataset, which contained 70% of the samples (n = 227), and a testing dataset (n = 75) composed by the remaining samples. The training dataset was then used to build the model while the testing dataset was used to calculate the performance of such model. As the performance is calculated as a mean of 100 individually trained and optimised models, the outcome is less likely to suffer from optimistic prediction accuracy and/or over-fitting.
Results
Mathematical modelling
The optimisation confusion matrix (Figure 1) indicates that the model achieved 100% classification accuracy where all 227 training samples were correctly classified. The model validation achieved an initial prediction accuracy of 81.13%; where 61 out of 75 samples were correctly predicted (Figure 1). Upon running a series of 200 iterations, while randomly reshuffling samples within the training and testing subsets, the model stabilised after 20 iterations as shown from the performance accumulative mean, achieving a mean performance of 77.53%. However, 14 patients were wrongly classified.
Visceral fat mass emerged as the most important variable for predicting GDM by the RF method as shown in Figures 2 and 3. This was followed by BMI, weight, PBF and waist hip ratio. The less important variables were family history of diabetes, hypertension, previous big baby and history of polycystic ovarian syndrome. The decision tree used a value of VFM < 210 as the first split in the decision tree.
Discussion
In this analysis, VFM emerged as the most important variable in determining the risk of GDM, followed by BMI, weight, PBF and WHR. Traditional predictors like previous GDM, history of polycystic ovarian syndrome, family history of diabetes and previous big baby were less important. These results add to the growing evidence of the importance of central obesity and in particular, visceral fat mass in the development of GDM.
The model correctly classified all 227 training samples and achieved a mean validation performance of 77.53% thereby providing good prediction accuracy. However, even though 97% of the no GDM were classified correctly, only one third of the GDM were correctly classified. Since only 24% of patients developed GDM in the original training dataset, there was an unbalanced distribution of samples among both classes, resulting in a slight bias in the model prediction towards the no GDM class. A larger training database with consequently more positive GDM would be required for training the model better thereby improving the predictive performance of the model.
To our knowledge, this is the first attempt to create a mathematical model to predict GDM incorporating VFM. Traditional predictors based on maternal history are easy to measure and widely applicable. The importance of central obesity and features of the metabolic syndrome in the development of GDM has long been recognised.24
A strong association between measures of abdominal obesity (waist circumference, WHR and CT-assessed intra-abdominal fat area) and the development of type 2 diabetes is also well established.14 Measuring VFM by bioimpedance is simple and can easily be done in the clinical setting. In our experience, midwives very quickly learn how to perform this measurement and the test takes less than 5 min. We have previously reported that VFM but not PBF correlates with fasting glucose and HbA1c particularly in women developing GDM.15 This finding emphasises the importance of metabolically active visceral fat.
The clinical significance of this study is the potential for early and personalised risk stratification for GDM allowing low-risk women to avoid unnecessary diagnostic testing, repeated clinic visits and additional growth scans. Conversely, those at high risk can start lifestyle interventions early to reduce the risk of complications.
The strength of this study is that we measured a range of clinically relevant and novel predictors of GDM simultaneously rather than one novel measure measured in isolation. As such, the model created has greater validity. We also acknowledge limitations. The sample size was relatively small and a larger dataset will be needed to further train the model and improve its accuracy. In addition, our dataset was predominantly Caucasian and hence we were unable to include ethnicity in the model.
In summary, existing prognostic models for GDM lack a strong predictive value and are not commonly used in routine clinical care nor are they recommended by current clinical guidelines. The addition of VFM in early pregnancy in the predictive model helps discriminate between high- and low-risk pregnancies but this need to be confirmed in larger studies with diverse populations.
Acknowledgement
We acknowledge the help of the midwives and diabetes specialist nurses, who greatly assisted this project. This study forms part of Dr Balani’s forthcoming PhD dissertation.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical approval
The London–Surrey Borders Research Ethics committee advised us that ethical approval will not be required for the study as all women would only undergo routine clinical investigations and management.
Guarantor
JB
Contributorship
JB: collecting and analysing data, development of predictive model, initial and final draft manuscript. SH: supervision, final manuscript. HS: final manuscript. FM: predictive model, final manuscript.
References
- 1.Galtier-Dereure F, Boegner C and, Bringer J. Obesity and pregnancy: Complications and cost. Am J Clin Nutr 2000; 71: 1242. [DOI] [PubMed] [Google Scholar]
- 2.Consortium FR. Obesity, obstetric complications and caesarean delivery rate–a population-based screening study. Am J Obstet Gynecol 2004; 190: 1091–1097. [DOI] [PubMed] [Google Scholar]
- 3.Catalano PM and, Ehrenberg HM. The short- and long-term implications of maternal obesity on the mother and her offspring. BJOG 2006; 113: 1126–1133. [DOI] [PubMed] [Google Scholar]
- 4.Committee IA of D& PSG (IADPSG) CPWG and the H& APO (HAPO) SS. The diagnosis of gestational diabetes mellitus: New paradigms or status quo? J Matern Neonatal Med 2012; 25: 2564–2569. [DOI] [PubMed] [Google Scholar]
- 5.Metzger BE, Buchanan TA, Coustan DR, et al. Summary and recommendations of the Fifth International Workshop-Conference on gestational diabetes mellitus. Diabetes Care 2007; 30: S251. [DOI] [PubMed] [Google Scholar]
- 6.Briley AL, Barr S, Badger S, et al. A complex intervention to improve pregnancy outcome in obese women: The UPBEAT randomised controlled trial. BMC Pregnancy Childbirth 2014; 1814: 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Glueck CJ, Goldenberg N, Wang P, et al. Metformin during pregnancy reduces insulin, insulin resistance, insulin secretion, weight, testosterone and development of gestational diabetes: Prospective longitudinal assessment of women with polycystic ovary syndrome from preconception throughout preg. Hum Reprod 2004; 19: 510–521. [DOI] [PubMed] [Google Scholar]
- 8.Roe AH and, Dokras A. The diagnosis of polycystic ovary. Rev Obstet Gynecol 2011; 4: 45–51. [PMC free article] [PubMed] [Google Scholar]
- 9.National Institute for Health and Care Excellence. Diabetes in pregnancy. NIHCE: London, 2015. [Google Scholar]
- 10.Neeland IJ, Turer AT, Ayers CR, et al. Dysfunctional adiposity and the risk of prediabetes and type 2 diabetes in obese adults. J Am Med Assoc 2012; 308: 1150–1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chu SY, Callaghan WM, Kim SY, et al. Maternal obesity and risk of gestational diabetes mellitus. Diabetes Care 2007; 30: 2070–2076. [DOI] [PubMed] [Google Scholar]
- 12.Chiswick CA, Reynolds RM, Denison FC, et al. Efficacy of metformin in pregnant obese women: A randomised controlled trial. BMJ Open 2015; 5: e006854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Syngelaki A, Nicolaides KH, Balani J, et al. Metformin versus placebo in obese pregnant women without diabetes mellitus. N Engl J Med 2016; 374: 434–443. [DOI] [PubMed] [Google Scholar]
- 14.Freemantle N, Holmes J and, Hockey A. How strong is the association between abdominal obesity and the incidence of type 2 diabetes? Int J Clin Pract 2008; 62: 1391–1396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Balani J, Hyer S and, Johnson A. The importance of visceral fat mass in obese pregnant women and relation with pregnancy outcomes. Obstet Med 2014; 7: 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.World Health Organization. Department of noncommunicable disease surveillance. Definition, diagnosis and classification of diabetes mellitus and its complications. Report of a WHO consultation. Geneva: WHO, 1999. [Google Scholar]
- 17.Ryo M, Maeda K, Onda T, et al. A new simple method for the measurement of visceral fat accumulation by bioelectrical impedance. Diabetes Care 2005; 28: 451–453. [DOI] [PubMed] [Google Scholar]
- 18.Ogawa H, Fujitani K, Tsujinaka T, et al. InBody 720 as a new method of evaluating visceral obesity. Hepatogastroenterology 2011; 58: 42–44. [PubMed] [Google Scholar]
- 19.Malavolti M, Mussi C, Poli M, et al. Cross-calibration of eight-polar bioelectrical impedance analysis versus dual-energy X-ray absorptiometry for the assessment of total and appendicular body composition in healthy subjects aged 21-82 years. Ann Hum Biol 2003; 30: 380–391. [DOI] [PubMed] [Google Scholar]
- 20.Van Loan MD, Kopp LE, King JC, et al. Fluid changes during pregnancy: Use of bioimpedance spectroscopy. J Appl Physiol (1985) 1995; 78: 1037–1042. [DOI] [PubMed] [Google Scholar]
- 21.McCarthy EA, Strauss BJ, Walker SP, et al. Determination of maternal body composition in pregnancy and its relevance to perinatal outcomes. Obstet Gynecol Surv 2004; 59: 731–736. [DOI] [PubMed] [Google Scholar]
- 22.Kantardzic M. Data mining: Concepts, models, methods, and algorithms. J Comput Inf Sci Eng 2005; 5: 394–395. [Google Scholar]
- 23.Kingsford C and, Salzberg S. What are decision trees? Nat Biotechnol 2008; 26: 1011–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Breiman L. Random forest. Machine Learn 2001; 45: 1–33. [Google Scholar]
- 25.Liaw A and, Wiener M. Classification and regression by random forest. R News 2002; 2: 18–22. [Google Scholar]
- 26.Xu Q-S and, Liang Y-Z. Monte Carlo cross validation. Chemom Intell Lab Syst 2001; 56: 1–11. [Google Scholar]