Abstract
Introduction:
Annual eye examinations are recommended for diabetic patients in order to detect diabetic retinopathy and other eye conditions that arise from diabetes. Medically underserved urban communities in the US have annual screening rates that are much lower than the national average and could benefit from informatics approaches to identify unscreened patients most at risk of developing retinopathy.
Methods:
Using clinical data from urban safety net clinics as well as public health data from the CDC’s National Health and Nutrition Examination Survey, we examined different machine learning approaches for predicting retinopathy from clinical or public health data. All datasets utilized exhibited a class imbalance.
Results:
Classifiers learned on the clinical data were modestly predictive of retinopathy with the best model having an AUC of 0.72, sensitivity of 69.2% and specificity of 55.9%. Classifiers learned on public health data were not predictive of retinopathy.
Discussion:
Successful approaches to detecting latent retinopathy using machine learning could help safety net and other clinics identify unscreened patients who are most at risk of developing retinopathy and the use of ensemble classifiers on clinical data shows promise for this purpose.
Introduction
Diabetic retinopathy arises when excess glucose in the bloodstream resulting from diabetes mellitus causes damage to the blood vessels of the retina. Among US adults between the ages of 20 and 74 years, diabetic retinopathy is the leading cause of blindness.1, 2 Diabetes affects an estimated 29.1 million people in the United States.3 In a Centers for Disease Control and Prevention (CDC) assessment performed between 2005 and 2008, 4.2 million or 28.5% of people with diabetes aged 40 years or older in that time period had diabetic retinopathy.3
Annual eye examinations for diabetic patients are recommended in order to detect and treat diabetic retinopathy in a timely manner, since blindness from this condition is preventable with early detection and the use of laser photocoagulation therapy. While the US national annual eye screening average for diabetic patients is 60%,4–8 some studies of the urban safety net setting have shown annual screening rates for inner-city diabetic patients to be lower than 25%.9–11 Previously, we presented findings from a study that sought to examine issues affecting that disparity between national and urban safety net screening rates by highlighting the feasibility and challenges of implementing teleretinal screening for diabetic retinopathy in an urban safety net setting facing eyecare specialist shortages.12–15 We also examined the potential for developing predictive models for detecting diabetic retinopathy on safety net clinic data. Here, we build on knowledge gained from our earlier diabetic retinopathy predictive modeling work;16 and we also examine the utility of using National Health and Nutrition Examination Survey (NHANES) public health data collected by the CDC in our model development effort, inspired by a study from South Korea that utilized similar public health data from that country to create predictive models for diabetic retinopathy.17 Our long-term goal is to develop predictive models for diabetic retinopathy that are appropriate for the safety net setting.
Risk factors for diabetic retinopathy cited in the literature include duration of diabetes,18–20 high blood glucose/poor blood sugar control,18–21 high blood pressure,18–21 dyslipidemia/18 high cholesterol,19 pregnancy,18 nephropathy,20 and obesity.18 Other known risk factors for diabetic retinopathy include inflammation,18 puberty (in Type 1 diabetes),18 ethnicity,18 insulin treatment (for Type 2 diabetes20 - related to poor blood glucose control), tumor necrosis factor receptors,22 and smoking23 (in one study, pack-years of smoking were found to be borderline significant in predicting retinopathy in younger adults;24 however, other studies have found the relationship between retinopathy and smoking to be inconsistent25, 26). Risk factors such as high blood glucose, duration of diabetes and high blood pressure are considered to be stronger predictors of retinopathy than other risk factors.18 Models that incorporate some or all of these known risk factors as well as other factors that may not have been considered in the literature could be useful for prediction. Aside from tumor necrosis factor receptors, most of the risk factors listed above are routinely collected and stored in electronic health records in the course of care for a diabetic patient and thus could be used to create predictive models to screen for latent retinopathy in diabetic patients who have not received an annual eye examination as recommended in clinical practice guidelines.
There are different stages of diabetic retinopathy, which in order of increasing severity are: mild non-proliferative diabetic retinopathy (NPDR), moderate NPDR, severe NPDR, and proliferative diabetic retinopathy. On eye examinations, there are visible developments in the retina as diabetic retinopathy progresses that aid clinicians in distinguishing one stage of retinopathy from another. These include micro-aneurysms, intra-retinal hemorrhages, retinal ischemia (cotton-wool spots), venous beading, and finally, the proliferation or growth of fragile new blood vessels that can bleed on the retina’s inner surface.27 For the present study, our goal is to develop methods to identify patients at high risk of diabetic retinopathy in order to have them come in to a care site for a teleretinal eye screening or an in-person eye examination. This means that the methods developed will predict the presence or absence of retinopathy generally; retinopathy staging is not attempted (to accurately stage retinopathy, data from digital retinal images would be required).
For the CDC NHANES dataset utilized in the present study, which contains merged data from 2005–2006 and from 2007–2008, we found that only 13% (158 instances) involved patients who had diabetic retinopathy, while 87% (1081 instances) did not. In our earlier predictive modeling work, the dataset collected from urban safety net clinics corresponded to 513 diabetic patients, with approximately 25% of instances (130 instances) having an outcome of diabetic retinopathy and roughly 75% of the instances (383 instances) having an outcome of no diabetic retinopathy. The diabetic retinopathy rate of 25% in that clinical dataset is roughly in line with the projected US rate of 28.5% between 2005 and 2008 referenced above. Using standard classifiers, the best classification result on the clinical data from 513 patients was achieved with a Bayesian network that had a sensitivity of 26.2%, a specificity of 94.5%, an Area Under the ROC Curve (AUC) of 0.71, a negative predictive value of 79%, and a positive predictive value (precision) of 61.8%.16
One of the difficulties with predicting diabetic retinopathy solely from clinical or public health data for screening outreach purposes (without the benefit of digital retinal imaging features that are clearly only associated with retinopathy) is that there is some overlap between the classes corresponding to “retinopathy” and “no retinopathy,” which makes the two classes difficult to separate. Additionally, as described above, datasets based on clinical or public health records may be mostly skewed towards “no retinopathy.” A class imbalance is said to occur when one of the outcomes of a learning problem, especially the outcome of interest, is underrepresented in the dataset from which predictive models are to be learned.28 As each dataset used for the present study exhibits a class imbalance and we have previously had modest results on the clinical dataset with standard classifiers, we turn to machine learning approaches designed to deal with the class imbalance problem in this study. Galar et al identify four key approaches to dealing with the class imbalance problem: (1) techniques that modify existing algorithms for a standard classifier in order to emphasize the significance of satisfactorily classifying the minority class; (2) data preprocessing methods, such as undersampling the majority class or oversampling the minority class; (3) cost sensitive approaches that combine both algorithm and data preprocessing methods, and, (4) ensembles of weak learners (classifiers) that take advantage of data preprocessing and other methods.28 For this study, we focused on the use of ensembles of weak learners that take advantage of data preprocessing methods.
Methods
Approval to use clinical data for the study was obtained from the Charles Drew University Institutional Review Board.
Data sources
Clinical data for the study were previously obtained from six federally qualified health centers that serve as primary care clinics for un- and under-insured patients in South Los Angeles. This was done through a retrospective review of medical records for 513 patients with type 2 diabetes who received an eye examination for diabetic retinopathy from ophthalmologists via teleretinal screening and who obtained care at the clinical sites in 2011.
Public health data for the study was obtained from the US NHANES cross-sectional data set. The National Health and Nutrition Examination Survey is conducted by the National Center for Health Statistics, using a stratified multistage probability design to obtain a representative sample of the total civilian, non-institutionalized US population. Since 1999, the NHANES has released data at 2-year intervals. The NHANES collects questionnaire data during face-to-face home interviews and includes a physical examination, as well as the collection of laboratory data. We used data from 2005–2006 and 2007–2008. These years had the only datasets that matched 95% of the variables utilized in the previously mentioned Korean study.17
Sample from NHANES
There were a total of 20,497 participants from NHANES 2005 to 2008. For this analysis, we only included persons with diabetes (N=2,874). Of those who had diabetes, participants who had complete data on the presence or absence of diabetic retinopathy were included in the analysis. Our analytical sample was based on a total of 1239 people with diabetes who had been told whether or not they had retinopathy. Participants were identified as being diabetic if they met at least one of the following criteria; plasma fasting glucose ≥ 126 mg/dL, serum glucose ≥ 200mg/dL, glycohemoglobin ≥ 6.5%, responding “Yes” to the question “ Other than during pregnancy, have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?,” responding “Yes” to the question, “Are you now taking insulin?,” responding to the question “ Are you now taking diabetic pills to lower your blood sugar? These are sometimes called oral agents or oral hypoglycemic agents.”
Classification Methods
Variables with 50% or more of their values missing were not included in the datasets used for machine learning. Missing data for the remaining variables (less than 50% of values missing) were handled by using imputation techniques. For the datasets analyzed, we performed feature subset selection using the Lasso.29 Since we used standard (single) classifiers in our previously published study, for this study, we use ensembles, which combine classifiers and may perform better than a single classifier approach. We learned ensemble classifiers based on decision tree learners designed to handle class imbalances such as RUSBoost,30 which utilizes majority class undersampling. For contrast, we also learned ensemble classifiers using AdaBoost.M1,31 which uses adaptive boosting to combine the weighted output of several weak learners to produce a boosted classification output. On its own, AdaBoost.M1 has no special accommodation for class imbalances. The ensemble classifiers were learned on the full feature set as well as the feature subsets obtained. We reserved 20% of each dataset for testing and then performed 10-fold cross validation on the remaining 80% of the dataset, selecting the best classifier from the cross-validation process for use on the reserved test set. Analyses were performed using MATLAB32 and Weka.33
For each classifier, we measured sensitivity or the true positive rate (the total number of cases classified as having diabetic retinopathy divided by the total number of cases actually involving retinopathy), specificity or the true negative rate, the AUC, which represents the trade-off between the true positive rate/sensitivity and the false positive rate or 1 – specificity, and accuracy (the total number of correctly classified cases divided by the total number of cases).
Results
The complete set of predictors/features utilized for the clinical data is presented in Table 1. Table 2 shows the feature subset obtained after applying the Lasso to the dataset in Table 1. Table 3 shows the results of 10-fold cross validation on the 80% of the clinical dataset set aside for this purpose, first using the entire set of features and next, using the feature subset. It also shows the results of applying the best ensemble model obtained from 10-fold cross-validation to the 20% of cases set aside solely for testing.
Table 1:
Variables gathered from clinic that potentially impact development of diabetic retinopathy
| Clinical variables that might impact diabetic retinopathy risk | |
|---|---|
| Age | Gender |
| Ethnicity/race | Marital Status |
| Education | Household income |
| Insulin dependence | Insurance |
| Number of years patient has had diabetes | Body mass index |
| Hemoglobin A1C value | Primary language |
| Co-morbid conditions | |
| Peripheral vascular disease | Cerebrovascular accident/Stroke |
| Hypertension | Other heart-related diagnosis |
| Nephropathy | Neuropathy |
| Depression | Erectile dysfunction |
| Dyslipidemia | Obesity |
| Other (hypothyroidism, etc.) | Previous diagnosis & treatment of retinopathy |
Table 2:
Subset of clinical variables following feature-subset selection using the Lasso
| Clinical variables selected from feature subset selection using the Lasso | |
|---|---|
| Age | Gender |
| Marital Status | Ethnicity/race |
| Hemoglobin A1C value | Education |
| Number of years patient has had diabetes | Insurance |
| Co-morbid conditions | |
| Hypertension | Neuropathy |
| Previous diagnosis & treatment of retinopathy | |
Table 3:
Results for ensemble classifiers on clinical data
| Averaged results for ensemble classifiers following 10-fold cross validation using all features | ||
|---|---|---|
| RUS Boost Ensemble Average | AdaBoost.M1 Ensemble Average | |
| Accuracy | 66.7% | 70.3% |
| Sensitivity | 51.9% | 23.1% |
| Specificity | 53.5% | 64.5% |
| AUC | 0.62 | 0.55 |
| Results for best ensemble on 20% set-aside test set using all features | ||
| Best RUSBoost Ensemble | Best AdaBoost.M1 Ensemble | |
| Accuracy | 73.5% | 75.5% |
| Sensitivity | 65.4% | 26.9% |
| Specificity | 56.9% | 68.6% |
| AUC | 0.71 | 0.6 |
| Averaged results for ensembles following 10-fold cross validation using feature subset | ||
| RUSBoost Ensemble Average | AdaBoost.M1 Ensemble Average | |
| Accuracy | 63.5% | 66.9% |
| Sensitivity | 49.0% | 26% |
| Specificity | 55.9% | 60.3% |
| AUC | 0.59 | 0.53 |
| Results for best ensemble on 20% set-aside test set using feature subset | ||
| Best RUSBoost Ensemble | Best AdaBoost.M1 Ensemble | |
| Accuracy | 73.5% | 72.5% |
| Sensitivity | 69.2% | 34.6% |
| Specificity | 55.9% | 63.7% |
| AUC | 0.72 | 0.6 |
The complete set of predictors/features utilized for the public health data is presented in Table 4. Table 5 shows the feature subset obtained after applying the Lasso to the dataset in Table 4. Table 6 shows the results of 10-fold cross validation on the 80% of the clinical dataset set aside for this purpose, first using the entire set of features and next, using the feature subset. Table 6 also shows the results of applying the best ensemble model obtained from 10-fold cross-validation to the 20% of cases set aside solely for testing.
Table 4:
Variables gathered from the NHANES 2005–2008 dataset
| Public health (NHANES) variables that might impact diabetic retinopathy risk | ||
|---|---|---|
| Age | Marital status | Fasting plasma glucose |
| Sex | Education | alanine aminotransferase test – blood |
| Smoking status | Race/ethnicity | aspartate aminotransferase test – blood |
| Alcohol consumption | Insurance status | cholesterol |
| Insulin therapy | Poverty income ratio | serum creatinine |
| On diabetes pills | bilirubin – urine | On non-drug diabetes interventions |
| Systolic blood pressure | Diastolic blood pressure | hemoglobin – blood |
| Diagnosed hypertension | On hypertension pills | urine albumin |
| Diagnosed diabetic nephropathy | Microalbuminuria | urine creatinine |
| Body mass index | Waist circumference | low density lipoprotein |
| hemoglobin A1C | triglycerides | high density lipoprotein |
Table 5:
Subset of NHANES variables following feature-subset selection using the Lasso
| NHANES variables selected from feature subset selection using the Lasso | |
|---|---|
| Age | Poverty income ratio |
| Hemoglobin | On diabetes pills |
| On non drug diabetes interventions | On hypertension pills |
| Systolic blood pressure | Diastolic blood pressure |
| Education | Marital status |
Table 6:
Results for ensemble classifiers on public health (NHANES) data
| Averaged results for ensemble classifiers following 10-fold cross validation using all features | ||
|---|---|---|
| RUS Boost Ensemble Average | AdaBoost.M1 Ensemble Average | |
| Accuracy | 62.6% | 86.4% |
| Sensitivity | 31.5% | 1% |
| Specificity | 58.6% | 86.4% |
| AUC | 0.49 (0.51) | 0.5 |
| Results for best ensemble on 20% set-aside test set using all features | ||
| Best RUSBoost Ensemble | Best AdaBoost.M1 Ensemble | |
| Accuracy | 61.1% | 87.4% |
| Sensitivity | 48.4% | 0% |
| Specificity | 55.1% | 87.4% |
| AUC | 0.56 | 0.5 |
| Averaged results for ensembles following 10-fold cross validation using feature subset | ||
| RUSBoost Ensemble Average | AdaBoost.M1 Ensemble Average | |
| Accuracy | 65.8% | 85.5% |
| Sensitivity | 37.8% | 2% |
| Specificity | 61% | 85.3% |
| AUC | 0.54 | 0.5 |
| Results for best ensemble on 20% set-aside test set using feature subset | ||
| Best RUSBoost Ensemble | Best AdaBoost.M1 Ensemble | |
| Accuracy | 65.6% | 86.2% |
| Sensitivity | 25.8% | 3.2% |
| Specificity | 62.3% | 85.8% |
| AUC | 0.48 (0.51) | 0.51 |
Discussion
The results show that the clinical dataset was moderately predictive for diabetic retinopathy, with the best RUSBoost ensemble having an accuracy of 73.5%, sensitivity of 69.2%, specificity of 55.9%, and AUC of 0.72 on previously unseen instances (the test data set aside). As the AdaBoost ensemble results show by comparison, there is a trade off of an increase in sensitivity with majority class undersampling in RUSBoost for a decrease in specificity. For the purpose of identifying individuals who have not yet received an annual eye examination but may have latent retinopathy, the increase in sensitivity provided by RUSBoost is important. In previous work on the same dataset, the best standard classifiers achieved stellar specificity (>90%) but poor sensitivity (<28%), which made them less suitable for screening purposes. The best AdaBoost ensemble achieved better sensitivity than a standard Bayesian network classifier (34.6% versus 26.2%) but much worse specificity (63.7% versus 94.5%). A greater improvement in sensitivity and reduction in specificity was also seen for the best RUSBoost ensemble results (sensitivity of 69.2% versus 26.2% for the Bayesian network classifier and specificity of 55.9% versus 94.5% for the Bayesian network classifier). The improved sensitivity results on the clinical data with RUSBoost are encouraging, since the data include features that are routinely collected by and available to clinics treating diabetic patients and could provide the basis of targeted outreach to noncompliant patients.
The classifier ensembles developed on the public health data were not useful, with AUC’s indicating that they were no better than flipping a coin at discriminating between cases of retinopathy and cases of no retinopathy. While the accuracy results for AdaBoost were high on the public health dataset, it is clear from the best AdaBoost sensitivity results (3.2%) and the best specificity results (85.8%) that the ensembles tended towards predicting “no retinopathy” in almost every situation and was not as adaptive in adjusting to misclassifications as might have been expected.
The NHANES dataset utilized relies on physical examinations as well as patient questionnaires for some key variables; poor responses to the questionnaires may have led to poorer quality data for learning. For example, duration of diabetes, which is known to be a major risk factor for retinopathy had to be derived from questionnaire responses about the age at which a patient was diagnosed with diabetes (subtracting the age of diagnosis from the current age). There were negative durations in some cases and over 70% missing data for this key variable; these inaccuracies meant that we were not able to include duration of diabetes for the public health data analyses in the present study. It is likely that a public health dataset with more variables related to retinopathy measured clinically and fewer missing data would produce better results. However, a recently published study that looked only at a subset of 266 people over the age of forty from NHANES who were newly identified by the paper’s authors as having diabetes had more promising results with regards to predicting retinopathy, with a best AUC of 0.74, a low positive predictive value (precision) of 22% and a high negative predictive value of 99%.34
Future work includes collecting additional clinic data, applying novel methods for learning with a class imbalance to the data and developing software from the predictive models developed. The software is intended for clinics to use for targeted outreach to patients who have not received an annual eye examination (in addition to their usual outreach to diabetic patients). Predictive models that are able to accurately identify individuals with diabetes who may have developed retinopathy but are not yet aware of it would be invaluable for outreach purposes in order to prevent avoidable blindness from diabetic retinopathy.
Acknowledgments
This project was supported by the NIH under grant numbers U54 MD007598, U54 RR026138-01S2, and S21MD000103.
References
- 1.Klein R, Klein BE. Vision Disorders in Diabetes Diabetes in America. 2nd ed. National Diabetes Data Group, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases; 1995. [Google Scholar]
- 2.Zhang X, Saaddine JB, Chou CF, Cotch MF, Cheng YJ, Geiss LS, et al. Prevalence of diabetic retinopathy in the United States, 2005–2008. JAMA. 2010 Aug 11;304(6):649–56. doi: 10.1001/jama.2010.1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Centers for Disease Control and Prevention . National diabetes fact sheet: national estimates and general information on diabetes and prediabetes in the United States, 2011. Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention; 2011. [Google Scholar]
- 4.Brechner RJ, Cowie CC, Howie LJ, Herman WH, Will JC, Harris MI. Ophthalmic examination among adults with diagnosed diabetes mellitus. JAMA. 1993 Oct 13;270(14):1714–8. [PubMed] [Google Scholar]
- 5.Cavallerano AA, Conlin PR. Teleretinal imaging to screen for diabetic retinopathy in the Veterans Health Administration. J Diabetes Sci Technol. 2008 Jan;2(1):33–9. doi: 10.1177/193229680800200106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Moss SE, Klein R, Klein BE. Factors associated with having eye examinations in persons with diabetes. Arch Fam Med. 1995 Jun;4(6):529–34. doi: 10.1001/archfami.4.6.529. [DOI] [PubMed] [Google Scholar]
- 7.Orr P, Barron Y, Schein OD, Rubin GS, West SK. Eye care utilization by older Americans: the SEE Project. Salisbury Eye Evaluation Ophthalmology. 1999 May;106(5):904–9. doi: 10.1016/s0161-6420(99)00508-4. [DOI] [PubMed] [Google Scholar]
- 8.Schoenfeld ER, Greene JM, Wu SY, Leske MC. Patterns of adherence to diabetes vision care guidelines: baseline findings from the Diabetic Retinopathy Awareness Program. Ophthalmology. 2001 Mar;108(3):563–71. doi: 10.1016/s0161-6420(00)00600-x. [DOI] [PubMed] [Google Scholar]
- 9.Deeb LC, Pettijohn FP, Shirah JK, Freeman G. Interventions among primary-care practitioners to improve care for preventable complications of diabetes. Diabetes Care. 1988 Mar;11(3):275–80. doi: 10.2337/diacare.11.3.275. [DOI] [PubMed] [Google Scholar]
- 10.Payne TH, Gabella BA, Michael SL, Young WF, Pickard J, Hofeldt FD, et al. Preventive care in diabetes mellitus. Current practice in urban health-care system. Diabetes Care. 1989 Nov-Dec;12(10):745–7. doi: 10.2337/diacare.12.10.745. [DOI] [PubMed] [Google Scholar]
- 11.Wylie-Rosett J, Basch C, Walker EA, Zybert P, Shamoon H, Engel S, et al. Ophthalmic referral rates for patients with diabetes in primary-care clinics located in disadvantaged urban communities. J Diabetes Complications. 1995 Jan-Mar;9(1):49–54. doi: 10.1016/1056-8727(94)00005-9. [DOI] [PubMed] [Google Scholar]
- 12.Ogunyemi O, George S, Patty L, Teklehaimanot S, Baker R. Teleretinal screening for diabetic retinopathy in six Los Angeles urban safety-net clinics: final study results; AMIA Annu Symp Proc. [Research Support, N.I.H., Extramural]; 2013. pp. 1082–8. [PMC free article] [PubMed] [Google Scholar]
- 13.Ogunyemi O, Moran E, Patty Daskivich L, George S, Teklehaimanot S, Ilapakurthi R, et al. Autonomy versus automation: perceptions of nonmydriatic camera choice for teleretinal screening in an urban safety net clinic. Telemed J E Health. [Evaluation Studies Research Support, N.I.H., Extramural] 2013 Aug;19(8):591–6. doi: 10.1089/tmj.2012.0191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ogunyemi O, Terrien E, Eccles A, Patty L, George S, Fish A, et al. Teleretinal screening for diabetic retinopathy in six Los Angeles urban safety-net clinics: initial findings; AMIA Annu Symp Proc. [Research Support, N.I.H., Extramural]; 2011. pp. 1027–35. [PMC free article] [PubMed] [Google Scholar]
- 15.Fish A, George S, Terrien E, Eccles A, Baker R, Ogunyemi O. Workflow concerns and workarounds of readers in an urban safety net teleretinal screening study; AMIA Annu Symp Proc. [Research Support, N.I.H., Extramural]; 2011. pp. 417–26. [PMC free article] [PubMed] [Google Scholar]
- 16.Ogunyemi O, Teklehaimanot S, Patty L, Moran E, George S. Evaluating predictive modeling’s potential to improve teleretinal screening participation in urban safety net clinics. Stud Health Technol Inform. [Research Support, N.I.H., Extramural] 2013;192:162–5. [PMC free article] [PubMed] [Google Scholar]
- 17.Oh E, Yoo TK, Park EC. Diabetic retinopathy risk prediction for fundus examination using sparse learning: a cross-sectional study. BMC Med Inform Decis Mak. 2013;13:106. doi: 10.1186/1472-6947-13-106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ding J, Wong TY. Current epidemiology of diabetic retinopathy and diabetic macular edema. Curr Diab Rep. [Review] 2012 Aug;12(4):346–54. doi: 10.1007/s11892-012-0283-6. [DOI] [PubMed] [Google Scholar]
- 19.Yau JW, Rogers SL, Kawasaki R, Lamoureux EL, Kowalski JW, Bek T, et al. Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care. [Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Review] 2012 Mar;35(3):556–64. doi: 10.2337/dc11-1909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pedro RA, Ramon SA, Marc BB, Juan FB, Isabel MM. Prevalence and relationship between diabetic retinopathy and nephropathy, and its risk factors in the North-East of Spain, a population-based study. Ophthalmic Epidemiol. 2010 Aug;17(4):251–65. doi: 10.3109/09286586.2010.498661. [DOI] [PubMed] [Google Scholar]
- 21.Klein R, Klein BE, Moss SE. Epidemiology of proliferative diabetic retinopathy. Diabetes Care. [Research Support, U.S. Gov’t, P.H.S. Review] 1992 Dec;15(12):1875–91. doi: 10.2337/diacare.15.12.1875. [DOI] [PubMed] [Google Scholar]
- 22.Kuo JZ, Guo X, Klein R, Klein BE, Cui J, Rotter JI, et al. Systemic soluble tumor necrosis factor receptors 1 and 2 are associated with severity of diabetic retinopathy in Hispanics. Ophthalmology. [Research Support, N.I.H., Extramural] 2012 May;119(5):1041–6. doi: 10.1016/j.ophtha.2011.10.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sawicki PT, Didjurgeit U, Muhlhauser I, Bender R, Heinemann L, Berger M. Smoking is associated with progression of diabetic nephropathy. Diabetes Care. [Comparative Study Research Support, Non-U.S. Gov’t] 1994 Feb;17(2):126–31. doi: 10.2337/diacare.17.2.126. [DOI] [PubMed] [Google Scholar]
- 24.Moss SE, Klein R, Klein BE. Association of cigarette smoking with diabetic retinopathy. Diabetes Care. [Comparative Study Research Support, U.S. Gov’t, P.H.S.] 1991 Feb;14(2):119–26. doi: 10.2337/diacare.14.2.119. [DOI] [PubMed] [Google Scholar]
- 25.Moss SE, Klein R, Klein BE. Cigarette smoking and ten-year progression of diabetic retinopathy. Ophthalmology. [Comparative Study Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S.] 1996 Sep;103(9):1438–42. doi: 10.1016/s0161-6420(96)30486-7. [DOI] [PubMed] [Google Scholar]
- 26.Muhlhauser I. Cigarette smoking and diabetes: an update. Diabet Med. [Research Support, Non-U.S. Gov’t Review] 1994 May;11(4):336–43. doi: 10.1111/j.1464-5491.1994.tb00283.x. [DOI] [PubMed] [Google Scholar]
- 27.Garg S, Davis RM. Diabetic Retinopathy Screening Update. Clinical Diabetes. 2009;27(4):140–5. [Google Scholar]
- 28.Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews. 2012;42(4):463–84. [Google Scholar]
- 29.Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological) 1996;58(1):267–88. [Google Scholar]
- 30.Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. 2010;40(1):185–97. [Google Scholar]
- 31.Freund Y, Schapire RE. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences. 1997;55(1):119–39. [Google Scholar]
- 32.MATLAB and Statistics Toolbox Release 2014b. Natick, Massachusetts, United States: The MathWorks, Inc; [Google Scholar]
- 33.Hall M, Frank E, Holmes G, Pfahringher B, Reutemann P, Witten I. The WEKA Data Mining Software: An Update. SIGKDD Explorations. 2009;11(1):10–018. [Google Scholar]
- 34.Cichosz SL, Johansen MD, Knudsen ST, Hansen TK, Hejlesen O. A classification model for predicting eye disease in newly diagnosed people with type 2 diabetes. Diabetes research and clinical practice. 2015 May;108(2):210–5. doi: 10.1016/j.diabres.2015.02.020. [DOI] [PubMed] [Google Scholar]
