Abstract
Objective
Compare machine learning (ML)-based predictive analytics methods to traditional logistic regression in classification of olfactory dysfunction in chronic rhinosinusitis (CRS-OD) and identify predictors within a large multi-institutional cohort of refractory CRS patients.
Methods
Adult CRS patients enrolled in a prospective, multi-institutional, observational cohort study were assessed for baseline CRS-OD using a smell identification test (SIT) or brief SIT (bSIT). Four different ML methods were compared to traditional logistic regression for classification of CRS normosmics versus CRS-OD.
Results
Data were collected for 611 study participants who met inclusion criteria between 2011 April and 2015 July. Thirty-four percent of enrolled patients demonstrated olfactory loss on psychophysical testing. Differences between CRS normosmics and those with smell loss included objective disease measures (CT and endoscopy scores), age, sex, prior surgeries, socioeconomic status, steroid use, polyp presence, asthma, and aspirin sensitivity. Most ML methods performed favorably in terms of predictive ability. Top predictors include factors previously reported in the literature, as well as several socioeconomic factors.
Conclusion
Olfactory dysfunction is a variable phenomenon in CRS patients. ML methods perform well compared to traditional logistic regression in classification of normosmia versus smell loss in CRS, and are able to include numerous risk factors into prediction models. Several actionable features were identified as risk factors for CRS-OD. These results suggest that ML methods may be useful for current understanding and future study of hyposmia secondary to sinonasal disease, the most common cause of persistent olfactory loss in the general population.
Keywords: sinusitis, smell, olfaction, chronic disease, outcome assessment (health care), artificial intelligence, AI/ML, predictive analytics
Introduction
Olfactory loss is common, affecting up to 25% of the adult population, with chronic rhinosinusitis (CRS) being a major cause of persistent olfactory loss (Hummel et al. 2016). Smell loss is a hallmark of CRS, particularly in those with nasal polyps, and is one of the cardinal diagnostic symptoms for the disease (Fokkens et al. 2012; Rosenfeld et al. 2015; Orlandi et al. 2016). Curiously, this symptom does not affect all CRS patients. Just as CRS is a heterogeneous disease (Akdis et al. 2013), previous literature has estimated a 20–80% prevalence of olfactory loss in CRS cohorts, based on the sensitivity of olfactory assessment used and the proportion of enrolled subjects with nasal polyps (CRSwNP). A recent study using well-defined CRS diagnostic criteria and Sniffin’ Sticks objective testing showed a 44% prevalence of smell loss in CRS without nasal polyps (CRSsNP) subjects and 58% in CRSwNP (Soler et al. 2016a). Risk factors associated with olfactory dysfunction in chronic rhinosinusitis (CRS-OD) identified in this study included disease severity score on computed tomography (CT) and the presence of comorbid conditions such as diabetes, allergy, and asthma. Adding a healthy control group for comparison, Schlosser et al. noted CRS-OD to affect CRSwNP patients more than CRSsNP, and identified age as an additional risk factor(Schlosser et al. 2020). Given that most existing studies consist of single-institution cohorts, with subjects undergoing variable treatment regimens, and different assessments of smell function, a systematic review and meta-analysis was performed by Kohli et al. to clarify the presence of OD in CRS and identify consistent risk factors (Kohli et al. 2017). The authors included 47 studies in the analysis, and identified age, CT score, and presence of nasal polyps as the three most important risk factors for CRS-OD in the published literature. Similarly, olfactory response to CRS treatment is highly variable, with resolution of olfactory disturbance after standard therapies occurring in approximately 40% of refractory CRS patients (Deconde et al. 2014).
Olfactory dysfunction in CRS is potentially complex and the interplay among possible mechanisms is poorly understood. The simple model of conductive loss due to impaired odorant transport to the olfactory cleft in CRS may explain a portion of the disease for some patients. Local and systemic inflammation likely contribute to varying degrees in particular individuals, with underlying factors such as age and presence of comorbid diabetes influencing the capacity for olfactory sensory neuron regeneration. Given the intricate and multifactorial nature of CRS-OD, compounded by inherent challenges of human cohort studies, it is no surprise that putative predictors and their relative importance have been inconsistently characterized across studies. Existing single-institution studies carry a significant risk of sampling bias, and their use of univariate regression models is insufficient to identify predictor variables for CRS-OD. These analyses cannot account for all possible predictor variables or interactions among variables, and therefore, may offer limited insights. Additional attempts at categorizing outcomes with multivariable regression approaches that were not developed for predictive performance, or assessed as such, cannot be relied upon(Katotomichelakis et al. 2014).
Traditional statistical approaches can model how system variables relate to one another and generate corresponding mathematical metrics of statistical significance. However, novel data analytics approaches including machine learning (ML) methods have shown improved classification accuracy and can be utilized to predict outcomes, rather than focusing on individual components within a system. ML approaches are particularly useful when there is a role for including complex interactions that may otherwise be ignored or dismissed as noise when using traditional statistics (Bzdok et al. 2018), such as in the case of CRS-OD.
Understanding the heterogeneity of CRS-OD and interpreting multidimensional risk factors for outcomes prediction is foundational for future clinical care and research. The objective of this study was to test different ML-based predictive analytics approaches against traditional regression modeling for classification accuracy of OD in CRS patients. A secondary goal was to explore top predictors in high-performing models to identify common predictor variables of importance that may help explain the heterogeneity of OD found among CRS patients.
Materials and methods
Study population
Adult patients (>18 years old) were recruited into a prospective, multicenter, observational cohort study, for which multiple reports have been previously published (Smith et al. 2013; Deconde et al. 2015; Soler et al. 2016b; Ramakrishnan et al. 2017). All study patients were diagnosed with medically refractory CRS as defined by the American Academy of Otolaryngology-Head and Neck Surgery 2007 guidelines (Rosenfeld et al. 2015). Performance sites consisted of tertiary care sinus centers located within academic hospital systems in North America including: Oregon Health & Science University (OHSU; Portland, OR; eIRB#7198), Stanford University (Palo Alto, CA; IRB#4947), and the Medical University of South Carolina (Charleston, SC; IRB#12409), with central study coordination at OHSU. The Institutional Review Board (IRB) at each performance site governed all study protocols, informed consent documentation, and data safety monitoring, in accordance with the Declaration of Helsinki for Medical Research Involving Human Subjects.
At original study enrollment, participants were screened for demographics, social and medical history (Table 1). The diagnoses of asthma, allergy, and aspirin-exacerbated respiratory disease (AERD) were made by evaluation of the medical record, presence of classic symptoms, and self-report of prior testing. Subjects with comorbid cystic fibrosis, ciliary dyskinesia, or autoimmune disease were included in the analysis. Patient-reported outcome measures (PROMs) were collected during study enrollment, but were not used in the predictive analytics given the overlap between particular symptom ratings and objective smell loss (Senior et al. 2001; Piccirillo et al. 2002; Hopkins et al. 2009; Simopoulos et al. 2012). Given the exploratory, retrospective nature of this study, all available subjects were included and sample size calculations were not conducted.
Table 1.
Characteristic | Overall, n = 611 | n | Olfactory dysfunction | P-value | Q-value | |
---|---|---|---|---|---|---|
Yes, n = 203 | No, n = 408 | |||||
CRS with polyps | 611 | <0.001 | <0.001 | |||
No | 399 (65%) | 86 (42%) | 313 (77%) | |||
Yes | 212 (35%) | 117 (58%) | 95 (23%) | |||
Total endoscopy score | 6.0 (4.0, 8.5) | 607 | 8.0 (4.0, 11.0) | 4.0 (2.0, 7.0) | <0.001 | <0.001 |
Total CT score | 12.0 (7.0, 16.0) | 582 | 15.0 (10.0, 19.0) | 11.0 (6.0, 14.0) | <0.001 | <0.001 |
AERD | 611 | <0.001 | <0.001 | |||
No | 567 (93%) | 173 (85%) | 394 (97%) | |||
Yes | 44 (7.2%) | 30 (15%) | 14 (3.4%) | |||
Age | 51.0 (38.3, 61.7) | 611 | 56.6 (44.6, 65.4) | 49.1 (36.9, 59.9) | <0.001 | <0.001 |
Prior surgery | 610 | <0.001 | <0.001 | |||
0 | 278 (46%) | 66 (33%) | 212 (52%) | |||
1 | 153 (25%) | 56 (28%) | 97 (24%) | |||
2 | 90 (15%) | 34 (17%) | 56 (14%) | |||
3 | 40 (6.6%) | 24 (12%) | 16 (3.9%) | |||
4+ | 49 (8.0%) | 23 (11%) | 26 (6.4%) | |||
Insurance | 610 | <0.001 | <0.001 | |||
Employer provided | 373 (61%) | 112 (55%) | 261 (64%) | |||
Medicaid | 24 (3.9%) | 4 (2.0%) | 20 (4.9%) | |||
Medicare | 129 (21%) | 63 (31%) | 66 (16%) | |||
None | 6 (1.0%) | 5 (2.5%) | 1 (0.2%) | |||
Private | 62 (10%) | 13 (6.4%) | 49 (12%) | |||
State assisted | 8 (1.3%) | 4 (2.0%) | 4 (1.0%) | |||
VA benefits | 3 (0.5%) | 0 (0%) | 3 (0.7%) | |||
VA benefits/tricare | 5 (0.8%) | 2 (1.0%) | 3 (0.7%) | |||
Asthma | 611 | <0.001 | <0.001 | |||
No | 397 (65%) | 110 (54%) | 287 (70%) | |||
Yes | 214 (35%) | 93 (46%) | 121 (30%) | |||
Septal deviation | 611 | <0.001 | 0.001 | |||
No | 408 (67%) | 155 (76%) | 253 (62%) | |||
Yes | 203 (33%) | 48 (24%) | 155 (38%) | |||
Recurrent acute sinusitis | 611 | <0.001 | 0.003 | |||
No | 571 (93%) | 199 (98%) | 372 (91%) | |||
Yes | 40 (6.5%) | 4 (2.0%) | 36 (8.8%) | |||
Steroid dependence | 611 | 0.003 | 0.009 | |||
No | 555 (91%) | 174 (86%) | 381 (93%) | |||
Yes | 56 (9.2%) | 29 (14%) | 27 (6.6%) | |||
Sex | 611 | 0.013 | 0.037 | |||
Female | 324 (53%) | 93 (46%) | 231 (57%) | |||
Male | 287 (47%) | 110 (54%) | 177 (43%) | |||
AFRS | 611 | 0.019 | 0.049 | |||
No | 593 (97%) | 192 (95%) | 401 (98%) | |||
Yes | 18 (2.9%) | 11 (5.4%) | 7 (1.7%) | |||
COPD | 611 | 0.020 | 0.049 | |||
No | 579 (95%) | 186 (92%) | 393 (96%) | |||
Yes | 32 (5.2%) | 17 (8.4%) | 15 (3.7%) | |||
Household income | 595 | 0.027 | 0.064 | |||
0–25,000 | 96 (16%) | 34 (17%) | 62 (16%) | |||
26,000–50,000 | 102 (17%) | 47 (24%) | 55 (14%) | |||
51,000–75,000 | 120 (20%) | 37 (19%) | 83 (21%) | |||
76,000–100,000 | 107 (18%) | 28 (14%) | 79 (20%) | |||
100,000+ | 170 (29%) | 51 (26%) | 119 (30%) | |||
Inferior turbinate hypertrophy | 611 | 0.091 | 0.200 | |||
No | 520 (85%) | 180 (89%) | 340 (83%) | |||
Yes | 91 (15%) | 23 (11%) | 68 (17%) | |||
CF or ciliary dysfunction | 611 | 0.117 | 0.241 | |||
No | 588 (96%) | 199 (98%) | 389 (95%) | |||
Yes | 23 (3.8%) | 4 (2.0%) | 19 (4.7%) | |||
Allergy by history | 611 | 0.143 | 0.277 | |||
No | 480 (79%) | 152 (75%) | 328 (80%) | |||
Yes | 131 (21%) | 51 (25%) | 80 (20%) | |||
Site | 611 | 0.189 | 0.349 | |||
#1 | 222 (36%) | 84 (41%) | 138 (34%) | |||
#2 | 253 (41%) | 77 (38%) | 176 (43%) | |||
#3 | 136 (22%) | 42 (21%) | 94 (23%) | |||
OSA by history | 611 | 0.215 | 0.376 | |||
No | 584 (96%) | 191 (94%) | 393 (96%) | |||
Yes | 27 (4.4%) | 12 (5.9%) | 15 (3.7%) | |||
Education (years) | 16.0 (13.0, 17.0) | 608 | 15.0 (13.0, 16.0) | 16.0 (13.0, 17.0) | 0.235 | 0.392 |
Race | 610 | 0.247 | 0.393 | |||
African American | 49 (8.0%) | 18 (8.9%) | 31 (7.6%) | |||
American Indian/Alaska Native | 5 (0.8%) | 2 (1.0%) | 3 (0.7%) | |||
Asian | 24 (3.9%) | 10 (4.9%) | 14 (3.4%) | |||
Native Hawaiian/Pacific Islander | 1 (0.2%) | 1 (0.5%) | 0 (0%) | |||
Other | 25 (4.1%) | 12 (5.9%) | 13 (3.2%) | |||
White | 506 (83%) | 160 (79%) | 346 (85%) | |||
GERD | 611 | 0.303 | 0.460 | |||
No | 570 (93%) | 186 (92%) | 384 (94%) | |||
Yes | 41 (6.7%) | 17 (8.4%) | 24 (5.9%) | |||
Ethnicity | 611 | 0.352 | 0.493 | |||
Hispanic/ Latino | 21 (3.4%) | 9 (4.4%) | 12 (2.9%) | |||
Non-Hispanic/ Latino | 590 (97%) | 194 (96%) | 396 (97%) | |||
Alcohol use | 608 | 0.344 | 0.493 | |||
No | 313 (51%) | 109 (54%) | 204 (50%) | |||
Yes | 295 (49%) | 92 (46%) | 203 (50%) | |||
Allergy_testing | 611 | 0.382 | 0.510 | |||
No | 362 (59%) | 115 (57%) | 247 (61%) | |||
Yes | 249 (41%) | 88 (43%) | 161 (39%) | |||
Depression | 611 | 0.403 | 0.510 | |||
No | 518 (85%) | 176 (87%) | 342 (84%) | |||
Yes | 93 (15%) | 27 (13%) | 66 (16%) | |||
Fibromyalgia | 611 | 0.408 | 0.510 | |||
No | 584 (96%) | 192 (95%) | 392 (96%) | |||
Yes | 27 (4.4%) | 11 (5.4%) | 16 (3.9%) | |||
OSA by testing | 611 | 0.450 | 0.543 | |||
No | 557 (91%) | 188 (93%) | 369 (90%) | |||
Yes | 54 (8.8%) | 15 (7.4%) | 39 (9.6%) | |||
Alcohol use (#) | 0.0 (0.0, 36.0) | 608 | 0.0 (0.0, 24.0) | 0.0 (0.0, 36.0) | 0.473 | 0.552 |
Diabetes | 609 | 0.536 | 0.605 | |||
No | 558 (92%) | 183 (91%) | 375 (92%) | |||
Yes | 51 (8.4%) | 19 (9.4%) | 32 (7.9%) | |||
Autoimmune disease | 610 | 0.589 | 0.645 | |||
No | 573 (94%) | 188 (93%) | 385 (94%) | |||
Yes | 37 (6.1%) | 14 (6.9%) | 23 (5.6%) | |||
Immunodeficiency | 611 | 0.626 | 0.664 | |||
No | 592 (97%) | 198 (98%) | 394 (97%) | |||
Yes | 19 (3.1%) | 5 (2.5%) | 14 (3.4%) | |||
Smoker (current or former) | 607 | 1.000 | 1.000 | |||
No | 574 (95%) | 190 (95%) | 384 (95%) | |||
Yes | 33 (5.4%) | 11 (5.5%) | 22 (5.4%) |
Objective measures of disease severity
Diagnostic measures of disease severity were collected as standard of care, and were also used for investigational purposes. These well-established objective measures have been recommended for clinical study(Hopkins et al. 2018), and include Lund-Kennedy nasal endoscopy scoring (Lund and Kennedy 1995) and high-resolution CT imaging graded using the Lund-Mackay scoring system (Lund and Mackay 1992).
The main outcome of interest in the current study is olfactory dysfunction as measured by psychophysical testing. Olfactory function was initially assessed using the brief Smell Identification Test (bSIT) in the years 2011-2013, and subsequently using the 40-item Smell Identification Test (SIT) in the years 2013-2015 (Sensonics Inc., Haddon Heights, NJ). Higher SIT total scores reflect a better sense of smell, and categorical ratings of “smell loss” (anosmia + hyposmia) and “normosmia” were assigned based on established scales (Doty 1995; Doty 2001; El Rassi et al. 2016).
Data management and statistical analyses
Protected health information was previously removed, and study data were safeguarded using a unique study identification number assignment for each participant. Study data were securely transferred from a HIPAA compliant, relational database (Access, Microsoft Corp, Redmond, WA) to the University of Colorado Department of Otolaryngology-Head and Neck Surgery password-protected research server, according to specifications of a Data Use Agreement between OHSU and the University of Colorado.
Study data were evaluated descriptively using count (%) for categorical variables and median and interquartile range (IQR; 25th quantile, 75th quantile) for numeric variables. Comparing variables between normosmics and those with smell loss, Fisher’s exact test was used for categorical variables and Wilcoxon rank-sum test was used for numeric variables. Both the original P-values and Benjamini-Hochberg multiple testing adjusted P-values (“Q-values”) are reported (Benjamini and Hochberg 1995). Statistical comparisons with Q-values < 0.05 were regarded as statistically significant. All statistical and data analyses were completed using the R software program (R Core Team 2019). Machine learning models were compared to simpler logistic regression model with backwards stepwise variable selection using Akaike information criteria (AIC). The stepwise variable selection was re-performed during each iteration of the cross-validation procedure (details later), in order to prevent biased estimates of classification accuracy (Hastie et al. 2009a).
Machine learning approaches to classification
Four different machine learning predictive analytics models were applied, in order to compare classification accuracy to logistic regression, and code was made publicly available at https://github.com/vijayramakrishnan/chemsenses-CRS-olfactory-prediction.git. All classifiers were fit using the caret R package (Kuhn 2008), which provides a unified framework for tuning and evaluating classification accuracy. When including a categorical predictor in a model, K-1 dummy variables were used, where K is the total number of categories. Random forests (RF) was chosen as it allows for nonlinear and interaction effects, and can handle a large number of potential predictors (Boulesteix et al. 2012). Although the original random forest algorithm is biased towards favoring features that have more possible cut-points (e.g. categorical variables with more categories), importantly, we used a special type of random forests for unbiased feature selection: an ensemble of “conditional inference trees” with permutation-based variable importance scores (Hothorn et al. 2006; Strobl et al. 2007; Strobl et al. 2009). Similar to traditional logistic regression, least absolute shrinkage and selection operator (LASSO) assumes only linear and additive effects (without interaction) where regression coefficients are shrunk toward zero to perform variable selection (Tibshirani 1996; Friedman et al. 2010). In this model, unimportant variables are given coefficients equal to zero, effectively removing them from the model. Along these lines, multivariate adaptive regression splines (MARS) was also applied in a similar fashion to traditional logistic regression, but it uses stepwise methods for variable selection and allows for nonlinear and interaction effects (Friedman 1991). Lastly, the Support Vector Machine (SVM) approach was applied with a radial basis kernel, which allows for nonlinear and interaction effects (Cristianini and Shawe-Taylor 2000). Although many machine learning methods exist, these four methods were selected since RF and SVM with a Gaussian/radial basis kernel have been shown to perform well across a wide variety of empirical applications, while LASSO and MARS were chosen as simpler, easier to interpret methods (Hastie et al. 2009b; Fernández-Delgado et al. 2014; Lötsch et al. 2019. Given that the primary goal is to compare how well different models classify a binary olfactory dysfunction outcome, unsupervised machine learning methods (which do not use an outcome) were not considered appropriate for this study.
Repeated 10-fold cross-validation (repeated 5 times (Kim 2009)) was used to tune and internally validate the classification accuracy of each model. The following measures of accuracy were considered: area under receiver operating characteristic curve (AUC), sensitivity (proportion of subjects with olfactory dysfunction that were correctly classified), and specificity (proportion of subjects without olfactory dysfunction that were correctly classified). To deal with the fact that the two classes are unbalanced (dysosmia vs. normosmia), under-sampling was used within the cross-validation procedure, as implemented in the caret R package (He and Garcia 2009; R Core Team 2019]. Over-sampling was also considered, but produced nearly equal classification accuracy in terms of AUC (results not shown). The method of “surrogate splits” (Strobl et al. 2009) was used to handle missing values in RF. The missForest R package (Stekhoven and Bühlmann 2012) was used to impute missing values for all other models, while excluding the outcome in order to prevent over-fitting. Cross-validated AUC was compared between models using a corrected resampled t-test (Nadeau and Bengio 2003), which corrects for the overlap that occurs during the resampling process, and the Benjamini-Hochberg method was used to adjust for multiple testing.
To identify the important predictor variables included in each classification algorithm, permutation-based variable importance scores were used to rank variables from most to least important in RF, while the absolute value of the standardized regression coefficient was used for LASSO. The IML R package (Molnar et al. 2018) was used to calculate variable importance scores for SVM using a permutation method. All variable importance scores were scaled from 0 to 100, where the most important variable for the model has a score of 100. Of note, for a categorical variable with more than two categories, RF and LASSO calculate separate variable importance scores for each category, whereas the approach used for SVM only calculates an overall score.
Results
Final cohort characteristics
A total of 611 study participants met all inclusion criteria and were prospectively enrolled between April 2011 and July 2015. Demographics and comorbid disease characteristics, examined by olfactory classification, are described in Table 1. The overall median age for the final cohort was 51 years (IQR = 38.3, 61.7) years with slight female preponderance (53% female). Approximately one-third of the cohort exhibited nasal polyps (35%).
Study participants were stratified by olfactory testing category. Statistically significant differences (q<0.05) were observed between the normosmic and hyposmic cohorts in terms of objective disease measures (CT and endoscopy scores), age, sex, previous surgery, socioeconomic status, corticosteroid dependence, asthma/COPD, and AERD. Sinonasal conditions including nasal polyps, allergic fungal rhinosinusitis, and deviated septum were also significant factors.
Smell loss vs. normosmia classification accuracy
Figure 1 illustrates a comparison of the different models based on AUC, sensitivity, and specificity. A logistic regression model for classification of baseline olfactory dysfunction demonstrated acceptable accuracy in the context of CRS (AUC 0.707 ± 0.07). Three of the four machine learning models performed favorably compared with the traditional logistic regression model. The SVM approach was the most accurate, with a relatively good classification accuracy (AUC 0.754 ± 0.05) and much higher sensitivity than the other methods. LASSO and Random Forest also outperformed logistic regression, whereas MARS had the lowest AUC. The corrected resampled t-test with Benjamini-Hochberg adjustment (Fig. 1c) indicated that SVM had a significantly higher mean AUC than MARS (Q = 0.02), but was not significantly different than the Random Forest, LASSO, or logistic regression. LASSO had a significantly higher mean AUC than MARS (Q = 0.012) and logistic regression (Q = 0.012).
Machine learning predictions: top variables
ML algorithms are adept at recognizing patterns within complex data, but the lack of a priori framework results in the clinical challenge of understanding why the algorithm made its decisions. The ML models carried some different relative importances of predictor variables. SVM, the model with the largest AUC, considered many variables to be influential, with 20 variables having variable importance scores > 25 (Fig. 2). In Random Forest and LASSO models, fewer variables were important and the most weight was placed on the presence of nasal polyps and the nasal endoscopy severity score (Fig. 3 and Supplementary Fig. 1).
Top predictors across the approaches include corticosteroid dependence, prior surgery, nasal polyps, AERD, age, sex, and smoking history. Inflammatory disease severity, as measured by CT and nasal endoscopy scores, were also strong predictors of CRS-OD.
Socioeconomic factors such as insurance status, household income, and race are among top predictors across the three models displayed. Age was a less important variable in all models. Comorbid medical conditions including asthma, diabetes, septal deviation, alcohol use, and sleep apnea, are also of moderate importance. Although the presence of nasal polyps, asthma, and AERD, are significantly associated with CRS-OD, the presence of allergic rhinitis by history or by skin testing appears to carry relatively minimal weight in the CRS-OD classifications.
Discussion
CRS is a heterogeneous process in terms of presentation, subtypes, natural history, and response to treatments (Orlandi et al. 2016). Current data from a multi-institutional CRS cohort studying olfactory loss in CRS with a sensitive olfactory test (Sniffin’ Sticks) demonstrates that nearly 70% of the CRS population has significant olfactory loss, with 20% exhibiting complete anosmia (Schlosser et al. 2020). The pathophysiology of OD in CRS could result from both sensorineural and conductive effects of local tissue inflammation, on top of the myriad factors that can affect olfaction in the general population (Hummel et al. 2016; Rombaux et al. 2016). In previous studies, factors associated with worse baseline olfactory function in CRS patients included nasal polyposis, asthma, age, smoking, and eosinophilia (Litvack et al. 2008). Other works have shown CRS-OD response treatment in ~40% of patients, with risk factors such as olfactory dysfunction severity, nasal polyps, female gender, high socioeconomic status, and nonsmoking associating with better quality of life results. In multivariate regression, only nasal polyps and degree of baseline olfactory dysfunction maintained statistical significance, highlighting the need for robust sample size and improved analytical methods to glean useful information from human study (Katotomichelakis et al. 2014).
Here, we utilized ML-based data science approaches in addition to logistic regression in predictive modeling of CRS-OD classification. Top predictors across the approaches contain many variables associated with olfactory function in previous studies including nasal polyps, prior surgery, AERD, age, sex, corticosteroid dependence, and smoking history. Inflammatory disease severity, as measured by CT and nasal endoscopy scores, are also apparently important predictors of CRS-OD. Socioeconomic factors such as insurance status, household income, and race were among the important variables across the three top-performing models. Since age was less important in all models, insurance status appears to be an independent predictor (i.e., Medicare insurance is not simply a proxy for advanced age). Comorbid medical conditions including asthma, diabetes, septal deviation, alcohol use, and sleep apnea, are also of moderate importance. Notably, these factors can be influenced by medical care and lifestyle modification, suggesting that other approaches to improve general health and well-being may be considered as part of a holistic approach to CRS-OD management. Although the presence of nasal polyps, asthma, and AERD, are significantly associated with CRS-OD here and in prior literature, the presence of allergic rhinitis by history or by skin testing appeared to carry relatively minimal weight in the CRS-OD classifications. This observation suggests that evaluation and treatment for allergic rhinitis may not necessarily be expected to benefit olfactory function in CRS.
We observed favorable prediction accuracy with most ML approaches compared to traditional regression modeling. ML is defined as the use of algorithms to break down data, learn from it, and then make a determination or prediction about some aspect. Whereas traditional statistical methods focus on modeling how system variables relate to one another and what statistical inference (e.g. significance in P-values) can be made, the goal of machine learning (ML) is not interpretation of individual components but prediction of future outcomes. In doing so, ML provides a novel approach to uncover previously unrecognized patterns among CRS-OD patients and thereby offers numerous advantages over regression analyses, which have been traditionally employed in studies of CRS disease severity and outcomes (Michie et al. 1994). As more complex and multidimensional data are adding in “deep phenotyping” approaches to CRS, machine learning approaches may be well-suited to include the expansive amounts of added data (i.e., multiomics). Further external validation using appropriate cohorts will certainly be of value.
In this study, four different supervised ML algorithms were used map features of interest to the outcome or “label” of olfactory loss, and three outperformed traditional regression. In particular, the SVM model had improved accuracy that was driven largely by improved sensitivity. The classification accuracy may be attributed to the ability of these ML models to handle high-dimensional data in which more features exist than observations, and include nonlinear and interaction effects. In contrast, traditional statistical modeling with a large number of potential features requires some form of dimension reduction or variable selection, and exploration of interactions and nonlinear effects is challenging with more than a small number of predictors. As a result, we believe that ML and artificial intelligence data analytics are well-suited to prime research and eventually for the application of precision medicine in olfactory disorders, given the idiosyncrasies, nuances, and numerous possible predictor variables.
There are some limitations of the current study that should be addressed with regard to ML. Besides the four ML models considered in this study, many other ML methods exist and could be evaluated in future studies. Adequate enrollment by sex and race are common sampling issues in healthcare ML (Rajkomar et al. 2019; Cutillo et al. 2020). Details of this cohort have been extensively published and include even distributions by age and sex. Racial/ethnic minorities and low socioeconomic status groups are underrepresented perhaps as a function of geography and referral patterns of the enrollment sites, but also result from the general lack of accessible and affordable healthcare and racial disparities inherent within the US healthcare system (Soler et al. 2012; Agency for Healthcare Research and Quality 2018). A related concern is that models created from our cohort data may “overfit” when new or unseen clinical data are applied. Certainly, these approaches should be further validated in independent studies, but nonetheless demonstrate the utility of data science approaches to uncover patterns within expansive human data in a noisy disease process. To incorporate a large enough patient dataset across three sites over several years of prospective data collection, psychophysical testing included classification of smell loss by either SIT or bSIT. Although the SIT may be more sensitive despite both of these tests being extensively validated in the literature, our prior analysis in a CRS cohort has demonstrated excellent diagnostic accuracy of the bSIT (AUC = 0.873; 95% CI: 0.819, 0.927) (El Rassi et al. 2016). Additionally, there may be important data inputs (i.e., molecular endotyping data) that are not yet clinically available. Serum markers, such as peripheral eosinophilia and serum IgE, have been proposed for CRS in the past but have generally been disappointing and do not appear to correlate with tissue levels (Settipane 1996; Gitomer et al. 2016). Some of the clinical data input fields may associate phenotypes with likely molecular endotypes, but such overlap is not universal (Akdis et al. 2013). Future goals include testing whether additional accuracy is gained with the addition of molecular-based biomarkers to the input data.
Conclusion
Here we demonstrate the utility of novel predictive analytics approaches in the study of clinical olfactory disorders. Founded in data science principles rather than traditional statistics, ML models performed favorably compared to logistic regression. Consistently observed predictors across models include a weighted interest in socioeconomic status, and other potentially modifiable conditions such as asthma/AERD. Further studies will be important to reproduce and validate these findings, and might explore why certain novel predictor variables appear to be important. Such approaches may have value for both clinical counseling and guidance for future basic science and translational research.
Supplementary Material
Acknowledgments
The authors would like to thank Drs. Miranda Kroehl, PhD, John Rice, PhD, and Debashis Ghosh, PhD, for biostatistical expertise in discussions regarding validation of machine learning and prediction models. We would also like to thank Drs. Peter Hwang, MD, Rod Schlosser, MD, and Luke Rudmik, MD, for their contributions in original study enrollment and ongoing collaboration.
Funding
This study was supported in part by a grant from the Ludeman Family Center for Women’s Health Research at the University of Colorado Anschutz Medical Campus (V.R.R.). V.R.R., J.C.M., Z.M.S., T.L.S., and S.S.S. are supported by grants for this investigation from the National Institute on Deafness and Other Communication Disorders (NIDCD) and the National Institute of Allergy and Infectious Diseases (NIAID) of the National Institutes of Health, Bethesda, MD [R01 DC005805 (T.L.S. and Z.M.S.), K23 DC014747 (V.R.R.), 1P01AI145818-01 (S.S.S.)]. Public clinical trial registration (www.clinicaltrials.gov) ID# NCT01332136. Contents are the authors’ sole responsibility and do not necessarily represent official NIH views.
Conflict of interest
None related to this study.
References
- Agency for Healthcare Research and Quality . 2018. National healthcare quality and disparities report. https://www.ahrq.gov/research/findings/nhqrdr/nhqdr18/index.html. Content last reviewed April 2020.
- Akdis CA, Bachert C, Cingi C, Dykewicz MS, Hellings PW, Naclerio RM, Schleimer RP, Ledford D. 2013. Endotypes and phenotypes of chronic rhinosinusitis: a PRACTALL document of the European Academy of Allergy and Clinical Immunology and the American Academy of Allergy, Asthma & Immunology. J Allergy Clin Immunol. 131(6):1479–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B (Methodol). 57(1):289–300. [Google Scholar]
- Boulesteix AL, Janitza S, Kruppa J, König IR. 2012. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Data Min Knowl Disc. 2(6):493–507. [Google Scholar]
- Bzdok D, Altman N, Krzywinski M. 2018. Statistics versus machine learning. Nat Methods. 15(4):233–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cristianini N, Shawe-Taylor J. 2000. An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press. doi: 10.1017/CBO9780511801389. [DOI] [Google Scholar]
- Cutillo CM, Sharma KR, Foschini L, Kundu S, Mackintosh M, Mandl KD; MI in Healthcare Workshop Working Group . 2020. Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit Med. 3:47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeConde AS, Mace JC, Alt JA, Rudmik L, Soler ZM, Smith TL. 2015. Longitudinal improvement and stability of the SNOT-22 survey in the evaluation of surgical management for chronic rhinosinusitis. Int Forum Allergy Rhinol. 5(3):233–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeConde AS, Mace JC, Alt JA, Schlosser RJ, Smith TL, Soler ZM. 2014. Comparative effectiveness of medical and surgical therapy on olfaction in chronic rhinosinusitis: a prospective, multi-institutional study. Int Forum Allergy Rhinol. 4(9):725–733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doty RL. 1995. The smell identification test administration manual. 3. Haddon Heights (NJ: ): Sensonics, Inc. [Google Scholar]
- Doty RL. 2001. The brief smell identification test administration manual. Haddon Heights (NJ): Sensonics, Inc. [Google Scholar]
- El Rassi E, Mace JC, Steele TO, Alt JA, Soler ZM, Fu R, Smith TL. 2016. Sensitivity analysis and diagnostic accuracy of the Brief Smell Identification Test in patients with chronic rhinosinusitis. Int Forum Allergy Rhinol. 6(3):287–292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández-Delgado M, Cernadas E, Barro S, Amorim D. 2014. Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res. 15(1):3133–3181. [Google Scholar]
- Fokkens WJ, Lund VJ, Mullol J, Bachert C, Alobid I, Baroody F, Cohen N, Cervin A, Douglas R, Gevaert P, et al. . 2012. European position paper on rhinosinusitis and nasal polyps 2012. Rhinology. 23: 1–298. [PubMed] [Google Scholar]
- Friedman JH. 1991. Multivariate adaptive regression splines. Ann Stat. 19:1–67. [DOI] [PubMed] [Google Scholar]
- Friedman J, Hastie T, Tibshirani R. 2010. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 33(1):1–22. [PMC free article] [PubMed] [Google Scholar]
- Gitomer SA, Fountain CR, Kingdom TT, Getz AE, Sillau SH, Katial RK, Ramakrishnan VR. 2016. Clinical examination of tissue eosinophilia in patients with chronic rhinosinusitis and nasal polyposis. Otolaryngol Head Neck Surg. 155(1):173–178. [DOI] [PubMed] [Google Scholar]
- Hastie T, Tibshirani R, Friedman J. 2009a. The wrong and right way to do cross-validation. In: Elements of statistical learning, data mining, inference, prediction. New York (NY): Springer. p. 245–247. [Google Scholar]
- Hastie T, Tibshirani R, Friedman J. 2009b. The elements of statistical learning: data mining, inference, and prediction. New York (NY): Springer Series in Statistics. [Google Scholar]
- He H, Garcia EA. Learning from imbalanced data. 2009. IEEE Transact Knowl Data Eng. 21(9):1263–1284. [Google Scholar]
- Hopkins C, Gillett S, Slack R, Lund VJ, Browne JP. 2009. Psychometric validity of the 22-item Sinonasal Outcome Test. Clin Otolaryngol. 34(5):447–454. [DOI] [PubMed] [Google Scholar]
- Hopkins C, Hettige R, Soni-Jaiswal A, Lakhani R, Carrie S, Cervin A, Douglas R, Fokkens WJ, Harvey R, Hellings PW, et al. . 2018. CHronic Rhinosinusitis Outcome MEasures (CHROME), developing a core outcome set for trials of interventions in chronic rhinosinusitis. Rhinology. 56(1):22–32. [DOI] [PubMed] [Google Scholar]
- Hothorn T, Hornik K, Zeileis A. 2006. Unbiased recursive partitioning: a conditional inference framework. J Computat Graph Stat. 15(3):651–674. [Google Scholar]
- Hummel T, Whitcroft KL, Andrews P, Altundag A, Cinghi C, Costanzo RM, Damm M, Frasnelli J, Gudziol H, Gupta N, et al. . 2016. Position paper on olfactory dysfunction. Rhinology. 56(1):1–30. [DOI] [PubMed] [Google Scholar]
- Katotomichelakis M, Simopoulos E, Tripsianis G, Balatsouras D, Danielides G, Kourousis C, Livaditis M, Danielides V. 2014. Predictors of quality of life outcomes in chronic rhinosinusitis after sinus surgery. Eur Arch Otorhinolaryngol. 271(4):733–741. [DOI] [PubMed] [Google Scholar]
- Kim JH. 2009. Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Computat Stat Data Anal. 53(11):3735–3745. [Google Scholar]
- Kohli P, Naik AN, Harruff EE, Nguyen SA, Schlosser RJ, Soler ZM. 2017. The prevalence of olfactory dysfunction in chronic rhinosinusitis. Laryngoscope. 127(2):309–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhn M. Building predictive models in R using the caret package. 2008. J Stat Softw. 28(5):1–26.27774042 [Google Scholar]
- Litvack JR, Fong K, Mace J, James KE, Smith TL. 2008. Predictors of olfactory dysfunction in patients with chronic rhinosinusitis. Laryngoscope. 118(12):2225–2230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lötsch J, Kringel D, Hummel T. 2019. Machine learning in human olfactory research. Chem Senses. 44(1):11–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lund VJ, Kennedy DW. 1995. Quantification for staging sinusitis. The Staging and Therapy Group. Ann Otol Rhinol Laryngol Suppl. 167:17–21. [PubMed] [Google Scholar]
- Lund VJ, Mackay IS. 1992. Staging in rhinosinusitus. Rhinology. 31(4):183–184. [PubMed] [Google Scholar]
- Michie D, Spiegelhalter D, Taylor C. 1994. Machine learning, neural, and statistical classification. New York, NY: Ellis Horwood. [Google Scholar]
- Molnar C, Casalicchio G, Bischl B. 2018. iml: an R package for interpretable machine learning. J Open Source Softw. 3(26):786. [Google Scholar]
- Nadeau C, Bengio Y. 2003. Inference for the generalization error. Mach Learn. 52:239–281. [Google Scholar]
- Orlandi RR, Kingdom TT, Hwang PH, Smith TL, Alt JA, Baroody FM, Batra PS, Bernal-Sprekelsen M, Bhattacharyya N, Chandra RK, et al. . 2016. International consensus statement on allergy and rhinology: rhinosinusitis. Int Forum Allergy Rhinol. 6 Suppl 1:S22–209. [DOI] [PubMed] [Google Scholar]
- Piccirillo JF, Merritt MG Jr, Richards ML. 2002. Psychometric and clinimetric validity of the 20-Item Sino-Nasal Outcome Test (SNOT-20). Otolaryngol Head Neck Surg. 126(1):41–47. [DOI] [PubMed] [Google Scholar]
- R Core Team . 2019. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. [Google Scholar]
- Rajkomar A, Dean J, Kohane I. 2019. Machine learning in medicine. N Engl J Med. 380(14):1347–1358. [DOI] [PubMed] [Google Scholar]
- Ramakrishnan VR, Mace JC, Soler ZM, Smith TL. 2017. Examination of high-antibiotic users in a multi-institutional cohort of chronic rhinosinusitis patients. Int Forum Allergy Rhinol. 7(4):343–351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rombaux P, Huart C, Levie P, Cingi C, Hummel T. 2016. Olfaction in chronic rhinosinusitis. Curr Allergy Asthma Rep. 16(5):41. [DOI] [PubMed] [Google Scholar]
- Rosenfeld RM, Piccirillo JF, Chandrasekhar SS, Brook I, Ashok Kumar K, Kramper M, Orlandi RR, Palmer JN, Patel ZM, Peters A, et al. . 2015. Clinical practice guideline (update): adult sinusitis. Otolaryngol Head Neck Surg. 152(2 Suppl):S1–S39. [DOI] [PubMed] [Google Scholar]
- Schlosser RJ, Smith TL, Mace JC, Alt J, Beswick DM, Mattos JL, Payne S, Ramakrishnan VR, Soler ZM. 2020. Factors driving olfactory loss in patients with chronic rhinosinusitis: a case control study. Int Forum Allergy Rhinol. 10(1):7–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senior BA, Glaze C, Benninger MS. 2001. Use of the Rhinosinusitis Disability Index (RSDI) in rhinologic disease. Am J Rhinol. 15(1):15–20. [DOI] [PubMed] [Google Scholar]
- Settipane GA. 1996. Nasal polyps and immunoglobulin E (IgE). Allergy Asthma Proc. 17(5):269–273. [DOI] [PubMed] [Google Scholar]
- Simopoulos E, Katotomichelakis M, Gouveris H, Tripsianis G, Livaditis M, Danielides V. 2012. Olfaction-associated quality of life in chronic rhinosinusitis: adaptation and validation of an olfaction-specific questionnaire. Laryngoscope. 122(7):1450–1454. [DOI] [PubMed] [Google Scholar]
- Smith TL, Kern R, Palmer JN, Schlosser R, Chandra RK, Chiu AG, Conley D, Mace JC, Fu RF, Stankiewicz J. 2013. Medical therapy vs surgery for chronic rhinosinusitis: a prospective, multi-institutional study with 1-year follow-up. Int Forum Allergy Rhinol. 3(1):4–9. [DOI] [PubMed] [Google Scholar]
- Soler ZM, Kohli P, Storck KA, Schlosser RJ. 2016a. Olfactory impairment in chronic rhinosinusitis using threshold, discrimination, and identification scores. Chem Senses. 41(9):713–719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soler ZM, Mace JC, Litvack JR, Smith TL. 2012. Chronic rhinosinusitis, race, and ethnicity. Am J Rhinol Allergy. 26(2):110–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soler ZM, Smith TL, Alt JA, Ramakrishnan VR, Mace JC, Schlosser RJ. 2016b. Olfactory-specific quality of life outcomes after endoscopic sinus surgery. Int Forum Allergy Rhinol. 6(4):407–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strobl C, Boulesteix AL, Zeileis A, Hothorn T. 2007. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics. 8:25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strobl C, Malley J, Tutz G. 2009. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol Methods. 14(4):323–348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stekhoven DJ, Bühlmann P. 2012. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics. 28(1):112–118. [DOI] [PubMed] [Google Scholar]
- Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J Royal Stat Soc B (Methodol). 58(1):267–288. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.