Abstract
Objective
We sought to identify an abbreviated test of impaired olfaction amenable for use in busy clinical environments in prodromal (isolated REM sleep behavior disorder [iRBD]) and manifest Parkinson disease (PD).
Methods
Eight hundred ninety individuals with PD and 313 controls in the Discovery cohort study underwent Sniffin’ Stick odor identification assessment. Random forests were initially trained to distinguish individuals with poor (functional anosmia/hyposmia) and good (normosmia/super-smeller) smell ability using all 16 Sniffin’ Sticks. Models were retrained using the top 3 sticks ranked by order of predictor importance. One randomly selected 3-stick model was tested in a second independent PD dataset (n = 452) and in 2 iRBD datasets (Discovery n = 241, Marburg n = 37) before being compared to previously described abbreviated Sniffin’ Stick combinations.
Results
In differentiating poor from good smell ability, the overall area under the curve (AUC) value associated with the top 3 sticks (anise/licorice/banana) was 0.95 in the Development dataset (sensitivity 90%, specificity 92%, positive predictive value 92%, negative predictive value 90%). Internal and external validation confirmed AUCs ≥0.90. The combination of the 3-stick model determined poor smell, and an RBD screening questionnaire score of ≥5 separated those with iRBD from controls with a sensitivity, specificity, positive predictive value, and negative predictive value of 65%, 100%, 100%, and 30%.
Conclusions
Our 3-Sniffin’-Stick model holds potential utility as a brief screening test in the stratification of individuals with PD and iRBD according to olfactory dysfunction.
Classification of Evidence
This study provides Class III evidence that a 3-Sniffin’-Stick model distinguishes individuals with poor and good smell ability and can be used to screen for individuals with iRBD.
Olfactory dysfunction is evident in up to 90% of individuals with early Parkinson disease (PD)1,2 and demonstrates concordance with dopaminergic deficit.3,4 PD stratification according to baseline poor sense of smell predicts individuals at greater risk of accelerated cognitive decline and dementia.5–10
The tendency of hyposmia to predate motoric PD by up to 20 years has led to its proposition as a prodromal marker.11 Among nonmotor markers, abnormal olfaction (hazard ratio 2.62) most strongly predicts disease conversion in isolated REM sleep behavior disorder (iRBD)12; an additional age cutoff of ≥55 years is suggested to further improve selection for future neuroprotective trials.13
Rates of subjectively reported and objectively tested hyposmia can differ by upward of 20%.14 The Sniffin’ Sticks test15 and the University of Pennsylvania Smell Identification Test16 are 2 popular tests of olfaction. The former benefits from relative cost-effectiveness due to multiuser presentation of felt-tip style pens; the latter involves single-use scratch cards. The Sniffin’ Sticks test has been studied extensively and validated across different populations.17–20
The existence of a PD-specific pattern of smell loss remains contentious,21,22 yet the need for an abbreviated smell test is well acknowledged.23–28 Here, our aim was to derive an abbreviated Sniffin' Stick test to identify individuals with a poor sense of smell, rather than using Sniffin’ Stick answers to distinguish disease groups as in previous studies.21,23,24,26,28–32 Our test is validated using 1 independent PD and 2 independent iRBD datasets, and its combination with the REM Sleep Behavior Disorder Screening Questionnaire (RBDSQ)33 in screening for iRBD is explored.
Methods
Primary Research Questions
Can an abbreviated Sniffin’ Stick test identify individuals with a poor sense of smell (Class III evidence)? Can an abbreviated Sniffin’ Stick test be used to screen for individuals with iRBD (Class III evidence)?
Participants
Development Dataset
The Development dataset comprised data collected from 890 participants with idiopathic PD and 313 controls (age matched, without a personal or family history of PD or a related condition; the spouses and friends of participants with PD) who were enrolled in the longitudinal Oxford Discovery Cohort Study as previously described.2,34
Mutually Exclusive, Independent Validation Datasets
Three mutually exclusive, independent validation datasets comprised (1) 241 participants with iRBD enrolled in the Oxford Discovery Cohort Study, (2) 452 participants with PD in the Tracking cohort study, and (3) 37 participants with iRBD recruited by the Department of Neurology, University of Marburg, Germany.
Longitudinal Control Dataset
Data from 40 participants collected longitudinally were treated as independent of baseline data contributed by the same participants to the development dataset.
Both the Oxford Discovery and Tracking PD cohorts are based within the United Kingdom and recruited nonoverlapping participants with PD who fulfilled the United Kingdom PD Brain Bank criteria for probable PD within 3.5 years of diagnosis.35,36 The 2 cohorts share many similarities and have been used to validate each other's findings.36,37 Inclusion of individuals with PD was contingent on trained researchers attributing a probability of PD of at least 90% at the latest clinic visit. All participants with iRBD had had a polysomnogram confirming their clinically suspected diagnosis of iRBD, in line with International Classification of Sleep Disorders criteria.38
Standard Protocol Approvals, Registrations, and Patient Consents
Studies were prospectively approved by the local research ethics committee, and all participants provided written informed consent before any study-related procedures.
Assessments
All participants were seen face to face in clinic at baseline. At each in-person clinic visit, smell was assessed with the 16 Sniffin’ Sticks odor identification test (Burghart Instruments, Wedel, Germany) in which 16 felt-tip pens, each containing an odor, were presented in turn by being held 2 cm centered in front of both nostrils, with participants choosing from 1 of 4 options provided.15,39 Sniffin’ Sticks were stored at room temperature out of direct sunlight, in accordance with the manufacturer’s instructions, and their replacement was directed by the best-before date displayed on each stick.
Definitions
Normative data from 9,139 healthy individuals were used to define age- and sex-specific percentiles for the total number of Sniffin’ Sticks correctly identified.39 With the application of previously described classification criteria, functional anosmia, a somewhat discordant entity, was the label attached to a score of ≤8 sticks and represented the limit that 90% of patients with anosmia would correctly identify.40 Individuals not already classified as having functional anosmia but with total Sniffin’ Stick scores below the 10th percentile for their age and sex were classified as having hyposmia. Similarly, those with scores above the 90th percentile were classified as having supersmell, and the remaining individuals were classified as normosmic.39
Analyses
Analyses were performed with MATLAB software (R2018a; MathWorks, Natick, MA). Data analyzed from the Oxford Discovery Study were collected between September 27, 2010, and May 28, 2019. Only complete sets of data were analyzed; incomplete data pertaining to 7 of 320 (2%) of controls, 14 of 255 (5%) of individuals with RBD, and 39 of 929 (4%) of individuals with PD were excluded from analysis (figure 1).
Figure 1. Flowcharts Demonstrating the Data Used to Train Poor Smell/Good Smell Models.
PD = Parkinson disease.
Developing the PD Poor Smell Model Using the Development Dataset
With the aim of developing a model capable of identifying individuals with a poor sense of smell, independent of disease etiology, data from individuals at extremes were used (see Objective 6 for the effect of different data combinations). A PD poor smell group was formed of participants with PD and hyposmia or functional anosmia (n = 721), and a control good smell group was formed of controls with normosmia or supersmell (n = 267). Only baseline data were used to ensure that each participant contributed a single set of data, thus ensuring the equal importance of each participant during modeling.
There was a discrepancy between the number of individuals in the PD poor smell group and the control good smell group, resulting in an imbalanced dataset and leading to the potential for a higher priority to be given to the majority class compared to the minority class during model training.41 Before training machine learning algorithms (MLAs), the data were thus balanced by random undersampling of the majority class.42 Specifically, a number of participants equal to that in the control good smell group were randomly selected from the PD poor smell group. Having formed a balanced group with a 1:1 ratio of PD poor smell to control good smell, leave 1 participant out cross-validation was performed to assess the generalizability of the trained model to previously unseen data. Random forests, a commonly used MLA,43 were trained using all of the data in the balanced group with the exception of the data from 1 individual on whom the accuracy of the trained model was tested. The process was repeated for each individual within the balanced group, resulting in a total of 534 predictions, that is, 1 prediction (PD poor smell or control good smell) for each participant that belonged to the validation data. The above process was further repeated for 10 different randomly formed balanced groups in total; resulting in a total of 5,340 models, which were then used to calculate the area under the curve (AUC) values. Further methodologic details are available from Dryad (additional Methods, doi.org/10.5061/dryad.x3ffbg7gx).
For improved clarity, we detail below our key research objectives and the methodologic approaches used to address each objective. Results are subsequently reported with the same numbering system, that is, Result 1 (Results section) refers to Objective 1 (Methods section).
Objective 1: To Determine the Relative Importance of Individual Sniffin’ Sticks in the Identification of Poor Smell in the Development Dataset
Using the individual answers to all 16 sticks, each trained random forest classifier comprised 500 trees. The predictor importance of each stick (as derived from the random forest algorithm using the Gini diversity index criteria for binary splitting) was averaged across all the trained models to rank the sticks in descending order of importance.
Objective 2: To Assess the Classification Accuracy Associated With Different Numbers of Sniffin’ Sticks Used in the Development Dataset
Five thousand three hundred forty models were further trained across another 10 different randomly formed balanced groups for each incremental number of Sniffin’ Sticks, cumulated in order of descending MLA-identified predictor importance. AUC values were calculated using only the validation sample to evaluate overall model accuracy.
In creating an abbreviated smell test, it was necessary to reach a compromise between the number of sticks administered as part of the test and the associated AUC value. A minimal number of 3 sticks was selected to make up the abbreviated test on the basis that the improvement in AUC from the addition of 2 sticks to that of a single stick was >3 times the improvement in AUC associated with the cumulative addition of all other 13 sticks. A 3-stick model also permitted direct comparison with other previously described 3-stick abbreviated smell tests.
Objective 3: To Validate the Use of a 3-Sniffin’-Stick Model for the Detection of Poor Smell Using 3 Independent PD and iRBD Validation Datasets
One 3-stick PD poor smell/control good smell model, henceforth referred to as the 3-Sniffin’-Stick model, was randomly selected from the 5,340 models trained using the same top 3 sticks and the Development dataset. Providing internal validation, its accuracy in identifying poor smell/good smell was assessed with the use of baseline olfactory testing data from participants with iRBD in the Oxford Discovery cohort. The accuracy of the 3-Sniffin’-Stick model was furthermore evaluated externally in the independent Tracking PD cohort and in a second independent iRBD cohort (Marburg).
Objective 4: To Compare the Accuracy of the 3-Sniffin’-Stick Model in the Detection of Poor Smell With Other Previously Published Stick Combinations in the Development Dataset
The accuracy of the 3-Sniffin’-Stick model, using the 3 sticks identified through random forests as having the highest predictor importance for the detection of poor smell, was compared with (1) the 3 sticks previously proposed to constitute the Q stick test,27 (2) the Brief Sniffin' Stick test previously investigated as a screening test for poor smell,25 and (3) stick combinations suggested to distinguish individuals with PD from controls.26,29,24 (data available from Dryad [table e-1, doi.org/10.5061/dryad.x3ffbg7gx]).
Objective 5: To Compare the Accuracy of the 3-Sniffin’-Stick Model in the Detection of Poor Smell With All Other Possible 3-Stick Combinations Using a Composite Validation Dataset Comprising Discovery iRBD, Marburg iRBD, and Tracking PD Datasets
There are a total of 560 unique 3-stick combinations of the 16 Sniffin’ Sticks. To compare their accuracy at detecting poor smell, with the use of baseline data from the Development dataset, a single balanced group was formed comprising an equal number of randomly selected individuals from the PD poor smell group and the control good smell group. One model was trained using data from the balanced group for each 3-stick combination in turn, and the accuracy of each model in distinguishing poor smell from good smell was assessed with an independent composite validation set formed by combining the Discovery iRBD, Marburg iRBD, and Tracking PD datasets. The process of model training and validation was undertaken 10 times with randomly balanced training groups resulting in a total of 5,600 trained models. The AUC values (computed on the test dataset) associated with our 3-Sniffin’-Stick model were compared individually against those associated with all other possible 3-stick combinations with pairwise t tests.
Objective 6: To Evaluate the Effect of Training Dataset Composition on 3-Sniffin’-Stick Model Accuracy
The aforementioned models were trained using data from individuals at extremes: individuals with PD and poor smell and controls with good smell. To compare the effects of training dataset composition, single randomly balanced groups of (1) individuals with PD with poor smell/controls with good smell, (2) controls with poor smell/good smell, and (3) individuals with PD with poor smell/good smell were used to train models, which were then tested according to their ability to predict poor smell in individuals with iRBD. Kolmogorov-Smirnov tests were used to compare the resultant AUC distributions.
Objective 7: To Investigate a Staged Screening Model to Distinguish Individuals With iRBD From Controls
Using baseline iRBD data and longitudinal control data, we assessed the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of age ≥55 years, RBDSQ score ≥5, and PD poor smell as predicted by the 3-stick model, in isolation and combination, in differentiating between individuals with iRBD and controls.
Data Availability
Deidentified participant data relating to the Oxford Discovery Cohort may be requested by means of a formal application to the Oxford Parkinson's Disease Centre Data Access Committee by any qualified investigator. The application form, protocol, and terms and conditions may be found at the website opdc.medsci.ox.ac.uk/external-collaborations.
Results
Demographics
Baseline characteristics, including smell status, are shown in table 1. No difference in sex was observed within disease groups. However, participants with PD in the Tracking cohort were on average older and had a longer disease duration at the time of their smell assessment. While participants in Tracking had lower Movement Disorder Society Unified Parkinson's Disease Rating Scale part 3 scores, motor assessments predated smell assessments by up to 6 months and were absent in 10%. Marburg participants were similar in age and subjective smell status to those in Discovery but had a longer disease duration at the time of their smell assessment. Self-reported poor smell concurred with poor smell on objective testing in 31% of controls, 68% of individuals with iRBD, and 66% of those with PD. Within disease groups, there was no significant difference in smell status on objective testing between the Discovery, Tracking, and Marburg cohorts. Across Discovery groups, individuals with PD and iRBD were more likely to have poor smell than controls (p < 0.001), but there was no difference between PD and iRBD disease groups (p = 0.67). We found no evidence of a difference in the total Sniffin’ Stick score of individuals with PD tested at 0 to 3 months after the start of the study and those tested at 9 to 12 months (p = 0.26) or 15 to 18 months (p = 0.10) to suggest a change in Sniffin’ Stick performance over time due to a gradual decline in odor intensity.
Table 1.
Baseline Demographics of Participants in the Discovery Study Alongside the Independent Tracking and Marburg Datasets
Result 1: Anise, Licorice, and Banana Were the 3 Sticks With the Greatest Importance in Predicting Poor Smell
Independent of disease group, Sniffin’ Sticks differed in their rates of correct identification (figure 2 and data available from Dryad [table e-2, doi.org/10.5061/dryad.x3ffbg7gx]). Independent of smell status, orange, peppermint, and fish were most frequently (poor smell ≥55%, good smell ≥90%, overall ≥70%) identified correctly. Overall, lemon, turpentine, and apple had the highest rates of misidentification. Within individuals of the same smell status (either poor or good sense of smell), there were significant (p < 0.01) differences in the rates of correct Sniffin’ Stick identification between those with PD and controls (data available from Dryad [table e-2]). Anise, licorice, and banana demonstrated the greatest differences in rates of correct identification (p < 0.001) (data available from Dryad [table e-2]) and were also the top 3 MLA-identified sticks across 5,340 trained models (figure 3A).
Figure 2. Spider Web Plot Demonstrating the Proportion (Percent) of Individuals in Each Group Who Correctly Identified Each Sniffin’ Stick.
Each Sniffin’ Stick is represented by a line radiating out from the center of the plot, with points at the maximal radius indicating 100% correct identification. PD = Parkinson disease; RBD = REM sleep behavior disorder.
Figure 3. Individual Sniffin' Stick Importance and the Cumulative Effect on Prediction Accuracy.
(A) The average predictor importance of each Sniffin' Stick across 5,340 PD poor smell/control good smell models. Predictor importance is derived from the random forest algorithm using the Gini diversity index criteria for binary splitting. (B) Area under the curve (AUC) associated with incremental Sniffin’ Stick combinations compared to those previously described. (A) Predictor importance is derived from the random forest algorithm using the Gini diversity index criteria for binary splitting. (B) Blue line with unfilled dots indicates Discovery trained PD poor smell/control good smell models; pink filled-in dot, Q stick Hummel et al.27 (3-stick) combination; green asterisk, Casjens et al.26 (3-stick) combination; orange square, Boesveldt et al.29 (3-stick) combination; purple pentagon, Mueller et al.25 Brief Smell Identification Test (5-stick) combination; and magenta cross, Mahlknecht et al.24 S8 (8-stick) combination.
Result 2: Classification Accuracy Improved With Incremental Stick Number
The effect on classification accuracy of incremental cumulative Sniffin’ Stick number is shown in figure 3B. The improvement in AUC from the addition of 2 sticks to that of a single stick was >3 times the improvement in AUC associated with the cumulative addition of all other 13 sticks.
Result 3: An Overall AUC ≥0.90 in Distinguishing Poor Smell From Good Smell Using the Top 3 Sniffin’ Sticks Was Replicated in a Total of 3 Different Independent PD and iRBD Cohorts
The AUC of 0.95 in the Development dataset equated to a sensitivity, specificity, PPV, and NPV of 90%, 92%, 92%, and 90%, respectively, assuming a probability threshold set at 0.5 (table 2). A single randomly selected MLA-identified top 3 Sniffin’ Stick model distinguished poor smell from good smell in baseline iRBD data from individuals within the Discovery study with an AUC of 0.90. Separately, the AUC was 0.90 in the independent Tracking (PD) cohort and 0.95 in the Marburg (iRBD) cohort (table 2).
Table 2.
Comparison of AUC Values for MLA-Trained 3-Sniffin'-Stick Models in the Detection of Hyposmia or Functional Anosmia
Result 4: The Top 3 MLA-Identified Sticks (Anise, Licorice, and Banana) Outperformed Previously Described Stick Combinations
For the detection of poor smell, the Q stick combination (cloves, coffee, rose)27 was outperformed when either an absolute cutoff of ≤2 correctly identified sticks was used or when trained MLAs that used the individual answer for each stick were applied (of 4 possible options) (figure 4). Other stick combinations described by Casjens et al.,26 Boseveldt et al.,29 and Muller et al.25 were similarly outperformed (figure 3B).
Figure 4. Comparative Accuracies Associated With Q Stick / MLA Identified 3 Stick Combinations.
Petal plots demonstrating the sensitivity, specificity, PPV, and NPV associated with using an absolute score of ≤2 of 3 sticks (A, B, E, F) or a trained MLA that takes into account each individual option for each stick (C, D, G, H) with sticks being chosen empirically (A, C, E, G) or by MLA predictor importance ranking (B, D, F, H).The larger the petal size, the greater the accuracy, with the tip of each petal indicating the percentage accuracy for each summary measure. Accuracy values displayed for the trained machine learning algorithm (MLA) relate to a cutoff probability threshold of 0.5. NPV = negative predictive value; PD = Parkinson disease; PPV = positive predictive value; RBD = REM sleep behavior disorder.
Result 5: No Other 3 Stick Combination Statistically Outperformed the Top 3 MLA-Identified Sticks
When validated against the composite validation dataset (combining Discovery iRBD, Marburg iRBD, and Tracking PD datasets), the AUC associated with our top 3-stick combination was 0.90 (0.89–0.91) compared to 0.68 (0.67–0.70) for the worst predicted 3-stick combination (turpentine, garlic, and apple). No other 3-stick combination was statistically better than our 3-stick (anise, licorice, and banana) combination in detecting poor smell (p > 0.05). There was a significant difference (p < 0.001) between the 3-stick combinations that included at least 1 of anise, licorice, or banana compared to those that did not include any of the 3 aforementioned sticks.
Result 6: Models Trained Using a Combination of PD Poor Smell/Control Good Smell Were Better at Identifying Poor Smell in Individuals With iRBD Than Those Trained Using PD Poor Smell/PD Good Smell or Control Poor Smell/Control Good Smell Data
The overall AUC value (for distinguishing poor smell from good smell) across 5,340 trained models using the same top 3 MLA-identified sticks and data from individuals with PD and poor smell and controls with good smell was 0.95, which is greater than that associated with models trained using control poor smell/control good smell data (0.81) or models trained using PD poor smell/PD good smell data (0.86). When models were trained using all 560 possible 3-stick combinations, higher AUC values were again obtained when data from individuals with PD poor smell/control good smell were used as opposed to control poor smell/control good smell or PD poor smell/PD good smell (p < 0.0001) (data available from Dryad [figure e-1, doi.org/10.5061/dryad.x3ffbg7gx]).
Result 7: A 2-Step Screening Test Comprising an RBDSQ Score ≥5 and Poor Smell as Predicted With the 3-Sniffin'-Stick Model Distinguished Individuals With iRBD From Controls With a Sensitivity of 65%, Specificity of 100%, PPV of 100%, and NPV of 30%
While the sensitivity of the 2-step screening test was lower than that associated with using age ≥55 years, an RBDSQ score ≥5, or 3-Sniffin’-Stick model–predicted poor smell, alone or in combination, the 2-step screening test was associated with a higher specificity and PPV; its accuracy values did not benefit further from the addition of age ≥55 years (table 3). However, the PPV is a function of the background prevalence of iRBD that is unrealistically high in our sample due to its design. If one takes the prevalence of 1.06% from a community survey in Switzerland44 and factors in the uncertainty around our specificity (95% lower confidence interval 90% and substituting 99% for the upper confidence interval of 100%, because a specificity of 100% will always yield a PPV of 100%), then the PPV could range from 7% to 41% if the specificity ranges from 90% to 99%, compared to a PPV range of 3%–12% when using the RBDSQ alone.
Table 3.
RBD/Control Detection Accuracies Associated With Age ≥55 Years,a RBDSQ Score ≥5,b and MLA-Identified 3-Sniffin’-Stick–Predicted Hyposmia/Functional Anosmia,c Alone and in Combination
Discussion
We describe the application of a 3-Sniffin’-Stick test to a total of 1933 individuals with iRBD, individuals with PD, and controls; to the best of our knowledge, this is the largest study of its kind. Poor sense of smell on objective testing was evident in 80% of individuals with PD, 72% of individuals with iRBD, and 15% of controls, surpassing rates of self-reported poor sense of smell of 61%, 58%, and 11% respectively. The 3 sticks with the highest (orange, peppermint, and fish)18 and lowest (lemon, turpentine, and apple)24 rates of correct identification across groups matched those reported previously.
Our aim was to develop an abbreviated Sniffin’ Stick test for the detection of impaired olfaction in prodromal and manifest PD, with the assumption that individuals with iRBD, prodromal to PD, lie on a continuum between controls and those with PD. We did not seek to distinguish individuals with PD from controls on the basis of their sense of smell alone; rather to identify those who had a poor sense of smell based on normative data specific to age and sex. Models were therefore trained with smell data at either extreme: data from controls with good smell (normosmia and supersmellers) and data from individuals with PD and poor smell (hyposmia and functional anosmia). When tested on independent data from individuals with iRBD, models trained using PD poor smell/control good smell data were better at identifying poor smell in individuals with iRBD than those trained using control good smell/control poor smell data or PD good smell/PD poor smell data, suggesting a pattern of smell loss that is also recapitulated in individuals with iRBD who have prodromal parkinsonism (data available from Dryad [figure e-1, doi.org/10.5061/dryad.x3ffbg7gx]).
These findings are consistent with previous studies in which we and others have shown that individuals with iRBD have a nonmotor profile comparable to those with PD, with similarities in cognition, depression, anxiety, apathy, impulsive compulsive behaviors, sleep, and autonomic dysfunction.2,12,34,45–48 Furthermore, individuals with iRBD show subtle motor dysfunction compared to age- and sex-matched controls, which is less than that required to meet the diagnosis of PD.34 This emerging wealth of phenotypic data suggest that the vast majority of individuals with iRBD already manifest motor and nonmotor features of prodromal parkinsonism, sitting on 1 end of a disease spectrum, on a continuum with established PD.
The cutoff points used to categorize olfactory performance were based on normative data stratified by age and sex, so our models were not adjusted for either of the aforementioned factors. Instead, we used the raw answers provided for each stick by each participant, which may in part explain the less-than-perfect 16-stick AUC of 0.99 calculated with validation data excluded from the training process. Nonetheless, despite their exclusion as model input variables, the subgroup performance of the top 3-stick trained models by age and sex was largely equivalent (data available from Dryad [table e-3, doi.org/10.5061/dryad.x3ffbg7gx]); the exception was for the 31- to 40-year age group, for which the associated dataset was small.
When sticks were ranked in descending order, the sharp reduction in predictor importance seen after the top 3 sticks (anise, licorice, and banana) translated into a relative reduction in AUC improvement with models using ≥4 sticks (figure 3). Cumulative incorporation of Sniffin’ Sticks into models (figure 3B) according to their data-driven ranking resulted in AUCs that outperformed previously reported 3- and 5-Sniffin’-Stick combinations.25–27,29 The Casjens et al.26 and Boesveldt et al.29 3-stick combinations yielded AUCs that were slightly lower than those associated with our top 3 MLA-identified sticks, which is perhaps unsurprising given that anise (the stick with the highest MLA-identified predictor importance) was common to all 3 top 3-stick combinations and the Boesveldt et al. combination additionally had licorice in common (data available from Dryad [table e-1, doi.org/10.5061/dryad.x3ffbg7gx]). Indeed, the S8 (8-stick) combination described by Mahlknecht et al.,24 which had 5 of 8 sticks in common, including the top 3 sticks, was associated with an AUC virtually identical to that of the top 8 MLA-identified sticks. Furthermore, in a comparison of the top 3 MLA-identified sticks to all other 559 possible 3-stick combinations (including those previously described) through the application of trained models to the independent composite validation dataset, no other 3-stick combination was statistically better than our 3-stick combination in detecting poor smell (p > 0.05).
Having derived a top 3 MLA-identified stick test of poor smell, we assessed its value both alone and in combination with age ≥55 years and RBDSQ score ≥5 in screening for individuals with iRBD. In keeping with the main body of scientific literature, as a single test, the RBDSQ was associated with excellent sensitivity.49 However, the combination of RBDSQ score ≥5 and poor smell identified from 3-stick testing yielded 100% specificity, although the lower 95% confidence interval was equal to 90%. Depending on the true specificity and community prevalence of iRBD, the PPV could range widely from 7% to 41% (compared to 3%–12% when RBDSQ score alone was used), but these figures, if replicated, suggest a potential 2-pronged screening test for community cases of subclinical iRBD that could be used to facilitate large-scale screening of individuals as part of clinical trials for prodromal PD. The addition of age ≥55 years did not result in any further improvement in accuracy values. Individuals with polysomnographically confirmed iRBD who did not fulfill both criteria (RBDSQ score ≥5 and poor smell on 3-Sniffin’-Stick test) were younger (mean [SD] age 61.7 [10.4] years vs 66.3 [7.5] years, p < 0.001) and had a lower Movement Disorder Society Unified Parkinson's Disease Rating Scale Part III score (mean [SD] 5.5 [5.3] vs 3.8 [3.2], p = 0.01) compared to those who met both criteria. There was no difference in sex (p = 0.10) between the 2 subgroups. Although not created to detect disease per se, the 3-stick model trained to detect individuals with poor smell, when used in isolation, distinguished participants with iRBD from controls with a sensitivity of 67% and a specificity of 80%, comparing favorably to respective values of 56% and 89% reported in the work by Huang et al.23 wherein the abbreviated 5-Sniffin’-Stick test was created with the specific intent of distinguishing disease groups.
One of the main weaknesses of this study and in the evaluation of the 3-stick test in the screening of iRBD was the absence of an independent control cohort. Accuracy values were therefore calculated from longitudinal control data that were treated as independent of the baseline control data. In addition, in keeping with iRBD studies worldwide, individuals with iRBD who had been recruited into the Discovery and Marburg studies were those who had originally presented to their clinician for evaluation of their sleep disturbance; thus, they may be expected to present with a more severe phenotype compared to individuals detected on population screening. Answers provided on the RBDSQ may also have been affected by foreknowledge of their iRBD diagnosis. Future work will apply our methods to derive a population of individuals with community-ascertained iRBD through the application of the 3-stick/RBDSQ test, acknowledging that the community prevalence of iRBD will affect the associated PPV and understanding the potential for regional variations to the largely German-derived normative data from which olfactory performance was categorized. Although the findings of our study are consistent across 3 large independent cohorts, future work will also explore its application to other international cohorts in which cultural and regional variations in exposure to anise and licorice, both similar in smell, may affect the generalizability of our results beyond the relatively homogeneous European cohorts described. A further interesting avenue of exploration includes the evaluation of longitudinal changes in the 3-stick test and its relation to clinical outcomes of interest.
We demonstrate that a 3-stick model comprising the Sniffin’ Sticks anise, licorice, and banana detects olfactory dysfunction with high levels of accuracy in individuals with PD and iRBD. Its ease of administration and relative cost-effectiveness support a role in screening for iRBD and in clinical phenotyping, where prognostication may be facilitated through standardized assessment of olfactory dysfunction.
Acknowledgment
The authors are grateful to participants in the Discovery, Tracking, and Marburg studies, without whom none of this work would have been possible. In addition, they thank members of the research teams at the various sites for their faithful work in clinical phenotyping.
Glossary
- AUC
area under the curve
- iRBD
isolated REM sleep behavior disorder
- MLA
machine learning algorithm
- NPV
negative predictive value
- PD
Parkinson disease
- PPV
positive predictive value
- RBDSQ
REM Sleep Behavior Disorder Screening Questionnaire
Appendix. Authors
Footnotes
Class of Evidence: NPub.org/coe
Study Funding
This study was funded by the Monument Trust Discovery Award from Parkinson's UK (J-1403) and supported by the Oxford National Institute for Health Research (NIHR) Biomedical Research Center (BRC). The views expressed are those of the authors and not necessarily those of the National Health Service, the NIHR, or the Department of Health.
Disclosure
Dr. Lo was funded by the Oxford NIHR BRC. Dr. Lo and Dr. Arora are coapplicants on a patent application related to smartphone predictions in PD (PCT/GB2019/052522) pending. Dr. Ben-Shlomo is Higher Education Funding Council for England funded and receives research support from the NIHR, MRC, and Gatsby Foundation. Dr. Barber was funded by a Wellcome Trust Doctoral Training Fellowship. Mr. Lawton was funded by Parkinson's UK through the Oxford Parkinson's Disease Centre. Dr. Klein acknowledges support from the NIHR Oxford Health Clinical Research Facility. Sofia Kanavou, Dr. Janzen, and Elisabeth Sittig report no disclosures. Dr. Oertel has served as a consultant on advisory boards or as a speaker at scientific symposia for Adamas, Bial Pharma, Desitin, Eisai, Roche, and UCB Pharma. He is supported by the Charitable Hertie Foundation and has received funding from the Deutsche Forschungsgemeinschaft, the Michael J. Fox Foundation, and the Parkinson Fonds Deutschland. Dr. Grosset has received honoraria from UCB Pharma, Acorda, and BIAL Pharma. Dr. Grosset serves as a consultant for GE Healthcare and Merz Pharma and reports funding from Michael's Movers, Parkinson's France, and Parkinson's UK. Dr. Hu has received funding and support from Parkinson's UK, Oxford NIHR BRC, University of Oxford, NIHR, Lab10X, Michael J. Fox Foundation, H2020 European Union, GE Healthcare, and the PSP Association. She also received payment for Advisory Board attendance/consultancy for Biogen, Roche, CuraSen Therapeutics, Evidera, and Manus Neurodynamica. Dr. Hu is a coapplicant on a patent application related to smartphone predictions in PD (PCT/GB2019/052522) pending. Go to Neurology.org/N for full disclosures.
References
- 1.Doty RL. Olfactory dysfunction in Parkinson disease. Nat Rev Neurol 2012;8:329–339. [DOI] [PubMed] [Google Scholar]
- 2.Baig F, Lawton M, Rolinski M, et al. Delineating nonmotor symptoms in early Parkinson's disease and first-degree relatives. Mov Disord 2015;30:1759–1766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Morley JF, Cheng G, Dubroff JG, Wood S, Wilkinson JR, Duda JE. Olfactory impairment predicts underlying dopaminergic deficit in presumed drug-induced parkinsonism. Mov Disord Clin Pract 2017;4:603–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yang HJ, Kim YE, Yun JY, Ehm G, Kim HJ, Jeon BS. Comparison of sleep and other non-motor symptoms between SWEDDs patients and de novo Parkinson's disease patients. Parkinson Relat Disord 2014;20:1419–1422. [DOI] [PubMed] [Google Scholar]
- 5.Domellof ME, Lundin KF, Edstrom M, Forsgren L. Olfactory dysfunction and dementia in newly diagnosed patients with Parkinson's disease. Parkinson Relat Disord 2017;38:41–47. [DOI] [PubMed] [Google Scholar]
- 6.Fullard ME, Tran B, Xie SX, et al. Olfactory impairment predicts cognitive decline in early Parkinson's disease. Parkinson Relat Disord 2016;25:45–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Schrag A, Siddiqui UF, Anastasiou Z, Weintraub D, Schott JM. Clinical variables and biomarkers in prediction of cognitive impairment in patients with newly diagnosed Parkinson's disease: a cohort study. Lancet Neurol 2017;16:66–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ham JH, Lee JJ, Sunwoo MK, Hong JY, Sohn YH, Lee PH. Effect of olfactory impairment and white matter hyperintensities on cognition in Parkinson's disease. Parkinson Relat Disord 2016;24:95–99. [DOI] [PubMed] [Google Scholar]
- 9.Gjerde KV, Muller B, Skeie GO, Assmus J, Alves G, Tysnes OB. Hyposmia in a simple smell test is associated with accelerated cognitive decline in early Parkinson's disease. Acta Neurol Scand 2018;138:508–514. [DOI] [PubMed] [Google Scholar]
- 10.Kang SH, Lee HM, Seo WK, Kim JH, Koh SB. The combined effect of REM sleep behavior disorder and hyposmia on cognition and motor phenotype in Parkinson's disease. J Neurol Sci 2016;368:374–378. [DOI] [PubMed] [Google Scholar]
- 11.Heinzel S, Berg D, Gasser T, Chen H, Yao C, Postuma RB. Update of the MDS research criteria for prodromal Parkinson's disease. Mov Disord 2019;34:1464–1470. [DOI] [PubMed] [Google Scholar]
- 12.Postuma RB, Iranzo A, Hu M, et al. Risk and predictors of dementia and parkinsonism in idiopathic REM sleep behaviour disorder: a multicentre study. Brain 2019;142:744–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Postuma RB, Gagnon JF, Bertrand JA, Genier Marchand D, Montplaisir JY. Parkinson risk in idiopathic REM sleep behavior disorder: preparing for neuroprotective trials. Neurology 2015;84:1104–1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Shill HA, Hentz JG, Caviness JN, et al. Unawareness of hyposmia in elderly people with and without Parkinson's disease. Mov Disord Clin Pract 2016;3:43–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hummel T, Sekinger B, Wolf SR, Pauli E, Kobal G. ‘Sniffin' Sticks': olfactory performance assessed by the combined testing of odor identification, odor discrimination and olfactory threshold. Chem Senses 1997;22:39–52. [DOI] [PubMed] [Google Scholar]
- 16.Doty RL, Shaman P, Kimmelman CP, Dann MS. University of Pennsylvania Smell Identification Test: a rapid quantitative olfactory function test for the clinic. Laryngoscope 1984;94:176–178. [DOI] [PubMed] [Google Scholar]
- 17.Haehner A, Mayer AM, Landis BN, et al. High test-retest reliability of the extended version of the “Sniffin' Sticks” test. Chem Senses 2009;34:705–711. [DOI] [PubMed] [Google Scholar]
- 18.Hummel T, Konnerth CG, Rosenheim K, Kobal G. Screening of olfactory function with a four-minute odor identification test: reliability, normative data, and investigations in patients with olfactory loss. Ann Otol Rhinol Laryngol 2001;110:976–981. [DOI] [PubMed] [Google Scholar]
- 19.Lawton M, Hu MT, Baig F, et al. Equating scores of the university of Pennsylvania smell identification test and Sniffin' Sticks test in patients with Parkinson's disease. Parkinsonism Relat Disord 2016;33:96–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Boesveldt S, Verbaan D, Knol DL, van Hilten JJ, Berendse HW. Odour identification and discrimination in Dutch adults over 45 years. Rhinology 2008;46:131–136. [PubMed] [Google Scholar]
- 21.Morley JF, Cohen A, Silveira-Moriyama L, et al. Optimizing olfactory testing for the diagnosis of Parkinson's disease: item analysis of the University of Pennsylvania Smell Identification Test. NPJ Parkinson's Dis 2018;4:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Crespo Cuevas AM, Ispierto L, Vilas D, et al. Distinctive olfactory pattern in Parkinson's disease and non-neurodegenerative causes of hyposmia. Neurodegener Dis 2018;18:143–149. [DOI] [PubMed] [Google Scholar]
- 23.Huang SF, Chen K, Wu JJ, et al. Odor identification test in idiopathic REM-behavior disorder and Parkinson's disease in China. PLoS One 2016;11:e0160199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mahlknecht P, Pechlaner R, Boesveldt S, et al. Optimizing odor identification testing as quick and accurate diagnostic tool for Parkinson's disease. Mov Disord 2016;31:1408–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mueller C, Renner B. A new procedure for the short screening of olfactory function using five items from the “Sniffin' Sticks” identification test kit. Am J Rhinol 2006;20:113–116. [PubMed] [Google Scholar]
- 26.Casjens S, Eckert A, Woitalla D, et al. Diagnostic value of the impairment of olfaction in Parkinson's disease. PLoS One 2013;8:e64735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hummel T, Pfetzing U, Lotsch J. A short olfactory test based on the identification of three odors. J Neurol 2010;257:1316–1321. [DOI] [PubMed] [Google Scholar]
- 28.Campabadal A, Segura B, Junque C, et al. Comparing the accuracy and neuroanatomical correlates of the UPSIT-40 and the Sniffin' Sticks test in REM sleep behavior disorder. Parkinson Relat Disord 2019;65:197–202. [DOI] [PubMed] [Google Scholar]
- 29.Boesveldt S, Verbaan D, Knol DL, et al. A comparative study of odor identification and odor discrimination deficits in Parkinson's disease. Mov Disord 2008;23:1984–1990. [DOI] [PubMed] [Google Scholar]
- 30.Miyamoto T, Miyamoto M, Iwanami M, Hirata K. Olfactory dysfunction in Japanese patients with idiopathic REM sleep behavior disorder: comparison of data using the University of Pennsylvania Smell Identification Test and Odor Stick Identification Test for Japanese. Mov Disord 2010;25:1524–1526. [DOI] [PubMed] [Google Scholar]
- 31.Miyamoto T, Miyamoto M, Iwanami M, et al. Olfactory dysfunction in idiopathic REM sleep behavior disorder. Sleep Med 2010;11:458–461. [DOI] [PubMed] [Google Scholar]
- 32.Krismer F, Pinter B, Mueller C, et al. Sniffing the diagnosis: olfactory testing in neurodegenerative parkinsonism. Parkinsoni Relat Disord 2017;35:36–41. [DOI] [PubMed] [Google Scholar]
- 33.Stiasny-Kolster K, Mayer G, Schafer S, Moller JC, Heinzel-Gutenbrunner M, Oertel WH. The REM Sleep Behavior Disorder Screening Questionnaire: a new diagnostic instrument. Mov Disord 2007;22:2386–2393. [DOI] [PubMed] [Google Scholar]
- 34.Barber TR, Lawton M, Rolinski M, et al. Prodromal parkinsonism and neurodegenerative risk stratification in REM sleep behaviour disorder. Sleep 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hughes AJ, Daniel SE, Kilford L, Lees AJ. Accuracy of clinical diagnosis of idiopathic Parkinson's disease: a clinico-pathological study of 100 cases. J Neurol Neurosurg Psychiatry 1992;55:181–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lawton M, Ben-Shlomo Y, May MT, et al. Developing and validating Parkinson's disease subtypes and their motor and cognitive progression. J Neurol Neurosurg Psychiatry 2018;89:1279–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Swallow DM, Lawton MA, Grosset KA, et al. Statins are underused in recent-onset Parkinson's disease with increased vascular risk: findings from the UK Tracking Parkinson's and Oxford Parkinson's Disease Centre (OPDC) discovery cohorts. J Neurol Neurosurg Psychiatry 2016;87:1183–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.International Classification of Sleep Disorders. Darien, IL: American Academy of Sleep Medicine; 2014. [Google Scholar]
- 39.Oleszkiewicz A, Schriever VA, Croy I, Hahner A, Hummel T. Updated Sniffin' Sticks normative data based on an extended sample of 9139 subjects. Eur Arch Otorhinolaryngol 2019;276:719–728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kobal G, Klimek L, Wolfensberger M, et al. Multicenter investigation of 1,036 subjects using a standardized method for the assessment of olfactory function combining tests of odor identification, odor discrimination, and olfactory thresholds. Eur Arch Otorhinolaryngol 2000;257:205–211. [DOI] [PubMed] [Google Scholar]
- 41.Chawla NV, Japkowicz N, Kotcz A. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor Newsl 2004;6:1–6. [Google Scholar]
- 42.Drummond C, Holte R. C4.5, class imbalance, and cost sensitivity: why under-sampling beats oversampling. In: Proceedings of the ICML'03 Workshop on Learning from Imbalanced Datasets. Darien, IL: American Academy of Sleep Medicine; 2003.
- 43.Breiman L. Random forests. Machine Learn 2001;45:5–32. [Google Scholar]
- 44.Haba-Rubio J, Frauscher B, Marques-Vidal P, et al. Prevalence and determinants of REM sleep behavior disorder in the general population. Sleep 2018;41:zsx197. [DOI] [PubMed] [Google Scholar]
- 45.Baig F, Kelly MJ, Lawton MA, et al. Impulse control disorders in Parkinson disease and RBD: a longitudinal study of severity. Neurology 2019;93:e675–e687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Baig F, Lawton MA, Rolinski M, et al. Personality and addictive behaviours in early Parkinson's disease and REM sleep behaviour disorder. Parkinsonism Relat Disord 2017;37:72–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Rolinski M, Zokaei N, Baig F, et al. Visual short-term memory deficits in REM sleep behaviour disorder mirror those in Parkinson's disease. Brain 2016;139:47–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Szewczyk-Krolikowski K, Tomlinson P, Nithi K, et al. The influence of age and gender on motor and non-motor features of early Parkinson's disease: initial findings from the Oxford Parkinson Disease Center (OPDC) discovery cohort. Parkinsonism Relat Disord 2014;20:99–105. [DOI] [PubMed] [Google Scholar]
- 49.Li K, Li SH, Su W, Chen HB. Diagnostic accuracy of REM sleep behaviour disorder screening questionnaire: a meta-analysis. Neurol Sci 2017;38:1039–1046. [DOI] [PubMed] [Google Scholar]
- 50.Goetz CG, Stebbins GT, Tilley BC. Calibration of Unified Parkinson's Disease Rating Scale scores to Movement Disorder Society-Unified Parkinson's Disease Rating Scale scores. Mov Disord 2012;27:1239–1242. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Deidentified participant data relating to the Oxford Discovery Cohort may be requested by means of a formal application to the Oxford Parkinson's Disease Centre Data Access Committee by any qualified investigator. The application form, protocol, and terms and conditions may be found at the website opdc.medsci.ox.ac.uk/external-collaborations.