Skip to main content
Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine logoLink to Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine
. 2012 Dec 15;8(6):701–711. doi: 10.5664/jcsm.2270

Evaluating Sleepiness-Related Daytime Function by Querying Wakefulness Inability and Fatigue: Sleepiness-Wakefulness Inability and Fatigue Test (SWIFT)

R Bart Sangal 1,
PMCID: PMC3501668  PMID: 23243405

Abstract

Study Objectives:

Routine assessment of daytime function in Sleep Medicine has focused on “tendency to fall asleep” in soporific circumstances, to the exclusion of “wakefulness inability” or inability to maintain wakefulness, and fatigue/tiredness/lack of energy. The objective was to establish reliability and discriminant validity of a test for wakefulness inability and fatigue, and to test its superiority against the criterion standard for evaluation of sleepiness—the Epworth Sleepiness Scale (ESS).

Methods:

A 12-item self-administered instrument, the Sleepiness-Wakefulness Inability and Fatigue Test (SWIFT), was developed and administered, with ESS, to 256 adults ≥ 18 years of age (44 retook the tests a month later); consecutive patients with symptoms of sleep disorders including 286 with obstructive sleep apnea ([OSA], apnea-hypopnea index ≥ 5/h sleep on polysomnography [PSG]), 49 evaluated with PSG and multiple sleep latency test for narcolepsy and 137 OSA patients treated with continuous positive airway pressure (CPAP).

Results:

SWIFT had internal consistency 0.87 and retest intraclass coefficient 0.82. Factor analysis revealed 2 factors—general wakefulness inability and fatigue (GWIF) and driving wakefulness inability and fatigue (DWIF). Normal subjects differed from patients in ESS, SWIFT, GWIF, and DWIF. SWIFT and GWIF (but not DWIF) had higher area under ROC curve, Youden's index, and better positive and negative likelihood ratios than ESS. ESS, SWIFT, GWIF, and DWIF improved with CPAP. Improvements in SWIFT, GWIF, and DWIF (but not ESS) were significantly correlated with CPAP compliance.

Conclusions:

SWIFT is reliable and valid. SWIFT and its factor GWIF have a discriminant ability superior to that of the ESS.

Citation:

Sangal RB. Evaluating sleepiness-related daytime function by querying wakefulness inability and fatigue: Sleepiness-Wakefulness Inability and Fatigue Test (SWIFT). J Clin Sleep Med 2012;8(6):701-711.

Keywords: Sleepiness, wakefulness inability, fatigue, obstructive sleep apnea, Epworth Sleepiness Scale


Sleepiness is a commonly reported public health problem. In a 2002 National Sleep Foundation (NSF) poll,1 7% reported sleepiness almost every day and another 9% a few days a week, for a total of 16%. Seventeen percent reported having dozed off while at the wheel of a vehicle, and 1% reported having an accident because they dozed off or were too tired (emphasis added). In other words, accidents were attributed to either dozing off or being tired. According to the National Highway Traffic Safety Administration (NHTSA),2 “NHTSA data indicate in recent years there have been about 56,000 crashes annually in which driver drowsiness/fatigue (emphasis added) was cited by police. Annual averages of roughly 40,000 nonfatal injuries and 1,550 fatalities result from these crashes.” Sleep apnea patients report not just daytime sleepiness, but also being tired, fatigued, or having a lack of energy, and these complaints may be more frequent than sleepiness in sleep apnea.3 Sleepiness (and its adverse effect of auto accidents) may be multifactorial, with elements of inability maintaining wakefulness when necessary or desired (wakefulness inability), tendency to doze off in soporific circumstances, and fatigue/tiredness/lack of energy. However, the specialty of Sleep Medicine has tended to focus rather exclusively on routine evaluation of daytime function by measuring sleepiness defined as tendency to fall asleep in soporific circumstances (using the multiple sleep latency test [MSLT]4 and the Epworth Sleepiness Scale [ESS]5,6), to the exclusion of wakefulness inability (difficulty maintaining wakefulness) and fatigue, in sleep disordered patients.

BRIEF SUMMARY

Current Knowledge/Study Rationale: There is not a questionnaire instrument to measure wakefulness inability or difficulty staying awake in situations where staying awake is desirable, or one that simultaneously addresses symptoms related to pathological sleepiness and to fatigue/tiredness/lack of energy in sleep disorders patients. Conceivably, what could be better than being able to fall asleep when one wants to (low MSLT [Mean Sleep Latency Test] and even high ESS [Epworth Sleepiness Scale]), but be able to stay awake when one wants to and feel refreshed (not tired) during the day?

Study Impact: A reliable and valid self-rating instrument (Sleepiness-Wakefulness Inability and Fatigue Test or SWIFT) was created and shown to be superior to the criterion standard for sleepiness (ESS) with regard to specificity, sensitivity and discriminate ability. It should be added to the ESS in evaluating daytime consequences of sleep disorders.

The MSLT defines sleepiness as the ability to fall asleep in a dark room when asked to do so. According to a review paper to establish Standards of Practice for the clinical use of the MSLT and MWT (Maintenance of Wakefulness Test), “the wide range in MSL makes it difficult to establish a specific threshold value for excessive sleepiness or to discriminate patients with sleep disorders from non-patients.”7 Further, “the MSL change between pre- and post-treatment for an individual is probably meaningful, although comparison of these data with the normative values is not helpful.”7 This may be related to the use of a behavior (the ability to fall asleep quickly when lying down in a dark room) that may be a desirable and adaptive trait rather than abnormal state. The MWT8,9 sought to correct this by asking subjects to try and stay awake in a dimly lit room. However, staying awake sitting in a dimly lit room doing nothing is not particularly advantageous (as opposed to staying awake when sitting in a dimly lit car and driving). The MSLT and MWT measure different abilities,10,11 and treatment may improve “wakefulness inability” (the MWT) more than ”sleep tendency” (MSLT).12 Both the MSLT and MWT require a large investment in time and resources.

There are self-rating questionnaire instruments that aim to measure sleepiness, the most commonly used being the ESS. The ESS queries for tendency to fall asleep in a variety of circumstances, often soporific. Sanford et al.13 have reported the distribution of the ESS. In their sample of normal subjects, median ESS was 7–8, and 30.7% of normal subjects without insomnia reported an ESS score ≥ 10, the widely used cutoff for abnormal excessive sleepiness. It has been shown that the ESS does not measure the same ability as the MWT.14,15 In patients who are severely sleepy on the MWT, the ESS was insensitive to the level of sleepiness as measured by the MWT. The ESS may16 or may not17 be correlated with the MSLT. A sample of 10,000 subjects with 71% response rate showed no correlation between the ESS and the adverse consequence of automobile accidents, although there was a correlation between dozing off while stopped in traffic (item 8 on ESS) and automobile accidents.18

Sleep disordered patients report fatigue, lack of energy, and tiredness in addition to sleepiness.3 Fatigue, tiredness and lack of energy are largely interchangeable terms, as suggested by Merriam-Webster Dictionary's19 definitions of fatigue as “weariness or exhaustion from labor, exertion, or stress”, and tired as “drained of strength and energy: fatigued often to the point of exhaustion.” Although these symptoms may be separable from sleepiness/wakefulness inability (wakeful being defined by Merriam-Webster Dictionary as “not sleeping or able to sleep”), it is not clear that sleep disordered patients, the general public and the NSF,1 or the police and the NHTSA,2 can separate these symptoms clearly.

Conceivably, what could be better than being able to fall asleep when one wants to (low MSLT and even high ESS), but be able to stay awake when one wants to and feel refreshed (not fatigued) during the day? Is the ability to fall asleep easily a pathological problem or an adaptive ability?20 This leads to the question of whether we should be querying instead the ability to stay awake when desired, along with fatigue.

There does not seem to be a questionnaire instrument to measure wakefulness inability or difficulty staying awake in situations where staying awake is desirable. Although fatigue inventories such as the 83-item Multidimensional Fatigue Inventory (MFI)21 and its 30-item short form (MFSI-sf)22 exist, there is no single short questionnaire that simultaneously addresses symptoms related to pathological sleepiness and to fatigue/tiredness/lack of energy in sleep disordered patients. Thus, these other domains of sleep disorder complaints are not routinely queried or measured.

The hypothesis was that a self-rating instrument for assessing wakefulness inability and fatigue can be created that is reliable (with good internal consistency and test-retest reliability in normal subjects) and valid (with good ability to discriminate between normal individuals and sleep disordered patients, and to show improvement with treatment of sleep disorders such as obstructive sleep apnea [OSA]), and that such a test incorporating wakefulness inability and fatigue is superior to the criterion standard for sleepiness (ESS) with regard to specificity, sensitivity, and discriminant ability.

METHODS

A 12-item questionnaire (Sleepiness-Wakefulness Inability and Fatigue Test, or SWIFT) was developed. Subscale A has 6 questions related to difficulty staying awake/wakefulness inability in different situations that might affect performance or cause adverse consequences; subscale B has 6 questions related to fatigue, tiredness or lack of energy in different situations that might affect performance or cause adverse consequences, all answered on a 4-level (scored 0–3) Likert scale. Items were prepared by the author based on apparent face validity, with the inclusion of more than one item related to driving. The SWIFT is shown in Table 1.

Table 1.

Sleepiness-Wakefulness Inability and Fatigue Test (SWIFT)

graphic file with name jcsm.8.6.701.t01.jpg

Normal Subjects

After obtaining approval from the Wayne State University Human Investigations Committee, adult subjects (age ≥ 18 years) were recruited over a period of 10 weeks by means of a group e-mail to medical students at Wayne State University as well as by personal solicitation of subjects in public places such as malls and parks. After reading an informational sheet, they were asked to fill out questionnaires seeking their gender, age, educational level, occupation, race, height, weight, medical/psychiatric problems, medicines taken, sleep habits, and presence or absence of sleep symptoms including snoring, observed or perceived apneic episodes in sleep, insomnia, fatigue, and sleepiness. They were also asked to complete the SWIFT and the ESS. Subjects willing to be contacted again in a month to retake the questionnaire were asked for contact information, and were contacted after a month to again complete the questionnaire.

A total of 403 subjects filled out the questionnaire. Subjects with incomplete questionnaires (49) were excluded. In order to examine the normal range of sleepiness, wakefulness inability, and fatigue, it was decided to exclude subjects with issues known to affect sleepiness, wakefulness inability, and fatigue, such as CNS-active or psychotropic medicines (53), CNS disorders (3), untreated depression (19), and history of observed/perceived apneic episodes in sleep (23). This left 256 normal subjects (87 male, 169 female; age range 18–92 years; 190 White, 26 Black, 5 Hispanic, 11 Asian American, 6 South Asian American, 18 Other; 4 with less than high school education, 23 high school graduates, 52 with some college, 101 with college degrees, and 76 with graduate degrees; National Statistics socioeconomic classification included: 10 higher professional or managerial, 50 lower professional or managerial, 26 intermediate occupations, 3 small employers and own account workers, 12 lower supervisory and technical, 4 routine occupations, 36 retired or unemployed, and 115 students. Forty-four of them retook the SWIFT and ESS a month later.

Determining Reliability

To determine internal consistency, Cronbach α was calculated for the SWIFT (and the ESS) using data from the normal subjects. To determine test-retest reliability, intraclass coefficients were calculated using the normal subjects with test and retest data. If these tests showed good reliability (Cronbach α and intraclass coefficient > 0.8), factor analysis with varimax rotation was performed using SWIFT data.

Additional Analyses in Normal Subjects

Correlation between ESS and SWIFT was calculated. Males and females were compared with regard to age, BMI, time in bed, SWIFT and ESS, as were subjects who completed the questionnaire again and those who did not. Correlations were calculated between SWIFT and ESS on the one hand, and age, time in bed and BMI on the other. If there was a significant correlation between SWIFT and age, normal subjects were divided into two age groups to determine if the correlation persisted. If it did not, factor analysis and correlations were performed again by age group.

Correction for Multiple Statistical Analyses

In order to correct for multiple statistical analyses, the false discovery rate method was applied to primary but not to conditional analyses (analyses performed only as a result of another statistically significant analysis).23 This method rank orders the p-values of the analyses. For k analyses, one p value of 0.05/k was accepted as significant, one p value of 0.05/k-1 was accepted as significant, and so on.

Determining Validity

After establishing good reliability for the SWIFT, validity was determined for SWIFT and its factors using data from normal subjects and sleep disordered patients, and SWIFT and its factors were compared with ESS to establish superiority.

Sleep Disordered Patients

All new patients presenting with sleep disorder symptoms to an AASM accredited Sleep Disorders Center during a 15-month period were administered the SWIFT and ESS at the time of the initial evaluation. If appropriate, they underwent a polysomnography (PSG) to evaluate for OSA using American Academy of Sleep Medicine (AASM) scoring criteria,24 or a PSG with MSLT to evaluate for Narcolepsy. New patients who had previously been evaluated/treated for OSA anywhere were excluded from analysis. All patients with significant OSA (apnea-hypopnea index (AHI) ≥ 15/h sleep, or ≥ 5/h sleep with comorbid sleepiness, hypertension, or cardiovascular disease, evaluated as new patients over the 15-month period, were offered standard treatment. Patients opting for continuous positive airway pressure (CPAP) were administered a PSG with CPAP titration, and were prescribed CPAP at the optimum determined pressure. At a follow-up office visit between 1 and 3 months after CPAP prescription, they were again administered the SWIFT and ESS, and CPAP compliance data were downloaded if available.

Data were available for 286 adult subjects (age ≥ 18 years, 192 males, 94 females) who presented with sleep disorder symptoms and had documented OSA (AHI ≥ 5/h sleep). After excluding subjects with AHI ≥ 5/h sleep on the PSG preceding the MSLT (who were counted among the adult OSA subjects), and patients who were administered the MSLT on CPAP, data were available for 49 adult subjects (17 males, 32 females) who were administered PSG with MSLT for suspicion of narcolepsy (because of unexplained sleepiness with no clinical evidence of OSA, off CNS-active medicines, including psychotropic medicines, for five half-lives). These 49 subjects were independent of and not a subset of the 286 OSA patients. Repeat ESS and SWIFT and compliance data from follow-up visits after CPAP initiation were available for 137 adult OSA patients (98 males, 39 females).

Determining Discriminant Validity

To determine discriminant validity, SWIFT (and SWIFT factor) and ESS scores were compared between the normal subjects and the OSA patients, as well as between normal subjects and patients evaluated for suspicion of narcolepsy. To further determine discriminant validity, SWIFT (and SWIFT factor) and ESS scores were compared in OSA patients before and after CPAP treatment, and the number of patients with abnormal SWIFT (and SWIFT factor) and ESS scores before and after CPAP treatment were compared. Correlations were calculated between compliance and improvement in ESS, SWIFT, and SWIFT factors identified by factor analysis.

Statistics of Diagnostic Tests

A diagnostic test identifies 2 groups: those with the disorder and those without the disorder. The sensitivity (also called true-positive rate) of a test is the probability of a positive test in the disordered population, whereas the specificity is the probability of a negative test in a disorder-free population; and the value (1-specificity) is also called the false-positive rate. The positive predictive value is the probability of a subject with a positive test having the disorder. The negative predictive value is the probability that a subject with a negative test does not have the disorder. Sensitivity and specificity are not affected by prevalence of the disorder, whereas positive and negative predictive values are affected. This means sensitivity and specificity can be accurately calculated when using a normal sample and a sample of disordered subjects, but predictive values cannot (they require a population sample for accurate calculations). A test with higher sensitivity and specificity than another is the superior test. However, a test may have higher sensitivity but lower specificity than another, or vice versa. Therefore, comparing 2 tests requires combining specificity and sensitivity. The likelihood ratio of a positive test or positive likelihood ratio (ρ+) is the ratio of the probability of a positive test in a disordered subject (true-positive rate) to the probability of a positive test in a normal subject (false-positive rate), and is calculated as [sensitivity/(1-specificity)]. The likelihood ratio of a negative test or negative likelihood ratio (ρ) is the ratio of the probability of a negative test in a disordered subject to the probability of a negative test in a normal subject, calculated as [(1-sensitivity)/specificity]. Both likelihood ratios may range from 0 to α. A positive likelihood ratio < 1 indicates a useless test, as does a negative likelihood ratio > 1. With a diagnostic test based on a continuously measured variable, a decision or cutoff threshold allows sensitivity and specificity to be combined into the Youden's index γ, which is the true positive rate minus the false positive rate, calculated as [sensitivity-(1-specifity)] (also written as [sensitivity + specificity −1]). A perfect test (with sensitivity and specificity of 1) results in a Youden's index of 1, whereas a useless test has a Youden's index of 0. When the cutoff threshold is increased, the proportions of both true positives (sensitivity) and false positives (1-specificity) will increase. The receiver operating characteristic (ROC) is a graph of sensitivity against (1-specificity). A perfect test has an area under the ROC (AUC) of 1, a useless test has an AUC of 0.5. Bewick et al. have written an excellent but concise and simple discussion of these tests.25 Since neither specificity nor sensitivity are affected by prevalence of the disorder, therefore positive and negative likelihood ratios, Youden's index and AUC are also not affected by prevalence when they are applied to population-based samples.

Determining Test Superiority

To determine which test is superior in discriminating normal subjects from sleep disordered patients, the AUC for the 2 tests can be compared. Visually, if the ROC for one test is entirely within the ROC of another test, then the second test seems certainly superior. Confidence intervals can be obtained for the AUC and statistical comparisons performed between the AUC for 2 tests using various nonparametric and binormal methods.26,27 A nonparametric distribution and correlated ROCs were assumed for this report. Different methods for calculating confidence intervals of Youden's index and likelihood ratios also exist, and a general method based on constant χ2 boundaries was used for this analysis.28 However, since the AUC is not dependent on a cutoff threshold, and diagnostic decisions are based on cutoff thresholds, a test may have a smaller AUC yet be more suitable than another, and the AUC is not a suitable measure of diagnostic excellence. Youden's index is a more suitable measure of diagnostic superiority.29 Although Youden's index is a good single summary measure of comparison between two tests, positive and negative likelihood ratios are an even better test of superiority.30 If test A has positive likelihood ratio greater than that for test B, and negative likelihood ratio lesser than that for test B, then test A is superior overall to test B. AUC, Youden's index, and positive and negative likelihood ratios (along with confidence intervals) were calculated for the ESS, SWIFT and its factors, using data from the normal subjects and OSA patients, as well as data from the normal subjects and patients evaluated for suspicion of narcolepsy. The AUC for SWIFT and ESS were compared. If there was a significant difference in favor of SWIFT, then the AUC for its factors were also compared with the AUC for ESS.

To determine which test is superior in showing improvement with treatment, effect sizes may be used. However, it is more important clinically to have a cutoff score (such as mean + 1 SD), above which the test is considered high and below which it is considered normal, and to show superiority in conversion of patients from abnormal scores before treatment to normal scores after treatment, therefore χ2 analyses were also performed.

Additional Analyses in Sleep Disordered Patients

Correlation coefficients were calculated between ESS and SWIFT and its factors on the one hand, and sleep efficiency, arousal index, periodic limb movement arousal index (PLMAI), and AHI and lowest oxygen saturation (for OSA patients), mean sleep latency (MSL) and sleep onset REM periods [SOREMPS] (for patients evaluated for narcolepsy), with corrections for false discovery rate for multiple tests.

RESULTS

Reliability

Cronbach α using data from the 256 normal subjects was 0.87 for SWIFT and 0.80 for ESS. Upon retest, the intraclass correlation coefficient for SWIFT was 0.82 (p < 0.001), and for ESS 0.91 (p < 0.001).

Factor Analysis

Factor analysis of SWIFT with varimax rotation revealed 2 factors: Factor 1 (36% of variance) included 9 items (A1, A4, A5, A6, B1, B2, B3, B4, B5), and was called general wakefulness inability and fatigue (GWIF) based on the generality of the items. Factor 2 (20% of variance) included 3 items (A2, A3, B6), and was called driving wakefulness inability and fatigue (DWIF) based on these items being related to driving. Table 2 gives the factor loadings.

Table 2.

Factor analysis matrix of SWIFT

graphic file with name jcsm.8.6.701.t02.jpg

Additional Analysis of Normal Subjects

ESS was correlated with SWIFT (r = 0.64, p < 0.001). There was no difference between males and females in age, BMI, hours in bed, SWIFT, or ESS. Those who completed the questionnaires again had a lower BMI (24.1 vs. 26.6, equal variances not assumed, t = 2.9, df = 68.6, p = 0.005) than those who did not, but there were no other significant differences. After the false discovery rate correction, there were significant negative correlations between age and SWIFT (r = −0.25, p < 0.001) as well as ESS (r = −0.14, p = 0.024). There were no other significant corrected correlations.

Upon dividing the subject group into young adults (ages 18–45, n = 188) and middle-aged to older adults (age > 45, n = 68), ESS and SWIFT were no longer correlated with age in either group. Table 3 gives the measures of central tendency and dispersion for age, hours in bed, BMI, SWIFT, ESS, and the GWIF and DWIF factors for the 188 young adults and 68 middle-aged to older adults; the 85th percentile generally corresponds very closely to mean + 1 SD, and 95th percentile to mean + 2 SD. Upon performing factor analysis separately for each age group, there were the same 2 factors for young adults, accounting for the same 36% and 20% of variance. For middle-aged to older adults, Factor 2 remained the same (A2, A3, B6) and accounted for 19% of variance. Factor 1 separated into 3 factors. The new Factor 1 (A1, A4, B1, B2, B3, B5) accounted for 27% of variance, while A5 and B4 (17% of variance), and A6 (10% of variance) became new separate factors, suggesting that wakefulness inability/fatigue while reading or studying may separate from general wakefulness inability/fatigue in middle-aged to older adults.

Table 3.

Normal subjects: mean, SD, medians and percentiles

graphic file with name jcsm.8.6.701.t03.jpg

OSA Patients

Of the 286 patients with AHI ≥ 5, 86 were young adults (ages 18–45 years) and 200 were middle-aged to older adults (age > 45 years). The 188 normal young adults differed significantly from the 86 young adults with AHI ≥ 5 in age, SWIFT, GWIF, DWIF, and ESS. Table 4 gives the means and standard deviations. Table 5 gives the AUC and, using cutoffs at greater than mean + 1 SD (> 10 for ESS, > 12 for SWIFT, > 11 for GWIF, and > 1 for DWIF), the sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and Youden's index. SWIFT and GWIF but not DWIF had better AUC, positive and negative likelihood ratios and Youden's index than ESS. Figure 1 shows that the ROC for ESS was entirely within the ROC for SWIFT and GWIF. However, there was no significant difference between AUC for ESS and SWIFT.

Table 4.

Normal subjects vs. OSA patients: means and SD

graphic file with name jcsm.8.6.701.t04.jpg

Table 5.

Normal subjects vs. OSA patients: indices of test superiority

graphic file with name jcsm.8.6.701.t05.jpg

Figure 1. ROC curves for ESS, SWIFT, GWIF, and DWIF for normal subjects vs. OSA patients in age group 18–45 years.

Figure 1

ESS, Epworth Sleepiness Scale; SWIFT, Sleepiness-Wakefulness Inability and Fatigue Test; GWIF, general wakefulness inability and fatigue Factor; DWIF, driving wakefulness inability and fatigue factor.

The 68 normal middle-aged to older adults differed significantly from the 200 middle-aged to older adults with AHI ≥ 5 in age, SWIFT, GWIF, DWIF, and ESS. Table 4 gives the means and standard deviations. Table 5 gives the AUC and, using cutoffs at greater than mean + 1 SD (> 9 for ESS, > 9 for SWIFT, > 8 for GWIF, and > 1 for DWIF), the sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and Youden's index. SWIFT, GWIF, and DWIF had better AUC, positive and negative likelihood ratios, and Youden's index than ESS. Figure 2 shows the ROC for ESS was entirely within the ROC for SWIFT and GWIF. The AUC was significantly higher for SWIFT (z = 2.36, p = 0.018) than for ESS, and the AUC was also significantly higher for GWIF than for ESS (z = 2.35, p = 0.019), but not for DWIF.

Figure 2. ROC curves for ESS, SWIFT, GWIF, and DWIF for normal subjects vs. OSA patients in age group > 45 years.

Figure 2

ESS, Epworth Sleepiness Scale; SWIFT, Sleepiness-Wakefulness Inability and Fatigue Test; GWIF, general wakefulness inability and fatigue factor; DWIF, driving wakefulness inability and fatigue factor.

There were no significant correlations found in the OSA patients between ESS, SWIFT, GWIF, or DWIF on the one hand, and sleep efficiency, PLMAI, or lowest oxygen saturation on the other. SWIFT (r = 0.16, p = 0.006), GWIF(r = 0.15, p = 0.009) and DWIF (r = 0.14, p = 0.023), but not ESS, were significantly correlated with arousal index. ESS (r = 0.14, p = 0.018) and GWIF (r = 0.14, p = 0.022), but not SWIFT or DWIF, were significantly correlated with AHI.

CPAP Treatment

ESS, SWIFT, GWIF, and DWIF improved significantly in patients on CPAP in both age groups (36 young adults: t = 7.1, df = 35, p < 0.001 for ESS, t = 7.0, df = 35, p < 0.001 for SWIFT, t = 7.4, df = 35, p < 0.001 for GWIF, t = 3.4, df = 35, p = 0.002 for DWIF; 101 middle-aged to older adults: t = 9.7, df = 100, p < 0.001 for ESS, t = 12.2, df = 100, p < 0.001 for SWIFT, t = 11.5, df = 100, p < 0.001 for DWIF, t = 7.7, df = 100, p < 0.001 for DWIF). Effect sizes and 95% confidence intervals for the 137 subjects were as follows: ESS 0.96 (0.07, 1.63), SWIFT 1.07 (-0.20, 1.98), GWIF 1.04 (-0.03, 1.82), DWIF 0.75 (0.43, 0.93). One hundred fourteen of 137 (83.2%) subjects were compliant (use ≥ 4 h/night) for ≥ 70% of nights. Compliance was significantly correlated with improvement in SWIFT (r = 0.21, p = 0.015), GWIF (r = 0.18, p = 0.034) and DWIF (r = 0.18, p = 0.032), but not ESS (r = 0.11, p = 0.216). Improvement in SWIFT (r = 0.22, p = 0.011) and GWIF (r = 0.24, p = 0.004) were also significantly correlated with AHI, but improvement in DWIF or ESS were not. Table 6 gives by age group the pre- and post-treatment data, as well as numbers above and below the cutoffs before and after treatment, effect sizes, and χ2 statistics. SWIFT, GWIF, DWIF, and ESS were all valuable in demonstrating conversion from abnormal to normal values with CPAP use.

Table 6.

Before and on CPAP treatment

graphic file with name jcsm.8.6.701.t06.jpg

Patients Evaluated for Narcolepsy

Of 49 patients evaluated with PSG and MSLT for evaluation of narcolepsy, 37 were young adults (ages 18–45 years), and 12 were middle-aged to older adults (age > 45 years). Ten of the young adults and none of the middle-aged to older adults met MSLT criteria for diagnosis of narcolepsy—a 20% positive diagnostic rate, which is comparable to the 20% (170 of 832) positive diagnostic rate for the MSLT reported earlier in sleepy patients without OSA.31 The young adults with narcolepsy were significantly younger than the young adults without narcolepsy (24.1, SD 5.3 vs. 38.4, SD 13.6), but did not significantly differ from them in ESS, SWIFT, GWIF, or DWIF. The 188 normal young adults differed significantly from the 37 young adults evaluated for suspicion of narcolepsy in SWIFT, GWIF, DWIF, and ESS, but not in age. Table 7 gives the means and standard deviations. Table 8 gives the AUC and, using cutoffs at greater than mean + 1 SD (> 10 for ESS, > 12 for SWIFT, > 11 for GWIF, and > 1 for DWIF), the sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and Youden's index. SWIFT, GWIF, and DWIF had better AUC, positive and negative likelihood ratios, and Youden's index than ESS. Figure 3 shows the ROC for ESS was entirely within the ROC for SWIFT and GWIF. The AUC was significantly higher for SWIFT (z = 2.29, p = 0.022) than for ESS. The AUC was also significantly higher for GWIF than for ESS (z = 2.07, p = 0.038), but not for DWIF.

Table 7.

Normal subjects vs. MSLT patients: means and SD

graphic file with name jcsm.8.6.701.t07.jpg

Table 8.

Normal subjects vs. MSLT patients: indices of test superiority

graphic file with name jcsm.8.6.701.t08.jpg

Figure 3. ROC curves for ESS, SWIFT, GWIF, and DWIF for normal subjects vs. patients evaluated for narcolepsy in age group 18–45 years.

Figure 3

ESS, Epworth Sleepiness Scale; SWIFT, Sleepiness-Wakefulness Inability and Fatigue Test; GWIF, general wakefulness inability and fatigue factor; DWIF, driving wakefulness inability and fatigue factor.

The 68 normal middle-aged to older adults differed significantly from the 12 middle-aged to older adults evaluated for suspicion of narcolepsy in age, SWIFT, GWIF, DWIF, and ESS. Table 7 gives the means and standard deviations. Table 8 gives the AUC and, using cutoffs at greater than mean + 1 SD (> 9 for ESS, > 9 for SWIFT, > 8 for GWIF, and > 1 for DWIF), the sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and Youden's index. SWIFT, GWIF and DWIF had better AUC, positive and negative likelihood ratios, and Youden's index than ESS. Figure 4 shows the ROC for ESS was entirely within the ROC for SWIFT and GWIF. However, there was no significant difference between AUC for ESS and SWIFT.

Figure 4. ROC curves for ESS, SWIFT, GWIF, and DWIF for normal subjects vs. patients evaluated for narcolepsy in age group > 45 years.

Figure 4

ESS, Epworth Sleepiness Scale; SWIFT, Sleepiness-Wakefulness Inability and Fatigue Test; GWIF, general wakefulness inability and fatigue factor; DWIF, driving wakefulness inability and fatigue factor.

There were no significant correlations found in the patients evaluated for narcolepsy between ESS, SWIFT, GWIF, or DWIF on the one hand and sleep efficiency, AHI, PLMAI, arousal index, or number of SOREMPs on the MSLT, on the other. ESS but not SWIFT, GWIF, or DWIF was significantly negatively correlated with mean sleep latency on the MSLT (r = −0.408, p = 0.004).

DISCUSSION

The SWIFT has high internal consistency as shown by high Cronbach α, and high test-retest reliability shown by high intraclass coefficient. Thus, the SWIFT is a reliable test.

The twelve test items of the SWIFT load on to two different factors. Factor 1 seems to be a measure of general wakefulness inability and fatigue, while Factor 2 seems to measure driving wakefulness inability and fatigue, indicating that it may be possible to measure separately general and driving related concepts/symptoms.

The ability of SWIFT, GWIF, and DWIF to discriminate between normal subjects and patients with OSA, as well as patients presenting with symptoms suggesting narcolepsy, shows that the SWIFT is a valid test, as does the ability to show significant improvement with CPAP treatment of OSA.

The SWIFT and GWIF are superior to the ESS (the criterion standard) in discriminating between normal subjects and patients with OSA in both age groups, with regard to sensitivity/specificity/discriminant validity, as shown by AUC (statistically significantly so for middle-aged and older adults), Youden's index, as well as the positive and negative likelihood ratios. The SWIFT and GWIF are also superior to the ESS in discriminating between normal subjects and patients evaluated for narcolepsy in both age groups, with regard to sensitivity/specificity/discriminant validity, as shown by AUC (statistically significantly for young adults), Youden's index, and positive and negative likelihood ratios. All the ROCs for ESS (young and middle-aged to older adults, patients with OSA, and patients evaluated with narcolepsy) were entirely within the ROCs for SWIFT and GWIF. Given the rarity of narcolepsy, comparisons were made using patients evaluated for narcolepsy rather than patients diagnosed with narcolepsy. However, patients with OSA were excluded from this group; patients not positive for narcolepsy were as sleepy, fatigued, and unable to maintain wakefulness as the patients with narcolepsy. Thus, SWIFT and GWIF may be more useful than ESS in terms of clinical utility in discriminating between normal and sleep disordered subjects.

Effect sizes were similar for improvement in ESS, SWIFT, and GWIF (but lower for DWIF) with CPAP treatment in young adults. In middle-aged to older adults, effect sizes for SWIFT and GWIF were higher than those for ESS and DWIF. Comparisons of the number of patients with high ESS, SWIFT, GWIF, and DWIF before and after CPAP treatment revealed significant differences in both age groups. Improvement in SWIFT, GWIF, and DWIF, but not ESS, was significantly correlated with compliance. This compliance-response relationship lends more confidence in the use of the SWIFT or GWIF rather than the ESS is assessing treatment response with CPAP despite similar effect sizes for SWIFT, GWIF, and ESS. The finding that only ESS, but not SWIFT, GWIF, or DWIF, is correlated with MSLT suggests that subjects were able to separate the concept of tendency to fall asleep (as measured by the MSLT and the ESS) from wakefulness inability and fatigue. The finding that SWIFT, GWIF, and DWIF, but not ESS are correlated with arousal index suggests that they are a better measure of lack of sleep quality than the ESS. The finding that wakefulness inability and fatigue did not load on to separate factors on factor analysis suggests that subjects may have a hard time separating these two concepts.

The separation of data into two groups by age necessitated by a correlation between age and SWIFT as well as ESS in the combined group, provides a built-in replication, and similar findings in the two independent age groups (though more robust in the middle-aged to older adults in the case of OSA and in young adults in the case of patients evaluated for narcolepsy) lend increased confidence to the results.

Mills et al. have reported that predictors of fatigue in OSA include BMI, depression scores, and soluble tumor necrosis factor receptor I (sTNF-RI), but not the severity of OSA as measured by AHI or mean oxygen saturation.32 Tumor necrosis factor-α (TNF- α) and interleukin-6 (IL-6) are increased in OSA and narcolepsy.33 Adding measures of fatigue to the measurement scale for daytime functioning, and changing the measurement scale to measure wakefulness inability rather than tendency to fall asleep may improve the measurement of the daytime consequences of sleep disorders.

Masa et al.34 reported habitual sleepiness affecting 3.6% of drivers, with an odds ratio of 13.7 for highway automobile accidents, and with considerable ESS overlap between these subjects and controls. 50% of habitually sleepy drivers had ESS < 9. This suggests that propensity to fall asleep in other circumstances (as measured by ESS) is neither necessary nor sufficient to cause increased risk for auto accidents. Although a sample of 10,000 subjects with 71% response rate showed no correlation between the ESS and the adverse consequence of automobile accidents, there was a correlation with dozing off while stopped in traffic.18 Increased risk for auto accidents may be the result of a complex mix of wakefulness inability, fatigue, and inattention/cognitive impairment, all of which may occur in sleep disordered or sleep deprived subjects. Measurement of increased risk for auto accidents may require questions directly related to wakefulness inability and fatigue while driving, as in the DWIF factor of the SWIFT. The question whether DWIF might be predictive of risk for auto accidents needs to be elucidated in further research.

This study was designed to determine if the SWIFT is a reliable and valid instrument, and if it is superior to the criterion standard, ESS in terms of specificity/sensitivity/discriminant ability, and therefore, possibly, clinical utility. We have shown that the SWIFT is reliable and has discriminant validity, that it has two factors (GWIF and DWIF), and that the SWIFT and its GWIF factor are superior to the ESS in discriminating between normal subjects and sleep disordered patients. These tests measure sleepiness/wake inability and assist in screening/diagnosis. However, they are not meant to discriminate between different causes of difficulties with wakefulness inability or sleepiness. SWIFT should be added to ESS in evaluating daytime consequences of sleep disorders. The two tests together comprise 20 questions and can form a quick questionnaire for use in the office to screen for sleepiness, wakefulness inability, and fatigue, with cutoffs of > 10 for ESS, > 12 for SWIFT, > 11 for GWIF, > 1 for DWIF in young adults (ages 18–45 years), and with cutoffs of > 9 for ESS, > 9 for SWIFT, > 8 for GWIF, > 1 for DWIF in middle-aged to older adults (age > 45 years).

A limitation of this study is that item selection was based on face validity rather than qualitative evaluation using patient focus groups. Another limitation is that the control group was recruited by means of a group e-mail to medical students and by personal solicitation in public places, and it is not clear whether this cohort of normal subjects generalizes to the population and whether it is comparable to the patient groups presented. Further, although the SWIFT and its factors are a better measure for differentiating between normal subjects and sleep disordered patients than the ESS, the areas under the curve still leave a lot to be desired. However, though there may eventually be a simple blood test to measure sleepiness, wakefulness inability, and fatigue, for now we are left with questionnaires as possibly the best proxies, though objective measures such as the psychomotor vigilance test or the divided attention driving test are other candidates.35 This study was a clinical rather than an experimental study. Since the MWT is not routinely performed clinically, this study did not compare the SWIFT with the MWT. Future directions might include a study of the SWIFT using the MWT.

DISCLOSURE STATEMENT

This was not an industry supported study. The author has indicated no financial conflicts of interest.

REFERENCES

  • 1.2002 “Sleep in America” Poll. National Sleep Foundation website. [Accessed February 11, 2011]. http://www.sleepfoundation.org/sites/default/files/2002SleepInAmericaPoll.pdf Published 2002.
  • 2.Drowsy Driving and Automobile Crashes. [Accessed May 22, 2012]. National Highway Safety Administration website. http://www.nhtsa.gov/people/injury/drowsy_driving1/drowsy.html. I: Introduction.
  • 3.Chervin RD. Sleepiness, fatigue, tiredness, and lack of energy in obstructive sleep apnea. Chest. 2000;118:372–9. doi: 10.1378/chest.118.2.372. [DOI] [PubMed] [Google Scholar]
  • 4.Carskadon MA, Dement WC, Mitler MM, Roth T, Westbrook PR, Keenan S. Guidelines for the multiple sleep latency test (MSLT): a standard measure of sleepiness. Sleep. 1986;9:519–24. doi: 10.1093/sleep/9.4.519. [DOI] [PubMed] [Google Scholar]
  • 5.Johns MW. A new method for measuring daytime sleepiness: the Epworth Sleepiness Scale. Sleep. 1991;14:540–5. doi: 10.1093/sleep/14.6.540. [DOI] [PubMed] [Google Scholar]
  • 6.Johns MW. Reliability and factor analysis of the Epworth Sleepiness Scale. Sleep. 1992;15:376–81. doi: 10.1093/sleep/15.4.376. [DOI] [PubMed] [Google Scholar]
  • 7.Arand D, Bonnet M, Hurwitz T, Mitler M, Rosa R, Sangal RB. The clinical use of the MSLT and MWT. Sleep. 2005;28:123–44. doi: 10.1093/sleep/28.1.123. [DOI] [PubMed] [Google Scholar]
  • 8.Mitler MM, Gujavarty KS, Browman CP. Maintenance of wakefulness test: a polysomnographic technique for evaluating treatment in patients with excessive sleepiness. Electroencpahlogr Clin Neurophysiol. 1982;53:658–61. doi: 10.1016/0013-4694(82)90142-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Doghramji K, Mitler MM, Sangal RB, et al. A normative study of the maintenance of wakefulness test (MWT) Electroencpahlogr Clin Neurophysiol. 1997;103:554–62. doi: 10.1016/s0013-4694(97)00010-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sangal RB, Thomas L, Mitler MM. The maintenance of wakefulness test (MWT) and the multiple sleep latency test (MSLT) measure different abilities in patients with sleep disorders. Chest. 1992;101:898–902. doi: 10.1378/chest.101.4.898. [DOI] [PubMed] [Google Scholar]
  • 11.Browman CP, Gujavarty KS, Sampson MG, Mitler MM. REM sleep episodes during the maintenance of wakefulness test in patients with sleep apnea syndrome and patients with narcolepsy. Sleep. 1983;6:23–8. doi: 10.1093/sleep/6.1.23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sangal RB, Thomas L, Mitler MM. Disorders of excessive sleepiness: treatment improves ability to stay awake but does not reduce sleepiness. Chest. 1992;102:699–703. doi: 10.1378/chest.102.3.699. [DOI] [PubMed] [Google Scholar]
  • 13.Sanford SD, Lichstein KL, Durrence HH, Riedel BW, Taylor DJ, Bush AJ. The influence of age, gender, ethnicity, and insomnia or Epworth Sleepiness Scores: a normative U.S. population. Sleep Med. 2006;7:319–26. doi: 10.1016/j.sleep.2006.01.010. [DOI] [PubMed] [Google Scholar]
  • 14.Sangal RB, Sangal JM, Belisle C. Subjective and objective indices of sleepiness (ESS and MWT) are not equally useful in patients with sleep apnea. Clin Electroencephalogr. 1999;30:73–5. doi: 10.1177/155005949903000208. [DOI] [PubMed] [Google Scholar]
  • 15.Sangal RB, Mitler MM, Sangal JM. Subjective sleepiness ratings (Epworth Sleepiness Scale) do not reflect the same parameter of sleepiness as objective sleepiness (maintenance of wakefulness test) in patients with narcolepsy. Clin Neurophysiol. 1999;110:2131–5. doi: 10.1016/s1388-2457(99)00167-4. [DOI] [PubMed] [Google Scholar]
  • 16.Johns MW. Sleepiness in different situations measured by the Epworth Sleepiness Scale. Sleep. 1994;17:703–10. doi: 10.1093/sleep/17.8.703. [DOI] [PubMed] [Google Scholar]
  • 17.Benbadis SR, Mascha E, Perry MC, et al. Association between the Epworth sleepiness scale and the multiple sleep latency test in a clinical population. Ann Intern Med. 1999;130:289–92. doi: 10.7326/0003-4819-130-4-199902160-00014. [DOI] [PubMed] [Google Scholar]
  • 18.Gander PH, Marshall NS, Harris RB, Reid P. Sleep, sleepiness and motor vehicle accidents: a national survey. Austr N Z J Public Health. 2005;29:16–21. doi: 10.1111/j.1467-842x.2005.tb00742.x. [DOI] [PubMed] [Google Scholar]
  • 19.Merriam-Webster Dictionary. [Accessed January 20, 2012]. http://www.merriam-webster.com.
  • 20.Sangal RB. When is sleepiness a disease? How do we measure it? Sleep Med. 2006;7:310–1. doi: 10.1016/j.sleep.2006.02.002. [DOI] [PubMed] [Google Scholar]
  • 21.Stein KD, Martin SC, Hann DM, Jacobsen PB. A multidimensional measure of fatigue for use with cancer patients. Cancer Pract. 1998;6:143–52. doi: 10.1046/j.1523-5394.1998.006003143.x. [DOI] [PubMed] [Google Scholar]
  • 22.Stein KD, Jacobsen PB, Blanchard CM, Thors CT. Further validation of the Multidimensional Fatigue Symptom Inventory-Short Form (MFSI-SF) J Pain Symptom Manage. 2004;27:14–23. doi: 10.1016/j.jpainsymman.2003.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Curran-Everett D. Multiple comparisons: philosophies and illustrations. J Physiol Regul Integr Comp Physiol. 2000;279:R1–R8. doi: 10.1152/ajpregu.2000.279.1.R1. [DOI] [PubMed] [Google Scholar]
  • 24.Iber C, Ancoli-Israel S, Chesson AL, Quan SF. Westchester, IL: American Academy of Sleep Medicine; 2007. The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications. [Google Scholar]
  • 25.Bewick V, Cheek L, Ball J. Statistics review 13: Receiver operating characteristic curves. Crit Care. 2004;8:508–12. doi: 10.1186/cc3000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the area under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–45. [PubMed] [Google Scholar]
  • 27.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
  • 28.Fleiss JL. Statistical methods for rates and proportions. New York: John Wiley – Sons; 1981. sec 5.6. [Google Scholar]
  • 29.Hilden J, Glasziou P. Regret graphs, diagnostic uncertainty and Youden's Index. Stat Med. 1996;15:969–86. doi: 10.1002/(SICI)1097-0258(19960530)15:10<969::AID-SIM211>3.0.CO;2-9. [DOI] [PubMed] [Google Scholar]
  • 30.Biggerstaff BJ. Comparing diagnostic tests: a simple graphic using likelihood ratios. Stat Med. 2000;19:649–63. doi: 10.1002/(sici)1097-0258(20000315)19:5<649::aid-sim371>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
  • 31.Aldrich MS, Chervin RD, Malow BA. Value of the multiple sleep latency test (MSLT) for the diagnosis of narcolepsy. Sleep. 1997;20:620–9. [PubMed] [Google Scholar]
  • 32.Vgontzas AN, Papanicalou DA, Bixler EO, Kales A, Tysom K, Chrousos GP. Elevation of plasma cytokines in disorders of excessive daytime sleepiness: role of sleep disturbance and obesity. J Clin Endocrinol Metab. 1997;82:1313–6. doi: 10.1210/jcem.82.5.3950. [DOI] [PubMed] [Google Scholar]
  • 33.Mills PJ, Kim J-H, Bardwell W, Hong S, Dimsdale JE. Predictors of fatigue in obstructive sleep apnea. Sleep Breath. 2008;12:397–9. doi: 10.1007/s11325-008-0192-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Masa JF, Rubio M, Findley LJ. Habitually sleepy drivers have a higher frequency of automobile crashes associated with respiratory disorders during sleep. Am J Respir Crit Care Med. 2000;162:1407–12. doi: 10.1164/ajrccm.162.4.9907019. [DOI] [PubMed] [Google Scholar]
  • 35.Sunwoo BY, Jackson N, Maislin G, Gurubhagavatula I, Goerge CF, Pack AI. Reliability of a single objective measure in assessing sleepiness. Sleep. 2012;35:149–58. doi: 10.5665/sleep.1606. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine are provided here courtesy of American Academy of Sleep Medicine

RESOURCES