Abstract
Current case definitions of Myalgic Encephalomyelitis (ME) and chronic fatigue syndrome (CFS) have been based on consensus methods, but empirical methods could be used to identify core symptoms and thereby improve the reliability. In the present study, several methods (i.e., continuous scores of symptoms, theoretically and empirically derived cut off scores of symptoms) were used to identify core symptoms best differentiating patients from controls. In addition, data mining with decision trees was conducted. Our study found a small number of core symptoms that have good sensitivity and specificity, and these included fatigue, post-exertional malaise, a neurocognitive symptom, and unrefreshing sleep. Outcomes from these analyses suggest that using empirically selected symptoms can help guide the creation of a more reliable case definition.
Keywords: Myalgic Encephalomyelitits, chronic fatigue syndrome, biomarkers, case definitions
Considerable controversy surrounds the illnesses known as chronic fatigue syndrome (CFS), Myalgic Encephalomyelitits (ME), and Myalgic Encephalomyelitits/chronic fatigue syndrome (ME/CFS). The terms CFS and ME were introduced to describe outbreaks of illness based upon their symptoms. The degree to which they overlap or are the same is currently under debate, and considerable controversy surrounds the terms. Patients experience debilitating fatigue in addition to other physical and cognitive symptoms, and substantial recovery from having CFS or ME occurs in less than 10% of cases (Cairns & Hotopf, 2005). The estimated annual direct and indirect costs of this illness to society are over 18 billion dollars (Jason, Benton, Johnson, & Valentine, 2008). Community-based CFS prevalence estimates range from .42% (Jason et al., 1999) to 2.54% (Reeves et al., 2007), and these discrepancies might be due to criteria variance.
The most widely used consensus-based CFS case definition is based on the Fukuda et al.(1994) criteria, which require four of eight core symptoms. Because of a polythetic method of selecting four out of eight symptoms, it is possible that some individuals who meet these criteria do not have core symptoms of the illness, such as post-exertional malaise, memory and concentration problems, and unrefreshing sleep (Jason, Brown, Evans, Sunnquist, & Newton, 2013). In contrast, Carruthers et al. (2003) developed what are known as the Canadian ME/CFS consensus-based clinical criteria, which require seven core symptoms. Carruthers et al. (2011) have recently developed another consensus-based case definition called the Myalgic Encephalomyelitis International Consensus Criteria (ME-ICC), which further increased the number of required symptoms to eight. We use the terms ME, CFS, and ME/CFS and suggest that these syndromes may be different from each other. Each has a case definition and a different set of criteria. These terms have been used to describe multi-symptom outbreaks of syndromes in disparate geographic areas. Whether they are same or different, is an argument that continues to this day. As for the term ME/CFS, it was first proposed by patient advocate groups who wished to have ME precede the name of their illness (CFS) to counter the stigma that has become associated with CFS and/or a fatiguing illness which might be more in one's mind than in one's abnormal physiology. The term ME/CFS was granted legitimacy when it was recommended by the Chronic Fatigue Syndrome Advisory Committee, and later when Dennis Mangan, the NIH Moderator of the NIH Chronic Fatigue Syndrome State of Knowledge Workshop event, stated that out of deference to the patients, the NIH would now and henceforth refer to the illness as ME/CFS.
The Canadian ME/CFS consensus criteria (Carruthers et al., 2003) and the ME-ICC criteria (Carruthers et al., 2011) do identify a smaller subset of patients with more severe symptoms and physical functioning impairment (Jason, Brown, et al., 2013), but both are consensus-based rather than empirical. In addition, case definitions that have higher numbers of core symptoms can contribute to higher rates of psychiatric comorbidity (Brown, Jason, Evans, & Flores, 2013; Katon, & Russo, 1992). In a recent systematic review, Brurberg, Fønhus, Larun, Flottorp, and Malterud (2014) identified 20 case definitions. While the Fukuda et al. (1994) criteria were the most frequently used, validation of any of the case definitions is inconsistent, and no studies rigorously assessed these case definitions’ reliability to accurately capture people with the illness.
Sources of diagnostic unreliability include subject, occasion, and information variance account, but criteria variance, differences in the formal inclusion and exclusion criteria used by clinicians to classify patients’ data into diagnostic categories, accounts for the largest source of diagnostic unreliability (Jason, & Choi, 2008). Criteria variance is most likely to occur when there are varying criteria for contrasting case definitions. When diagnostic categories lack reliability and accuracy, the validity of a diagnostic category is inherently limited by its reliability. Problems of criteria variance have plagued case definitions involving CFS, ME/CFS, and ME.
Advanced statistical methods could be used to evaluate these consensus criteria as well as suggest a more empirically-based case definition, which could deal with the problem of criteria variance. For example, factor analytic studies have explored latent factors (Arroll & Senior, 2009; Brown & Jason, 2014; Friedberg, Dechene, McKenzie, & Fontanetta, 2000; Hickie et al., 2009; Jason, Corradi & Torres-Harding, 2007), and the domains of neurocognitive impairments and post-exertional malaise are common, whereas fewer studies identify pain, autonomic, immune and neuroendocrine factors.
Other statistical selection techniques can also help reveal which symptoms are the most useful in distinguishing between patients and healthy individuals and, hence, which symptoms are most characteristic of the illness. For example, using data mining, Jason, Skendrovic et al. (2012) found that the core features of the illness, including the inability to concentrate, post-exertional malaise, and unrefreshing sleep, best discriminated patients from non-patients. However, this data set was limited in size, and efforts were not directed toward developing an empirical case definition. In another data set, Jason, Sunnquist, Brown, Evans, Vernon, et al. (2014) also found items involving fatigue, post-exertional malaise, and neurocognitive problems as differentiating patients from controls. However, there were a number of limitations in that study. The way these authors determined whether the frequency and severity of symptoms was severe enough to meet criteria was not derived empirically, so it is unclear if similar findings would occur when using more rigorous methods. Another limitation of this study was that only one data mining analysis was conducted, and, to avoid any one sample from affecting the results of the analysis, it is important to create multiple sets. In addition, the sample sizes for patients and controls were not equivalent, and this poses additional problems for data mining. Finally, individuals with the identified core symptoms versus those without the core symptoms had not been compared on any symptom or disability measures, so it was unclear whether these core symptoms identified a more impaired group of patients.
The present study attempted to overcome these limitations by empirically developing symptom and frequency cutoff points, creating multiple data sets, using equivalent samples of patients and controls, and examining whether patients identified with core symptoms would evidence more impairment than those without those core symptoms. However, the intent of this study was to gain clarity of case definition for research purposes as opposed to clinical purposes. We hypothesized that symptoms assessing post-exertional malaise, neurocognitive problems and unrefreshing sleep would best differentiate the patients from controls. Theoretical support for these symptoms includes the fact that each of the various case definitions does list these foundational symptoms, but some employ choice through a polythetic method (Fukuda et al., 1994) whereas others require them (Carruthers et al., 2003, 2011). Empirical support for these symptoms derives from factor analytic studies (e.g., Brown & Jason, 2014) as well as predictors differentiating CFS from Major Depressive Disorder (Hawk, Jason, & Torres-Harding, 2006). We also hypothesized that empirical methods could identify case definition criteria involving fewer core symptoms than more consensus based approaches.
Method
Participants
SolveCFS BioBank sample
Data from the SolveCFS BioBank were de-identified and shared with the DePaul research team by the Solve ME/CFS Initiative (SMCI). The SolveCFS BioBank has clinical information and blood samples on individuals who were diagnosed by a licensed physician using either the Fukuda et al. (1994) CFS criteria or the Carruthers et al. (2003) Canadian ME/CFS criteria. Individuals with medical or psychiatric reasons for their fatigue were excluded, as this is a requirement for current case definitions. Some patients who had been cared for and whose treatment may have had a beneficial effect on these patients were included, however, all patients still met case definition criteria for either CFS or ME/CFS. All individuals included in the present study were over 18 years of age. Participants were recruited by the SMCI through physician clinics. All participants who met eligibility criteria completed a written informed consent process. Control participants were recruited who were in generally good physical and mental health and did not have a substance use disorder or any disorder that could cause immunosuppression. Controls could not have any medical condition or mental health disorder that caused fatigue. Participants completed the study measures electronically or by hard copy.
Measures
The DePaul Symptom Questionnaire
All participants completed the DePaul Symptom Questionnaire (DSQ) (Jason, Evans et al., 2010), a self-report measure of symptomatology, demographics, and medical, occupational, and social history. Participants were asked to rate the frequency and severity of 54 symptoms on a 5-point Likert scale. Symptom frequency was rated: 0=none of the time, 1=a little of the time, 2=about half the time, 3=most of the time, and 4=all of the time. Likewise, severity was rated: 0=symptom not present, 1=mild, 2=moderate, 3=severe, and 4=very severe. The DSQ has evidenced good test-retest reliability among both patient and control groups (Jason, So, et al., in press). A factor analysis by Brown and Jason (2014) found a three-factor solution, with factors evidencing good internal consistency. The DSQ is available at REDCap's shared library: https://redcap.is.depaul.edu/surveys/?s=tRxytSPVVw
RAND 36-Item Health Survey (Version 1.0)
RAND-36 is a self-report questionnaire that measures the impact of physical and mental health on functioning (Ware & Sherbourne, 1992). Low scores indicate that an individual's health is affecting his or her functioning; higher scores indicate less of an impact. Test construction studies have shown adequate internal consistency, significant discriminant validity among subscales, and substantial differences between patient and non-patient populations (McHorney, Ware, Lu, & Sherbourne, 1994).
Statistics
Methods for replacing missing values
In examining the frequency and severity ratings of the 54 DSQ symptoms, participants missing responses to 10% or more items were removed. Of the participants who remained (233 individuals with CFS and 80 controls), there were 137 instances of missing values (about 0.4% of the total data). If there is a high rate of missing data, this could inflate Type I error. However, as Fidell and Tabachnick (2003) have argued, if less than 5% is missing, then there is not a problem with imputing data. In our study, the percentage of values that were missing was very low. The approach we used in this paper was used by Watson et al. (2014). These missing values were replaced using the following method: For the cases that had a score of 0 for either frequency or severity of a symptom and were missing the other field, the missing value was set to 0; the rationale was that a symptom should occur “none of the time” (frequency=0) if and only if the symptom is “not present” (severity=0). Otherwise, if a subject was missing data in only one of the two fields (frequency or severity) for a symptom, then the missing value was replaced with the mode value from the cases that had the same score for the non-missing field. When both fields were missing for a symptom, the values were replaced with the overall medians in those fields for that symptom.
Receiver Operating Characteristic (ROC) curve analysis
For this analysis, a composite score for each symptom was created by averaging its frequency and severity scores and multiplying the result by 25; thus, possible scores ranged from 0 to 100.
Classification accuracy of individual symptoms
The symptoms listed in the DSQ were converted into binary variables for use in the next analysis, as we wanted to develop a method for determining when a symptom met a threshold that indicated it was a significant problem for the patient. In other words, binary variables for each symptom indicated whether or not the participant reported frequency and severity levels that met a minimum threshold. Initially, a threshold was applied that was defined in a prior study (Jason, Sunnquist, Brown, Evans, Vernon, et al., 2014): a symptom's frequency and severity scores needed to be greater than or equal to 2 (symptoms of at least moderate severity that occur at least half of the time). We assumed that symptoms that occurred at least half the time and were of moderate severity would be a reasonable threshold for discriminating somewhat serious symptoms from those that were relatively mild and not impairing (and below we provide a more empiric way of determining this threshold). The resulting binary symptoms derived from this 2,2 threshold were used to test the predictive accuracy of each of the symptoms in discriminating between patients and healthy controls. A benefit of this threshold is that it has some face validity and is theoretically appealing; furthermore, it is easier to interpret than a continuous score.
Next, as an alternative to applying a static, 2,2 threshold for all symptoms, empirical methods were used to determine the frequency and severity scores that best discriminate patients and controls for each individual symptom. The threshold was dynamically adjusted for each symptom based on observed frequency and severity scores, similar to Watson et al.'s (2014) use of unsupervised learning. Supervised machine learning techniques, as used in Hanson, Gause, and Natelson (2001), are only valid insofar as they reflect the initial diagnosis criteria. In order to develop an empirical definition, it is imperative to minimize any reliance on pre-existing case definitions so that results do not simply mirror selection biases of the prior definitions. In the current study, a k-means clustering approach was used. Generally speaking, the k-means algorithm iteratively divides coordinate points into a predetermined number of clusters based on which cluster center the point lies closest to. In this case, the k-means clustering algorithm was set to find two clusters, based on the underlying assumption that the data consisted of patient and control groups. Frequency and severity scores for each symptom were treated as coordinate pairs for the purpose of cluster assignment, and the Euclidean distance was used to measure closeness to the cluster centers. After equilibrium was reached, the perpendicular bisector of the line between cluster centers was found. This bisecting line was used as the threshold; frequency-severity pairs above the threshold line were considered “symptom present,” whereas scores below the line were considered “symptom not present.”
Data mining
All symptoms were placed into the analyses, rather than one symptom at a time. In the current study, decision trees were used to determine which symptoms were most effective at accurately classifying participants as either a patient or control. Decision trees consist of a series of successive binary choices (branch points) that result in an accurate classification of participants.
SPSS Statistics software was used to build our decision tree models. To build the models, a Classification and Regression Tree (CART) algorithm was applied to a training set consisting of 66% of the cases, stratified to reflect the distribution of patient and control groups. The value of the model was measured by evaluating its classification performance when applied to cases reserved for testing (34% of the data), allowing this technique the ability to be generalized to new data. Data mining in general, and decision trees specifically, is biased when label sets are not of equal size. We took a random subsample of 80 patients along with all the 80 Controls, omitting the other 153 patients. To avoid any one training or testing subsample from affecting the result of the analysis, we created 100 such sets (random subsample of 80 of patients and all 80 controls) for analysis. For most analyses, only three to five variables were needed to classify participants.
Comparison of groups
To further explore the results of the decision tree analyses, three groups of participants were compared: healthy controls, participants diagnosed with CFS or ME/CFS who met the 2,2 frequency and severity criteria for the symptoms identified in the decision tree analyses, and participants diagnosed with CFS or ME/CFS who did not meet these 2,2 criteria. As we had unequal sample sizes and unequal variances, we selected statistical tests which would accommodate these data problems. Welch's F tests and Games-Howell post-hoc tests were conducted to compare the RAND-36 subscale scores and 100-point symptom scores of these groups. Additionally, a total symptom score was computed by summing each participant's frequency and severity scores for the 54 DSQ symptoms; a Welch's F test and Games-Howell post hoc test were conducted to compare the total symptom scores of the three groups.
Results
Demographics
Jason, Sunnquist, Brown, Evans, Vernon, et al. (2014) report demographic characteristics of this sample. About three quarters of the sample were female and 98-99% were White, and these demographics are comparable to much of the published literature. Significant differences existed in work status between the control group and those that met the Fukuda et al. (1994) CFS criteria [p < 0.000, two-tailed Fisher's exact test], the Carruthers et al. (2003) Canadian ME/CFS criteria [p < 0.000, two-tailed Fisher's exact test], and the Carruthers et al. (2011) ME-ICC criteria [p < 0.000, two-tailed Fisher's exact test]. Most of the individuals in the control group were working, while about 70% of the patient groups were on disability. Additionally, a significant difference was found when comparing the marital status of those meeting the Fukuda et al. CFS criteria and control groups [p = 0.03, two-tailed Fisher's exact test], as a larger proportion of the Fukuda et al. CFS group were single.
Receiver Operating Characteristic (ROC) Curve Analysis
Table 1 shows the AUCs for the ten most accurate symptoms using continuous scores in the DSQ. Accuracy was determined based on the individual symptom's ability to correctly predict CFS or healthy control status, and it was used to generate the ROC curves (and area under the curve of .90 or better is considered as very good). It is apparent that fatigue, post-exertional malaise, and neurocognitive symptoms are among the most accurate items. Unrefreshing sleep was also among the top ten items. These findings provided evidence that our hypothesized symptoms would be among the most accurate predictors.
Table 1. Area Under the Curve (AUC) For Top 10 symptoms.
| Individual Symptoms | AUC |
|---|---|
| Fatigue / Extreme tiredness | 0.978 |
| Physically drained / sick after mild activity | 0.971 |
| Minimum exercise makes you physically tired | 0.968 |
| Next-day soreness after non-strenuous activities | 0.952 |
| Dead, heavy feeling after starting to exercise | 0.951 |
| Unrefreshing sleep | 0.946 |
| Mental tiredness after slightest effort | 0.945 |
| Slowness of thought | 0.943 |
| Muscle weakness | 0.940 |
| Difficulty finding the right word to say or expressing thoughts | 0.931 |
Classification Accuracy of Individual Symptoms
Table 2 provides the most accurate symptoms when this 2,2 threshold was applied, and among the most accurate were fatigue, post-exertional malaise, neurocognitive, and sleep symptoms. Using the unsupervised learning system, where the threshold was dynamically adjusted for each symptom based on observed frequency and severity scores, as evident in Table 2, we found comparable results to the 2,2 criteria analysis, thus confirming the usefulness of the simpler-to-use 2,2 criteria.
Table 2. Accuracy Using Multiple Thresholds for Top 10 Symptoms.
| 2,2 Threshold Individual Symptoms | Accuracy |
|---|---|
| Fatigue / Extreme tiredness | 94.9% |
| Minimum exercise makes you physically tired | 89.8% |
| Unrefreshing sleep | 89.5% |
| Physically drained / sick after mild activity | 87.5% |
| Next-day soreness after non-strenuous activities | 86.6% |
| Dead, heavy feeling after starting to exercise | 85.9% |
| Problems remembering things | 83.4% |
| Difficulty finding the right word to say or expressing thoughts | 79.2% |
| Muscle weakness | 78.9% |
| Can only focus on one thing at a time | 78.0% |
| Dynamic Threshold Individual Symptoms | Accuracy |
| Fatigue / Extreme tiredness | 95.5% |
| Dead, heavy feeling after starting to exercise | 88.2% |
| Minimum exercise makes you physically tired | 88.2% |
| Physically drained / sick after mild activity | 87.9% |
| Next-day soreness after non-strenuous activities | 87.5% |
| Unrefreshing sleep | 87.2% |
| Problems remembering things | 83.7% |
| Muscle weakness | 81.8% |
| Difficulty finding the right word to say or expressing thoughts | 80.5% |
| Can only focus on one thing at a time | 78.9% |
Data Mining
In Table 3, the data mining analyses suggested the selection of four symptoms (using the 2,2 criteria): fatigue or extreme tiredness, difficulty finding the right word to say or expressing thoughts, physically drained/sick after mild activity, and unrefreshing sleep. In particular, these were all symptoms that appeared in a majority of the 100 classification trees. Figure 1 shows that 62% of patients referred by medical specialists had these four symptoms, and these criteria are referred to as the four-symptom criteria.
Table 3. Decision Tree Analysis Multiple Test on Repeated Measures of Individual Symptoms.
| 2,2 Threshold | ||||||
|---|---|---|---|---|---|---|
|
| ||||||
| Selected Symptom (Symptom Group) |
Number of Times the Symptom Group Was Used | Total Use | ||||
| As Node 1 | As Node 2 | As Node 3 | As Node 4 | As Node 5 | ||
| Fatigue / Extreme tiredness | 94 | 2 | 2 | 0 | 0 | 98 |
| Difficulty finding the right word to say or expressing thoughts | 1 | 1 | 74 | 0 | 0 | 76 |
| Physically drained / sick after mild activity | 4 | 64 | 1 | 1 | 0 | 70 |
| Unrefreshing sleep | 0 | 1 | 6 | 20 | 26 | 53 |
| Joint pain | 0 | 2 | 3 | 25 | 2 | 32 |
| Sensitivity to noise | 0 | 7 | 2 | 11 | 0 | 20 |
| Muscle pain | 0 | 0 | 1 | 6 | 12 | 19 |
| Dead, heavy feeling after starting to exercise | 1 | 8 | 0 | 0 | 0 | 9 |
| Problems remembering things | 0 | 8 | 0 | 0 | 0 | 8 |
| Sensitivity to bright lights | 0 | 4 | 2 | 1 | 0 | 7 |
| Muscle weakness | 0 | 0 | 1 | 4 | 0 | 5 |
| Can only focus on one thing at a time | 0 | 1 | 0 | 4 | 0 | 5 |
Average performance metrics across 100 trials
Sensitivity 95.2
Specificity 91.4
Accuracy 93.3
Figure 1. Individuals referred by medical specialists in CFS and ME/CFS.

Comparison of Groups
Table 4 displays the mean RAND-36 subscale scores of healthy controls, individuals diagnosed with CFS or ME/CFS who did not meet the four-symptom criteria, and individuals diagnosed with CFS or ME/CFS who met the four-symptom criteria. Welch's F-tests indicated that these groups were significantly different on all eight subscales: Physical Functioning [F(2, 179.3) = 535.54, p < .001], Role Physical [F(2, 148.5) = 639.57, p < .001], Bodily Pain [F(2, 182.7) = 172.53, p < .001], General Health [F(2, 172.8) = 462.87, p < .001], Social Functioning [F(2, 179.9) = 452.26, p < .001], Mental Health [F(2, 192.1) = 18.67, p < .001], Role Emotional [F(2, 190.1) = 18.17, p < .001], and Vitality [F(2, 154.2) = 355.23, p < .001]. Games-Howell post hoc tests revealed significant differences between the control group and both patient groups on all eight subscales. The group that met the four-symptom criteria showed significantly worse Physical Functioning, Bodily Pain, General Health, Social Functioning, and Vitality scores than the patient group that did not meet criteria.
Table 4. Comparison of RAND-36 and Symptom Scores.
| Control | Did Not Meet Criteria1 | Met Criteria1 | Sig. | |
|---|---|---|---|---|
| (n = 80) | (n = 88) | (n = 145) | ||
| M (SD) | M (SD) | M (SD) | ||
| RAND-36 Subscale2 | ||||
| Physical Functioning | 94.5 0(9.3)ab | 46.6 (24.6)ac | 32.0 (20.9)bc | *** |
| Role Physical | 93.4 (20.7)ab | 7.5 (18.3)a | 2.8 (12.5)b | *** |
| Bodily Pain | 86.0 (15.2)ab | 55.4 (24.7)ac | 40.5 (21.5)bc | *** |
| General Health | 82.2 (13.9)ab | 30.1 (19.5)ac | 23.0 (14.9)bc | *** |
| Social Functioning | 93.2 (14.3)ab | 41.2 (28.5)ac | 22.4 (21.5)bc | *** |
| Mental Health Functioning | 80.1 (15.0)ab | 69.9 (14.5)a | 65.9 (20.3)b | *** |
| Role Emotional | 92.0 (20.8)ab | 78.2 (37.3)a | 66.7 (43.3)b | *** |
| Vitality | 71.3 (17.7)ab | 20.2 (16.3)ac | 11.8 (12.1)bc | *** |
| Symptom Scores3 | ||||
| Fatigue / Extreme tiredness | 17.6 (19.3)ab | 71.6 (19.7)ac | 84.6 (13.3)bc | *** |
| Physically drained / sick after mild activity | 2.8 (10.8)ab | 56.6 (31.7)ac | 80.0 (17.3)bc | *** |
| Difficulty finding the right word to say or expressing thoughts | 15.0 (15.1)ab | 39.8 (21.9)ac | 76.0 (17.0)bc | *** |
| Unrefreshing sleep | 23.9 (22.6)ab | 69.8 (25.8)ac | 84.6 (15.9)bc | *** |
| Total symptom score | 37.9 (37.7)ab | 151.1 (49.3)ac | 221.8 (55.4)bc | *** |
p <0.001
“Criteria” indicate frequency and severity scores of 2 or greater for the four symptoms identified in the decision tree analyses
Higher scores indicate better functioning
Higher scores indicate worse symptoms
Table 4 also displays the three groups’ mean scores for the symptoms included in the four-symptom criteria as well as the total symptom score. As expected, Welch's F-tests evidenced significant differences among groups for all four symptoms: fatigue / extreme tiredness [F(2, 154.4) = 381.37, p < .001], physically drained / sick after mild activity [F(2, 173.8) = 858.18, p < .001], difficulty finding the right word to say or expressing thoughts [F(2, 173.7) = 394.24, p < .001], and unrefreshing sleep [F(2, 151.5) = 225.23, p < .001], as well as the total symptom score [F(2, 192.3) = 443.97, p < .001]. In order to control for Type I error, Games-Howell post hoc tests were used to show whether each group was significantly different from all other groups. The patient group that met the four-symptom criteria had significantly worse scores than the patient group that did not meet criteria, and both groups had significantly worse scores than controls.
Conclusion
The findings of this study suggest that core symptoms of this illness are fatigue, post-exertional malaise, a neurocognitive symptom, and unrefreshing sleep. These findings were consistent when using continuous scores, theoretically and empirically derived cut-off scores, and data mining analyses. These results are theoretically compatible with other studies, such as Hawk, Jason and Torres-Harding's (2006) investigation which found that these domains were able to successfully differentiate patients with CFS from Major Depressive Disorder. Factor analytic studies also suggest these are among the most common domains found for this illness (Brown & Jason, 2014). Other symptoms, such as pain, autonomic, immune, and neuroendocrine symptoms are less prevalent, but still important; and scores on these domains could be specified as secondary areas of assessment. The present study suggests that empirical methods can be used to help determine which symptoms to include in the case definition.
For the Canadian ME/CFS case definition (Carruthers et al., 2003), seven symptoms need to be present for a patient to meet criteria, whereas eight are required for the ME-ICC (Carruthers et al., 2011). However, by using data mining empirical methods, only four symptoms were required to differentiate patients from controls. These have the advantage of referring to specific core symptoms rather than using the polythetic method of four out of eight symptoms of the Fukuda et al. (1994) criteria. Using the same data set as the present study, Jason, Brown, Evans, Sunnquist, and Newton (2013) found that the Fukuda et al. (2014) criteria identified 93% of the referred sample, whereas the Canadian ME/CFS clinical criteria (Carruthers et al., 2003) identified 73% of the sample. In addition, our best estimate for the ME-ICC criteria (Carruthers et al., 2011) from two other patient data sets (Jason, Sunnquist, Brown, Evans, & Newton, 2014) indicated that approximately 58% of cases would be identified. Figure 1 graphically portrays how the use of the four-symptom criteria identified in the present study classified 62% as meeting the new empirical criteria. In other words, using these four-symptom criteria, fewer patients were identified than by the Fukuda et al. (1994) CFS criteria or the ME/CFS Canadian criteria (Carruthers et al., 2003), and slightly more than by the ME-ICC criteria (Carruthers et al., 2011).
Table 4 indicates that participants who met the four-symptom criteria showed significantly more impairment than healthy controls and individuals with CFS or ME/CFS who did not meet these criteria. Of interest, those who met the four-symptom criteria did not show worse role emotional or mental health functioning than those who did not meet the four criteria. Furthermore, mean RAND-36 scores of individuals who met these criteria were similar to those who met the Canadian ME/CFS criteria (Jason, Brown, et al., 2013). The Canadian ME/CFS criteria require information on 54 symptoms in order to determine whether individuals have symptoms from the seven required domains. The results from the current study indicate that individuals identified using fewer, but empirically selected, symptoms can evidence comparable disability to those who meet other case definitions that require more symptoms.
The present study was methodologically stronger than the prior one by Jason, Sunnquist, Brown, Evans, Vernon, et al. (2014), as the current study first identified core symptoms by using continuous scores with ROC curve analyses, and then compared the unsupervised learning system to determine threshold with more theoretically driven 2,2 criteria. The current study found that the 2,2 threshold was comparable to the continuous method as well as the empirically defined threshold, thus providing support for the 2,2 criteria. In addition, the current study was different from the prior Jason, Sunnquist et al. study by conducting mining with equal sample sizes for patient and controls, reporting on 100 sets of data mining analyses as opposed to just one, and comparing those who met the four-symptom criteria to those that did not on disability measures. Findings identified four-symptom criteria that seem to differentiate patients with this illness.
Although the identified criteria in this paper resulted in a group of participants who were statistically significantly worse scores on impairment measures, some of the differences between patients who met the four-symptom criteria and those who did not were small. For example, those who met the four-symptom criteria had a 2.8 on the role physical subscale, those who did not had a 7.5, while the controls had a score of 93.4. It would seem that those who did not meet criteria were still experiencing a clinically significant impairment on that subscale that was fairly comparable to those who did meet criteria. However, in Table 4, there are only three areas where there is not a significant difference, and, in all cases, those who meet criteria have worse scores. It is possible that we have identified two groups of patients, and future work can focus on better understanding their differential characteristics.
It is possible that even what we have shown to be primary symptoms are not present at all stages of the illness. In addition, some patients were being cared for – therefore the management/treatment might have reduced the severity of symptoms as assessed in the current work. It is also possible that some of these patients didn’t have post-exertional malaise as defined by our questions, and then they would have been misclassified. In addition, it is important to note that whether one uses the four-item criteria or other criteria such as the Fukuda et al. (1994), those with other causes for CFS or ME (e.g., cancer, medications, etc.) need to be excluded, and those criteria also need to be developed using more empirical rather than just consensus methods.
Future studies might also be directed toward determining what we have classified as secondary symptoms may contribute less to the illness burden. For example, pain is the major contributor to incapacity and self-reported symptoms; thus, some might challenge its relegation to a secondary role. Others might feel that autonomic dysfunction would occur in a larger proportion of patients using additional measures to those that examine orthostatic intolerance, as autonomic dysfunction can also affect many organs and may lead to a wide variety of symptoms including gastrointestinal, genitourinary, temperature dysregulation, ocular, et cetera.
In addition, there is a need to include biological indices rather than just self-report data to confirm differences in diagnostic classifications. For example, Brenu et al. (2013) found natural killer cell activity significantly decreased for both the Fukuda et al. (1994) and the ME-ICC (Carrruthers et al., 2011) case definitions, but only those diagnosed with the ME-ICC had significant correlations between physical status and some immune parameters. Finally, the results of the current study need to be replicated.
This study has implications for assessment science and practice. Criteria variance is most likely to occur when operationally explicit criteria do not exist for diagnostic categories, or when there are varying criteria for contrasting case definitions. If the current CFS, ME/CFS, and ME diagnostic categories lack reliability and accuracy, their validity is inherently limited. There is considerable debate ongoing now within the scientific community regarding how to deal with this criteria variance problem (Jason, Najar, Porter, & Reh, 2009), and research presented in this study suggests that empirical strategies have many advantages over more consensus-based approaches in dealing with this issue. Dealing with improving the case definition is critical for enabling investigators to better understand etiology, epidemiology, pathophysiology, and treatment approaches for those with ME, ME/CFS, and CFS.
Acknowledgments
Funding was provided by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (Grant Number R01HD072208) and the National Institute of Allergy and Infectious Diseases (Grant Number AI105781). The authors appreciate the Solve ME/CFS Initiative (formerly the CFIDS Association of America), which approved the use of de-identified SolveCFS BioBank registry data in this analysis.
Contributor Information
Leonard A. Jason, DePaul University
Bobby Kot, DePaul University.
Madison Sunnquist, DePaul University.
Abigail Brown, DePaul University.
Meredyth Evans, DePaul University.
Rachel Jantke, DePaul University.
Yolonda Williams, DePaul University.
Jacob Furst, DePaul University.
Suzanne D. Vernon, Solve ME/CFS Initiative
References
- Arroll MA, Senior V. Symptom typology and sub-grouping in chronic fatigue syndrome. Bulletin of the IACFS/ME. 2009;17(2):39–52. [Google Scholar]
- Brenu EW, Johnston S, Hardcastle SL, Huth TK, Fuller K, Ramos SB, Marshall-Gradisnik SM. Immune abnormalities in patients meeting new diagnostic criteria for chronic fatigue syndrome/myalgic encephalomyelitis. Molecular Biomarkers & Diagnosis. 2013;4:3. [Google Scholar]
- Brown A, Jason LA. Validating a measure of Myalgic Encephalomyelitis/chronic fatigue syndrome symptomatology. Fatigue: Biomedicine, Health & Behavior. 2014 doi: 10.1080/21641846.2014.928014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown AA, Jason LA, Evans MA, Flores S. Contrasting case definitions: The ME International Consensus Criteria vs. the Fukuda et al. CFS Criteria. North American Journal of Psychology. 2013;15(1):103–120. [PMC free article] [PubMed] [Google Scholar]
- Brurberg KG, Fønhus MS, Larun L, Flottorp S, Malterud K. Case definitions for chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME): a systematic review. BMJ Open. 2014;20144:e003973. doi: 10.1136/bmjopen-2013-003973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cairns R, Hotopf M. A systematic review describing the prognosis of chronic fatigue syndrome. Occupational Medicine. 2005;55(1):20–31. doi: 10.1093/occmed/kqi013. [DOI] [PubMed] [Google Scholar]
- Carruthers BM, Jain AK, De Meirleir KL, Peterson DL, Klimas NG, Lerner AM, van de Sande MI. Myalgic Encephalomyelitis/chronic fatigue syndrome: Clinical working case definition, diagnostic and treatments protocols. Journal of Chronic Fatigue Syndrome. 2003;11:7–115. [Google Scholar]
- Carruthers BM, van de Sande MI, De Meirleir KL, Klimas NG, Broderick G, Mitchell T, Stevens S. Myalgic Encephalomyelitis: International Consensus Criteria. Journal of Internal Medicine (published online on 20 July 2011) 2011 doi: 10.1111/j.1365-2796.2011.02428.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fidell LS, Tabachnick BG. Preparatory data analysis. In: Schinka JA, Velicer WF, editors. Handbook of psychology Volume 2 Research methods in psychology. Hoboken,N.J: John Wiley & Sons; 2003. pp. 115–140. [Google Scholar]
- Friedberg F, Dechene L, McKenzie MJ, II, Fontanetta R. Symptom patterns in long-duration chronic fatigue syndrome. Journal of psychosomatic research. 2000;48(1):59–68. doi: 10.1016/s0022-3999(99)00077-x. [DOI] [PubMed] [Google Scholar]
- Fukuda K, Straus SE, Hickie I, Sharpe MC, Dobbins JG, Komaroff A. The chronic fatigue syndrome: A comprehensive approach to its definition and study. Annals of Internal Medicine. 1994;121:953–959. doi: 10.7326/0003-4819-121-12-199412150-00009. [DOI] [PubMed] [Google Scholar]
- Hanson SJ, Gause W, Natelson B. Detection of immunologically significant factors for chronic fatigue syndrome using neural-network classifiers. Clinical and Diagnostic Laboratory Immunology. 2001;8:658–662. doi: 10.1128/CDLI.8.3.658-662.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hawk C, Jason LA, Torres-Harding S. Differential diagnosis of chronic fatigue syndrome and major depressive disorder. International Journal of Behavioral Medicine. 2006;13:244–251. doi: 10.1207/s15327558ijbm1303_8. [DOI] [PubMed] [Google Scholar]
- Hickie I, Davenport T, Vernon SD, Nisenbaum R, Reeves WC, Hadzi-Pavlovic D, Lloyd A. Are chronic fatigue and chronic fatigue syndrome valid clinical entities across countries and health-care settings? Australian and New Zealand Journal of Psychiatry. 2009;43:25–35. doi: 10.1080/00048670802534432. [DOI] [PubMed] [Google Scholar]
- Holmes GP, Kaplan JE, Gantz NM, Komaroff AL, Schonberger LB, Strauss SS, Brus I. Chronic Fatigue Syndrome: A working case definition. Annals of Internal Medicine. 1988;108:387–389. doi: 10.7326/0003-4819-108-3-387. [DOI] [PubMed] [Google Scholar]
- Jason LA, Benton M, Johnson A, Valentine L. The economic impact of ME/CFS: Individual and societal level costs. Dynamic Medicine. 2008;7:6. doi: 10.1186/1476-5918-7-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Brown AA, Clyne E, Bartgis L, Evans M, Brown M. Contrasting case definitions for chronic fatigue syndrome, myalgic encephalomyelitis/chronic fatigue syndrome, and myalgic encephalomyelitis. Evaluation and the Health Professions. 2012;35:280–304. doi: 10.1177/0163278711424281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Brown M, Evans M, Anderson V, Lerch A, Brown A, Porter N. Measuring substantial reduction in functioning in patients with chronic fatigue syndrome. Disability & Rehabilitation. 2011;33(7):589–98. doi: 10.3109/09638288.2010.503256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Brown A, Evans M, Sunnquist M, Newton JL. Contrasting Chronic Fatigue Syndrome versus Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Fatigue: Biomedicine, Health & Behavior. 2013;1:168–183. doi: 10.1080/21641846.2013.774556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Choi M. Dimensions and assessment of fatigue. In: Yatanabe Y, Evengard B, Natelson BH, Jason LA, Kuratsune H, editors. Fatigue Science for Human Health. Tokyo: Springer; 2008. pp. 1–16. 2008. [Google Scholar]
- Jason LA, Damrongvachiraphan D, Hunnell J, Bartgis L, Brown A, Evans M, Brown M. Myalgic Encephalomyelitis: Case definitions. Autonomic Control of Physiological State and Function. 2012;1:1–14. [Google Scholar]
- Jason LA, Evans M, Brown M, Porter N, Brown A, Hunnell J, Lerch A. Fatigue scales and chronic fatigue syndrome: Issues of sensitivity and specificity. Disability Studies Quarterly. 2011;31 [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Najar N, Porter N, Reh C. Evaluating the Centers for Disease Control's empirical chronic fatigue syndrome case definition. Journal of Disability Policy Studies. 2009;20:93–100. [Google Scholar]
- Jason LA, Richman JA, Rademaker AW, Jordan KM, Plioplys AV, Taylor RR, Plioplys S. A community-based study of chronic fatigue syndrome. Archives of Internal Medicine. 1999;159:2129–2137. doi: 10.1001/archinte.159.18.2129. [DOI] [PubMed] [Google Scholar]
- Jason LA, Skendrovic B, Furst J, Brown A, Weng A, Bronikowski C. Data mining: Comparing the empiric CFS to the Canadian ME/CFS case definition. Journal of Clinical Psychology. 2012;68:41–49. doi: 10.1002/jclp.20827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, So S, Brown A, Sunnquist M, Evans M. Test-retest reliability of the DePaul Symptom Questionnaire. Fatigue: Biomedicine, Health, and Behavior. doi: 10.1080/21641846.2014.978110. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Sunnquist M, Brown A, Evans M, Newton JL. Are Myalgic Encephalomyelitis and chronic fatigue syndrome different illnesses? Journal of Health Psychology. 2014 doi: 10.1177/1359105313520335. Advance online publication. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jason LA, Sunnquist M, Brown A, Evans M, Vernon SD, Furst J, Simonis V. Examining case definition criteria for chronic fatigue syndrome and Myalgic Encephalomyelitis. Fatigue: Biomedicine, Health, and Behavior. 2014;2(1):40–56. doi: 10.1080/21641846.2013.862993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnston S, Brenu EW, Staines DR, Marshall-Gradisnik S. The adoption of chronic fatigue syndrome/myalgic encephalomyelitis case definitions to assess prevalence: A systematic review. Annals of Epidemiology. 2013a;23(6):371–6. doi: 10.1016/j.annepidem.2013.04.003. [DOI] [PubMed] [Google Scholar]
- Johnston S, Brenu EW, Staines D, Marshall-Gradisnik S. The prevalence of chronic fatigue syndrome/ myalgic encephalomyelitis: a meta-analysis. clinical Epidemiology. 2013;5:105–10. doi: 10.2147/CLEP.S39876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katon W, Russo J. Chronic fatigue syndrome criteria. A critique of the requirement for multiple physical complaints. Archives of Internal Medicine. 1992;152:1604–1609. doi: 10.1300/J092v13n02_01. [DOI] [PubMed] [Google Scholar]
- McHorney CA, Ware JE, Lu RL, Sherbourne D. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Medical Care. 1994;32:40–66. doi: 10.1097/00005650-199401000-00004. [DOI] [PubMed] [Google Scholar]
- Reeves WC, Jones JF, Maloney E, Heim C, Hoaglin DC, Boneva RS, Devlin R. Prevalence of chronic fatigue syndrome in metropolitan, urban, and rural Georgia. Population Health Metrics. 2007;5(5):1–10. doi: 10.1186/1478-7954-5-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinik AI, Maser RE, Mitchel BD, Freeman R. Diabetic autonomic neuropathy. Diabetes Care. 2003;26:1553–1579. doi: 10.2337/diacare.26.5.1553. [DOI] [PubMed] [Google Scholar]
- Ware JE, Sherbourne CD. The MOS 36-item Short-Form health survey (SF-36): Conceptual framework and item selection. Medical Care. 1992;30:473–483. [PubMed] [Google Scholar]
- Watson S, Ruskin A, Simonis V, Jason L, Sunnquist M, Furst J. Identifying defining aspects of chronic fatigue syndrome via unsupervised machine learning and feature selection. International Journal of Machine Learning and Computing. 2014;4:133–138. [Google Scholar]
