Abstract
Background: Psychotic symptoms have been under-investigated in Huntington's disease (HD) and research is needed in order to elucidate the characteristics linked to the unique phenotype of HD patients presenting with psychosis.
Objective: To evaluate the frequency and factors associated with psychosis in HD.
Methods: Cross-sectional study including manifest individuals with HD from the Enroll-HD database. Both conventional statistical analysis (Stepwise Binary Logistic Regression) and five machine learning algorithms [Least Absolute Shrinkage and Selection Operator (LASSO); Elastic Net; Support Vector Machines (SVM); Random Forest; and class-weighted SVM] were used to describe factors associated with psychosis in manifest HD patients.
Results: Approximately 11% of patients with HD presented history of psychosis. Logistic regression analysis indicated that younger age at HD clinical diagnosis, lower number of CAG repeats, history of [alcohol use disorders, depression, violent/aggressive behavior and perseverative/obsessive behavior], lower total functional capacity score, and longer time to complete trail making test-B were associated with psychosis. All machine learning algorithms were significant (chi-square p < 0.05) and capable of distinguishing individual HD patients with history of psychosis from those without a history of psychosis with prediction accuracy around 71–73%. The most relevant variables were similar to those found in the conventional analyses.
Conclusions: Psychiatric and behavioral symptoms as well as poorer cognitive performance were related to psychosis in HD. In addition, psychosis was associated with lower number of CAG repeats and younger age at clinical diagnosis of HD, suggesting that these patients may represent a unique phenotype in the HD spectrum.
Keywords: Huntington's disease, psychosis, behavior, Enroll-HD, machine learning
Introduction
Huntington's disease (HD) is traditionally classified as a movement disorder as its formal diagnosis is based on the unequivocal presence of otherwise unexplained extrapyramidal motor symptoms, for example chorea, dystonia, bradykinesia and rigidity (1, 2). Chorea is the most prominent symptom in the early stages of adult- or late-onset HD. Incoordination, bradykinesia and rigidity tend to predominate in early-onset HD and in the late stages of the more common adult-onset HD (1). While the presence of motor symptoms is required for the clinical diagnosis of HD, cognitive impairment is also a core characteristic of the disease, and can emerge years before the diagnosis (3). Along with motor and cognitive changes, psychiatric issues complete the triad of signs and symptoms that characterize HD. Psychiatric symptoms can be present across all stages of HD, even preceding the onset of motor impairment. Although not universal, they are common and may be a cause of significant distress in HD. Psychiatric manifestations in HD include depression, irritability, apathy, obsessions, and occasionally psychosis (4).
Psychosis is defined by the presence of delusions and/or hallucinations (5). The prevalence of psychotic symptoms in HD patients is variable, ranging from 3 to 11% (6). Psychosis can be very distressful for both individuals with HD and their caregivers (7). Interestingly, the presence of psychotic symptoms has defined a specific phenotype in some Huntington pedigrees. In these cases, psychosis was the most prominent symptom and predated motor and cognitive changes in most affected member across generations (8).
Psychotic symptoms have been under-investigated in HD and research is needed in order to elucidate the characteristics potentially linked to the unique phenotype of HD patients presenting with psychosis. Therefore, the current study was carried out to evaluate the factors associated with psychosis in a large database of people with HD. Taking advantage of the Enroll-HD database (9), we used both conventional statistical analyses and multivariate machine learning methods to describe the factors associated with psychosis in HD. The results of this study might contribute to the understanding of psychosis in HD and, ultimately, improving the management of these patients.
Methods
Study design and participants
This study was based on information provided by the Enroll-HD database (9). Enroll-HD is a worldwide longitudinal observational study whose sites are located in North America, Latin America, Europe, Australia and New Zealand. Among other goals, Enroll-HD was designed to provide information about the dynamic phenotypic spectrum of HD and to promote the acquisition of knowledge about standards of care to inform clinical decisions, improving the health outcomes for the participant/family unit (9).
We used the periodic dataset containing Enroll-HD participants which meet the criteria for inclusion into the dataset as of November 1, 2015 (PDS2, Wave 1 sample). Cross-sectional data from baseline visits was obtained, with the data set being composed of 4,146 participants. From these, we excluded 971 participants whose number of CAG repeats was <36 (455 genotype negative and 516 family controls). The participants with a genetic diagnosis of HD (i.e., a CAG repeat length on the larger allele >=36) were divided into premanifest and manifest subjects. According to the Enroll-HD data handling manual, participants were classified as premanifest if they had the gene expansion for HD (larger CAG allele ≥36) but no clinical diagnosis of HD, [i.e., diagnostic confidence level (DCL) <4 on question 17 of the Unified Huntington's Disease Rating Scale (UHDRS)]. The manifest group was composed by participants with the gene expansion for HD (larger CAG allele ≥36) and a clinical diagnosis of HD based on the presence of unequivocal motor signs, i.e., DCL from the UHDRS = 4. A comparison between premanifest (N = 861) and manifest (N = 2,314) subjects revealed that while only 1.3% of premanifest patients presented history of psychosis, this number was much higher when considering manifest subjects (10.8%, p < 0.001, Fisher's exact test). Hence, we decided to run the analyses considering only manifest subjects. Figure 1A summarizes the exclusion process we applied in this study.
Searching for factors associated with psychosis: conventional statistical analysis
First, we performed univariate analyses in order to investigate differences between HD patients with (N = 248) and without (N = 2,055) history of psychosis. Associations between dichotomous variables were assessed with the Fisher's exact test. All continuous variables were tested to assess whether they follow a Gaussian distribution using the Shapiro-Wilk normality test. Two groups (history of psychosis vs. no history of psychosis) were compared using the Mann–Whitney U-test since data were determined to not follow a normal distribution. Then, a binary logistic regression was performed to determine which variables (among general clinical characteristics, medical history of substance abuse and psychiatric and behavioral problems, motor and functional capacity and cognitive performance) were significant associated with history of psychosis. A backward stepwise regression was used and all variables described in Table 1 were included in the initial model, except those exhibiting some degree of interdependency: (i) mother OR father affected by HD; and (ii) variables representing history of abuse of specific drugs (marijuana, heroin, cocaine, etc.), since they are dependent on the variable “history of drugs abuse.” Therefore, the following variables were included in the initial model: age, sex, age at motor symptoms onset, age at HD clinical diagnosis (based on the presence of unequivocal motor signs, DCL = 4), whether the mother was affected, number of CAG repeats, medical history of (alcohol use disorders, smoking, drugs abuse, depression, irritability, violent/aggressive behavior, apathy, perseverative/obsessive behavior, cognitive impairment, suicidal ideation), total motor score (TMS), total functional capacity (TFC) score, symbol digit modalities test (SDMT) (number of correct responses), verbal fluency test (number of correct responses in 1 min), Stroop interference test (number of correct responses), trail making test (TMT) parts A and B (time to complete and number of correct responses) and mini-mental state examination (MMSE) score. The backward stepwise selection was automatically performed using the SPSS software version 25.0 (SPSS Inc., Chicago, IL, USA) and the removal testing was based on the probability of the likelihood-ratio statistic based on conditional parameter estimates. The goodness of fit of the logistic regression model was assessed by the Hosmer-Lemeshow test as well as a Receiver Operating Characteristic (ROC) curve.
Table 1.
Psychosis history | P-value | ||
---|---|---|---|
No (N = 2,055) | Yes (248) | ||
Age in years [mean ± SD (median)] | 52.4 ± 11.7 (53) | 53.6 ± 12.6 (53.5) | 0.18b |
Sex (% female) | 50.2 | 52.0 | 0.32a |
Age at motor symptoms onset [mean ± SD (median)] | 45.6 ± 11.6 (46) | 44.1 ± 12.0 (45) | 0.15b |
Age of clinical HD diagnosis [mean ± SD (median)] | 48.3 ± 12.1 (48) | 47.01 ± 12.5 (47) | 0.21b |
Mother affected (%) | 47.3 | 41.8 | 0.06a |
Father affected (%) | 46.7 | 52.3 | 0.06a |
CAG repeats [mean ± SD (median)] | 44.0 ± 3.7 (43) | 43.9 ± 3.9 (43) | 0.47b |
Medical history of: | |||
Alcohol use disorders | 9.3 | 17.3 | <0.0001a |
Smoking | 49.0 | 51.8 | 0.22a |
Drugs abuse | 9.7 | 13.7 | 0.04a |
Marijuana | 86.0 | 88.2 | 0.49a |
Heroin | 5.5 | 11.8 | 0.16a |
Cocaine | 29.0 | 41.2 | 0.11a |
Club drugs (ecstasy, GHB, roofies) | 19.0 | 29.4 | 0.13a |
Amphetamines | 17.5 | 26.5 | 0.16a |
Ritalin | 1.5 | 0 | 0.62a |
Hallucinogens | 19.0 | 17.6 | 0.53a |
Inhalants | 1.0 | 5.9 | 0.10a |
Opium | 2.0 | 0 | 0.53a |
Painkillers | 7.0 | 5.9 | 0.58a |
Barbiturates/sedatives | 2.5 | 5.9 | 0.27a |
Tranquilizers | 1.5 | 2.9 | 0.47a |
Depression (%) | 63.8 | 87.0 | <0.0001a |
Irritability (%) | 60.5 | 83.1 | <0.0001a |
Violent/aggressive behavior (%) | 27.2 | 59.3 | <0.0001a |
Perseverative/obsessive behavior (%) | 39.4 | 73.8 | <0.0001a |
Apathy (%) | 52.4 | 76.2 | <0.0001a |
Cognitive impairment (%) | 57.9 | 77.7 | <0.0001a |
Previous suicidal ideation (%) | 23.2 | 42.5 | <0.0001a |
Total motor score [mean ± SD (median)] | 38.1 ± 20.9 (35) | 50.1 ± 25.0 (47) | <0.0001b |
Total functional capacity [mean ± SD (median)] | 8.2 ± 3.5 (9) | 5.3 ± 3.5 (5) | <0.0001b |
SDMT (total correct) [mean ± SD (median)] | 23.5 ± 13.0 (22) | 16.6 ± 13.5 (15) | <0.0001b |
Verbal fluency test (category) – number of correct responses in1 min [mean ± SD (median)] | 12.1 ± 5.7 (12) | 9.4 ± 6.1 (9) | <0.0001b |
Stroop Interference Test – number of correct responses [mean ± SD (median)] | 24.2 ± 11.7 (24) | 18.6 ± 12.6 (17) | <0.0001b |
TMT-A time to complete [mean ± SD (median)] | 71.8 ± 52.9 (55) | 104.1 ± 70.4 (83) | <0.0001b |
TMT-A number of correct responses [mean ± SD (median)] | 24.1 ± 4.4 (25) | 22.4 ± 7.1 (25) | <0.0001b |
TMT-B time to complete [mean ± SD (median)] | 151.2 ± 71.8 (141) | 187.4 ± 66.0 (239) | <0.0001b |
TMT-B number of correct responses [mean ± SD (median)] | 21.2 ± 9.5 (25) | 17.1 ± 9.6 (24) | <0.0001b |
MMSE [mean ± SD (median)] | 25.3 ± 4.1 (26) | 22.2 ± 6.6 (24) | <0.0001b |
Only CAG ≥ 36 and manifest HD subjects (N = 2,303) from Enroll-HD which meet the criteria for inclusion into the dataset as of November 1, 2015.
Fisher's exact test;
Mann-Whitney test.
SD, Standard deviation; SDMT, Symbol Digit Modalities Test; TMT, Trail Making Test; MMSE, Mini–Mental State Examination.
In order to validate our findings, the same variables included in the logistic regression model described above were tested in an independent cohort of patients with HD (Wave 2). Specifically, the Wave 2 data were composed of new Enroll-HD participants (N = 4,752 manifest subjects) whose information was released by the Enroll-HD as of October 31, 2016 (PDS3, Figure 1B).
Lastly, we evaluated the percentage of patients that presented pre-morbid psychosis (i.e., psychosis antedating the clinical diagnosis of HD) and the current presence of psychotic symptoms [defined as a score ≥1 in the psychosis subscale of the Problem Behaviors Assessment–short version (PBA-s)]. The PBA-s psychosis subscale is calculated as the sum of delusions / paranoid thinking score + hallucinations score (questions 9 and 10 of the PBA-s, respectively).
Searching for variables capable of distinguishing individuals with and without psychosis: an individualized approach using multivariate machine learning
Machine learning—also known as pattern recognition—is a branch of computer science that involves developing algorithms that can learn from patterns of data, and subsequently able to make predictions on previously “unseen” observations (10). These algorithms are able to identify patterns of interactions among multiple variables and facilitate predictions and stratification of individual subjects' clinical outcomes (11). Machine learning has recently gained traction in biomedical studies due to their ability to analyze data from multiple observations and varied sources—also known as “big data” (11, 12). Machine learning approaches can result in highly accurate predictive models that support important clinical decisions such as selection of treatment options, preventive strategies, and prognosis orientations.
Machine learning algorithms are typically implemented in three stages: (1) algorithm training, (2) algorithm testing; and (3) validation. First, the data are divided into two groups (i.e., training and testing sets). Stage 1: the “training” set is used to train the algorithm and identify a set of optimal algorithm parameters. Stage 2: the “testing” set is used to examine whether the algorithm is able to generalize from the training set and calculate algorithm's prediction performance using accuracy, sensitivity and specificity metrics. Notably, it is a common practice to use a k-fold (i.e., k = 5 or 10) cross-validation method to separate algorithm training and testing data sets. Stage 3: once the algorithm has gone through the “training” and “testing” phase, it is evaluated using a “novel” evaluation data set—which was not included in the algorithm training or testing stages (13). In this study, we examined five machine learning algorithms: (i) Least Absolute Shrinkage and Selection Operator (LASSO) (14); (ii) Elastic Net (15); (iii) Support Vector Machines (SVM) (16); (iv) Random Forest (17); and (v) class-weighted SVM (18).
The same variables included in the Binary Logistic Regression (described in section Searching for Factors Associated With Psychosis: Conventional Statistical Analysis) were included in the machine learning approach. The main motivation here was to examine which demographic and clinical characteristics can individually distinguish between HD patients with and without psychosis. The machine learning algorithms were implemented using the Python programming language (19) through the Scikit-learn machine learning package (20). Missing data were imputed by replacing missing predictor variables with the mean. Predictor variables (Table 1) were normalized between zero and unity and together with corresponding categorical labels (0 – no history of psychosis; 1 – history of psychosis) used as an “input-target” pair for machine learning analyses.
Machine learning algorithms used in this study can be divided into three broad categories: (1) penalized linear regression (LASSO and Elastic Net), (2) Kernel-based (SVMs) and (3) Ensemble decision trees (random forests). Penalized linear regression methods use a classical linear regression approach albeit with additional penalty parameters to facilitate selection of most relevant variables or remove those that are redundant (21, 22). The penalized linear regression algorithms employed in this study use a logistic function to model probability and categorical outcomes (14, 15). On the other hand, Kernel-based methods use linear and non-linear kernel functions (e.g., polynomial, linear, and Gaussian) to “map” training data onto a higher dimensional space where a decision function able to separate both clinical groups is derived. Lastly, ensemble decision tree methods begin by constructing multiple decision trees which are subsequently combined by taking a majority (i.e., mode) of the predicted scores (17, 23).
The class imbalance problem
Class imbalance is a common problem in machine learning classification studies which happens when the number of observations in one class (e.g., no psychosis, N = 2,055) greatly exceeds the remaining class (e.g., psychosis, N = 238) (10, 23, 24). The class imbalance problem results in the machine learning algorithm being unable to generalize to previously unseen observations by largely assigning them to the majority class (24). Although there are multiple solutions that can mitigate the class imbalance problem (25), we used the majority class random under-sampling method (24, 25) and in the case of SVMs, a weighted SVM algorithm was also implemented.
Briefly, the majority class under-sampling method involves “under-sampling” the majority class (i.e., individuals without psychosis) which is followed by training a machine learning algorithm using a balance sample. In this study, this process was repeated 5,000 iterations and predictions aggregated as shown in Figure 2A. On the other hand, a weighted SVM algorithm mitigates the class imbalance problem by weighting the SVM penalty parameter with the corresponding ratio of observations in each class. For example, in this study the SVM penalty—also known as regularization parameter—was weighted using a ratio of psychosis vs. non-psychosis (i.e., 1:8). We used the weighted SVM algorithm as implemented in the Scikit-learn package (20) and explored in details elsewhere (26). Noticeably, the weighted SVM method did not require under-sampling of the majority class and, therefore, used all data during training which is a major strength as compared to the majority class random under-sampling method mentioned above (26).
Machine learning algorithm training, testing and validation
The training and testing of all machine learning algorithms were performed using a 10-fold cross-validation approach (27) which entailed subdividing the Wave 1 sample into 10 subsets. Therefore, in each iteration 9 subsets were used for training the algorithm while the remaining subset was used for testing. This process was repeated until all subsets were used for testing the at-least once (Figure 2B). Notably, all algorithm parameters were selected using a nested 10-fold cross-validation which excluded the test sample to avoid circularity or double-dipping (28, 29). The algorithms' ability to identify novel or previously “unseen” subjects as belonging to either psychosis or non-psychosis was quantified using prediction accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) values. ROCs and the corresponding AUCs were computed. Chi-square statistical tests between actual and machine learning predicted labels were also calculated and considered significant when p < 0.05. Permutation-based p-values were calculated using the Scikit-learn package (20, 30) and significance set at p < 0.05.
Lastly, at the validation stage, the machine learning algorithms were trained using Wave 1 data and validated using a unique or “novel” Wave 2 dataset (Figure 2C). The validation step entailed training the algorithms using the Wave 1 sample by selecting parameters using a 10-fold cross-validation. Subsequently, the algorithms were tested using the Wave 2 sample. As above, prediction accuracy, specificity, sensitivity, PPV, NPV, ROC curve, and AUC were also computed.
Results
Conventional statistics results
Nearly 11% of motor manifest individuals with HD presented with a history of psychosis (N = 248 out of the 2,303). Demographics and clinical characteristics of manifest patients with HD with and without history of psychosis are shown in Table 1. The CAG trinucleotide repeat length was similar in both groups. Patients with a history of psychosis exhibited a higher frequency of behavioral problems and worse motor, functional capacity and cognitive scores than patients without a history of psychosis (Table 1).
Regarding the multivariate analysis, younger age at HD clinical diagnosis, lower number of CAG repeats, clinical history of [alcohol use disorders, depression, violent/aggressive behavior and perseverative/obsessive behavior], lower TFC score, and longer time to complete TMT-B (meaning worse cognitive performance) remained as significant factors associated with the history of psychosis in the final model (step 19). The results are presented in Table 2. The logistic regression model was significant [Hosmer-Lemeshow goodness of fit test (step 19): Chi-square = 11.4; p = 0.2] and the predicted variability resulted in an area under the curve (AUC) of 0.793 in the ROC analysis (Supplementary Figure 1A). The logistic regression results were corroborated by an external validation, since the relevant factors associated with psychosis in HD described above were very similar in the Wave 2 sample analysis [Supplementary Table 1; Hosmer-Lemeshow goodness of fit test (step 18): Chi-square = 7.2; p = 0.5. AUC = 0.816 in the ROC analysis (Supplementary Figure 1B)].
Table 2.
Variable | 95% CI for odds ratio | |||||||
---|---|---|---|---|---|---|---|---|
B | SE | Wald | df | p-value | Odds ratio | Lower | Upper | |
Age at clinical diagnosis | −0.51 | 0.016 | 9.749 | 1 | 0.002 | 0.951 | 0.921 | 0.981 |
Number of CAG repeats | −0.149 | 0.056 | 7.051 | 1 | 0.008 | 0.862 | 0.772 | 0.962 |
History of alcohol use disorders | 0.568 | 0.323 | 3.084 | 1 | 0.079 | 1.764 | 0.936 | 3.324 |
History of depression | 1.235 | 0.372 | 11.053 | 1 | 0.001 | 3.440 | 1.660 | 7.126 |
History of violent/aggressive behavior | 0.711 | 0.246 | 8.358 | 1 | 0.004 | 2.036 | 1.257 | 3.297 |
History of perseverative/obsessive behavior | 1.374 | 0.276 | 24.866 | 1 | 0.000 | 3.952 | 2.303 | 6.674 |
TFC score | −0.074 | 0.043 | 3.025 | 1 | 0.082 | 0.929 | 0.854 | 1.009 |
TMT-B (time to complete) | 0.006 | 0.002 | 7.207 | 1 | 0.007 | 1.006 | 1.001 | 1.010 |
The analysis considered only CAG ≥ 36 and manifest HD subjects from Enroll-HD which meet the criteria for inclusion into the dataset as of November 1, 2015 (Wave 1 sample; N = 2,303).
CI, confidence interval; df, degrees of freedom; TFC, total functional capacity; TMT, trail making test; SE, standard error.
Our additional analyses revealed that among HD patients with history of psychosis, 31.6% currently have psychotic symptoms and 55.3% were pre-morbid (i.e., the age of psychosis symptoms preceded the age of clinical diagnosis of HD). Among these patients, the psychotic symptoms started in mean 4.11 (±6.26) years before the clinical diagnosis of HD. In addition, we observed that patients with HD who had a history of psychosis had higher scores in all the behavioral subscales of the PBA-s in comparison with patients with no history of psychosis (Supplementary Table 2).
Machine learning results
All algorithms were capable of distinguishing individual HD patients with history of psychosis from those without a history of psychosis with prediction accuracy ranging from 71 to 73%. These results were established using a 10-fold cross-validation using the Wave 1 dataset only. All models were significant (chi-square p < 0.05, Table 3). A confusion matrix—which represents predicted labels against true labels (0 – no history of psychosis, 1 – history of psychosis) and a ROC curve for the weighted SVM algorithm are shown in Figures 3A,B, respectively.
Table 3.
Algorithm | Balanced accuracy (specificity + sensitivity)/2 | Classical accuracy | AUC | Specificity | Sensitivity | PPV | NPV | Chi-square/permutations p-value |
---|---|---|---|---|---|---|---|---|
LASSO | 0.72 | 0.70 | 0.79 | 0.69 | 0.75 | 0.71 | 0.73 | P < 0.05 |
Elastic net | 0.70 | 0.68 | 0.78 | 0.67 | 0.73 | 0.69 | 0.71 | P < 0.05 |
SVM | 0.72 | 0.70 | 0.72 | 0.70 | 0.74 | 0.71 | 0.73 | P < 0.05 |
Random forest | 0.71 | 0.71 | 0.77 | 0.71 | 0.72 | 0.71 | 0.71 | P < 0.05 |
Weighted SVM | 0.71 | 0.72 | 0.78 | 0.72 | 0.71 | 0.71 | 0.71 | p < 0.05, p = 0.0002 |
AUC, Area under the curve; LASSO, Least Absolute Shrinkage and Selection Operator; NPV, negative predictive value; PPV, positive predictive value; SVM, Support Vector Machines. Sensitivity and specificity represented correctly predicted history of psychosis (true positive rate) and correctly predicted no history of psychosis (true negative rate), respectively. The “classical” prediction accuracy was provided by taking the sum of true positives and true negatives divided by the total sample. Due to the class imbalance problem, a “balanced” accuracy was also calculated as the average of predicted sensitivity and specificity. PPV was calculated as the proportion of individuals predicted as having history of psychosis and who actually presented history of psychosis. NPV was calculated as the proportion of individuals who were predicted as not having history of psychosis and who actually did not present history of psychosis. Permutation-based p-values were calculated for the weighted SVM using the permutations tests method presented by Ojala et al. (30) and implemented in Scikit-learn.
Figure 3C shows a bar graph representing weighting factors assigned to each variable by a weighted SVM based on their relevance in distinguishing individuals with and without history of psychosis. The most relevant variables in identifying individuals with and without history of psychosis were the TFC score, worse performance at cognitive tests (longer time to complete the TMT-B and lower scores at SDMT), as well as history of perseverative/obsessive behavior. In addition, younger age at clinical HD diagnosis was found to be relevant in distinguishing individuals with and without history of psychosis.
To further validate our results, the machine learning algorithms developed using the Wave 1 sample were tested on “novel” or previously unseen observations from the Wave 2 validation sample. Results obtained by the machine learning models in identifying subjects with and without history of psychosis from Wave 2 sample are shown in Table 4. All validation models were significant (chi-square p < 0.05, Table 4). In particular, the weighted SVM reported the best accuracy on the Wave 2 validation sample.
Table 4.
Algorithm | Balanced accuracy (specificity + sensitivity)/2 | Classical accuracy | AUC | Specificity | Sensitivity | PPV | NPV | Chi-square/permutation test p-value |
---|---|---|---|---|---|---|---|---|
LASSO | 0.71 | 0.68 | 0.79 | 0.67 | 0.75 | 0.69 | 0.73 | P < 0.05 |
Elastic net | 0.70 | 0.67 | 0.77 | 0.66 | 0.74 | 0.69 | 0.72 | P < 0.05 |
SVM | 0.72 | 0.69 | 0.79 | 0.68 | 0.75 | 0.70 | 0.73 | P < 0.05 |
Random forest | 0.71 | 0.68 | 0.78 | 0.69 | 0.73 | 0.7 | 0.72 | P < 0.05 |
Weighted SVM | 0.72 | 0.70 | 0.79 | 0.69 | 0.75 | 0.71 | 0.73 | p < 0.05, p = 0.0004 |
AUC, Area under the curve; LASSO, Least Absolute Shrinkage and Selection Operator; NPV, negative predictive value; PPV, positive predictive value; SVM, Support Vector Machines.
Discussion
To our knowledge, this is the most comprehensive study evaluating factors associated with psychosis in HD. Using conventional statistical analyses, a history of depression, violent/aggressive behavior, perseverative/obsessive behavior and excessive alcohol use were, among the behavioral related variables, the significant variables associated with psychosis history in HD. In addition, younger age at clinical HD diagnosis, lower number of CAG repeats, worse functional capacity and poorer cognitive performance were significant associated with psychosis in HD. An external validation with an independent cohort of patients with HD (Wave 2) and the machine learning approach corroborated these results.
We found that almost 11% of patients from the Enroll-HD database had a history of psychosis. The prevalence of psychotic symptoms in HD patients varies between 3 and 11% (6). The great variability in psychosis prevalence among different studies probably results from participants' selection criteria. For instance, the selection of an outpatient population reduces the likelihood of observing behavioral symptoms severe enough to require hospitalization and therefore may underestimate the prevalence of psychosis (31). Indeed, institutionalized patients with HD are more likely to have delusions and auditory hallucinations than outpatients (32) as the psychotic symptoms are often the cause for institutionalization
The percentage of patients with HD that have psychotic symptoms may vary depending on the disease stage. It has been reported that delusions and hallucinations are more prevalent in the middle stages of the disease (33, 34). However, our analyses revealed that among HD patients with a history of psychosis, psychotic symptoms preceded the clinical diagnosis of HD in the majority of patients (55.3%). A few studies have pointed out that psychotic symptoms in HD may occur before the clinical diagnosis (35, 36). Early psychotic symptoms and inappropriate behaviors have been described in juvenile HD (37, 38). Supporting this latter observation, our results showed that younger age at clinical HD diagnosis were associated with history of psychosis. Only 18 out of the 2,303 patients from the Enroll-HD dataset (PDS2) had juvenile HD (defined as a clinical diagnosis before 20 years old). From these, 4 had history of psychosis, resulting in a percentage (29%) way above the general HD population. A recent study analyzed data from 230 Spanish patients from the REGISTRY database and found a prevalence of 4% of psychosis in both premanifest and early symptomatic patients with HD (39). Older studies have reported a higher prevalence of schizophrenia-like psychosis in HD in comparison with the general population. Some case-reports described patients that were first diagnosed with schizophrenia-like psychosis and only later with HD (8, 40–42). Taken together, these data confirm that psychosis may antecede motor symptoms in HD.
There are a few studies evaluating clinical correlates of psychosis in HD. A study involving a large cohort of HD mutation carriers (1,993 participants from the observational REGISTRY study) investigating current psychosis, found low percentages of participants scoring mild (2.9%) and moderate to severe (1.2%) psychosis. HD mutation carriers with psychosis had a significantly longer duration of disease, a higher TMS, a lower TFC score, more often a positive psychiatric history for depression, obsessive compulsive behaviors and psychosis, and more often used benzodiazepines and antipsychotics. The only independent correlates of current psychosis were a history of psychosis and use of benzodiazepines (34). Overall these results corroborate our findings.
We also found that lower number of CAG repeats were associated with history of psychosis. The age of clinical onset of symptoms in HD is strongly influenced by the length of the CAG trinucleotide expansion within the HTT gene (1). It is possible that since the clinical onset of HD is defined mainly by motor symptoms, the patients with lower CAG repeats may present with psychiatric problems, including psychosis, before the development of significant motor symptoms necessary for clinical diagnosis.
There are specific HD phenotypes that represent the heterogeneity in clinical presentation and rates of progression (1). The pathophysiological process underlying HD may contribute to the development of psychotic symptoms in a subset of patients. However, because not all HD patients or families are susceptible to developing psychosis, other predisposing factors may also exist (43). Herein, we described clinical factors associated with psychosis in HD, such as psychiatric and behavioral problems and poorer cognitive performance. These factors might predispose psychosis in HD. However, genetic/biological factors might also contribute to the development of psychosis in HD. Future studies are needed in this regard. The hypothesis that the presence of psychotic symptoms might be part of a unique HD phenotype is substantiated by some studies with HD families. In some HD pedigrees, psychotic symptoms have defined the disease phenotype across generations. In these families, psychosis was the most prominent symptom and preceded the motor or cognitive changes in almost all affected members (8). Another study found that probands with psychotic symptoms were much more likely to have a first-degree relative with psychosis than were the nonpsychotic comparison probands. In addition, the age at onset of psychosis was lower in probands with a higher number of CAG repeats (43).
We are aware of the limitations of our study. First, we cannot make assumptions about the temporal relationship between the factors associated with psychosis and psychosis itself. A longitudinal study is needed in this regard. Also, some variables we described as significant associated with psychosis are difficult to be defined (e.g., drugs of abuse), but this is a limitation intrinsic to the information available in the Enroll-HD dataset. Lastly, our analyses were not controlled for medications use and some drugs might mask psychotic symptoms [e.g., antipsychotics that are commonly used for the management of chorea in HD (44)]. The majority of the patients (N = 1,272, 55%) have history of antipsychotics use. A great percentage (N = 830, 36%) was using antipsychotics at the time of the baseline interview (the source of information for the current study). The most commonly used antipsychotics were olanzapine (N = 228), followed by tiapride (N = 160), risperidone (N = 128), quetiapine (N = 104) and haloperidol (N = 69). On the other hand, the big sample size, the use of both conventional statistics and machine learning methods with convergent results and the external validation with an independent cohort of patients with HD can be regarded as strengths of our study. In addition, the multivariate machine learning approach is able to consider interactions between all variables and to distinguish between HD patients with and without psychosis with significant accuracy (specificity and sensitivity). Noteworthy, our results were validated/replicated in an independent cohort (i.e., wave 2 analyses).
Psychiatric and behavioral problems and poorer cognitive performance were significant associated psychosis in HD. Accordingly, psychosis seems to occur in patients with HD that also have a burden of other non-motor symptoms. Further analysis in the larger database with longitudinal assessments will allow refining the analysis and determining predictors of psychosis in HD. As psychosis was associated with lower number of CAG repeats and younger age at clinical diagnosis of HD, our study also suggests that patients presenting with a history of psychosis may represent a unique phenotype in the HD spectrum. A well-defined subtype of HD patients may allow the identification of genetic modifiers associated with newer pathogenic mechanisms and lead to novel therapeutic targets.
Ethics statement
The study protocol has been approved by the UTHealth Committee for the Protection of Human Subjects (IRB number: HSC-MS-15-0881). A waiver of consent was granted since we worked with a consolidated data set, containing only de-identified data.
Author contributions
EF and AT worked on the design and conceptualization of the study. NR, BM, and CG worked on the data analysis and interpretation. NR and BM wrote the first draft of the manuscript. EF, CS, and AT revised the manuscript for intellectual content. All authors read and approved the manuscript.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
We would like to thank the CHDI Foundation for granting us access to the Enroll-HD periodic datasets. We would also like to thank the members of the Independent Statistical Standing Committee (ISSC) – CHDI Foundation, especially Prof. Martin Lindquist and Prof. Glyn Lewis for their critical comments and skilled assistance regarding the statistical analyses. NR is a Huntington's Disease Society of America (HDSA) fellowship recipient. The Enroll-HD is a CHDI Foundation Project.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fneur.2018.00930/full#supplementary-material
References
- 1.Ross CA, Aylward EH, Wild EJ, Langbehn DR, Long JD, Warner JH, et al. Huntington disease: natural history, biomarkers and prospects for therapeutics. Nat Rev Neurol. (2014) 10:204–16. 10.1038/nrneurol.2014.24 [DOI] [PubMed] [Google Scholar]
- 2.Huntington Study Group Unified Huntington's disease rating scale: reliability and consistency. Mov Disord. (1996) 11:136–42. 10.1002/mds.870110204 [DOI] [PubMed] [Google Scholar]
- 3.Duff K, Paulsen JS, Beglinger LJ, Langbehn DR, Wang C, Stout JC, et al. “Frontal” behaviors before the diagnosis of Huntington's disease and their relationship to markers of disease progression: evidence of early lack of awareness. J Neuropsychiatry Clin Neurosci. (2010) 22:196–207. 10.1176/jnp.2010.22.2.196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Epping EA, Kim JI, Craufurd D, Brashers-Krug TM, Anderson KE, McCusker E, et al. Longitudinal psychiatric symptoms in prodromal Huntington's disease: a decade of data. Am J Psychiatry (2016) 173:184–92. 10.1176/appi.ajp.2015.14121551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.NIoMH . What is Psychosis? National Institute of Mental Health. Available online at: https://www.nimh.nih.gov/health/topics/schizophrenia/raise/what-is-psychosis.shtml (2017). [Google Scholar]
- 6.van Duijn E, Kingma EM, van der Mast RC. Psychopathology in verified Huntington's disease gene carriers. J Neuropsychiatry Clin Neurosci. (2007) 19:441–8. 10.1176/jnp.2007.19.4.441 [DOI] [PubMed] [Google Scholar]
- 7.Goncalves-Pereira M, Xavier M, van Wijngaarden B, Papoila AL, Schene AH, Caldas-de-Almeida JM. Impact of psychosis on Portuguese caregivers: a cross-cultural exploration of burden, distress, positive aspects and clinical-functional correlates. Soc Psychiatry Psychiatr Epidemiol. (2013) 48:325–35. 10.1007/s00127-012-0516-7 [DOI] [PubMed] [Google Scholar]
- 8.Correa BB, Xavier M, Guimaraes J. Association of Huntington's disease and schizophrenia-like psychosis in a Huntington's disease pedigree. Clin Pract Epidemiol Ment Health (2006) 2:1. 10.1186/1745-0179-2-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Enroll-HD A User Guide to the Clinical Datasets and Biosamples Available From Enroll-HD. CHDI Foundation, Version 2016-10-R1. Available online at: http://www.enroll-hd.org/enrollhd_documents/2016-10-R1/Enroll-HD-Plus-User-Guide-2016-10-R1.pdf (2016).
- 10.Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. (2015) 16:321–32. 10.1038/nrg3920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kuffner R, Zach N, Norel R, Hawe J, Schoenfeld D, Wang L, et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat Biotechnol. (2015) 33:51–7. 10.1038/nbt.3051 [DOI] [PubMed] [Google Scholar]
- 12.Passos IC, Mwangi B, Kapczinski F. Big data analytics and machine learning: 2015 and beyond. Lancet Psychiatry (2016) 3:13–5. 10.1016/S2215-0366(15)00549-0 [DOI] [PubMed] [Google Scholar]
- 13.Bishop C. Pattern Recognition and Machine Learning. New York, NY: Springer-Verlag; (2006). [Google Scholar]
- 14.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Series B Stat Methodol. (2011) 73:273–82. 10.1111/j.1467-9868.2011.00771.x [DOI] [Google Scholar]
- 15.Zou HH. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol. (2005) 67:301–20. 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]
- 16.Vapnik VN. Statistical Learning Theory: New York, NY: Wiley-Interscience; (1998). [Google Scholar]
- 17.Breiman L. Random forests. Mach Learn. (2001) 45:5–32. [Google Scholar]
- 18.He HM. Imbalanced Learning: Foundations, Algorithms, and Applications: Hoboken, NJ: Wiley-IEEE Press; (2013). [Google Scholar]
- 19.Sanner MF. Python: a programming language for software integration and development. J Mol Graph Model (1999) 17:57–61. [PubMed] [Google Scholar]
- 20.Pedregosa FV, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, et al. Scikit-learn: machine learning in python. ?J Mach Learn Res. (2011) 12:1825–2830. [Google Scholar]
- 21.Mwangi B, Tian TS, Soares JC. A review of feature reduction techniques in neuroimaging. Neuroinformatics (2014) 12:229–44. 10.1007/s12021-013-9204-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang W, Wan YW, Allen GI, Pang K, Anderson ML, Liu Z. Molecular pathway identification using biological network-regularized logistic models. BMC Genomics (2013) 14(Suppl. 8):S7. 10.1186/1471-2164-14-S8-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Murphy KP. Machine Learning: A Probabilistic Perspective: Cambridge, MA: MIT Press; (2012). [Google Scholar]
- 24.Passos IC, Mwangi B, Cao B, Hamilton JE, Wu MJ, Zhang XY, et al. Identifying a clinical signature of suicidality among patients with mood disorders: a pilot study using a machine learning approach. J Affect Disord. (2016) 193:109–16. 10.1016/j.jad.2015.12.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nanni LF, C.;, Lazzarini N. Coupling different methods for overcoming the class imbalance problem. Neurocomputing (2015) 158:48–61. 10.1016/j.neucom.2015.01.068 [DOI] [Google Scholar]
- 26.Bobra MGS, C Solar flare prediction using SDO/HMI vector magnetic field data with a machine-learning algorithm. Astrophys J. (2015) 798:135 10.1088/0004-637X/798/2/135 [DOI] [Google Scholar]
- 27.Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI (1995) 14:1137–43. [Google Scholar]
- 28.Kriegeskorte N, Simmons WK, Bellgowan PS, Baker CI. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci. (2009) 12:535–40. 10.1038/nn.2303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Poldrack RA, Gorgolewski KJ. Making big data open: data sharing in neuroimaging. Nat Neurosci. (2014) 17:1510–7. 10.1038/nn.3818 [DOI] [PubMed] [Google Scholar]
- 30.Ojala M, Garriga GC. Permutation tests for studying classifier performance. J Mach Learn Res. (2010) 11:1833–63. [Google Scholar]
- 31.Craufurd D, Thompson JC, Snowden JS. Behavioral changes in huntington disease. Neuropsychiatry Neuropsychol Behav Neurol. (2001) 14:219–26. [PubMed] [Google Scholar]
- 32.Wheelock VL, Tempkin T, Marder K, Nance M, Myers RH, Zhao H, et al. Predictors of nursing home placement in Huntington disease. Neurology (2003) 60:998–1001. 10.1212/01.WNL.0000052992.58107.67 [DOI] [PubMed] [Google Scholar]
- 33.Kirkwood SC, Su JL, Conneally P, Foroud T. Progression of symptoms in the early and middle stages of Huntington disease. Arch Neurol. (2001) 58:273–8. 10.1001/archneur.58.2.273 [DOI] [PubMed] [Google Scholar]
- 34.van Duijn E, Craufurd D, Hubers AA, Giltay EJ, Bonelli R, Rickards H, et al. Neuropsychiatric symptoms in a European Huntington's disease cohort (REGISTRY). J Neurol Neurosurg Psychiatry (2014) 85:1411–8. 10.1136/jnnp-2013-307343 [DOI] [PubMed] [Google Scholar]
- 35.Nagel M, Rumpf HJ, Kasten M. Acute psychosis in a verified Huntington disease gene carrier with subtle motor signs: psychiatric criteria should be considered for the diagnosis. Gen Hosp Psychiatry (2014) 36:361 e3–4. 10.1016/j.genhosppsych.2014.01.008 [DOI] [PubMed] [Google Scholar]
- 36.Petit AC, Hozer F, Youssov K, Lavaud P, Hardy P, Mouaffak F. Differential response to ECT of psychotic and affective symptoms in Huntington's disease: a case report. J Neuropsychiatry Clin Neurosci. (2016) 28:e3–5. 10.1176/appi.neuropsych.15040084 [DOI] [PubMed] [Google Scholar]
- 37.Chuo YP, Hou PH, Chan CH, Lin CC, Liao YC. Juvenile Huntington's disease presenting as difficult-to-treat seizure and the first episode of psychosis. Gen Hosp Psychiatry (2012) 34:436 e9–11. 10.1016/j.genhosppsych.2012.02.008 [DOI] [PubMed] [Google Scholar]
- 38.Jardri R, Medjkane F, Cuisset JM, Vallee L, Delion P, Goeb JL. Huntington's disease presenting as a depressive disorder with psychotic features. J Am Acad Child Adolesc Psychiatry (2007) 46:307–8. 10.1097/chi.0b013e31802ed8c9 [DOI] [PubMed] [Google Scholar]
- 39.Martinez-Horta S, Perez-Perez J, van Duijn E, Fernandez-Bobadilla R, Carceller M, Pagonabarraga J, et al. Neuropsychiatric symptoms are very common in premanifest and early stage Huntington's Disease. Parkinsonism Relat Disord. (2016) 25:58–64. 10.1016/j.parkreldis.2016.02.008 [DOI] [PubMed] [Google Scholar]
- 40.Gama Marques J, Carnot MJ. Huntington's disease in a patient with 15-year history of Capgras delusion misdiagnosed as paranoid schizophrenia. Gen Hosp Psychiatry (2016) 39:97–8. 10.1016/j.genhosppsych.2015.11.007 [DOI] [PubMed] [Google Scholar]
- 41.Radanovic-Grguric L, Petek A, Laufer D, Koic O, Radanovic B, Filakovic P. Pharmacologic side effects and/or neurologic disorder: case report. Psychiatr Danub (2009) 21:575–8. [PubMed] [Google Scholar]
- 42.Xu C, Yogaratnam J, Tan N, Sim K. Psychosis, treatment emergent extrapyramidal events, and subsequent onset of huntington's disease: a case report and review of the literature. Clin Psychopharmacol Neurosci. (2016) 14:302–4. 10.9758/cpn.2016.14.3.302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tsuang D, Almqvist EW, Lipe H, Strgar F, DiGiacomo L, Hoff D, et al. Familial aggregation of psychotic symptoms in Huntington's disease. Am J Psychiatry (2000) 157:1955–9. 10.1176/appi.ajp.157.12.1955 [DOI] [PubMed] [Google Scholar]
- 44.Unti E, Mazzucchi S, Palermo G, Bonuccelli U, Ceravolo R. Antipsychotic drugs in Huntington's disease. Expert Rev Neurother. (2017) 17:227–37. 10.1080/14737175.2016.1226134 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.