Abstract
The COVID-19 pandemic continues to be widespread, and little is known about mental health impacts from dealing with the disease itself. This retrospective study used a deidentified health information exchange (HIE) dataset of electronic health record data from the state of Rhode Island and characterized different subgroups of the positive COVID-19 population. Three different clustering methods were explored to identify patterns of condition groupings in this population. Increased incidence of mental health conditions was seen post-COVID-19 diagnosis, and these individuals exhibited higher prevalence of comorbidities compared to the negative control group. A self-organizing map cluster analysis showed patterns of mental health conditions in half of the clusters. One mental health cluster revealed a higher comorbidity index and higher severity of COVID-19 disease. The clinical features identified in this study motivate the need for more in-depth analysis to predict and identify individuals at high risk for developing mental illness post-COVID-19 diagnosis.
Introduction
The Coronavirus Disease 2019 (COVID-19) pandemic that began in the United States on March 11, 2020 continues to impact most of the population and has been identified as a public and mental health crisis.1–6 Individuals diagnosed with COVID-19 have shown an increased risk for mental health disorders such as post-traumatic stress disorder (PTSD) and depression, among others.7–16 Previous studies such as Taquet et al. and Xie et al. used cohort analyses to describe increased risk for neurological and psychiatric morbidity across patients diagnosed after COVID-19.17,18 These studies called for further validation and research prioritization for mental health disorders following SARS-CoV-2 infection. Factors such as comorbidities (or co-occurring conditions either related or unrelated to a COVID-19 diagnosis), demographics, and social features have an impact on the risk for developing mental health related conditions.19 Few studies have used machine learning to identify clinical characteristics and patterns associated with developing mental illness after COVID-19 illness. Understanding how prevalence of comorbid physical disease affects the risk for developing a mental health condition is an important area of focus.20 Better understanding of mental health disorder prevalence, a patient’s clinical profile, and the patterns seen within the population after COVID-19 illness is needed to identify risk factors and aid in the prioritization of resources and research planning.21,22
This study aimed to validate the hypothesis that there are increased mental health diagnoses seen post-COVID-19 diagnosis and identify important features and patterns of clinical features observed in these patients. The overall goal was to confirm groupings or clusters of mental health patients within those that have been COVID-19 positive as well as further characterize and describe the patient profiles seen within these clusters. Data mining methods can provide an understanding of the broader community and population and identify outcomes of COVID-19 disease survivors and related mental health outcomes. Machine learning, and in particular unsupervised clustering techniques, have been used to investigate patterns of comorbidities associated with patients in a particular population.22–27 We examined the phenotypic patient profiles and comorbidity burden that could contribute to higher risk for mental health concerns. Incident rates of mental health disorders were examined, and notable comorbidities, demographics, and procedures were described for different subgroups of individuals. Clinical features were chosen based on previous literature to describe the comorbidity burden and characteristics strongly associated with COVID-19 severity and risk.17,18,21,28–32 Three clustering methods were compared to determine if clusters of individuals with mental health disorders post- COVID-19 could be identified and further explained. The important features of co-occurring conditions, social factors, and demographics as it relates to mental health were described. The results of this study provide evidence that there is increasing incidence of mental health conditions and comorbidities compared to a control population.
Methods
As depicted in Figure 1, we sought to group and characterize COVID-19 positive patients with a mental health disorder post-diagnosis using statewide deidentified electronic health data and unsupervised clustering techniques. Statistical and machine learning analyses were performed using Julia (v.1.7.1) and R-statistical software (v.3.6.1).33,34
Figure 1.
Overview of Study Approach
Dataset
The Rhode Island Quality Institute (RIQI) operates Rhode Island’s Health Information Exchange (HIE) “CurrentCare” and is the state-designated Regional Health Information Organization (RHIO). CurrentCare contains Electronic Health Record (EHR) data from all acute care hospitals in Rhode Island and from many ambulatory and laboratory facilities across the state for over 540,000 individuals. De-identified data from CurrentCare were provided by RIQI for the study period January 2018 – December 2021. The de-identification strategy included randomly shifting dates, consistently per patient. This dataset contained lab-confirmed, suspected, and possible cases of COVID-19 and adopted the Observational Medical Outcomes Partnership (OMOP) Common Data Model format, leveraged by the Observational Health Data Sciences and Informatics (OHDSI) program. These cases were demographically matched (on age group, sex, race, and ethnicity) to controls who tested negative or equivocal for COVID-19, at a ratio of 1:2 (cases to controls) as defined by the National COVID Cohort Collaborative (N3C) Phenotype.35
Data Pre-Processing & Transformation
Patients positive for COVID-19 were identified based on lab values or the ICD-10-CM COVID-19 diagnosis (U07.1, J12.82, M35.89) according to the N3C phenotyping schema (version 3.3).35,36 COVID-19 lab tests (LOINC codes corresponding to OMOP Concept IDs) for a positive/negative (presence/not detected) test were identified using the N3C phenotype. Demographics were extracted from the dataset including sex, ethnicity, and age. Age ranges were calculated based on year of birth that was reported in increments of five years. Age groupings were divided into approximately fifteen-year groupings to be close to the five-year ranges linked to age groupings used by Geifman et al. as meaningful MeSH-based adult age groupings for disease: young adult (21-35), adult (36-50), middle aged adult (51-65), and aged/older adult (66-80+).
Conditions and other features of interest were identified using a combination of OMOP Concept IDs and source condition values of ICD-9/10-CM and SNOMED CT codes.37,38 These conditions were aggregated into broader groupings using Phenome Wide Association Studies (PheWAS) categories (“PheCodes”) and Clinical Classifications Software Refined (CCSR) single-level categories.39,40 SNOMED CT codes were mapped to PheCodes by first mapping to ICD-10-CM codes using the SNOMED CT to ICD-10-CM Map from the National Library of Medicine.41 PheCode groupings of mental health disorders were identified for interest in this study based on previous literature.17,18 These included depression, anxiety, post-traumatic stress disorder, acute stress reaction, and psychotic disorders and others seen in Table 2. One mental health grouping example can be seen in Table 1 with corresponding PheCode and CCSR mapping. In addition to mental health groupings, PheCode groupings of other COVID-19-related conditions were extracted. These included co-occurring conditions such as hypertension, cardiac-related conditions, diabetes cancers, respiratory conditions, and others. Other features extracted from the dataset included: smoking status, severity of disease based on hospital admission, Intermittent Intubation Ventilation (IMV), supplemental oxygen, and respiratory failure. Charlson Comorbidity Index (CCI) scores were also calculated to further quantify the differing levels of comorbidity burden for the clusters and subgroups identified in this study. CCI can be used as a method for categorizing ICD diagnosis codes and calculating weighted risk of mortality or resource use.42–44
Table 2.
Characteristics for all patients, COVID-19 positive subset, COVID-19 positive hospitalized patients, COVID-19 positive IMV patients, negative control group with a mental illness, and mental illness post-COVID-19
Table 1.
Example of condition groupings for PheCode and CCSR
The population was characterized based on an aggregate statistical analysis of the selected features for the following five groups: (1) all patient set, (2) COVID-19 positive subset, (3) COVID-19 positive with a mental health diagnosis present post-illness subset, (4) hospitalization subset, and (5) IMV subset. The mental health post-COVID-19 subset of patients were identified by extracting patients who had been diagnosed with a mental illness after their first COVID-19 positive test. Patients were excluded from the subset if they had a mental health diagnosis prior to a positive COVID-19 indicator. Patients hospitalized with COVID-19 were identified based on date of COVID-19 test and reason for hospitalization. If patients were hospitalized within 30 days of a COVID-19 positive test or they had a COVID-19 diagnosis with a visit type as “Inpatient,” they were considered hospitalized for COVID-19. IMV patients were identified based on patients who had a COVID-19 related hospitalization within a 30-day time frame. The comorbidities of the mental health post-COVID-19 subset were statistically compared to the control group of mental health patients in the negative COVID-19 group through use of the chi-square (X2) test, and the p-values were reported.45,46
The diagnosis codes were mapped to CCSR single-level categories in the dataset and a high dimensional binary matrix in wide format was created for unsupervised cluster analysis of the COVID-19 positive subset (Figure 3a). The matrix consisted of unique patients and their associated CCSR categories. Each column corresponded to a diagnosis category and populated with ‘1’ for presence of the condition and ‘0’ for not present. CCSR categories were excluded if prevalence was less than 1%. Out of the 517 CCSR single-level categories, 303 were kept for analysis. The Manhattan distance or dissimilarity measurement was used to obtain a distance matrix of the binary categorical features.24
Figure 3.
Two-Level SOM Clustering Approach: (1) original data projected onto a SOM ordered grid and (2) proto-clusters grouped to form the final clusters
Hierarchical & K-Modes Clustering
Hierarchical clustering is an agglomerative, bottom-up unsupervised learning approach that can be used to identify clinically significant subgroups based on the presence of comorbidities.47 The algorithm starts with each observation as a separate cluster. Subsequent clusters are then created by identifying those that are closest, merging the two most similar clusters. This process is continued until all the clusters are merged. The resulting dendrogram can then be cut for the optimum number of clusters. K-modes is a categorical variation of k-means that aims to group similar data together based on a specified distance metric. The k-modes modification of the k-means algorithm defines the mode of clusters instead of the k centroids as the mean of clusters.48
Self-Organizing Map
The self-organizing map (SOM) proposed by Kohonen is an unsupervised artificial neural network (ANN) that converts and visualizes high dimensional input onto a lower-dimensional map.49,50 SOM keeps the topological order of the original data, with the two-dimensional space consisting of nodes that cluster the most related data. The input is connected to a chosen node lattice, where the dataset is distributed across the nodes. The SOM process begins with initializing a grid of lattices with a random sample of the dataset. The distances are then iteratively compared for each node and individuals are assigned to nodes (Figure 2). A two-level self-organizing clustering approach (Figure 3) was used to determine if mental health conditions would group together. This first stage provided topological coordinates of the prototypes (Figure 3b) such that they could be subsequently clustered by a classic clustering method in the second stage (e.g., K-means or hierarchical) (Figure 3c).51–53 The two-level SOM clustering approach first initialized a lattice of 60x60 (about five observations per neuron), and the SOM algorithm was run as a first level of abstraction to classify characteristics about the data into “proto-clusters.” The SOM was trained using several iterations of different parameters. Initially the map was trained with a large learning rate (alpha) and small width of initial neighborhood function. Parameters were adjusted and iterations were increased from 100-5000. Observing the learning rate and visualizing results of the initial map base determined that the best SOM was achieved towards the lower iterations and smaller learning rate. This reduced data convergence. The next step of the two-level SOM clustering was to section the map using a clustering algorithm to define subgroups of the COVID-19 subset. Both hierarchical and k-means was performed to acquire connected clusters by multiplying the positional distance matrix on the grid by the distance matrix resulting from the variables in the dataset.53 To find a plausible number of clusters, the within clusters distances for different values of clusters (k) was evaluated. R packages were used to visualize the total within-cluster sum of square (WSS).54
Figure 2.
Self-organizing map (SOM) artificial neural network (ANN) algorithm produces a two-dimensional grid from the higher-dimensional input matrix.
Results
This study used the OMOP visit, measurement, condition, procedure, and observation tables from the HIE dataset. Using the N3C phenotype definition, 116,732 patients were extracted (36,281 positive for COVID-19) (Table 2).
There were 3,034 (8.4%) identified patients with a mental health condition that appeared after COVID-19 illness. A total of 1,939 (5.3%) patients were hospitalized related to COVID-19 and 383 (1.1%) patients required IMV. The COVID-19 positive cases were 62.2% female and 37.7% male reflecting the known gender proportion in the HIE. The control group (COVID-19 negative patients) was similarly distributed based on demographic features. The mental health subset also stayed consistent with proportions expected in the HIE overall. Age was generally evenly distributed with a slightly higher Middle Aged Adult group (29.1%). Consistent with the prevalence of COVID-19 across ethnicities in Rhode Island, there was a higher prevalence of Hispanic/Latinx (23% positive COVID-19 vs 12% negative COVID-19) (Table 2).
Compared to the control group, the COVID-19 positive group had similar prevalences in the comorbid conditions chosen for this study. By contrast, the mental health (post-COVID-19) subgroup had a higher prevalence. Prevalences worth noting are an inpatient rate of 14.6% vs 5.3% of the positive COVID-19 individuals and a 2.5% vs 1.1% IMV prevalence. These individuals also showed higher rates and statistically significant differences of respiratory failure, encephalitis, smoking, diabetes, cardiovascular diseases, hypertension, and chronic lung disease (Table 2). These differences were confirmed with chi-squared tests. Figure 4 shows the age distribution and common conditions observed in the COVID-19 patient population. The mental illness area is equally distributed with age, but overall the morbidity burden count increases with age.
Figure 4.
Counts for selected diagnoses in COVID-19 positive population show high rates of mental illness across all ages and increased morbidity with age.
Hierarchical & K-Modes Clustering
On their own, neither agglomerative hierarchical nor K-modes clustering resulted in connected clusters or groupings of mental health conditions within the COVID-19 positive group. The sparsity of the input vectors (binary variables for each feature and CCSR grouping present for the patients) resulted in the dendrogram clustering many of the individuals into one cluster. The optimum number of clusters calculated was 4-5 clusters. Based on five clusters, most patients were in two of the clusters. The clusters were examined based on the features listed in Table 2. There were minimal differences in the distribution of conditions and no significant groupings or high prevalence within each of the clusters.
Self-Organizing Map
Several different metrics were compared to determine optimal number of clusters. Figure 5 shows the mean within clusters distance and the inflection point suggests a meaningful value for k. Both clustering methods were performed to cluster the map units into classes. Hierarchical clustering led to better connected clusters and was the chosen method for the SOM clustering. Six clusters were chosen as the optimal number of clusters based on the elbow plot and visualization of the connected clusters and feature similarities.
Figure 5.
Elbow plot of the total within sum of square (WSS) for various number of clusters.
Figure 6a shows the final output for the SOM and final six clusters. Figure 6b plots a heat map of the post-COVID-19 mental health diagnoses and confirms that most individuals can be seen in the clusters of interest (3, 4, and 6). The final SOM lattice can be visualized in Figure 7. Figure 7a depicts the counts of observations (individuals) in each node. The patients are evenly distributed throughout the map with few nodes that have 0 individuals in them, which is the goal. A larger number of individuals are seen on the outer edges of the lattice, which is the larger cluster (Cluster 1; described below). Figure 7b plots the neighbor distance or the “U-Matrix,” which is the visualized distance between each node and its immediate neighbors. This plot shows there are some distinct groupings or clusters with a high similarity (low neighbor distance) indicated by the green and red coloring. This was also where a large majority of individuals with mental health disorders appear. Figures 7c-l are property plots of the SOM that indicate heatmaps of features groupings. Figure 7c is an example of a CCSR grouping that has a large distribution throughout each of the six clusters. This is included to compare to other conditions of interest that group in specific areas of the map. Figures 7d and 7e show where groupings of IMV and hospitalized individuals group on the map, respectively, and there is overlap with mental health conditions. Figures 7g-l indicate the areas of the SOM where mental health conditions are grouping on the map. A high density of mental health individuals group in Clusters 3, 4, and 6 with the majority in 6. In Figure 7f smokers are largely observed in Clusters 4 and 6.
Figure 6.
Final SOM clusters (b) Heat map of mental health post-COVID-19 highlighted on the SOM grid.
Figure 7.
Final SOM and associated cluster boundaries. (a) count of individuals in each node; (b) neighbor distance plot with high similarity in cluster 6; (c-l) property heat maps of different CCSR categories that show increased density on specific areas of the SOM
Cluster Characterization
Based on the clustering of the 36,281 COVID-19 positive cases, six clusters revealed distinct patterns. The demographic features and average Charlson comorbidity index (CCI) score for each cluster is described in Table 3. Each cluster was examined by analyzing the prevalence of CCSR categories for each cluster in addition to comparing prevalence of main features and comorbidities of focus from Table 2. Figure 8 includes a radar plot for each of the six clusters and some of the important attributes identified to compare the differences in prevalence (%). The larger the polygon vector for each condition around the radar plot correlates to a higher prevalence within that cluster. Cluster 6 had the largest polygon, highest average CCI score, and contained a high incidence of individuals with mental health conditions post-COVID-19. A descriptive summary for each cluster is below.
Table 3.
Characteristics of the six clusters (COVID-19 positive)
Figure 8.
Radar plots for all six clusters with select demographics and conditions. Note that the larger polygons indicate higher prevalence around the spider plot for each feature within an individual cluster.
Cluster 1: Cluster 1 was the largest cluster, which can also be visualized in the SOM counts plot (Figure 7a). There was a slightly higher prevalence of females compared to the overall population (67.6% vs 62.2%) and the second largest prevalence of Hispanic/Latinx individuals (35.9%). This cluster had the lowest average age of ~46±5 years old and had a small prevalence of comorbidities. This group contained significantly lower prevalence of all CCSR categories compared to any other clusters and the lowest average CCI. The highest prevalence of CCSR categories included respiratory signs and symptoms (16%), upper respiratory infections (12.7%), and viral infection (11.2%).
Cluster 2: Cluster 2 had an even distribution of gender, with an average age of ~59±5 years old. This cluster contained common comorbidities at a high prevalence such as disorders of lipid metabolism (high cholesterol or hyperlipidemia) (80.5%). Over half the individuals had hypertension (57.8%) and diabetes (50.1%).
Cluster 3: Cluster 3 was largely female (78.2%), with an average age of ~49±5 and was the largest population of Hispanic/Latinx individuals (41.2%). This cluster had the second largest incidence of mental illness (60.3%), which included a large representation of anxiety (36.1%) and mood disorders (30%). Other common comorbidities were diabetes (48%), hypertension (42.4%), obesity (39.5%), nonspecific chest pain (42.3%), and asthma (25.7%).
Cluster 4: Cluster 4 was 65.8% female and 34.2% male (similar to the prevalence of the entire cohort), with an average age of ~49±5 years old. This cluster had noticeable groupings of post-COVID-19 mental health patients and the third largest prevalence of mental health conditions (41.2%) with 27% of individuals having anxiety and 18.3% mood disorders. This was also one of the two clusters that contained a little less than half of the smokers in the COVID-19 positive cohort (42.2%). The highest prevalence CCSR category within this cluster was nutritional deficiencies (26.5%). This group had a low CCI score with no other notable comorbidities with a prevalence greater than 15%.
Cluster 5: Cluster 5 had the highest male prevalence (50.2%) and had a greater average age of ~63±5 years old. This cluster had the highest prevalence of cardiac-related comorbidities such as ischemic heart disease (27.9%), heart failure, nonspecific chest pain (29.1%), and cardiac dysrhythmias (20.7%), as well as a higher prevalence of cardiomyopathies (6.9%). Other comorbidities of high prevalence included hypertension (70%), disorders of lipid metabolism (49.8%), diabetes (42.1%) and obesity (25%).
Cluster 6: Cluster 6 had an even gender distribution and the highest average age of ~65±5 years old. This cluster represented the largest number of mental health individuals (77.8%) and many individuals with a higher severity of COVID-19 disease. Of the 383 IMV individuals, 44.4% were identified in this cluster. Of the 3,983 in this cluster, 23% were hospitalized due to COVID-19 and 32.6% were diagnosed with respiratory failure. Mental health disorders observed in this cluster included anxiety (49.7%) and mood disorders (48.7%). As seen in Figures 7i-l, most of the other mental health conditions of interest were exclusively represented in this cluster. These included PTSD or Trauma and stressor related disorders (13.5%), suicidal ideation/attempt (9.8%), substance use disorder (22.1%), alcohol-related disorders (12%), sleep disorders (17.1%), psychotic disorders (10.1%), and dementia (24.9%). The individuals in this cluster had a high comorbidity burden with the highest average CCI score of 3.14. This cluster contained the largest number of smokers, and 26.5% of the individuals in this cluster had a diagnosis of chronic lung disease. Other common comorbidities included hypertension (77.7%), diabetes (55.1%), ischemic heart diseases (41.8%), heart failure (29.6%), aplastic anemia (41.5%), obesity (35.2%), and the highest prevalence of stroke (18.5%). As seen in Figure 8, most of the renal disease diagnoses (>75% of the COVID-19 positive cohort) were observed in this cluster with CCSR categories of chronic kidney disease (35.2%) and acute and unspecified renal failure (34.7%).
Discussion
Little is known about mental health sequelae due to the COVID-19 pandemic. This retrospective study used deidentified EHR data from a statewide HIE to characterize comorbidities associated with the COVID-19 population. This group was further divided into individuals who had post-COVID-19 mental health diagnoses, were hospitalized, or required IMV. Incident rates of mental health conditions and patient profiles of demographics, social factors, and comorbidities were reported and compared. A SOM cluster analysis was done to further define different subgroups of the COVID-19 positive population, where the comorbidities (CCSR categories) were used as the features for the cluster analysis. Connected clusters occurred within the population and patterns specific to individuals with mental health conditions in the population were identified in three of the clusters. This study aimed to describe the COVID-19 positive group and associated mental health outcomes using unsupervised clustering techniques. We validated that there is increasing statistically significant incidence of mental health conditions alongside an increased morbidity burden among those who get diagnosed with COVID-19. The SOM cluster analysis revealed patterns of mental health conditions with different subgroups. Three of the clusters did not show significant mental health groupings. The remaining three clusters depicted mental health patterns in unique ways. One cluster (Cluster 4) contained younger individuals with mental health disorders who had a lesser morbidity burden. Another cluster (Cluster 3) was highly female and had a higher prevalence of Hispanic/Latinx individuals. This cluster also showed high incidences of other common comorbidities compared to Cluster 4, which included diabetes, hypertension, asthma, and obesity. The final subgroup (Cluster 6) of individuals with mental health conditions had the highest average age and higher severity of COVID-19 disease based on representation of hospitalization and individuals who required IMV. While Clusters 3 and 4 included mostly depression and anxiety, Cluster 6 represented other mental health concerns such as suicidality and PTSD. The patterns and characteristics identified in this cluster such as high morbidity and severity of COVID-19 disease is high priority due to safety concerns and resource needs. Stratifying risk for developing these mental health conditions derived from patterns and features described in the mental health clusters will be an important area to focus on in future research. People who have severe symptoms of COVID-19 need inpatient hospital care with some needing care in the ICU. The results from this analysis show patterns that indicate these experiences among others associated with COVID-19 illness can make a person at a higher risk for diagnosis of mental health disorders.
The HIE dataset used for this study contained data from January 2018-December 2021. These few years of data should be sufficient to identify individuals with or without active mental health disorders; however, they do not necessarily reflect the full clinical profile of all individuals in the HIE. Nonetheless, it is still significant to note these mental illness diagnoses appearing in an individual’s profile of the HIE post COVID-19 illness after several years where there were no prior noted mental health concerns. To account for prior mental health conditions, future studies may include a more comprehensive longitudinal view of individuals in addition to studying other pandemic related factors that could cause increased mental health prevalence. This study also demonstrated how SOMs can identify meaningful clusters of individuals with mental health conditions post-COVID-19 diagnosis. Future work will focus on using a combination of metrics to determine optimal number of clusters in additional to further validation of the clusters to provide more comprehensive results. Future studies may expand this work outside of Rhode Island to the entire N3C dataset that, as of July 2022, contained over 5.8 million positive COVID-19 patients. Additionally, supervised learning methods may be explored to design predictive models of mental health outcomes based on the important features identified in this initial study for clinical use.
Conclusion
This study was an exploratory analysis that confirmed groupings of mental health diagnoses seen post-COVID-19 diagnosis. The results motivate the need for a more in-depth analysis on how these patterns are unique to individuals diagnosed with mental health conditions after a COVID-19 diagnosis. The use of SOMs enabled a characterization of COVID-19 positive individuals and mental health outcomes, which was not possible with other unsupervised clustering techniques. The results indicate that mental health conditions post-COVID-19 illness also has a high morbidity burden and more severe COVID-19 complications.
Acknowledgments
De-identified data for this analysis were provided by the Rhode Island Quality Institute (RIQI), which operates CurrentCare, the Rhode Island statewide Health Information Exchange (HIE). Technical support was provided by the following RIQI staff: Sarah Eltinge. This project was supported by National Institutes of Health grants U54GM115677 and U54GM104942. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work was conducted using computational resources and services provided by the Center for Computation and Visualization at Brown University.
Figures & Table
References
- 1.Dong L, Bouey J. Public Mental Health Crisis during COVID-19 Pandemic, China. Emerg Infect Dis. 2020;26(7):1616–8. doi: 10.3201/eid2607.200407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cullen W, Gulati G, Kelly BD. Mental health in the COVID-19 pandemic. QJM. 2020;113(5):311–2. doi: 10.1093/qjmed/hcaa110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Talevi D, Socci V, Carai M, et al. Mental health outcomes of the CoViD-19 pandemic. Riv Psichiatr. 2020;55(3):137–44. doi: 10.1708/3382.33569. [DOI] [PubMed] [Google Scholar]
- 4.Khan KS, Mamun MA, Griffiths MD, Ullah I. The Mental Health Impact of the COVID-19 Pandemic Across Different Cohorts. Int J Ment Health Addict. 2020. pp. 1–7. [DOI] [PMC free article] [PubMed]
- 5.Bryant-Genevier J, Rao CY, Lopes-Cardozo B, et al. Symptoms of Depression, Anxiety, Post-Traumatic Stress Disorder, and Suicidal Ideation Among State, Tribal, Local, and Territorial Public Health Workers During the COVID-19 Pandemic - United States, March-April 2021. MMWR Morb Mortal Wkly Rep. 2021;70(48):1680–5. doi: 10.15585/mmwr.mm7048a6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Czeisler MÉ, Lane RI, Petrosky E, et al. Mental Health, Substance Use, and Suicidal Ideation During the COVID-19 Pandemic - United States, June 24-30, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(32):1049–57. doi: 10.15585/mmwr.mm6932a1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mazza MG, De Lorenzo R, Conte C, et al. Anxiety and depression in COVID-19 survivors: Role of inflammatory and clinical predictors. Brain Behav Immun. 2020;89:594–600. doi: 10.1016/j.bbi.2020.07.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chen Y, Zhu Z, Lei F, Lei S, Chen J. Prevalence and Risk Factors of Post-traumatic Stress Disorder Symptoms in Students Aged 8-18 in Wuhan, China 6 Months After the Control of COVID-19. Front Psychol. 2021;12:740575. doi: 10.3389/fpsyg.2021.740575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Musheyev B, Borg L, Janowicz R, et al. Functional status of mechanically ventilated COVID-19 survivors at ICU and hospital discharge. J Intensive Care Med. 2021;9(1):31. doi: 10.1186/s40560-021-00542-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sher L. Are COVID-19 survivors at increased risk for suicide? Acta Neuropsychiatr. 2020;32(5):270. doi: 10.1017/neu.2020.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Janiri D, Carfì A, Kotzalidis GD, et al. Posttraumatic Stress Disorder in Patients After Severe COVID-19 Infection. JAMA Psychiatry. 2021;78(5):567–9. doi: 10.1001/jamapsychiatry.2021.0109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Xiao S, Luo D, Xiao Y. Survivors of COVID-19 are at high risk of posttraumatic stress disorder. Glob Health Res Policy. 2020;5:29. doi: 10.1186/s41256-020-00155-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tarsitani L, Vassalini P, Koukopoulos A, et al. Post-traumatic Stress Disorder Among COVID-19 Survivors at 3-Month Follow-up After Hospital Discharge. J Gen Intern Med. 2021;36(6):1702–7. doi: 10.1007/s11606-021-06731-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chamberlain SR, Grant JE, Trender W, Hellyer P, Hampshire A. Post-traumatic stress disorder symptoms in COVID-19 survivors: online population survey. BJPsych Open. 2021;7(2):e47. doi: 10.1192/bjo.2021.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sekowski M, Gambin M, Hansen K, et al. Risk of Developing Post-traumatic Stress Disorder in Severe COVID-19 Survivors, Their Families and Frontline Healthcare Workers: What Should Mental Health Specialists Prepare For? Front Psychiatry. 2021;12:562899. doi: 10.3389/fpsyt.2021.562899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liu D, Baumeister RF, Veilleux JC, et al. Risk factors associated with mental illness in hospital discharged patients infected with COVID-19 in Wuhan, China. Psychiatry Res. 2020;292:113297. doi: 10.1016/j.psychres.2020.113297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Taquet M, Geddes JR, Husain M, Luciano S, Harrison PJ. 6-month neurological and psychiatric outcomes in 236 379 survivors of COVID-19: a retrospective cohort study using electronic health records. Lancet Psychiatry. 2021;8(5):416–27. doi: 10.1016/S2215-0366(21)00084-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xie Y, Xu E, Al-Aly Z. Risks of mental health outcomes in people with covid-19: cohort study. BMJ. 2022;376:e068993. doi: 10.1136/bmj-2021-068993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kim J, Kim H. Demographic and Environmental Factors Associated with Mental Health: A Cross-Sectional Study. Int J Environ Res Public Health [Internet] 2017;14(4) doi: 10.3390/ijerph14040431. Available from: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sartorious N. Comorbidity of mental and physical diseases: a main challenge for medicine of the 21st century. Shanghai Arch Psychiatry. 2013;25(2):68–9. doi: 10.3969/j.issn.1002-0829.2013.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Xie Q, Liu X-B, Xu Y-M, Zhong B-L. Understanding the psychiatric symptoms of COVID-19: a meta-analysis of studies assessing psychiatric symptoms in Chinese patients with and survivors of COVID-19 and SARS by using the Symptom Checklist-90-Revised. Transl Psychiatry. 2021;11(1):290. doi: 10.1038/s41398-021-01416-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Carmona-Pírez J, Gimeno-Miguel A, Bliek-Bueno K, et al. Identifying multimorbidity profiles associated with COVID-19 severity in chronic patients using network analysis in the PRECOVID Study. Sci Rep. 2022;12(1):2831. doi: 10.1038/s41598-022-06838-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Doshi-Velez F, Ge Y, Kohane I. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics. 2014;133(1):e54–63. doi: 10.1542/peds.2013-0819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chang J, Sarkar IN. Using Unsupervised Clustering to Identify Pregnancy Co-Morbidities. AMIA Jt Summits Transl Sci Proc. 2019;2019:305–14. [PMC free article] [PubMed] [Google Scholar]
- 25.Singh SP, Karkare S, Baswan SM, Singh VP. Agglomerative Hierarchical Clustering Analysis of co/multi-morbidities [Internet] arXiv [q-bio.QM] 2018. Available from: http://arxiv.org/abs/1807.04325.
- 26.Ferreira-Santos D, Rodrigues PP. Obstructive sleep apnea: A categorical cluster analysis and visualization. Pulmonology [Internet] 2021. Available from: [DOI] [PubMed]
- 27.Wartelle A, Mourad-Chehade F, Yalaoui F, Chrusciel J, Laplanche D, Sanchez S. Clustering of a Health Dataset Using Diagnosis Co-Occurrences. NATO Adv Sci Inst Ser E Appl Sci. 2021;11(5):2373. [Google Scholar]
- 28.Cho SI, Yoon S, Lee H-J. Impact of comorbidity burden on mortality in patients with COVID-19 using the Korean health insurance database. Sci Rep. 2021;11(1):6375. doi: 10.1038/s41598-021-85813-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kessler RC, Demler O, Frank RG, et al. Prevalence and treatment of mental disorders, 1990 to 2003. N Engl J Med. 2005;352(24):2515–23. doi: 10.1056/NEJMsa043266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Egede LE. Major depression in individuals with chronic medical disorders: prevalence, correlates and association with health resource utilization, lost productivity and functional disability. Gen Hosp Psychiatry. 2007;29(5):409–16. doi: 10.1016/j.genhosppsych.2007.06.002. [DOI] [PubMed] [Google Scholar]
- 31.Nowels MA, VanderWielen LM. Comorbidity indices: a call for the integration of physical and mental health. Prim Health Care Res Dev. 2018;19(1):96–8. doi: 10.1017/S146342361700041X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Huang Y-Q, Gou R, Diao Y-S, et al. Charlson comorbidity index helps predict the risk of mortality for patients with type 2 diabetic nephropathy. J Zhejiang Univ Sci B. 2014;15(1):58–66. doi: 10.1631/jzus.B1300109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM Rev. 2017;59(1):65–98. [Google Scholar]
- 34.R Core Team. R: A Language and Environment for Statistical Computing [Internet] 2021. Available from: https://www.R-project.org/
- 35.National-COVID-Cohort-Collaborative (N3C) Phenotype Data Acquisition [Internet]. Github; [cited 2022 Feb 22] Available from: https://github.com/National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition.
- 36.Haendel MA, Chute CG, Bennett TD, et al. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc. 2021;28(3):427–43. doi: 10.1093/jamia/ocaa196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.ICD - ICD-10-CM - International Classification of Diseases, Tenth Revision, Clinical Modification [Internet] 2022 [cited 2022 Mar 9]. Available from: https://www.cdc.gov/nchs/icd/icd10cm.htm.
- 38.SNOMED International. SNOMED CT. 2016 [cited 2022 Mar 9]. Available from: https://www.snomed.org/
- 39.PheWAS - Phenome Wide Association Studies [Internet]. [cited 2019 May 11] Available from: https://phewascatalog.org/phecodes.
- 40.Clinical Classifications Software Refined (CCSR) for ICD-10-CM Diagnoses [Internet]. [cited 2022 Feb 22] Available from: https://www.hcup-us.ahrq.gov/toolssoftware/ccsr/dxccsr.jsp.
- 41.National Library of Medicine (NLM), Unified Medical Language System® (UMLS®). SNOMED CT to ICD-10-CM Map. 2012 [cited 2022 Mar 9]. Available from: https://www.nlm.nih.gov/research/umls/mapping_projects/snomedct_to_icd10cm.html.
- 42.Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83. doi: 10.1016/0021-9681(87)90171-8. [DOI] [PubMed] [Google Scholar]
- 43.Quan H, Li B, Couris CM, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol. 2011;173(6):676–82. doi: 10.1093/aje/kwq433. [DOI] [PubMed] [Google Scholar]
- 44.Sundararajan V, Henderson T, Perry C, Muggivan A, Quan H, Ghali WA. New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. J Clin Epidemiol. 2004;57(12):1288–94. doi: 10.1016/j.jclinepi.2004.03.012. [DOI] [PubMed] [Google Scholar]
- 45.Bellolio MF, Serrano LA, Stead LG. Understanding statistical tests in the medical literature: which test should I use? Int J Emerg Med. 2008;1(3):197–9. doi: 10.1007/s12245-008-0061-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc. 2008;15(1):87–98. doi: 10.1197/jamia.M2401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Johnson SC. Hierarchical clustering schemes. Psychometrika. 1967;32(3):241–54. doi: 10.1007/BF02289588. [DOI] [PubMed] [Google Scholar]
- 48.Zhexue Huang, Ng MK. A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst. 1999;7(4):446–52. [Google Scholar]
- 49.Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80. [Google Scholar]
- 50.Kohonen T. Essentials of the self-organizing map. Neural Netw. 2013;37:52–65. doi: 10.1016/j.neunet.2012.09.018. [DOI] [PubMed] [Google Scholar]
- 51.Alippi C, Polycarpou M, Panayiotou C, Ellinas G. Springer Science & Business Media; 2009. Artificial Neural Networks -ICANN 2009: 19th International Conference, Limassol, Cyprus, September 14-17, 2009, Proceedings. [Google Scholar]
- 52.Souza JR, Ludermir TB, Almeida LM. Artificial Neural Networks -ICANN 2009. Springer Berlin Heidelberg; 2009. A Two Stage Clustering Method Combining Self-Organizing Maps and Ant K-Means; pp. 485–94. [Google Scholar]
- 53.Yang L, Ouyang Z, Shi Y. A Modified Clustering Method Based on Self-Organizing Maps and Its Applications. Procedia Comput Sci. 2012;9:1371–9. [Google Scholar]
- 54.Kassambara A. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. STHDA. 2017.











