Skip to main content
JMIR mHealth and uHealth logoLink to JMIR mHealth and uHealth
. 2019 Dec 5;7(12):e14473. doi: 10.2196/14473

Clustering Insomnia Patterns by Data From Wearable Devices: Algorithm Development and Validation Study

Sungkyu Park 1,#, Sang Won Lee 2,#, Sungwon Han 3, Meeyoung Cha 3,4,
Editor: Gunther Eysenbach
Reviewed by: Bumseok Jeong
PMCID: PMC6923760  PMID: 31804187

Abstract

Background

As societies become more complex, larger populations suffer from insomnia. In 2014, the US Centers for Disease Control and Prevention declared that sleep disorders should be dealt with as a public health epidemic. However, it is hard to provide adequate treatment for each insomnia sufferer, since various behavioral characteristics influence symptoms of insomnia collectively.

Objective

We aim to develop a neural-net based unsupervised user clustering method towards insomnia sufferers in order to clarify the unique traits for each derived groups. Unlike the current diagnosis of insomnia that requires qualitative analysis from interview results, the classification of individuals with insomnia by using various information modalities from smart bands and neural-nets can provide better insight into insomnia treatments.

Methods

This study, as part of the precision psychiatry initiative, is based on a smart band experiment conducted over 6 weeks on individuals with insomnia. During the experiment period, a total of 42 participants (19 male; average age 22.00 [SD 2.79]) from a large university wore smart bands 24/7, and 3 modalities were collected and examined: sleep patterns, daily activities, and personal demographics. We considered the consecutive daily information as a form of images, learned the latent variables of the images via a convolutional autoencoder (CAE), and clustered and labeled the input images based on the derived features. We then converted consecutive daily information into a sequence of the labels for each subject and finally clustered the people with insomnia based on their predominant labels.

Results

Our method identified 5 new insomnia-activity clusters of participants that conventional methods have not recognized, and significant differences in sleep and behavioral characteristics were shown among groups (analysis of variance on rank: F4,37=2.36, P=.07 for the sleep_min feature; F4,37=9.05, P<.001 for sleep_efficiency; F4,37=8.16, P<.001 for active_calorie; F4,37=6.53, P<.001 for walks; and F4,37=3.51, P=.02 for stairs). Analyzing the consecutive data through a CAE and clustering could reveal intricate connections between insomnia and various everyday activity markers.

Conclusions

Our research suggests that unsupervised learning allows health practitioners to devise precise and tailored interventions at the level of data-guided user clusters (ie, precision psychiatry), which could be a novel solution to treating insomnia and other mental disorders.

Keywords: insomnia, precision psychiatry, cluster analysis, time-series data, unsupervised learning, convolutional autoencoder

Introduction

Approximately 30% of contemporary people have one or more symptoms of insomnia, and insomnia sufferers encounter difficulty falling or staying asleep [1,2]. Work schedule; sleep irregularity; naps; and nicotine, alcohol, and caffeine consumption can have significant effects on insomnia symptoms [3,4]. While an individual’s intermixed behavioral characteristics might affect sleep behaviors, these are not considered in current diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition or the International Classification of Diseases.

With concepts of precision psychiatry emerging, individual characteristics including genetic or neuroimaging, behavioral characteristics, and individual symptoms of illness are being used to make better decisions for diagnosis or treatment. Advances in machine learning and deep learning techniques can make precision psychiatry possible in clinical situations. For instance, unsupervised learning has been applied to distinguish traits of patients with various psychiatric disorders from those of healthy subjects [5,6]. When considering the heterogeneous characteristics of insomnia patients, an approach using precision psychiatry concepts can help develop better treatment methods for insomnia.

There has been considerable research using unsupervised machine learning methodologies in medical sciences not limited to psychiatry. One study reviewed the literature on detecting various diseases via computer-aided diagnosis and identified the best machine learning methodology for each disease [7]. Specific to using images as inputs, researchers have developed a single-layer sparse autoencoder to automatically classify tissue types from dynamic contrast-enhanced magnetic resonance imaging [8]. Another study discovered ground-truth networks in the brain through unsupervised learning of functional magnetic resonance imaging data [9]. As these research projects demonstrate, unsupervised learning has shown its potential to produce highly accurate models in the real world.

Several approaches have been proposed to handle the unstructured data type common in medical research. The support vector machine algorithm has been used on brain images for detecting Alzheimer disease in patients as well as for finding related brain parts with signs of Alzheimer disease [10]. One study built a convolutional neural network model to learn relevant features from unstructured raw data automatically and then made a convolutional neural network–based multimodal disease risk prediction model [11]. Another work collected data from an online social network (Twitter), constructed symptom weighting vectors by exploiting sentiment analysis, and tried to detect latent infectious diseases [12]. Some researchers have adopted wearable devices as inputs and built a neural network based on a multilayered perceptron to detect cardiovascular disease [13].

Despite the potential of a precision approach to insomnia treatment, little effort has been made to classify patients with insomnia using sleep and behavioral patterns. We hypothesized that certain symptom clusters representing specific sleep and/or behavioral patterns exist among individuals with insomnia. Sleep and behavioral characteristics that can affect insomnia symptoms are easily collected through smart bands. In this study, we tried to find certain clusters representing different sleep or behavioral characteristics using smart band data. Our unsupervised learning classification approach can be a cost-effective alternative that can bring positive insights for developing everyday interventions to assist in insomnia recovery (see Multimedia Appendix 1 for more information about implementation details including codes and datasets).

Methods

Recruitment

We applied unsupervised learning to gathered time-series data from an experiment. Participants were recruited using an online board of a large university community. To recruit subjects with insomnia symptoms, we used the Insomnia Severity Index (ISI) [14], Korean version [15]. The ISI is calculated based on responses to 7 questions that ask about sleep problems individuals have experienced in the most recent 2 weeks. For example, the questionnaire asks “How satisfied/dissatisfied are you with your current sleep pattern?” and the respondents answer using a 5-point Likert scale (1=very satisfied, 2=satisfied, 3=moderately satisfied, 4=dissatisfied, 5=very dissatisfied). This index is used as a standard metric of treatment response in clinical research.

A total of 50 participants whose ISI scores were 15 or above (ie, an indication of a mild to severe level of insomnia) participated (26 male; average age 22.63 [SD 3.02] years). Height and weight information were also gathered to calculate body mass index (BMI). In the study, we recruited young, healthy participants. Applicants being treated for medical or psychiatric illnesses were excluded from the experiment. The experiment was approved by the Korea Advanced Institute of Science and Technology institutional review board (approval number KH2018-40). Subjects’ behavioral and sleep characteristics were collected via a smart band, the Charge 2 (Fitbit Inc) [16].

Intervention

The experiment was conducted for 6 weeks from April 23 to June 3, 2018, and the subjects’ data were sent to a server. The first author gave weekly reminders to encourage subjects to wear the device continuously. Eight subjects were excluded from the study since they failed to wear the device for longer than 2 consecutive days. The remaining 42 participants finished our experiment, and we analyzed data from the subjects (19 male; average age 22.00 [SD 2.79]). The data features gathered and analyzed in our study are shown in Textbox 1. All values were normalized before analysis (see Figure 1 for the normalization method).

List of data gathered from the smart band–wearing experiment.

Modality 1: Sleep pattern (source: Fitbit)

  • sleep_start_time: time when a user goes to bed in units of seconds

  • sleep_end_time: time when a user gets out of bed

  • onbed_min: total time duration for staying in bed

  • sleep_min: total time duration for actual sleeping

  • sleep_efficiency: sleep min/onbed min

  • awaken_min: total time duration of waking while sleeping

  • awaken_moments: total frequency of waking up while sleeping

Modality 2: Daily activity (Fitbit)

  • calories_consumed: total number of calories consumed per day

  • active_calories: total number of calories consumed for activities per day

  • Walks: total frequency of steps per day

  • distance: total distance a user moves per day

  • Stairs: total frequency at which a user climbs stairs per day

  • active_ratio: daily moving time over the total time wearing the device

Modality 3: Personal demographic (Survey)

  • Age: age of the participant

  • Gender: gender of the participant

  • Body mass index: kg/m2

  • Insomnia Severity Index: result from the survey

Figure 1.

Figure 1

Min-max scaler for normalizing each feature, where all values for each feature are normalized between 0 and 1.

Fitbit provides a rich set of information about the wearer’s sleep. For example, it tracks the total time that a person is in bed as well as the predicted total sleep time. Its autodetection and prediction are based on various behavioral or biological patterns such as information on heart rate and movement. There is one paper confirming the validity of the data from the Charge 2 [17]; it reports that sleep sensitivity (ie, predicting falling asleep) of Fitbit is remarkable, about 0.96, and the measured wake after sleep onset is similar to that of the existing polysomnography (PSG) method. Another study also demonstrated that the retrieved data from actigraphy corresponded acceptably well to that of PSG [18]. However, at the same time, we need to be cautious of using the data retrieved from wearables, since there might be a specificity (relatively low precision) issue with the wearables [19]. While it is possible for smart bands to give false sleep reports, we chose smart bands over PSG because they are easier to use and can obtain sleep measures regularly not in the clinical setting but in the wild. In addition to the sleep log, the Fitbit also reports a variety of information about the subjects’ activity, including the calories consumed, steps walked, distance moved, and stairs climbed.

Data Analysis

Among the Fitbit-provided features in Textbox 1, we considered the reliability of information about sleep stages to be relatively low; therefore, 3 sleep features (rapid eye movement sleep, narrow sleep, deep sleep) were excluded from the analysis. Instead, sleep efficiency was added, which is calculated as the fraction of time dedicated to actual sleep out of the time spent lying in bed. Sleep efficiency is an essential factor in determining sleep quality [20]. Additionally, we introduced a new variable, the active ratio, which examines the total fraction of time spent on any activity.

Due to the nature of sleep, we needed to examine time-series data and consider the entire sequence of activities leading to each sleep event [21]. Today’s sleep can be affected collectively by today’s activity, yesterday’s sleep, and so on. Therefore, it may be acceptable to jointly consider logs like this across multiple sleep and behavioral features. Our goal was to automatically identify clusters of sleep and behavioral features in hopes of identifying meaningful groups of individuals with insomnia who exhibit similar sleep-related dysfunctions.

Results

Approach Overview of Cluster Analysis

Cluster analysis has been a popular research topic in computer science [22]. In this study, we have tried two different approaches to clustering: synchronic and diachronic. In the synchronic approach, we would consider the dataset as one whole snapshot and cluster features within a snapshot all together. There were two issues with the synchronic approach (see Multimedia Appendix 1 for more detailed research procedures and results of this approach): (1) the clustering performance was not good enough in general and (2) this approach does not consider the consecutive patterns within and between features. Thus, we developed a novel diachronic approach, presented in the following section, in which we cluster the chunk consecutive daily logs first and then classify users based on their dominant clusters of chunk logs. This method represents insomnia-related patterns as a sequence of images.

Diachronic Unsupervised Learning of Insomnia Patterns

Supervised learning of clinical data may develop an overfitting problem because data are unstructured, sparse, noisy, or irregular [23]. In such conditions, unsupervised learning can be a good alternative. In particular, a convolutional autoencoder (CAE) that learns the abstract latent representations of data is known to be appropriate for the task [24]. We implemented a CAE and prepared a collage of daily features as an input image. This approach allowed us to capture the entire set of features simultaneously within the assigned time window.

When clustering high-dimensional data, it is crucial to reduce dimensions to avoid expensive computational costs and memory loss [25]. In handling high-dimensional data, the target clusters often lie in subspaces of the full space [26]. Hence, reducing dimensions via conventional methods such as singular value decomposition or principal component analysis may not yield correct clustering results, mainly when the target clusters are not in the same subspace. To avoid this issue, we implemented a CAE to effectively reduce the dimensions in the data since the latent variables of the CAE may include subspace information via filters. We also chose to implement CAE before clustering because a previous study argued that subsequence time-series clustering based on a sliding window might not be meaningful [27]. By treating the data with CAE first and then applying to cluster, the outcome patterns of the clusters are affected not by the subsequence time-series data but by the derived latent variables per image. Moreover, we can also expect that by presuming the data into images, the latent variables of CAE are more robust to the noisy daily changes of the Fitbit logs.

Clustering Steps

The neural net–based clustering approach operates in 5 steps.

Step 1: Preprocessing of Time-Series Data

We combine modality 1 (sleep pattern) and modality 2 (daily activity) in Textbox 1 to construct a multiplex vector of 12 dimensions (cf, modality 3, personal demographics, was not included since its data did not frequently change over the 6-week period, so it was solely employed in the comprehensive qualitative analysis process). In this step, with respect to modality 1, the onbed_min feature was excluded from the source dataset because it showed high correlations with other features, similar to the synchronic approach (see Multimedia Appendix 2 for the cross-correlations among 12 features of modalities 1 and 2). Let sleep vector vs={x1, x2, ..., x8} and activity vector va={y1, y2, ..., y6}; then, the daily multiplex vector of subject α vα={x1α, ..., x8α, y1α, ..., y6α}.

Step 2: Composing Sequential Images From Data

Sequentially connected multiplex vectors can be treated as an image chunk. We then composed the consecutive multiplex matrix to learn latent variables from images that represent individuals’ daily log series. We modeled the daily sleep patterns as a Markov decision process because, according to previous sleep research, sleep deprivation (ie, sleep debt) has cumulative effects on waking function, and tonight’s sleep debt will mostly be affected by recent sleep patterns (ie, sleep 1 to 2 days before) [28]. As a result, we connected each daily multiplex vector with a sliding window over 8 days (ie, the total 8-day logs are combined as one window always to include a weekend, where a window slides to the next vector, which is the next day). We applied a discount factor (γ) such that on the first window, the consecutive multiplex vector of the subject α per day can be formulated as in Figure 2.

Figure 2.

Figure 2

Consecutive multiplex vector of the subject α per day on the 1st image chunk, where t1 is defined as the 9th of vα (ie, the 9th day from the starting date) and m as the corresponding closeness rank number (mth of vα) to t1 within the image chunk (1≤m≤8).

We then finalized the sequential multiplex matrix, as shown in Figure 3. Fixing 8 days as a window size leaves a total of 42 days worth of data (ie, 42 data points). This result means that each user is represented by 35 sets of images (42–8+1=35), which is the batch size. The total number of input image chunks for the CAE is 1470 (42 users×35 images=1470). Let the sequential multiplex matrix of the first image chunk for the subject α be defined as in Figure 4. As a consequence, the sequential multiplex matrix of all subjects for all 35 images can be framed as in Figure 5 (see Multimedia Appendix 1 for more information about the mathematical notations).

Figure 3.

Figure 3

Converting the consecutive sleep and activity data to images: sliding windows of 8-day chunks (# of Features=12, # of Days [ie, win-size]=8).

Figure 4.

Figure 4

Sequential multiplex matrix of the subject α on the 1st image chunk, where t1=9.

Figure 5.

Figure 5

Sequential multiplex matrix of all subjects for all 35 images, where β indicates the 2nd participant, tn≤43, and tn+1=tn+1.

Step 3: Learning Representations With Reducing Dimensions via Convolutional Autoencoder

In the proposed model depicted in Figure 6 for the designed CAE, the pooling layer has been omitted due to the small size of the input images (8×12). Additionally, we set up only one convolutional layer due to the relatively small amount of data (141,120 pixels=1470×8×12). Discount factor γ was set between 0.60 and 0.99. Then we iterated the model until finding the optimal learning rate, optimal γ value, optimal vertical size (cf, the horizontal size was fixed to 12, ie, the number of features), number of convolutional filters, and number of latent variables (ie, encoded_size in Figure 6).

Figure 6.

Figure 6

Convolutional autoencoder to find the latent variables of insomnia-related patterns per 8 days.

We tested all combinations among 5 hyperparameters and found that the optimal values were 1e–4 with the AdamOptimizer (optimal learning rate); 0.75 (optimal γ value); 3 (optimal vertical size); 30 (number of convolutional filters); and 15 (number of latent variables), for each of these hyperparameters with the lowest L2-norm regularized reconstruction loss value of 1.09 (see Multimedia Appendix 1 for more information about the optimization and overfitting issue). To confirm the learning results, Figure 7 depicts the 35 input images and 35 reconstructed outputs of one random participant (UserId 26). This visual coherence ensures that the CAE efficiently reduced the input size dimensions (ie, 8×12→15) by learning the vital latent representations of data.

Figure 7.

Figure 7

Convolutional autoencoder (CAE) reconstruction results of the learned CAE for one random subject.

Step 4: Clustering Images via Latent Variables

We clustered the latent variables as follows. As an initial step, we used t-SNE (ie, stochastic neighbor embedding with t-distribution) to reduce the number of dimensions from 15 to 2 additionally (ie, major hyperparameters for t-SNE were perplexity=30.0 [29,30], see Multimedia Appendix 1 for more information related to t-SNE; metric=euclidean; method=barnes_hut; the clustering result from the largest value of average silhouettes [AS] was chosen out of 100 trials). We then applied k-means and hierarchical clustering algorithms to form clusters of the 8-day chunk images. The results are displayed in Figure 8; both clustering results indicate the same cluster number of 6, and they were found reasonable by the AS values. However, the results of k-means clustering showed better results following the AS as well as the sum of squared errors (SSE). Therefore, we chose k-means to be the choice of the final user clustering step.

Figure 8.

Figure 8

Clusters of 8-day chunk images based on sequential sleep/activity patterns. D: dimensions; C: clusters; AS: average silhouettes; SSE: sum of squared errors.

Step 5: Clustering Individuals With Insomnia

Each cluster of the 8-day images was labeled as cluster A, B, C, D, E, and F. We then composed a sequence of codes for each user (eg, the best-matched cluster code for each of the 35 images representing the user’s logs, such as “A-A-D-C-C-E-B-...”). Finally, to determine one dominant cluster type per user, the PageRank algorithm was implemented and used among the stochastic relations between those 6 extracted labels. This intuition is feasible since participants’ sequential codes, which are the series of the latent space representations per 8-day image for both sleep pattern and daily activity modalities, became the subjects’ series of the comprehensive insomnia-related patterns; therefore, the codes can be modeled as a hidden Markov model, meaning that one’s next state is dependent on the current state [28]. We then set the largest stochastic value of the labels per participant to be the dominant cluster, which is identical to finding a cluster with the highest PageRank value (ie, the probability of arriving at the cluster after many steps).

Evaluation Outcomes

Table 1 shows the clustering evaluation results among the two approaches concerning the diachronic clustering of daily vectors (v). The first approach is without CAE, meaning the original dataset is converted into chunk images (one image is composed of the 8 consecutive daily vectors with 96 dimensions) and then filtered via t-SNE (96→2 dimensions) and clustered by 3 clustering methods. The second approach is our model, and as introduced above, the original dataset is converted into chunk images and then inputted into CAE, and the latent variables of CAE are clustered by 3 clustering methods after the variables are filtered through t-SNE (96→15→2 dimensions).

Table 1.

Evaluation of user clustering results for two diachronic approaches.

Approach NCa ASb SSEc NEOd Remark
First with DBSCANe 1 0.2075 119.4467 14 Chunk images used without a CAEf
First with hierarchical clustering 4 0.3394 35.0190 g Chunk images used without a CAE
First with k-means clustering 5 0.3941 22.6103 Chunk images used without a CAE
Ours with DBSCAN 4 0.1275 64.7179 30 Chunk images used with a CAE
Ours with hierarchical clustering 6 0.4485 24.3988 Chunk images used with a CAE
Ours with k-means clustering 6 0.4653 21.5532 Chunk images used with a CAE

aNC: number of clusters.

bAS: average silhouettes.

cSSE: sum of squared errors.

dNEO: number of excluded outliers.

eDBSCAN: density-based spatial clustering of applications with noise.

fCAE: convolutional autoencoder.

gNot applicable.

We calculated two metrics to measure the clustering performance: the AS and the SSE. With AS, if the value is greater or equal to 0.4, then the clustering result is guaranteed to be significant [31]. By contemplating consecutive pattern information together via composing 35 images from the weekly based consecutive v per user, we could confirm that k-means clustering with our process showed the best performance (ie, the highest value of AS and the lowest of SSE) in clustering v. With respect to the first approach, the clustering performances were worse, and the possible reason for this result is that although the number of dimensions becomes large due to the chunked v, the approach could not successfully extract the latent features of the chunks while dramatically reducing the number of dimensions from 96 to 2.

Table 2 presents how the 42 subjects were eventually divided across the identified clusters of v by using our process with k-means clustering. Cluster F did not contain any members. The table shows the PageRank value, which represents the strength of the association between subjects and clusters. Cluster IDs are sorted by this PageRank value. Both cluster A and cluster D have the largest subject count of 12 people. The average BMI and ISI values are also given for each of the clusters. The BMI and ISI values were similar across clusters.

Table 2.

Diachronic clustering results of 42 participants suffering from insomnia.

Cluster ID A B C D E F
Subjects, n (%) 12 (29) 9 (21) 4 (10) 12 (29) 5 (12) 0 (0)
PageRank 0.57 0.54 0.45 0.42 0.36 a
BMIb (kg/m2), mean (SD) 22.8 (3.3) 20.8 (2.3) 26.0 (2.7) 20.2 (3.0) 22.9 (2.3)
ISIc, mean (SD) 19.1 (1.5) 19.1 (2.4) 18.2 (1.6) 16.9 (1.8) 18.8 (2.4)

aNot applicable.

bBMI: body mass index.

cISI: Insomnia Severity Index.

A psychiatrist conducted both qualitative and quantitative analyses of the 5 clusters. The number of subjects for each cluster is small and does not meet normality, and therefore, we performed an analysis of variance (ANOVA) on rank and Kruskal-Wallis test. There were several features that showed statistically significant differences among clusters (ANOVA on rank: F4,37=2.36, P=.07 for sleep_min; F4,37=9.05, P<.001 for sleep_efficiency; F4,37=8.16, P<.001 for active_calorie; F4,37=6.53, P<.001 for walks; F4,37=3.51, P=.02 for stairs). Detailed results were described in Figure 9 as well as Tables MA3-a and MA3-b in Multimedia Appendix 3 (we also performed the ANOVA and post hoc Tukey honest significant difference test and included the results in the same supplementary material for reference).

Figure 9.

Figure 9

Bar plots of the major smartband features per cluster (P<.10, *P<.05, **P<.01, ***P<.001 in analysis of variance on rank and Kruskal-Wallis test results for 5 groups, also find Tables MA3-a and MA3-b in Multimedia Appendix 3).

Discussion

Principal Findings

This research conducted a 6-week experiment to collect smart band data and analyzed it to identify relationships between insomnia and daily activities. Our analysis finds that participants who seemingly face similar levels of insomnia based on the ISI score belonged to different clusters based on unsupervised learning. This finding means that our neural net–based clustering method could identify, beyond conventional diagnosis, new meaningful sleep-activity relationships that could be used to devise tailored interventions. Our method could determine which cluster an individual belongs to via data indicators of sleep and behavior acquired in the study.

Among the derived clusters of people with insomnia (see Figure 9), we found clusters B, C, and E to be more similar to one another and clusters A and D to contain similar sleep-activity profiles. Within the former group, subjects in cluster B were relatively long sleepers (even with insomnia) with low sleep efficiency. These subjects were in bed long hours, yet they could not easily fall asleep. Sleep restriction was a potentially useful intervention for these subjects [32]. In contrast, the subjects in cluster E exhibited a short sleep time with high sleep efficiency. These subjects also burned more calories per day. Chronic restriction of sleep may have affected symptoms of insomnia in this group [33]. While subjects in cluster C seemed to be morning persons, as indicated by the awake time patterns, their activity levels were relatively low, as shown in the active calorie, walk, and stair data. These subjects were treated with different interventions from the other similar clusters. In the latter group, clusters A and D had similar sleep patterns, such as total sleep time and sleep efficiency. However, subjects in cluster A indicated higher activity levels, such as walks and stairs, compared with cluster D. Cluster D also showed the lowest ranges for both BMI and ISI compared with other clusters, where a smaller BMI value represents nonobese status and a smaller ISI value represents a milder insomnia level. Such subtle differences would have been difficult to notice with conventional methods or supervised learning. These results suggest that cluster-based intervention for individuals with insomnia could be an accurate method for alleviating symptoms of insomnia.

This study demonstrates that unsupervised learning of insomnia activity data gathered from wearable fitness devices could identify meaningful clusters. The 5 derived clusters each had distinctive characteristics, meaning that they can be used to derive different therapies and diagnoses. In accordance with our finding, one study also classified thousands of participants who answered online surveys by using latent class analysis so that they could find insomnia disorder subtypes [34], yet they were more focused not on past behavioral and sleeping patterns but on individual personalities as well as stressful surroundings in revealing the clusters. One more difference is that they solely used a conventional statistical method. Our study highlights that merely looking at sleep logs can reveal a limited aspect of insomnia; one needs to look at the full spectrum of both daily activities and sleep logs. Finally, our findings align with those of a previous study that revealed the average sleep quality is better related to many behavioral aspects of the person than the average sleep quantity [35].

Limitations

This work has several limitations. First, participants were from the same university, which may induce sampling biases. A larger scale study can draw conclusions that apply to the general public. In the future, we plan to conduct experiments targeting generic patients who visit hospitals regularly due to insomnia to ensure the generality of the clustering result. Second, a small sample size can reduce the statistical power, and it can affect the reducibility for future study. Third, various behaviors, such as consuming alcohol, tobacco, or caffeine, can affect patterns of sleep and activity. However, these factors were not considered in the current clustering analyses, mainly due to the small sample size and limited information on these behaviors (eg, consuming frequency per day or week). In that sense, a future study is needed to verify our preliminary insights. Fourth, when constructing the CAE, we prepared the input and output images irrespective of subjects. As a consequence, the learned latent variables cannot capture subtle dissimilarities that may exist across participants. To reduce this potential bias, a learning model may diminish the latent factors in composing images among participants while training. Last, we used t-SNE when reducing the dimensionality of the latent features of CAE images, and there could be type 1 errors when clustering dimension-reduced data because t-SNE forcibly fits original data into a t-distribution. We further plan to adopt and test other dimensionality reduction methods toward the extracted latent variables of CAE images to relax the possible type 1 error issue.

Conclusion

This study, although preliminary, gives new insights for future studies in the field of mobile health. Motivated by the results of this study, in the future we hope to develop tailored intervention strategies that can be matched to each cluster for relieving insomnia symptoms, which will be a meaningful step in precision psychiatry. In the meantime, we plan to consider qualitative factors such as the level of sleep as well as quantitative aspects of sleep-related factors.

Acknowledgments

The authors would like to thank Donghyun Ahn for his contribution on managing the experiment and collecting data. This research was supported by the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science and Information and Communications Technology (#NRF-2017R1E1A1A01076400).

Abbreviations

ANOVA

analysis of variance

AS

average silhouettes

BMI

body mass index

CAE

convolutional autoencoder

ISI

Insomnia Severity Index

PSG

polysomnography

SSE

sum of squared errors

t-SNE

t-distributed stochastic neighbor embedding

Multimedia Appendix 1

Description of the datasets and the codes utilized at the current research and detailed procedure of user clustering based on the synchronic approach and its evaluation outcomes with additional descriptions on the clustering steps of the diachronic unsupervised learning approach.

Multimedia Appendix 2

Cross rank correlation matrix among 12 features of modality 1 and 2 used in clustering, regardless of the derived cluster ID (N=42) and with regard to each derived cluster ID (# of clusters=5).

Multimedia Appendix 3

Test results with respect to sleep pattern-related (modality 1) and daily activity-related (modality 2) features among 5 derived clusters of insomnia sufferers: (1) analysis of variance on rank and Kruskal-Wallis and (2) analysis of variance and post hoc Turkey honest significant difference.

Footnotes

Conflicts of Interest: None declared.

References

  • 1.Ohayon MM. Epidemiology of insomnia: what we know and what we still need to learn. Sleep Med Rev. 2002 Apr;6(2):97–111. doi: 10.1053/smrv.2002.0186.S1087079202901863 [DOI] [PubMed] [Google Scholar]
  • 2.Morphy H, Dunn K, Lewis M, Boardman H, Croft P. Epidemiology of insomnia: a longitudinal study in a UK population. Sleep. 2007 Mar;30(3):274–280. [PubMed] [Google Scholar]
  • 3.Yang Y, Shin JC, Li D, An R. Sedentary behavior and sleep problems: a systematic review and meta-analysis. Int J Behav Med. 2017 Aug;24(4):481–492. doi: 10.1007/s12529-016-9609-0.10.1007/s12529-016-9609-0 [DOI] [PubMed] [Google Scholar]
  • 4.National Sleep Foundation. [2019-04-23]. What causes insomnia? https://www.sleepfoundation.org/insomnia/what-causes-insomnia .
  • 5.Silver H, Shmoish M. Analysis of cognitive performance in schizophrenia patients and healthy individuals with unsupervised clustering models. Psychiatry Res. 2008 May 30;159(1-2):167–179. doi: 10.1016/j.psychres.2007.06.009.S0165-1781(07)00193-X [DOI] [PubMed] [Google Scholar]
  • 6.Zeng L, Shen H, Liu L, Hu D. Unsupervised classification of major depression using functional connectivity MRI. Hum Brain Mapp. 2014 Apr;35(4):1630–1641. doi: 10.1002/hbm.22278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fatima M, Pasha M. Survey of machine learning algorithms for disease diagnostic. JILSA. 2017;09(01):1–16. doi: 10.4236/jilsa.2017.91001. [DOI] [Google Scholar]
  • 8.Shin H, Orton M, Collins D, Doran S, Leach M. Autoencoder in time-series analysis for unsupervised tissues characterisation in a large unlabelled medical image dataset. Proceedings of the ICMLA; 2011; Honolulu. 2011. [DOI] [Google Scholar]
  • 9.Bai Z, Walker P, Tschiffely A, Wang F, Davidson I. Unsupervised network discovery for brain imaging data. Proceedings of the SIGKDD; 2017; Halifax. 2017. [DOI] [Google Scholar]
  • 10.Zhang Y, Dong Z, Phillips P, Wang S, Ji G, Yang J, Yuan T. Detection of subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on eigenbrain and machine learning. Front Comput Neurosci. 2015;9:66. doi: 10.3389/fncom.2015.00066. doi: 10.3389/fncom.2015.00066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chen M, Hao Y, Hwang K, Wang L, Wang L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 2017;5:8869–8879. doi: 10.1109/access.2017.2694446. [DOI] [Google Scholar]
  • 12.Lim S, Tucker CS, Kumara S. An unsupervised machine learning model for discovering latent infectious diseases using social media data. J Biomed Inform. 2017 Feb;66:82–94. doi: 10.1016/j.jbi.2016.12.007. https://linkinghub.elsevier.com/retrieve/pii/S1532-0464(16)30181-2 .S1532-0464(16)30181-2 [DOI] [PubMed] [Google Scholar]
  • 13.Oresko JJ, Duschl H, Cheng AC. A wearable smartphone-based platform for real-time cardiovascular disease detection via electrocardiogram processing. IEEE Trans Inf Technol Biomed. 2010 May;14(3):734–740. doi: 10.1109/TITB.2010.2047865. [DOI] [PubMed] [Google Scholar]
  • 14.Bastien C, Vallières A, Morin C. Validation of the Insomnia Severity Index as an outcome measure for insomnia research. Sleep Med. 2001 Jul;2(4):297–307. doi: 10.1016/s1389-9457(00)00065-4.S1389945700000654 [DOI] [PubMed] [Google Scholar]
  • 15.Cho YW, Song ML, Morin CM. Validation of a Korean version of the insomnia severity index. J Clin Neurol. 2014 Jul;10(3):210–215. doi: 10.3988/jcn.2014.10.3.210. https://thejcn.com/DOIx.php?id=10.3988/jcn.2014.10.3.210 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fitbit. [2019-04-23]. Fitbit Charge 2 https://www.fitbit.com/uk/charge2 .
  • 17.de Zambotti M, Goldstone A, Claudatos S, Colrain IM, Baker FC. A validation study of Fitbit Charge 2™ compared with polysomnography in adults. Chronobiol Int. 2018 Apr;35(4):465–476. doi: 10.1080/07420528.2017.1413578. [DOI] [PubMed] [Google Scholar]
  • 18.Blackwell T, Redline S, Ancoli-Israel S, Schneider J, Surovec S, Johnson N, Cauley JA, Stone KL, Study of Osteoporotic Fractures Research Group Comparison of sleep parameters from actigraphy and polysomnography in older women: the SOF study. Sleep. 2008 Feb;31(2):283–291. doi: 10.1093/sleep/31.2.283. http://europepmc.org/abstract/MED/18274276 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Marino M, Li Y, Rueschman MN, Winkelman JW, Ellenbogen JM, Solet JM, Dulin H, Berkman LF, Buxton OM. Measuring sleep: accuracy, sensitivity, and specificity of wrist actigraphy compared to polysomnography. Sleep. 2013 Nov 01;36(11):1747–1755. doi: 10.5665/sleep.3142. http://europepmc.org/abstract/MED/24179309 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Buysse DJ, Reynolds CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res. 1989 May;28(2):193–213. doi: 10.1016/0165-1781(89)90047-4.0165-1781(89)90047-4 [DOI] [PubMed] [Google Scholar]
  • 21.Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E. Querying and mining of time series data. Proceedings of the VLDB; 2008; Auckland. 2008. Aug 01, pp. 1542–1552. [DOI] [Google Scholar]
  • 22.Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin C. A review of clustering techniques and developments. Neurocomputing. 2017 Dec;267:664–681. doi: 10.1016/j.neucom.2017.06.053. [DOI] [Google Scholar]
  • 23.Lasko TA, Denny JC, Levy MA. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLoS One. 2013;8(6):e66341. doi: 10.1371/journal.pone.0066341. http://dx.plos.org/10.1371/journal.pone.0066341 .PONE-D-12-40629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Masci J, Meier U, Cirean D, Schmidhuber J. Stacked convolutional auto-encoders for hierarchical feature extraction. Proceedings of the International Conference on Artificial Neural Networks; 2011; Espoo. 2011. [DOI] [Google Scholar]
  • 25.Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T. Time-series clustering: a decade review. Inf Syst. 2015 Oct;53:16–38. doi: 10.1016/j.is.2015.04.007. [DOI] [Google Scholar]
  • 26.Steinbach M, Ertöz L, Kumar V. The challenges of clustering high dimensional data. New Directions Statistical Physics. 2004;273:309. doi: 10.1007/978-3-662-08968-2_16. [DOI] [Google Scholar]
  • 27.Keogh E, Lin J. Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst. 2005 Aug 1;8(2):154–177. doi: 10.1007/s10115-004-0172-7. [DOI] [Google Scholar]
  • 28.Dinges D, Pack F, Williams K, Gillen K, Powell J, Ott G, Aptowicz C, Pack AI. Cumulative sleepiness, mood disturbance, and psychomotor vigilance performance decrements during a week of sleep restricted to 4-5 hours per night. Sleep. 1997 Apr;20(4):267–277. [PubMed] [Google Scholar]
  • 29.Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008 Nov;:2579–2605. [Google Scholar]
  • 30.Wattenberg M, Viégas F, Johnson I. How to use t-SNE effectively. Distill. 2016 Oct;1(10) doi: 10.23915/distill.00002. [DOI] [Google Scholar]
  • 31.Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Computat Appl Math. 1987 Nov;20:53–65. doi: 10.1016/0377-0427(87)90125-7. [DOI] [Google Scholar]
  • 32.Falloon K, Elley CR, Fernando A, Lee AC, Arroll B. Simplified sleep restriction for insomnia in general practice: a randomised controlled trial. Br J Gen Pract. 2015 Aug;65(637):e508–e515. doi: 10.3399/bjgp15X686137. https://bjgp.org/cgi/pmidlookup?view=long&pmid=26212846 .65/637/e508 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Levine B, Lumley M, Roehrs T, Zorick F, Roth T. The effects of acute sleep restriction and extension on sleep efficiency. Int J Neurosci. 1988 Dec;43(3-4):139–143. doi: 10.3109/00207458808986162. [DOI] [PubMed] [Google Scholar]
  • 34.Blanken TF, Benjamins JS, Borsboom D, Vermunt JK, Paquola C, Ramautar J, Dekker K, Stoffers D, Wassing R, Wei Y, Van Someren EJW. Insomnia disorder subtypes derived from life history and traits of affect and personality. Lancet Psychiatry. 2019 Feb;6(2):151–163. doi: 10.1016/s2215-0366(18)30464-4. [DOI] [PubMed] [Google Scholar]
  • 35.Pilcher JJ, Ginter DR, Sadowsky B. Sleep quality versus sleep quantity: relationships between sleep and measures of health, well-being and sleepiness in college students. J Psychosom Res. 1997 Jun;42(6):583–596. doi: 10.1016/s0022-3999(97)00004-4.S0022399997000044 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia Appendix 1

Description of the datasets and the codes utilized at the current research and detailed procedure of user clustering based on the synchronic approach and its evaluation outcomes with additional descriptions on the clustering steps of the diachronic unsupervised learning approach.

Multimedia Appendix 2

Cross rank correlation matrix among 12 features of modality 1 and 2 used in clustering, regardless of the derived cluster ID (N=42) and with regard to each derived cluster ID (# of clusters=5).

Multimedia Appendix 3

Test results with respect to sleep pattern-related (modality 1) and daily activity-related (modality 2) features among 5 derived clusters of insomnia sufferers: (1) analysis of variance on rank and Kruskal-Wallis and (2) analysis of variance and post hoc Turkey honest significant difference.


Articles from JMIR mHealth and uHealth are provided here courtesy of JMIR Publications Inc.

RESOURCES