Assessing the performance of a commercial multisensory sleep tracker

Nanna J Mouritzen; Lisbeth H Larsen; Maja H Lauritzen; Troels W Kjær

doi:10.1371/journal.pone.0243214

. 2020 Dec 11;15(12):e0243214. doi: 10.1371/journal.pone.0243214

Assessing the performance of a commercial multisensory sleep tracker

Nanna J Mouritzen ^1,^2,^*, Lisbeth H Larsen ¹, Maja H Lauritzen ¹, Troels W Kjær ^1,²

Editor: Raffaele Ferri³

PMCID: PMC7732119 PMID: 33306678

Abstract

Wearable sleep technology allows for a less intruding sleep assessment than PSG, especially in long-term sleep monitoring. Though such devices are less accurate than PSG, sleep trackers may still provide valuable information. This study aimed to validate a commercial sleep tracker, Garmin Vivosmart 4 (GV4), against polysomnography (PSG) and to evaluate intra-device reliability (GV4 vs. GV4). Eighteen able-bodied adults (13 females, M = 56.1 ± 12.0 years) with no self-reported sleep disorders were simultaneously sleep monitored by GV4 and PSG for one night while intra-device reliability was monitored in one participant for 23 consecutive nights. Intra-device agreement was considered sufficient (observed agreement = 0.85 ± 0.13, Cohen’s kappa = 0.68 ± 0.24). GV4 detected sleep with high accuracy (0.90) and sensitivity (0.98) but low specificity (0.28). Cohen’s kappa was calculated for sleep/wake detection (0.33) and sleep stage detection (0.20). GV4 significantly underestimated time awake (p = 0.001) including wake after sleep onset (WASO) (p = 0.001), and overestimated light sleep (p = 0.045) and total sleep time (TST) (p = 0.001) (paired t-test). Sleep onset and sleep end differed insignificantly from PSG values. Our results suggest that GV4 is not able to reliably describe sleep architecture but may allow for detection of changes in sleep onset, sleep end, and TST (ICC ≥ 0.825) in longitudinally followed groups. Still, generalizations are difficult due to our sample limitations.

1. Introduction

Insufficient sleep constitutes a large health problem. In 2017, 46% of the Danish population reported sleep problems, and the prevalence has been increasing [1]. Sleep diaries are a common way to assess sleep. Though sleep diaries are convenient, non-intrusive, and allow for subjective sleep assessment, they do not describe sleep architecture, and erroneous sleep estimates may be reported [2]. In long-term monitoring, sleep diaries may be further encumbered by compliance problems, providing missing data.

Objective sleep assessment is ideally obtained with polysomnography (PSG), considered the gold standard. During sleep, specific physiological signals are recorded allowing for the differentiation between sleep stages and calculation of sleep parameters. A central problem when using PSG is the time-consuming and resource-demanding factor requiring expensive equipment and specialized personnel. Thus, PSG is not suitable for long-term sleep monitoring. The need for a practical sleep monitoring device is reflected in the rapid development of commercial activity trackers that enables self-monitoring of various biomarkers, including sleep parameters [3]. Almost 30% of US respondents reported to self-monitor one or more health parameters with an application, a band, clip, or smartwatch in 2016 [4]. Wrist-borne consumer sleep trackers are primarily accelerometer-based and assess sleep similar to actigraphy where movement denotes wake and lack of movement denotes sleep.

The great adoption and development of sleep trackers are accompanied by a growing demand for validation to make use of the provided sleep information. The majority of sleep trackers detect sleep with high sensitivity but low specificity [5, 6]. Mostly, they overestimate total sleep time (TST) and sleep efficiency (SE), and underestimate wake including sleep onset latency (SOL) and wake after sleep onset (WASO) [7–9]. These estimation biases are also known from standard actigraphs [10, 11]. Similar to actigraphy, more pronounced discrepancies have been implied in older and sleep-disrupted populations [9, 12, 13]. Estimation biases of sleep stage durations are not marked by a clear tendency [14–16], but current sleep trackers describe sleep architecture poorly. Though often overestimated, TST is repeatedly emphasized as the most reliable parameter calculated by sleep trackers [9, 14, 15, 17].

Autonomic changes accompany transitions between sleep stages. Measures of such physiological changes have been employed to aid accelerometer-based sleep scoring (the multisensory approach). When falling to sleep, parasympathetic influence gradually increases through non rapid eye movement sleep (NREM) while sympathetic influence decreases. This results in reduced cardiac activity. When progressing into rapid eye movement sleep (REM), and specifically into phasic REM, sympathetic influence increases together with cardiac activity [18, 19]. Optical photoplethysmography (PPG) is a mechanism that detects changes in blood volume and thereby measures heart rate (HR) and HR variability (HRV) by illuminating the skin with subsequent registration of changes in the reflected light. HRV frequencies can help indicate shifts between sleep stages. The newer generation of sleep trackers apply both accelerometer and PPG data in sleep scoring, but the beneficial role of additional PPG information is ambiguous.

New sleep tracking technology must result in improved performance and for that reason, the validity of potentially beneficial devices should be explored. This study aims to evaluate the validity of sleep/wake and sleep stage detection provided by a wrist-borne, accelerometer, and PPG based device, Garmin Vivosmart 4 (GV4). As PSG is considered the gold standard, GV4 is compared against PSG (inter-device reliability). The applied PPG feature is expected to improve GV4’s ability to distinguish between sleep stages and to detect wake periods. GV4 vs. GV4 (intra-device) reliability is also investigated because sufficient reliability between identical GV4 devices is considered a prerequisite for validation of GV4 against gold standard.

2. Methods

2.1 Participants and study design

This is a validation study with measurements of sleep parameters from one night of simultaneous PSG and GV4 sleep monitoring for each participant. This study is a sub-study of an ongoing longitudinal cross-over study from which participants (N = 18) were invited to additional sleep testing. Participants were required to be ≥ 18 years old, able-bodied and without any self-reported previous history of clinically diagnosed sleep disorders. The trial protocol was approved by the Ethics Committee of Region Zealand (protocol number: SJ-780).

2.2 Materials

2.2.1 Polysomnography

Overnight home PSG (AASM Type II study) was obtained by use of TrackIT™ Mk3 Sleep Click-on 2 (Lifelines, Neuro). Sleep was unattended without monitoring by a technician or by video. Participants were equipped with eight scalp electrodes for EEG recording (providing F3/M2, F4/M1, C3/M2, C4/M1, O1/M2 and O2/M1 derivations), two electrodes at the chin recording EMG, two electrodes near the eyes recording EOG, and two electrodes on the chest recording a lead II ECG. All electrodes were positioned in accordance with the American Academy of Sleep Medicine (AASM) guidelines. Recordings of airflow and respiratory effort were left out since the study focused only on sleep staging and its associated measures. Participants were asked to press an event marker button on the PSG device when going to bed (identified as “lights off”). This mark represents the time when the participant intends to sleep but does not sleep yet. Recordings were initiated before the participants went home and terminated when they came in the next day to deliver the detached PSG equipment. Afterwards, the AASM Scoring Manual Version 2.2 was used to evaluate PSG files in epochs of 30 s in the Nicolet One 5.95 software by an experienced technician (MHL). Lastly, all PSG recordings were reviewed by a board-certified neurophysiologist (TWK).

2.2.2 Garmin Vivosmart 4

GV4 is a commercially available, off-the-shelf activity watch. Besides possessing the ability to measure the daily number of steps, HR, oxygen saturation, etc., it claims to have a sleep tracking function. This function is based on limb movements, detected via an embedded triaxial accelerometer, and PPG signals. On this basis, an algorithm attempts to detect sleep stages and wake depending on threshold values [5, 20]. The collected sleep information include sleep onset, end, duration, stages (light, deep, REM, and wake), and level of movement during sleep [21]. Due to the pre-formed device settings, data were collected in epoch lengths of 1 min. PSG and GV4 data were collected from December 2019 until March 2020.

2.3 Procedure

Intra-device reliability was examined by equipping one participant with two identical GV4 devices for 23 nights. In order to perform optimally, GV4 needs to adjust to the user, lasting ~14 days (personal communication). The devices were connected to two separate and newly created Garmin Connect™ accounts, eliminating potential influence from earlier logged data. Each software was updated, and the watches were set up identically and synchronized with their accounts. Both devices were worn on the non-dominant wrist side-by-side, and device position (proximal, distal) was shifted each night.

As part of the larger study, written informed consent was obtained from all participants. Participants were measured (height, weight) and had filled out a questionnaire prior to sleep testing including questions about their sleep. Participants were prepared for simultaneous GV4 and PSG sleep monitoring as earlier described. Next morning, sleep data from GV4 were extracted via the participant’s online Garmin Connect™ account.

2.4 Data analysis

Our analyses focused on intra-device and inter-device reliability. For both of them, descriptive statistics were calculated using Microsoft Excel (2019) and MATLAB (R2019a).

GV4 and PSG data were checked for normality, and paired t-tests were performed to identify significant differences in sleep estimates. We used 95% confidence intervals and p-values < 0.05 were considered statistically significant. To evaluate sleep stage agreement, epoch-by-epoch (EBE) analysis was performed in both intra-device and inter-device comparisons. Observed agreement (P_O) and Cohen’s kappa (κ) were calculated based on the confusion matrices (Tables 2 and 4).

Table 2. Confusion matrix (intra-device agreement).

GV4 1	GV4 2
	Epoch	Wake	Light sleep	Deep sleep	REM sleep	Subtotal
	Wake	98	95	60	12	265
	Light sleep	119	7266	504	107	7996
	Deep sleep	41	701	2650	13	3405
	REM sleep	4	141	16	166	327
	Subtotal	262	8203	3230	298	11993

Open in a new tab

Values are based on 1 min epochs from 23 nights and presented as frequencies.

Table 4. Confusion matrices (inter-device agreement).

	(A)
PSG	GV4
	Epoch	Wake	Light sleep	Deep sleep	REM sleep	Subtotal
	Wake	501	851	187	270	1809
	Light sleep	297	4878	1696	1166	8037
	Deep sleep	8	1709	1526	130	3373
	REM sleep	21	1896	430	1150	3497
	Subtotal	827	9334	3839	2716	16716
	(B)
PSG	GV4
	Epoch	Wake	Sleep			Subtotal
	Wake	501	1308			1809
	Sleep	326	14581			14907
	Subtotal	827	15889			16716

Open in a new tab

Values are presented as frequencies. A) is based on four sleep stages and B) is based on sleep/wake.

2.4.1 Intra-device reliability

EBE analysis was performed by aligning and comparing data files from both GV4 devices in 1 min epochs (default). Sleep scorings were compared from the time when the first watch detected sleep until the time when both watches detected one wake epoch the next morning. Comparisons of sleep measures were based on these time intervals as were comparisons of HR (bpm) and accelerometer data (“level of movement”) to identify the source of a potential disagreement.

2.4.2 Inter-device reliability

To compare the GV4 data files with the corresponding PSG data files, GV4 epochs were doubled to match the 30 s PSG epoch resolution. Thus, a 1 min GV4 epoch, e.g. staged as light sleep, was extended to two epochs of 30 s, both staged as light sleep. This was done in accordance with former comparison studies [9, 22]. Both GV4 and PSG were manually synchronized using the internet clock time before each PSG study. GV4 and PSG data were compared from “lights off”, registered by the participant, until the first wake epoch registered by both GV4 and PSG (scored by MHL). If GV4 sleep onset fell before the PSG event mark, we initiated the comparison from GV4 sleep onset.

Because some participants reported earlier sleep onset detected by GV4 compared to their own experience (confirmed by preliminary PSG results), we attempted to by-pass a potential misalignment by comparing data differently. This was approached using three different methods: The first comparison was based on conventional time alignment and the second on “sleep alignment”, comparing data from the first registered sleep epoch in both methods (independent of time). The third was based on cross correlation, displacing GV4 data 40 epochs forward and backward, relative to PSG, from the time aligned point (P_O was evaluated for each 30 s displacement).

Accuracy (proportion of true sleep and true wake epochs), sensitivity (proportion of true sleep epochs), specificity (proportion of true wake epochs), P_O (proportion of correctly sleep staged epochs), and Cohen’s kappa were calculated as mean values from the EBE analyses performed in each participant. Those values were also calculated for each type of alignment. We have reported the highest values of Cohen’s kappa, achieved from one overall matrix containing all epochs (Table 4). Confusion matrices were used to identify misclassifications. To quantify the influence of the artificial extension of the original GV4 1 min epochs into 2 x 30 s epochs, performance values were additionally calculated only when two equal PSG epochs followed each other (i.e. evaluation of GV4 epochs were only made when PSG showed a similar data resolution equivalent to 2 x 30 s).

Sleep estimates measured by single modalities were compared (Table 3). GV4 “light sleep” was compared to PSG N1 + N2 sleep, and GV4 “deep sleep” to PSG N3 sleep. Presented GV4 measures of SOL, REM latency, TIB, and SE were artificially constructed subsequent to data extraction (GV4 does not automatically generate these measures). SOL was calculated as the time from the PSG event mark until GV4 sleep onset, REM latency as the time from GV4 sleep onset until the first GV4 detected epoch of REM, TIB as the sum of SOL, WASO and TST, and SE as TST divided by TIB. These measures were calculated for clinical interest. “Wake” included wake periods before sleep onset (from the first compared epoch) and after sleep onset until both methods agreed on sleep end with 1 wake epoch.

Table 3. Single comparisons of sleep parameters between PSG and GV4.

Sleep parameter	PSG	GV4	Mean difference (CI)	p	R²	ICC
TST, min	414.9 ± 69.4	442.8 ± 59.7	27.8 ± 29.5 (14.2 to 41.5)	0.001^*	0.822	0.825
TIB, min	455.0 ± 65.3	453.2 ± 63.5	-1.8 ± 26.2 (-14.0 to 10.3)	0.771	0.842	0.921
Wake, min	50.3 ± 28.7	23.0 ± 18.6	-27.3 ± 28.9 (-40.7 to 13.9)	0.001^*	0.098	0.179
WASO, min	29.6 ± 24.0	6.6 ± 6.7	-23.0 ± 23.2 (-33.7 to 12.3)	0.001^*	0.067	0.074
Light sleep (N1+N2), min	223.3 ± 42.9	259.8 ± 79.5	36.5 ± 71.7 (3.4 to 69.6)	0.045^*	0.196	0.328
Deep sleep (N3), min	93.7 ± 47.1	107.1 ± 91.9	13.4 ± 98.1 (-31.9 to 58.8)	0.570	0.014	0.100
REM sleep, min	97.9 ± 33.0	75.8 ± 49.3	-22.1 ± 54.7 (-47.4 to 3.2)	0.105	0.026	0.136
SOL, min	10.4 ± 10.8	3.8 ± 24.3	-6.7 ± 24.6 (-18.0 to 4.7)	0.265	0.039	0.144
REM latency, min	73.5 ± 14.8	89.6 ± 34.7	16.1 ± 40.3 (-5.1 to 37.2)	0.160	0.039	-0.129
SE, %	90.6 ± 6.8	97.9 ± 5.2	7.3 ± 7.9 (3.6 to 10.9)	0.001^*	0.020	0.081
Sleep onset, min	1405.6 ± 73.7	1399.1 ± 73.7	-6.5 ± 24.6 (-17.8 to 4.9)	0.280	0.892	0.944
Sleep end, min	409.0 ± 66.8	409.4 ± 66.9	0.4 ± 18.3 (-8.0 to 8.8)	0.927	0.927	0.965

Open in a new tab

Data are expressed as means ± sd values. Total sleep time (TST), time in bed (TIB), wake after sleep onset (WASO), sleep onset latency (SOL).

*p < 0.05.

Linear regression was performed on each relationship between GV4 and PSG estimates providing R² values. Intraclass correlations (ICC) for sleep parameters were calculated using IBM SPSS Statistics (version 27) and were based on a single-rater, absolute-agreement, two-way random model.

Bland-Altman plots were used to visualize the agreement between PSG and GV4 of sleep parameters and to identify potential patterns in biases. Limits of agreement (± 1.96 sd) were additionally calculated and applied.

3. Results

3.1 Missing data

We had an extremely high compliance of 99.51%–calculated as all unlabeled epochs divided by the total number of epochs. GV4 accounted for 0.19% and PSG for 0.30% of the 0.49% of missing data. The intra-device reliability data set had no missing data.

3.2 Demographics

Demographic characteristics of the study population are presented in Table 1. 61.1% of participants were moderate to severely overweight (BMI ≥ 25 kg/m²) and 16.7% reported to have experienced mild sleep problems within the preceding 14 days. None reported any severe sleep problems.

Table 1. Demographic characteristics of participants.

Variable (N = 18)	Mean ± sd
Age, years	56.1 ± 12.0
Sex, female	13 (72.2%)
Body mass index, kg/m²	27.6 ± 5.0
Sleep problems within last 14 days	3 (16.7%)

Open in a new tab

3.3 Intra-device reliability

We calculated the observed agreement, P_O = 0.85 ± 0.13 (0.80 ≤ P_O ≤ 0.90), and Cohen’s kappa, κ = 0.68 ± 0.24 (0.58 ≤ κ ≤ 0.77), from a confusion matrix (Table 2) based on epochs from 23 nights and two GV4 watches.

Two successive nights showed exceptionally low levels of agreement (κ = -0.01 and 0.02). Agreement was ≥ 85% for HR and ≥ 96% for movement during these nights (proportion of agreement ± 1 unit). When reliability measures of the two nights were excluded, observed agreement and Cohen’s kappa reached 0.89 ± 0.05 and 0.75 ± 0.11, respectively. No significant intra-device differences were noticed in the durations of TST, wake, light, deep or REM sleep (p > 0.23 of all mentioned sleep outcomes). Device position did not affect agreement between devices (p > 0.85 for either P_O or κ). We assumed an identical calibration process during the first 14 days, due to identical device exposures, and therefore, we included all 23 nights in our calculations.

3.4 Inter-device reliability

3.4.1 Paired comparisons

Sleep parameters measured by PSG and GV4 and differences between the single modalities are shown in Table 3. For four participants, REM latency was not possible to calculate because GV4 (N = 3) or PSG (N = 1) did not detect any REM sleep. Therefore, REM latency is based on N = 14. Also, GV4 initiated sleep detection before the PSG event mark button was pressed in four participants, resulting in a negative SOL and a SE > 100%, consequently, lowering and increasing mean values, respectively. Sleep onset and sleep end are expressed in minutes from 00:00:00 (HH:MM:SS).

Significant differences included overestimations of TST by 27.8 min, SE by 7.3% and light sleep by 36.5 min, and underestimations of wake by 27.3 min and WASO by 23.0 min.

3.4.2 Bland-Altman analysis

Fig 1 shows Bland-Altman plots for TST and WASO with lower and upper limits of agreement (± 1.96 sd). Positive mean differences denote GV4 underestimation of sleep parameters and vice versa as in previous validation studies [12, 14, 23, 24]. The plots confirm the obtained results from the paired t-tests, visualizing biases. General overestimations of TST and increasing discrepancy between methods in participants with higher amounts of WASO are illustrated. For both, measurements from three participants fell outside the limits of agreement out of which two participants were the same.

3.4.3 Linear regression analysis

Fig 2 illustrates the relationships between PSG and GV4 measured sleep onset and sleep end. The coefficients of determination, R², indicate linearity in both cases.

3.4.4 Epoch-by-epoch analysis

Based on the confusion matrices, sensitivity was calculated to 0.98 ± 0.03, specificity to 0.30 ± 0.17, accuracy to 0.90 ± 0.06, observed agreement to 0.48 ± 0.10, and Cohen’s kappa to 0.20 ± 0.11 for all sleep stages and 0.33 ± 0.18 for sleep/wake. Sensitivity for light, deep, and REM sleep were calculated to 0.60 ± 0.19, 0.45 ± 0.26, and 0.34 ± 0.26, respectively. Specificity for light, deep and REM sleep were calculated to 0.49 ± 0.13, 0.83 ± 0.20, and 0.88 ± 0.08, respectively.

No remarkable differences in performance values arose between the three alignment approaches and for that reason, the conventional time alignment was chosen to calculate the final results from the EBE analysis.

Since we extended GV4’s default 1 min epochs into 2 x 30 s epochs, performance could be falsely diluted. When we examined this potentially negative effect, calculating performance values based on comparison only when two equal PSG epochs followed each other, we found no noteworthy difference.

4. Discussion

4.1 Main findings

Evaluation of GV4 validity demonstrated sufficient intra-device agreement, and specific inter-device measures were reliably estimated. Sleep onset and sleep end were highly correlated with and differed insignificantly from PSG measures. TST was significantly overestimated but also highly correlated to TST measured by PSG. Though sleep architecture was poorly described by GV4, changes in sleep onset, sleep end and TST may be reliably detected in group settings over time.

4.2 Intra-device reliability

Values for intra-device reliability were lower than expected but comparable to the overall sleep stage agreement found between human scorers on different experience levels (= 0.83) [25] and to the inter-scorer reliability found between Chinese and US doctors (κ = 0.75 ± 0.01) [26]. According to Landis and Koch’s arbitrary divisions [27], our kappa statistic indicates a substantial agreement across GV4 devices. The relatively low intra-device agreement may be due to differences in software (data processing) rather than hardware (sensors) since high agreement was found between devices in the detection of both HR and movement.

One study found high agreement between Fitbit devices (96.5–99.1%) [22], evaluating only three nights while we evaluated 23. Another study found no significant differences between Fitbit devices in SE and TST estimations agreeing with our findings [17].

4.3 Estimation accuracy

Poor sleep increases the risk of concentrating problems, traffic accidents, overweight, lifestyle diseases, impaired immune function, and even mortality [28]. Therefore, high estimation accuracy is important to detect poor sleep and changes in sleep over time.

GV4 significantly underestimated wake, including WASO, and significantly overestimated TST, SE, and light sleep (Table 3). This is in accordance with previous results in other commercial sleep trackers [5]. Although TST was significantly overestimated, primarily due to poor wake detection, it is strongly correlated to TST measured by PSG. Of note, a study comparing various sleep trackers with PSG also found this strong correlation between TST measurements (r ≥ 0.84) [14].

In the present study, most frequent misclassifications were deep as light sleep (50.7%) and vice versa (21.1%), wake as light sleep (44.1%), and REM as light sleep (54.2%) matching earlier findings [16, 29] (Table 4A). This was in accordance with GV4 light sleep overestimations. Interestingly, misclassifications between light and deep sleep, and wake and light sleep correspond to typical misclassifications found between experienced human sleep scorers [26].

We used GV4 rather than a standard accelerometer since PPG signals are included in the sleep scoring algorithm [30], supposed to improve sleep scoring. The finding of a low kappa value indicates poor sleep stage differentiation in spite of included PPG signals. However, the beneficial role of PPG signals in accelerometer-based sleep tracking remains unilluminated, since we do not know how these signals are processed and applied.

The uneven distribution of epochs in the marginal totals with a symmetrical high amount of sleep epochs and low amount of wake epochs (Table 4), causes a higher chance-expected agreement. This translates into a lower kappa value though observed agreement is high, described by Feinstein and Cicchetti as “the first paradox of kappa” [31].

Peculiarly, a preliminary validation study of Garmin Vivosmart 3 (N = 55) has presented a Cohen’s kappa as high as 0.54 ± 0.12 [20] using a three-channel EEG system as reference method. Unfortunately, sparse information is available, obstructing comparison with our study. Nevertheless, a study performed in “good sleepers” (N = 17) [9], comparing Fitbit against PSG, calculated a Cohen’s kappa of 0.38, close to our kappa value.

4.4 Sleep/wake detection

The intrinsic sleep/wake detection sensitivity of GV4 causes a high sensitivity at the cost of a low specificity. This agrees with previous studies [9, 16, 32]. Classification of sleep/wake in GV4 is based on movement and HRV. Lack of movement far from always specifies sleep, and motionless wake has been acknowledged as a huge factor hampering wake detection [5].

Though GV4 SOL was artificially constructed, the range of the PSG device difference included negative values indicating a risk of underestimation (Table 3). In four participants, the event marker button was pushed after GV4 initiated sleep detection, underlining its insufficient wake identification. Furthermore, PSG device discrepancy increased almost constantly in participants with a higher content of WASO (Fig 1B). WASO measured by PSG had a notably larger sd than WASO measured by GV4 (Table 3). This underlines a general tendency with underestimations of WASO across participants. Therefore, GV4 may be less erroneous to measure WASO in individuals with lower amounts.

The high average age of our participants as well as the “first-night effect” may provide a more fragmented sleep pattern in the sample, lowering specificity [5, 12, 33]. We performed home PSG which is known to reduce the first night effect [34]. Only 16.7% of our participants had PSG SE < 85% (SE ≥ 85% is considered “normal” according to the Pittsburg Sleep Quality Index [35]). Thus, we do not believe poor sleep caused poorer device performance.

Registration of habitual bedtimes and awakening times are required by GV4. Unprecise adjustment can supposedly affect performance as can a too short calibration period. However, poorer performance was not observed in participants with inaccurate adjustments and all of the participants had worn the GV4 > 14 days, ensuring sufficient calibration.

4.5 Limitations

Generalization of our results is limited by the small sample size with an unequal distribution of gender and age.

Because no consensus exists about to which extent new sleep technology should perform, sleep tracker comparisons are challenging due to the unstandardized use of methods and performance measures. Formulation of device standards would aid the evaluation and comparison of future sleep trackers.

The proprietary nature of commercial sleep tracker algorithms encumbers their validation. Furthermore, is the use of commercial sleep trackers accompanied by a risk of software updates that without warning can change the sleep scoring algorithms.

4.6 Future perspectives

Interpretation of the provided sleep information by GV4 should be done with caution most importantly because GV4 poorly describes sleep architecture. Thus, use of GV4 should be avoided when exact estimations of sleep parameters are essential. However, when PSG is unsuitable, GV4 may be an interesting sleep assessment tool. In both the intra-device and inter-device comparisons, agreement was high for most nights with only a few outliers. Performance is thus considered acceptable when use is intended in groups over longer periods, rather than in individuals over shorter periods.

In summary, our results confirmed that GV4 is not able to reliably describe sleep architecture but may accurately detect changes in sleep onset, sleep end, and TST though generalizations are difficult due to our sample limitations. Thus, our findings should be confirmed by further studies. However, this information contributes to the field of sleep monitoring, especially for research purposes. GV4’s abilities can impart objective sleep information, as we know sleep reporting methods like sleep diaries can induce problems of compliance in longitudinal study designs.

Acknowledgments

The authors thank Sirin W. Gangstad for providing MATLAB scripts, facilitating our data analyses.

Data Availability

According to Danish law, data cannot be shared publicly. However, data are available upon request from the Research Unit, Center of Neurophysiology, Zealand University Hospital (email: suh-neu-kfe@regionsjaelland.dk) for researchers who meet the criteria for access to confidential data.

Funding Statement

The project was funded by the EU Interreg Deutschland-Danmark (087-1.1-18) and a pre-graduate research scholarship was awarded by the Lundbeck Foundation to NJM.

References

1.Den Nationale Sundhedsprofil 2017 [cited 2020 January 15]; Available from: https://www.sst.dk/-/media/Udgivelser/2018/Den-Nationale-Sundhedsprofil-2017.ashx?la=da&hash=421C482AEDC718D3B4846FC5E2B0EED2725AF517.
2.Matthews K.A., et al. , Similarities and differences in estimates of sleep duration by polysomnography, actigraphy, diary, and self-reported habitual sleep in a community sample. Sleep Health, 2018. 4(1): p. 96–103. 10.1016/j.sleh.2017.10.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Sim I., Mobile Devices and Health. N Engl J Med, 2019. 381(10): p. 956–968. 10.1056/NEJMra1806949 [DOI] [PubMed] [Google Scholar]
4.Do you currently monitor or track your health or fitness using an online or mobile application through a fitness band, clip or smartwatch? (by gender) [cited 2016 September 30]; Available from: https://www.statista.com/statistics/668250/usage-of-health-and-fitness-monitoring-devices-in-us/.
5.de Zambotti M., et al. , Wearable Sleep Technology in Clinical and Research Settings. Med Sci Sports Exerc, 2019. 51(7): p. 1538–1557. 10.1249/MSS.0000000000001947 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Evenson K.R., Goto M.M., and Furberg R.D., Systematic review of the validity and reliability of consumer-wearable activity trackers. Int J Behav Nutr Phys Act, 2015. 12: p. 159 10.1186/s12966-015-0314-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Gruwez A., Bruyneel A.V., and Bruyneel M., The validity of two commercially-available sleep trackers and actigraphy for assessment of sleep parameters in obstructive sleep apnea patients. PLoS One, 2019. 14(1): p. e0210569 10.1371/journal.pone.0210569 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Cook J.D., et al. , Ability of the Fitbit Alta HR to quantify and classify sleep in patients with suspected central disorders of hypersomnolence: A comparison against polysomnography. J Sleep Res, 2019. 28(4): p. e12789 10.1111/jsr.12789 [DOI] [PubMed] [Google Scholar]
9.Kang S.G., et al. , Validity of a commercial wearable sleep tracker in adult insomnia disorder patients and good sleepers. J Psychosom Res, 2017. 97: p. 38–44. 10.1016/j.jpsychores.2017.03.009 [DOI] [PubMed] [Google Scholar]
10.Van de Water A.T., Holmes A., and Hurley D.A., Objective measurements of sleep for non-laboratory settings as alternatives to polysomnography—a systematic review. J Sleep Res, 2011. 20(1 Pt 2): p. 183–200. 10.1111/j.1365-2869.2009.00814.x [DOI] [PubMed] [Google Scholar]
11.Paquet J., Kawinska A., and Carrier J., Wake detection capacity of actigraphy during sleep. Sleep, 2007. 30(10): p. 1362–9. 10.1093/sleep/30.10.1362 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.de Zambotti M., et al. , Evaluation of a consumer fitness-tracking device to assess sleep in adults. Chronobiol Int, 2015. 32(7): p. 1024–8. 10.3109/07420528.2015.1054395 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Mansukhani M.P. and Kolla B.P., Apps and fitness trackers that measure sleep: Are they useful? Cleve Clin J Med, 2017. 84(6): p. 451–456. 10.3949/ccjm.84a.15173 [DOI] [PubMed] [Google Scholar]
14.Mantua J., Gravel N., and Spencer R.M., Reliability of Sleep Measures from Four Personal Health Monitoring Devices Compared to Research-Based Actigraphy and Polysomnography. Sensors (Basel), 2016. 16(5). 10.3390/s16050646 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Gruwez A., et al. , Reliability of commercially available sleep and activity trackers with manual switch-to-sleep mode activation in free-living healthy individuals. Int J Med Inform, 2017. 102: p. 87–92. 10.1016/j.ijmedinf.2017.03.008 [DOI] [PubMed] [Google Scholar]
16.de Zambotti M., et al. , A validation study of Fitbit Charge 2 compared with polysomnography in adults. Chronobiol Int, 2018. 35(4): p. 465–476. 10.1080/07420528.2017.1413578 [DOI] [PubMed] [Google Scholar]
17.Meltzer L.J., et al. , Comparison of a Commercial Accelerometer with Polysomnography and Actigraphy in Children and Adolescents. Sleep, 2015. 38(8): p. 1323–30. 10.5665/sleep.4918 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Miglis M.G., hapter 12—Sleep and the Autonomic Nervous System, in Sleep and Neurologic Disease, Miglis M.G., Editor. 2017, Academic Press: San Diego: p. 227–244. [Google Scholar]
19.de Zambotti M., et al. , Dynamic coupling between the central and autonomic nervous systems during sleep: A review. Neuroscience & Biobehavioral Reviews, 2018. 90: p. 84–103. 10.1016/j.neubiorev.2018.03.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Stevens S. and Siengsukon C., Commercially-available wearable provides valid estimate of sleep stages (P3.6–042). Neurology, 2019. 92(15 Supplement): p. P3.6–042. [Google Scholar]
21.Vivosmart 4 Owner’s Manual. 2018 [cited 2020 June 12]; Available from: https://www8.garmin.com/manuals/webhelp/vivosmart4/EN-US/vivosmart_4_OM_EN-US.pdf.
22.Montgomery-Downs H.E., Insana S.P., and Bond J.A., Movement toward a novel activity monitoring device. Sleep Breath, 2012. 16(3): p. 913–7. 10.1007/s11325-011-0585-y [DOI] [PubMed] [Google Scholar]
23.de Zambotti M., Baker F.C., and Colrain I.M., Validation of Sleep-Tracking Technology Compared with Polysomnography in Adolescents. Sleep, 2015. 38(9): p. 1461–1468. 10.5665/sleep.4990 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Toon E., et al. , Comparison of Commercial Wrist-Based and Smartphone Accelerometers, Actigraphy, and PSG in a Clinical Cohort of Children and Adolescents. J Clin Sleep Med, 2016. 12(3): p. 343–50. 10.5664/jcsm.5580 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Rosenberg R.S. and Van Hout S., The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. J Clin Sleep Med, 2013. 9(1): p. 81–7. 10.5664/jcsm.2350 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Deng S., et al. , Interrater agreement between American and Chinese sleep centers according to the 2014 AASM standard. Sleep Breath, 2019. 23(2): p. 719–728. 10.1007/s11325-019-01801-x [DOI] [PubMed] [Google Scholar]
27.Landis J.R. and Koch G.G., The Measurement of Observer Agreement for Categorical Data. Biometrics, 1977. 33(1): p. 159–174. [PubMed] [Google Scholar]
28.Søvn og Sundhed 2015 [cited 2020 January 15]; Available from: http://www.vidensraad.dk/sites/default/files/vidensraad_soevn-og-sundhed_digital.pdf
29.Beattie Z., et al. , Estimation of sleep stages in a healthy adult population from optical plethysmography and accelerometer signals. Physiol Meas, 2017. 38(11): p. 1968–1979. 10.1088/1361-6579/aa9047 [DOI] [PubMed] [Google Scholar]
30.What is Heart-Rate Variability (HRV)? [cited 2020 May 20]; Available from: https://support.garmin.com/en-US/?faq=04pnPSBTYSAYL9FylZoUl5#end.
31.Feinstein A.R. and Cicchetti D.V., High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol, 1990. 43(6): p. 543–9. 10.1016/0895-4356(90)90158-l [DOI] [PubMed] [Google Scholar]
32.Kahawage P., et al. , Validity, potential clinical utility, and comparison of consumer and research-grade activity trackers in Insomnia Disorder I: In-lab validation against polysomnography. J Sleep Res, 2020. 29(1): p. e12931 10.1111/jsr.12931 [DOI] [PubMed] [Google Scholar]
33.Agnew H.W. Jr., Webb W.B., and Williams R.L., The first night effect: an EEG study of sleep. Psychophysiology, 1966. 2(3): p. 263–6. 10.1111/j.1469-8986.1966.tb02650.x [DOI] [PubMed] [Google Scholar]
34.Edinger J.D., et al. , Sleep in the Laboratory and Sleep at Home: Comparisons of Older Insomniacs and Normal Sleepers. Sleep, 1997. 20(12): p. 1119–1126. 10.1093/sleep/20.12.1119 [DOI] [PubMed] [Google Scholar]
35.Buysse D.J., et al. , The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research. Psychiatry Research, 1989. 28(2): p. 193–213. 10.1016/0165-1781(89)90047-4 [DOI] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0243214.r001

Decision Letter 0

Raffaele Ferri

4 Aug 2020

PONE-D-20-21216

Validation of a Commercial Multisensory Sleep Tracker

PLOS ONE

Dear Dr. Mouritzen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please note that both reviewers exoressed a series of concerns that must all be addressed.

Please submit your revised manuscript by Sep 18 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Raffaele Ferri, MD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for including your ethics statement: 'The trial protocol was approved by the Regional Health Research Ethics Committee (SJ-780).'

a. Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study.

b. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research

3. Thank you for including your competing interests statement; "The study was based on Garmin Vivosmart 4 sleep tracking because of its multisensory technology and user-friendly design. Garmin kindly borrowed us Garmin watches to perform the experiment. This had no influence on the study design, data collection or analysis, decision to publish, or preparation of the manuscript."

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is a validation study for a commercial sleep tracker from Garmin. The study was carried out in parallel with polysomnography, which is the reference standard for sleep recording. A small number of subjects was investigated: 18 healthy persons. One person was recorded for 23 nights, but these were not with polysomnography. The results show that the sleep tracker is good for sleep onset and end of sleep, but not good for sleep stages. This appears to be similar to actigraphy which has not been tested here.

As written in the ‘Polysomnography’ section, I assume that this is not a AASM type I study with supervised polysomnography, but a AASM type II study with polysomnography at home. If this is the case, please state this clearly, because this setting is different to most settings. In type II studies, there is no video recording or no monitoring from an experienced sleep technician. This needs to be noted.

For the smartwatch, you say that it considers not only actigraphy, but PPG as well. But how is this done? Since the results show, that these are comparable to simple actigraphy, may be PPG is not really evaluated? If we consider the WatchPAT device from Itamar as a smartwatch as well, then that device does a better job in evaluationg the PPG signal. Please compare critically.

I am especially interest how the Garmin evaluates REM sleep. That it evaluates REM sleep is only mentioned in brackets in the methods and details are only given when one reads the results. I think this capability of Garmin to detect REM sleep needs to be described in the methods section.

I think it is very good to check for 23 consecutive nights and it is very good to have two devices simultaneously.

As you performed an epoch by epoch comparison, it is of interest, how you arranged for a good synchronization between the polysomnography and the Garmin device. How much was the synchronization error then, less than a second?

The critical summary is appreciated.

Reviewer #2: Authors aimed at evaluating the performance of GV4 against PSG in measuring several sleep parameters in a small sample of 18 “healthy” adults in their home environment. I agree with the authors that the validation of such devices is an important step for adoption. The manuscript is well written.

Comments:

- Please consider the guidelines outlined by Depner at al., 2020 - Wearable Technologies for Developing Sleep and Circadian Biomarkers A Summary of Workshop Discussions

- I would use the term “assessing the performance” instead of “validating”.

- “without sleep disorder”. How sleep disorders were evaluated? Self-report?

- Not sure about having the intra-device reliability as an aim, which has been evaluated on a single participant. What is the generalizability of such results? You may consider this as “exploratory”

- Correlation analysis and interpretation of correlation outputs is misleading. Please remove it.

- “Our results suggest inaccurate sleep stage detection by GV4, but sleep onset and sleep end were accurately detected with few outliers.” The absence of a significant bias between PSG and device does not specifically inform about accuracy/inaccuracy.

- I would not consider low- vs high-frequency HRV as sympathetic vs vagal influence. I would quickly describe the main autonomic changes occurring across stages of sleep.

- It is unclear how the lights-off and lights-on were identified. Participant were asked to press an event-marker button when going to bed. How the authors determined the next morning wake time?

- What was the rate of concordance between the 2 scorers (MHL and TWK)?

- Please provide the data collection time windows (e.g., from Aug 2018 to …)

- Did the authors control for normality of data distribution in the analyses?

- The SD of WASO for GV4 is much lower than the SD of WASO for PSG. Please comment on that.

- Please provide BA plots for all the sleep parameters analyzed

- In the BA plots please use the PSG values on the x axis, and plot the bias or the proportional bias

- EBE should be performed on an individuals’ level. Sensitivity and specificity should be calculated for each participant (night) separately.

- Sensitivity and specificity should be provided for all sleep stages (separately)

- Can you please clarify this statement? “The high agreement in HR and movement detection between devices indicates differences in software rather than hardware”?

- The advantages of incorporating HR and HRV indices by these devices should be due to the sleep stage dependent shifting in autonomic control. Not sure what authors imply with this statement “HRV is influenced by several other conditions than sleep for instance stress level”.

- I would avoid any interpretation about GV4 performance due to the limited generalizability of the results (limitation of the sample used) and the lack of standard in evaluating what could be considered acceptable or not.

- Please check refs. 4, 6, 20 for accuracy

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 11;15(12):e0243214. doi: 10.1371/journal.pone.0243214.r002

Author response to Decision Letter 0

18 Sep 2020

Responses to reviewer and editor comments are fully provided in the attached "Response to Reviewers" document (rebuttal letter).

Attachment

Submitted filename: Response_to_Reviewers.pdf

Click here for additional data file.^{(133KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0243214.r003

Decision Letter 1

Raffaele Ferri

8 Oct 2020

PONE-D-20-21216R1

Assessing the performance of a commercial multisensory sleep tracker

PLOS ONE

Dear Dr. Mouritzen,

As you can see from the attached comments, one of the reviewers has some remaining concerns that need to be addressed.

Please submit your revised manuscript by Nov 22 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Raffaele Ferri, MD

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: The authors did a nice job in reviewing the paper and replying to the Reviewers’ concerns!

Few additional comments

- I would avoid statements like “Our results suggest inaccurate sleep stage detection by GV4”, etc. There is no standard on what would be considered accurate/inaccurate, and the picture is extremely complex (there are biases, proportional biases, clinical significance other than statistical significance to consider, etc.) . Thus, I believe that statements like that give the readers false interpretation of the device performance.

- In the intro, you may want to mention that PPG-derived HR and HRV has been “validated” against ECG-derived HR and HRV.

- Given the nature of the study you may want to specify which EEG derivation were used.

- I have trouble understanding the sleep onset and sleep end analysis. I usually consider SO the time in which the person falls asleep (the time of the first epoch of sleep after lights-off). Is that the case? What is the PSG event mark?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS One. 2020 Dec 11;15(12):e0243214. doi: 10.1371/journal.pone.0243214.r004

Author response to Decision Letter 1

16 Nov 2020

Our responses to specific reviewer and editor comments are provided in the attached file "Response_to_Reviewers_V1.2".

Attachment

Submitted filename: Response_to_Reviewers_V1.2.docx

Click here for additional data file.^{(24.1KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0243214.r005

Decision Letter 2

Raffaele Ferri

18 Nov 2020

Assessing the performance of a commercial multisensory sleep tracker

PONE-D-20-21216R2

Dear Dr. Mouritzen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Raffaele Ferri, MD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0243214.r006

Acceptance letter

Raffaele Ferri

26 Nov 2020

PONE-D-20-21216R2

Assessing the performance of a commercial multisensory sleep tracker

Dear Dr. Mouritzen:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Raffaele Ferri

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Response_to_Reviewers.pdf

Click here for additional data file.^{(133KB, pdf)}

Attachment

Submitted filename: Response_to_Reviewers_V1.2.docx

Click here for additional data file.^{(24.1KB, docx)}

Data Availability Statement

[pone.0243214.ref001] 1.Den Nationale Sundhedsprofil 2017 [cited 2020 January 15]; Available from: https://www.sst.dk/-/media/Udgivelser/2018/Den-Nationale-Sundhedsprofil-2017.ashx?la=da&hash=421C482AEDC718D3B4846FC5E2B0EED2725AF517.

[pone.0243214.ref002] 2.Matthews K.A., et al. , Similarities and differences in estimates of sleep duration by polysomnography, actigraphy, diary, and self-reported habitual sleep in a community sample. Sleep Health, 2018. 4(1): p. 96–103. 10.1016/j.sleh.2017.10.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref003] 3.Sim I., Mobile Devices and Health. N Engl J Med, 2019. 381(10): p. 956–968. 10.1056/NEJMra1806949 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref004] 4.Do you currently monitor or track your health or fitness using an online or mobile application through a fitness band, clip or smartwatch? (by gender) [cited 2016 September 30]; Available from: https://www.statista.com/statistics/668250/usage-of-health-and-fitness-monitoring-devices-in-us/.

[pone.0243214.ref005] 5.de Zambotti M., et al. , Wearable Sleep Technology in Clinical and Research Settings. Med Sci Sports Exerc, 2019. 51(7): p. 1538–1557. 10.1249/MSS.0000000000001947 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref006] 6.Evenson K.R., Goto M.M., and Furberg R.D., Systematic review of the validity and reliability of consumer-wearable activity trackers. Int J Behav Nutr Phys Act, 2015. 12: p. 159 10.1186/s12966-015-0314-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref007] 7.Gruwez A., Bruyneel A.V., and Bruyneel M., The validity of two commercially-available sleep trackers and actigraphy for assessment of sleep parameters in obstructive sleep apnea patients. PLoS One, 2019. 14(1): p. e0210569 10.1371/journal.pone.0210569 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref008] 8.Cook J.D., et al. , Ability of the Fitbit Alta HR to quantify and classify sleep in patients with suspected central disorders of hypersomnolence: A comparison against polysomnography. J Sleep Res, 2019. 28(4): p. e12789 10.1111/jsr.12789 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref009] 9.Kang S.G., et al. , Validity of a commercial wearable sleep tracker in adult insomnia disorder patients and good sleepers. J Psychosom Res, 2017. 97: p. 38–44. 10.1016/j.jpsychores.2017.03.009 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref010] 10.Van de Water A.T., Holmes A., and Hurley D.A., Objective measurements of sleep for non-laboratory settings as alternatives to polysomnography—a systematic review. J Sleep Res, 2011. 20(1 Pt 2): p. 183–200. 10.1111/j.1365-2869.2009.00814.x [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref011] 11.Paquet J., Kawinska A., and Carrier J., Wake detection capacity of actigraphy during sleep. Sleep, 2007. 30(10): p. 1362–9. 10.1093/sleep/30.10.1362 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref012] 12.de Zambotti M., et al. , Evaluation of a consumer fitness-tracking device to assess sleep in adults. Chronobiol Int, 2015. 32(7): p. 1024–8. 10.3109/07420528.2015.1054395 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref013] 13.Mansukhani M.P. and Kolla B.P., Apps and fitness trackers that measure sleep: Are they useful? Cleve Clin J Med, 2017. 84(6): p. 451–456. 10.3949/ccjm.84a.15173 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref014] 14.Mantua J., Gravel N., and Spencer R.M., Reliability of Sleep Measures from Four Personal Health Monitoring Devices Compared to Research-Based Actigraphy and Polysomnography. Sensors (Basel), 2016. 16(5). 10.3390/s16050646 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref015] 15.Gruwez A., et al. , Reliability of commercially available sleep and activity trackers with manual switch-to-sleep mode activation in free-living healthy individuals. Int J Med Inform, 2017. 102: p. 87–92. 10.1016/j.ijmedinf.2017.03.008 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref016] 16.de Zambotti M., et al. , A validation study of Fitbit Charge 2 compared with polysomnography in adults. Chronobiol Int, 2018. 35(4): p. 465–476. 10.1080/07420528.2017.1413578 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref017] 17.Meltzer L.J., et al. , Comparison of a Commercial Accelerometer with Polysomnography and Actigraphy in Children and Adolescents. Sleep, 2015. 38(8): p. 1323–30. 10.5665/sleep.4918 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref018] 18.Miglis M.G., hapter 12—Sleep and the Autonomic Nervous System, in Sleep and Neurologic Disease, Miglis M.G., Editor. 2017, Academic Press: San Diego: p. 227–244. [Google Scholar]

[pone.0243214.ref019] 19.de Zambotti M., et al. , Dynamic coupling between the central and autonomic nervous systems during sleep: A review. Neuroscience & Biobehavioral Reviews, 2018. 90: p. 84–103. 10.1016/j.neubiorev.2018.03.027 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref020] 20.Stevens S. and Siengsukon C., Commercially-available wearable provides valid estimate of sleep stages (P3.6–042). Neurology, 2019. 92(15 Supplement): p. P3.6–042. [Google Scholar]

[pone.0243214.ref021] 21.Vivosmart 4 Owner’s Manual. 2018 [cited 2020 June 12]; Available from: https://www8.garmin.com/manuals/webhelp/vivosmart4/EN-US/vivosmart_4_OM_EN-US.pdf.

[pone.0243214.ref022] 22.Montgomery-Downs H.E., Insana S.P., and Bond J.A., Movement toward a novel activity monitoring device. Sleep Breath, 2012. 16(3): p. 913–7. 10.1007/s11325-011-0585-y [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref023] 23.de Zambotti M., Baker F.C., and Colrain I.M., Validation of Sleep-Tracking Technology Compared with Polysomnography in Adolescents. Sleep, 2015. 38(9): p. 1461–1468. 10.5665/sleep.4990 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref024] 24.Toon E., et al. , Comparison of Commercial Wrist-Based and Smartphone Accelerometers, Actigraphy, and PSG in a Clinical Cohort of Children and Adolescents. J Clin Sleep Med, 2016. 12(3): p. 343–50. 10.5664/jcsm.5580 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref025] 25.Rosenberg R.S. and Van Hout S., The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. J Clin Sleep Med, 2013. 9(1): p. 81–7. 10.5664/jcsm.2350 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0243214.ref026] 26.Deng S., et al. , Interrater agreement between American and Chinese sleep centers according to the 2014 AASM standard. Sleep Breath, 2019. 23(2): p. 719–728. 10.1007/s11325-019-01801-x [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref027] 27.Landis J.R. and Koch G.G., The Measurement of Observer Agreement for Categorical Data. Biometrics, 1977. 33(1): p. 159–174. [PubMed] [Google Scholar]

[pone.0243214.ref028] 28.Søvn og Sundhed 2015 [cited 2020 January 15]; Available from: http://www.vidensraad.dk/sites/default/files/vidensraad_soevn-og-sundhed_digital.pdf

[pone.0243214.ref029] 29.Beattie Z., et al. , Estimation of sleep stages in a healthy adult population from optical plethysmography and accelerometer signals. Physiol Meas, 2017. 38(11): p. 1968–1979. 10.1088/1361-6579/aa9047 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref030] 30.What is Heart-Rate Variability (HRV)? [cited 2020 May 20]; Available from: https://support.garmin.com/en-US/?faq=04pnPSBTYSAYL9FylZoUl5#end.

[pone.0243214.ref031] 31.Feinstein A.R. and Cicchetti D.V., High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol, 1990. 43(6): p. 543–9. 10.1016/0895-4356(90)90158-l [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref032] 32.Kahawage P., et al. , Validity, potential clinical utility, and comparison of consumer and research-grade activity trackers in Insomnia Disorder I: In-lab validation against polysomnography. J Sleep Res, 2020. 29(1): p. e12931 10.1111/jsr.12931 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref033] 33.Agnew H.W. Jr., Webb W.B., and Williams R.L., The first night effect: an EEG study of sleep. Psychophysiology, 1966. 2(3): p. 263–6. 10.1111/j.1469-8986.1966.tb02650.x [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref034] 34.Edinger J.D., et al. , Sleep in the Laboratory and Sleep at Home: Comparisons of Older Insomniacs and Normal Sleepers. Sleep, 1997. 20(12): p. 1119–1126. 10.1093/sleep/20.12.1119 [DOI] [PubMed] [Google Scholar]

[pone.0243214.ref035] 35.Buysse D.J., et al. , The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research. Psychiatry Research, 1989. 28(2): p. 193–213. 10.1016/0165-1781(89)90047-4 [DOI] [PubMed] [Google Scholar]

PERMALINK

Assessing the performance of a commercial multisensory sleep tracker

Nanna J Mouritzen

Lisbeth H Larsen

Maja H Lauritzen

Troels W Kjær

Roles

Abstract

1. Introduction

2. Methods

2.1 Participants and study design

2.2 Materials

2.2.1 Polysomnography

2.2.2 Garmin Vivosmart 4

2.3 Procedure

2.4 Data analysis

Table 2. Confusion matrix (intra-device agreement).

Table 4. Confusion matrices (inter-device agreement).

2.4.1 Intra-device reliability

2.4.2 Inter-device reliability

Table 3. Single comparisons of sleep parameters between PSG and GV4.

3. Results

3.1 Missing data

3.2 Demographics

Table 1. Demographic characteristics of participants.

3.3 Intra-device reliability

3.4 Inter-device reliability

3.4.1 Paired comparisons

3.4.2 Bland-Altman analysis

Fig 1.

3.4.3 Linear regression analysis

Fig 2.

3.4.4 Epoch-by-epoch analysis

4. Discussion

4.1 Main findings

4.2 Intra-device reliability

4.3 Estimation accuracy

4.4 Sleep/wake detection

4.5 Limitations

4.6 Future perspectives

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Raffaele Ferri

Roles

Author response to Decision Letter 0

Decision Letter 1

Raffaele Ferri

Roles

Author response to Decision Letter 1

Decision Letter 2

Raffaele Ferri

Roles

Acceptance letter

Raffaele Ferri

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases