Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2019 Oct 29;14(10):e0224521. doi: 10.1371/journal.pone.0224521

On the development of sleep states in the first weeks of life

Tomasz Wielek 1,2,*, Renata Del Giudice 3, Adelheid Lang 1,2, Malgorzata Wislowska 1,2, Peter Ott 4, Manuel Schabus 1,2,*
Editor: Daniele Marinazzo5
PMCID: PMC6818777  PMID: 31661522

Abstract

Human newborns spend up to 18 hours sleeping. The organization of their sleep differs immensely from adult sleep, and its quick maturation and fundamental changes correspond to the rapid cortical development at this age. Manual sleep classification is specifically challenging in this population given major body movements and frequent shifts between vigilance states; in addition various staging criteria co-exist. In the present study we utilized a machine learning approach and investigated how EEG complexity and sleep stages evolve during the very first weeks of life. We analyzed 42 full-term infants which were recorded twice (at week two and five after birth) with full polysomnography. For sleep classification EEG signal complexity was estimated using multi-scale permutation entropy and fed into a machine learning classifier. Interestingly the baby’s brain signal complexity (and spectral power) revealed developmental changes in sleep in the first 5 weeks of life, and were restricted to NREM (“quiet”) and REM (“active sleep”) states with little to no changes in state wake. Data demonstrate that our classifier performs well over chance (i.e., >33% for 3-class classification) and reaches almost human scoring accuracy (60% at week-2, 73% at week-5). Altogether, these results demonstrate that characteristics of newborn sleep develop rapidly in the first weeks of life and can be efficiently identified by means of machine learning techniques.

Introduction

Sleep of newborns greatly differs from the sleep of kids or adults. Adult-like classification of sleep into classical sleep stages is possible only from the age of 2–3 months onwards, since only then typical NREM patterns, like sleep spindles, K-complexes, or slow waves emerge (AASM; [1]). Until then the EEG landscape is dominated by low-voltage-irregular (REM/Wake), high-voltage slow (NREM/REM), mixed (Wake/NREM/REM) and tracé alternant patterns in NREM. Oscillatory activity of newborns is dominated also during wake by slow oscillations of a very high amplitude up to 100 μV [2]. Another hallmark of early brain activity is bursting activity (known also as spontaneous “activity transients” or “delta brushes”), characterized by slow delta-like waves with superimposed fast, beta range activity [3, 4]. Hence in the first weeks of life the only discrimination of sleep stages possible is between wake, active sleep (REM) and quiet sleep (NREM) [5, 6].

Importantly the neonatal sleep-wake state organization also impacts on later development [79]. For instance sleep characteristics during the first postnatal days are related to cognitive development at an age of 6 months [10]. Being able to reliably characterize sleep in newborns has been recognized as crucial for both research and pediatric practice but is even to-date an inherently difficult endeavor.

Traditionally, sleep of infants is staged based on a visual inspection of polysomnographical (PSG) recordings, which often is supplemented by simultaneous observation of overt behavior and respiratory activity. However such manual sleep staging is time consuming, costly, requires high expertise, and can be quite variable due to varying staging criteria and/or noisy data. Crowell and colleagues [11] for example reported moderate inter-scorer reliability for infant sleep staging, with kappa coefficient going below 0.6, when staged according to the modified Anders manual [5]. Reliability can be improved by refining criteria for a specific age group individually, as done for example by Satomaa and colleagues [12], who reached kappa score of 0.73 indicating substantial agreement for one-month old babies. From a practical point of view however such fine tuning of sleep staging criteria might yield rather low reproducibility as the results cannot generalize to other age groups. We here try to address this issue and reached out for a more data-driven and objective analysis of EEG data using machine learning.

In the context of brain oscillations typically frequency-resolved information is exploited by means of spectral methods such as Fast Fourier Transform (FFT). More recently however, a stronger focus on irregular dynamics of brain signals gave rise to entropy-based features (for a review see [13]). Entropy quantifies the extent of irregularity in the EEG time signal, where repeating, predictable signal yield low entropy, while irregular, unpredictable signal yields high entropy. In contrast to the power spectrum capturing only linear properties of the brain signal, entropy-based features emphasize also additional characteristics of the EEG that are related to non-linear dynamics [14]. A non-linear behavior of human EEG was for example detected during adult sleep, especially in N2 stage [15, 16]. In contrast to FFT-based measures, symbolic measures such as permutation entropy are operating on the order of values rather than on the absolute values of a time series. This has a big practical advantage if a signal is highly non-stationary and corrupted by noise [17], as is the case with the data of newborns. For instance, noise due to high electrode impedance is less likely to affect symbolic measures such as permutation entropy [14].

Mounting evidence suggests that fluctuations in EEG entropy reflect both transient, state-like changes in human brain activity (e.g. wake and sleep states), as well as slower and more long lasting dynamics across a day or even brain maturation. Sleep studies investigating healthy adults report an overall trend of entropy decrease from WAKE, across transitional (N1) and light (N2) to deep (N3) sleep, with a relative increase during REM sleep [1820]. A similar pattern was reported in newborns with higher entropy levels in active/REM sleep as compared to quiet /NREM sleep [21]. Diurnal changes in EEG entropy between daytime and night time periods–although diminished in size in relation to healthy individuals–were also found in patients following a severe brain-injury [22].

Across the development from childhood to adulthood a permanent increase of EEG entropy has been observed [23]. Also in adults (19–74 years) an age-related increase of EEG entropy was reported [24]. Interestingly, during the first weeks of infancy, this pattern is much less consistent, and may even be accompanied by transient declines in EEG entropy. For instance, Zhang and colleagues [25] reported EEG entropy during sleep to increase across the first month of baby’s life. Then suddenly this patterns changes, such that entropy remains constant during quiet sleep, but decreases during active sleep. This transient change in entropy evolution is in good agreement with previous results showing a general decrease in high frequency (beta band) power occurring within the first month of life ([26, 27]also for sleep stage specific findings see [28]).

To delve deeper into the dynamically changing landscape of early brain activity, we analyzed newborn sleep EEG data (PSG and hdEEG) of 42 participants at week 2 and 5 after birth. It has to be mentioned that the reported data were not recorded during continuous night-time sleep period, thus may differ from natural sleep in newborns. First we evaluated sleep that was previously scored visually, in terms of both entropy and oscillatory power. In contrast to Zhang and colleagues, where temporally unspecific entropy was used, we quantify entropy over multiple temporal scales, with the aim to add knowledge, especially in light of classic, ‘frequency-resolved’ findings. Second we aimed at testing the possibility of automatizing sleep staging by using the previously extracted entropy measures in a machine learning approach. This approach could ultimately complement or even replace visual staging and thus make the sleep scoring in newborns more objective and replicable. Last but not least, we tried cross-session classification such that we could assess whether our algorithm can generalize between age groups and reveal the similarity or “dissimilarity” in sleep organization this early in life.

Participants and methods

Participants and EEG recording

Mothers of 42 full-term infants (15 female) were recruited for a study on prenatal learning. Polysomnography (PSG) was recorded from all but one newborn during two separate sessions: first at 2 weeks (14.8±4.3 days) after birth and then 5 weeks (36.7±4.3 days) after birth. Recording took place in the home environment of the mother and infant. EEG were recorded with an ambulatory, high-density (128-EEG channels cap) system using a Geodesic Sensor Net (Geodesic EEG System 400, Electrical Geodesics, Inc, Eugene, OR; US). The signal was recorded continuously with a sampling rate of 500Hz over 35min (n = 11) or 27min (n = 31). Recording times were determined by the experimental protocol including nine 3min or 5min periods of alternating rest and auditory stimulation periods (with simple pre-recorded nursery rhymes). For the current study we disregard this experimental stimulation and merely focus on the changes in behavioral states over the full recording time. The study was approved by the ethic committee of the University of Salzburg (EK-GZ 12/2013) and all parents provided written informed consent before participation.

Data preprocessing

Preprocessing was done in Brain Vision Analyzer (Brain Products GmbH, Gilching, Germany, version 2.0) and MNE-python software (version 0.16.1). Data was down-sampled to 125 Hz and re-referenced to average reference of all the 128 EEG channels. EEG was band-pass filtered (FIR filter with hamming window) between 0.5 and 35 Hz. Electrode (impedance check) artifacts characterized by a 20Hz component were deleted semi-automatically by first visually inspecting individual recordings in the time-frequency domain and next iterating over segments. 95% percentile thresholding was used to exclude bad segments which resulted in an exclusion of 4.5% of total segments. Note that this step was performed in addition to the exclusion of segments staged as movement or transitional sleep by the human scorer. The high density EEG montage was then subsampled to the equivalent of a habitual sleep montage with only 6 EEG channels (F3, F4, C3, C4, O1, O2) and 5 peripheral channels: bipolar ECG, bipolar EMG, bipolar VEOG, as well as a HEOG left, and HEOG right both re-referenced to the right ear as recommended in [29]. For the subsequent automatized sleep scoring we used the same reduced PSG setup as we used for the visual sleep scoring. This allows making a fair comparison between the two sleep scoring approached on one hand, and increases applicability of our classifier to other baby PSG datasets on the other hand.

Visual sleep staging

Eighty-four recordings were visually sleep-scored by an expert sleep scorer (Scholle) according to scoring criteria [30] based on 30-second PSG segments. Each segment was assigned to one of the five classes: quiet sleep (NREM), active sleep (REM), wake (WAKE), movement time, and transitional sleep. To account for possible difference in the amount of movement from week-2 to week-5 all epochs scored as movement (as well as transitional sleep) were excluded from further analysis (7.7% of segments). Ten recordings were considered “unscorable” by our expert and removed from further analysis. In the next step we validated the manual scorings by examining the simultaneously recorded videos. We followed established Prechtl staging criteria [31]. Whenever we detected mismatch between PSG-based scorings and video recordings, for example observed infant’s open eyes in epochs staged as sleep, we sought for a consensus score. Due to technical issues (EEG signal corrupted) we excluded one of the recording sessions of one participant. All in all 72 recordings (34 at 2 weeks of age; 38 at 5 weeks of age) were included in the final analyses.

Entropy measure

Entropy quantifies the irregularity or complexity of signal fluctuations, where repeating, highly predictable signal yield low entropy, while irregularity yields high entropy. We used permutation entropy (PE) as a robust entropy measure that first converts EEG time series into a sequence of data-patterns (where each pattern describes the order relations between neighboring EEG voltages), and next quantifies the distribution of these patterns by using the Shannon entropy equation (cf. Fig 1a). Highest PE (maximal information) is attained when all patterns have equal probability. Further generalization of this method, called multi-scale permutation entropy (MSPE), [32] applies coarse-graining to the original broadband signal by averaging data within non-overlapping windows [33]. Like low-pass filtering, coarse-graining eliminates fast fluctuations from the signal biasing the complexity estimates towards increasingly slower time scale (cf. Fig 1b).

Fig 1. Multi-scale permutation entropy as useful feature for neonatal sleep stage classification.

Fig 1

(a) Six possible ordinal patterns are identified (pi), their distribution is formed and Shannon entropy is computed. (b) Simulated time series prior (left) and after the coarse-graining procedure (right) are shown. Note that the granulation removes the fast-varying changes, which allows estimating the entropy on a slower temporal scale. (c) PSDs of individual epochs for a single recording (week-5) and corresponding visual/manual sleep staging, log-transformed power is shown for better visibility. (d) Corresponding MSPE values for the same recording depicted separately for the original signal (left) and after coarse-graining (right). Note that these three classes or stages are distinguishable on both scales, yet different patterns are observed in the original and coarse-grained computation.

MSPE was calculated for non-overlapping 30s segments, for each PSG channel separately, and for 5 different levels of coarse-graining (scales). To maximize the predictive power of the classifier we used all 5 temporal scales as an input. This resulted in 55-dimensional feature vectors (per segment) for each subject. The classifier is later called as MSPE-based. For univariate analysis reported later we used MSPE computed at a scale of 1 (original signal, mixture of fast and slow temporal scales) and at a scale of 5 (fast temporal scales eliminated), later referred to as fast scale and slow time scales respectively. Analysis for intermediate scales was included in supplementary materials for completeness (cf. S6 Fig).

Spectral measure

We also calculated power spectral density (PSD) for the same 30s segments in 1 Hz steps and for frequencies between 1–30Hz. Welch’s method with overlapping Hamming windows was used. Similarly to MSPE analysis, we used all frequency bins as an input for the classifier (later called PSD-based) providing 330-dimensional (11 x 30) feature vectors (per segment) for each subject.

Statistical analysis

Univariate statistical analysis was performed for both entropy measures as well as PSD estimates. To compare entropy data linear mixed models were used as they are more suited to deal with unbalanced datasets than repeated measure ANOVA. This is of a particular benefit as the number of available segments was limited (for example NREM for week 2; cf. Table 1). To provide an equal number of observations for each subject and for each sleep stage a bootstrap sample of 10 MSPE values was repeatedly (1000 times) drawn and then averaged. Matrix of MSPE values (sleep stage x participant x session x location) entered the model as a dependent variable. Two sessions (week-2 and week-5), three sleep stages (NREM, REM, WAKE) and three locations (frontal, central, occipital) served as fixed effects, and participants as a random effect. We also included a random slope to account for inter-individual differences in complexity of each baby from week-2 to week-5. To select the model with optimal fit the Akaike information criterion was applied. All model parameters were estimated using restricted maximum likelihood estimation. Wald Chi-Squared test was used to test for significance of the model variables. Two independent testings were performed with MSPE at scale of 1 and at scale of 5 as a dependent variable. Additionally, we carefully report differences if the exclusion of statistical outliers (as identified by the interquartile range rule) changed results significantly. Linear mixed model analysis was performed with the lme4 package [34] by using the statistical software R 3.4.0 [35]. Post hoc, multiple comparison procedure was performed with Tukey test using the glht-method of the multcomp package [36].

Table 1. Sample descriptives.

Session Age (in days) Sleep stages (visually scored 30s epochs) Subjects MSPE Fast MSPE fast MSPE Slow MSPE slow
Mean SD Name n % Mean SD Mean SD
week-2 13.9 3.7 NREM 213 35 1.356 0.05 1.609 0.031
REM 1004 97 1.435 0.048 1.615 0.032
WAKE 431 62 1.451 0.053 1.585 0.045
week-5 36.2 3.9 NREM 387 66 1.315 0.044 1.58 0.051
REM 1056 95 1.392 0.042 1.606 0.028
WAKE 392 66 1.44 0.056 1.569 0.036

The analyzed data set consisted of two recordings from a group of healthy newborns with three sleep stage classes that were identified using visual scoring. EEG signal complexity was estimated using multi-scale permutation entropy (MSPE). The subjects% column reflects the percentage of participants who actually showed a specific sleep stage in the respective (week-2 or week-5) recording sessions. MSPE values refer to the results from frontal channels (F3 and F4).

To compare power spectra between the two sessions we first used similar bootstrapping as for MSPE and then ran cluster-based permutation tests as implemented in MNE-python software (version 0.16.1).

Machine learning

The principle behind supervised machine learning (ML) is to train a predictive model, by automatically encapsulating information from previously labeled dataset (in our case visual sleep staged PSG epochs). We performed an epoch-by-epoch classification into one of three distinct sleep classes: NREM, REM, WAKE. A random forest (RF) classifier was used as a main component of the classification pipeline. We employed repeated (10 repetitions) two-fold cross-validation by indexing half of the subjects as testing and the other half as training subjects. It has been shown that this procedure has smaller variance than the typical leave-one-subject-out cross validation [37]. Training and testing sets were created by concatenating data of all subjects assigned to one of the two groups. Random under sampling of epochs was performed to equalize the number of epochs across sleep class, both in the training and testing set. Additional cross-validation with the training set was performed to find optimal configuration of the classifier parameters (hyper-parameters). Finally we used two types of the two-fold cross-validation: (1) within sessions such that both training and testing subjects were of the same age, (2) across sessions such that training subjects and testing subjects were of different age (cf. S2 Fig). Importantly both procedures were run as a between-subject classification, such that each participant was assigned to either the training or testing set (i.e., no two sessions of the same subject in training and testing at the same time). Each cross-validation was repeated 20 times, and the median scores (accuracy, F1-score) are reported. Chance level of the accuracy scores was estimated by running each cross-validation on shuffled data (100 repetitions). The machine learning analysis was performed in Python using the scikit-learn package [38]. To evaluate the performance of a classifier standard performance metrics were used. Class-wise performance (for NREM, REM and WAKE classes separately) as well as the overall performance was accessed using the F1 score and the accuracy score respectively (cf. Fig 2).

Fig 2. Evaluation of the performance.

Fig 2

(a) The confusion matrix relates the actual (visual) scoring to the classifier predictions. (b) Values from the confusion matrix were used to calculate the overall accuracy and the class-specific F1 scores. Note that diagonal corresponds to agreement between the visual sleep staging and the MSPE-based automatic classification. Whereas accuracy (top panel) reflects relative amount of agreement between predicted and actual scores, F1 accounts for precision (middle panel) and recall (bottom panel). Precision (known also as positive predictive value) takes into account false positives and is defined as the ratio of epochs classified by both the classifier and the human scorer as given sleep stage to all of the epochs that the classifier assigned as that sleep stage (b, middle panel). Recall (known also as sensitivity) in turn takes into account the amount of false negatives and is defined as the ratio of epochs classified by both the classifier and the human scorer as given sleep stage to all of the epochs that the human scorer classified as the given sleep stage (b, lower panel). (c) The F1 score combines the two measures into a single metric.

To compare output of the classifiers based on different data both a MSPE- and PSD-based classification was repeated 20 times on the full dataset (i.e., the week 2 and week 5 data merged). The corresponding accuracy scores of MSPE- and PSD-based classifiers were compared statistically with Mann-Whitney U test. Since the MSPE-based classifier had significantly better performance as compared to the PSD-based classifier (cf. S1 Fig), we restricted our subsequent classification analysis to MSPE classifier results.

Results

To assess sleep rhythm of newborns EEG recordings in a sample of 42 participants were used. Final analysis included 72 visually sleep scored PSG recordings divided into two age groups: 2-weeks-old (N = 34) and 5-weeks-old (N = 38). We evaluated changes in sleep stage distribution (including NREM, REM and WAKE stages) from the age of 2 weeks to the age of 5 weeks. On average 5-week old newborns spend a higher percentage of total time (19%) in NREM sleep as compared with 2-weeks old babies (11.5%). In contrast relative REM duration decreases from 60.6% at week-2 to 57.2%, at week-5, and WAKE decreases from 27.9% at week-2 to 23.8 at week-5. A significantly larger proportion of participants showed NREM during week-5 (66%) as compared to week-2 (35%) (χ2 (1) = 6.81, p < .05). Using paired-samples Wilcoxon test (by including only those subjects that actually show given sleep-state in both sessions), we found no significant differences in the median duration of classes from week-2 to week-5 (Wilcoxon Signed-Ranks tests; NREM: Z = 10.5, p = .15, REM: Z = 191.0, p = .56, WAKE: Z = 32.5, p = .21). Subsequently, EEG signal complexity was investigated across different sleep stages and recording sessions. Please see Table 1 for a detailed overview with respect to age, distribution of sleep stages and the complexity measure–multi-scale permutation entropy.

Entropy and spectral measures

We observed that the multi-scale permutation entropy (MSPE) at a fast scale (i.e., no coarse-graining, incl. mixture of low and faster frequencies) as well as slow scale (i.e., fast temporal scales eliminated) significantly differed between sleep stages, recording sessions as well as channel locations. Table 1 and Fig 3 illustrate the results for MSPE averaged over F3 and F4 channels (for central and occipital sites see S4 Fig).

Fig 3. Entropy at both fast (left) and slow (right) time scale across sleep/wake states and the two recording sessions.

Fig 3

MSPE values were averaged over frontal electrodes. Note that at the fast scale all three stages are distinguishable at week 5, but not yet at week 2. At both scales there is a clear difference in signal complexity between week 2 and week 5 with week 2 being generally higher in entropy.

At a fast time scale (Fig 3, left) there was a main effect of session (week-2 vs week-5: Wald chi-square (1, 73) = 27.65, p < .001) indicating higher permutation entropy during week-2 as compared to week-5. Furthermore we observed a main effect of sleep stage (NREM vs REM vs WAKE: Wald chi-square (2, 73) = 237.64, p < .001), and a main effect of channel location (frontal vs central vs occipital: Wald chi-square (2, 73) = 45.71, p < .001). There were significant interactions between session and sleep stage (Wald chi-square (2, 73) = 7.79, p = .02) as well sleep stage and location. A post-hoc Tukey test for the factor sleep state confirmed higher entropy during WAKE compared to both REM (mean(SE) = .021(.005); Z = 4.09, p < .001) as well as NREM (mean(SE) = .086(.006); Z = 13.93, p < .001). In addition there was also significant difference between REM and NREM (mean(SE) = .065(.005); Z = 12.03, p < .001). A post-hoc Tukey test for the factor location yielded lower entropy over frontal channels as compared to the central location (mean(SE) = -.024(.006); Z = 4.18, p < .001), as well as lower entropy over frontal as compared to occipital sites(mean(SE) = -.029(.006); Z = 5.0, p < .001). A post-hoc Tukey test on the interaction between session and sleep stage revealed higher entropy during WAKE compared to REM, but only at week-5 (mean(SE) = .031(.007); Z = 4, p < .001). A post-hoc Tukey test on the interaction between sleep stage and location yielded higher entropy during WAKE compared to REM, only over the frontal location (mean(SE) = .032(.008); Z = 4.18, p < .001). Note that the session x sleep stage interaction changed to a trend after outliers (based on the interquartile rule) were excluded (Wald chi-square (2, 73) = 4.67, p = .09).

At a coarse time scale (Fig 3, right) we found a similar main effect of session (week-2 vs week-5: Wald chi-square (1, 73) = 6.96, p = .008) indicating an overall decrease in EEG complexity from week-2 to week-5. Additionally we observed a significant main effect of the sleep stage (NREM, REM, WAKE) (Wald chi-square (2, 73) = 87.4, p < .001). A post-hoc Tukey test for the factor sleep state revealed higher entropy during REM compared to NREM (mean(SE) = .02(.004); Z = 5.47, p < .001), also higher entropy during REM compared to WAKE (mean(SE) = .03(.004); Z = 8.81, p < .001), as well as higher entropy during NREM compared to WAKE (mean(SE) = .01(.004); Z = 2.42, p = .04) Moreover, we observed that the pattern of relative entropy levels (across sleep stages) reverses at coarse temporal scale (Fig 3, right) compared to fine temporal scale (Fig 3, left); such that WAKE shows the lowest entropy level for coarse temporal scale. The interaction between session and sleep stage was found to be marginally significant (Wald chi-square (2, 73) = 6.0, p = .051). A post-hoc Tukey test showed that NREM is distinguishable from REM but only during week-5 (mean(SE) = .03(.005); Z = 5.79, p < .001).

Similarly, we also evaluated changes in power spectral densities (PSD) across sleep stages and recording session. We found an increase in spectral power from week-2 to week-5, but only during NREM (Fig 4, left). This pattern was observed also for central electrodes and to a limited degree (2–4Hz frequency bins) for occipital ones (cf. S5 Fig). For Spearman’s rho correlations between spectral and entropy features please see S9 Fig.

Fig 4. Average log-log-scale PSD spectra per sleep stage over frontal electrodes.

Fig 4

The shaded area highlights statistical differences between week-2 and week-5 recordings. The dashed lines represent the standard error of the mean. Note that only NREM shows differences in the PSD spectra between age groups and the developing 9–14 Hz peak exclusively observed at week-5 of age.

Machine learning classification

Influence of the feature extraction on the classification performance: PSD vs MSPE

Multiscale permutation entropy improves the overall classification accuracy as compared to PSD (cf. S1 Fig). The per-class evaluation shows that MSPE improves performance in the discrimination of all classes, but in particular for WAKE (73% instead of 59% median accuracy) wherefore we decided to focus on MSPE in all subsequent classification analyses.

Separate classification for week-2 and week-5—Within age-group

To automate sleep staging we used machine learning and previously extracted entropy features. We first performed within age-group classification and evaluated results from the two independent and age-specific classifiers. That is, we computed the performance scores of a classifier that was both train and tested on data from either (1) weekd-2 or (2) week-5 recording. Accuracy scores across all 3 sleep stages were significantly higher than would be expected by chance. In randomization test across 3 classes, chance level was at ~33%. For both the week-2 and week-5 classification (cf. Fig 5, left upper panel). Moreover, the classifier performed better on the week-5 (Mdn = 72.7%) as compared to the week-2 (Mdn = 60.1%) babies (U = 1, Z = 6.5, p<0.001). The per-class evaluation (cf. Fig 5, left lower panel) shows higher F1 classification scores during week-5 (Mdn = 63.1%) as compared to week-2 for REM (Mdn = 53.2%); (U = 24, Z = 5.2, p<0.001), as well as for WAKE (Mdn = 83.6% vs. Mdn = 58.5%; U = 1, Z = 6.6, p<0.001).

Fig 5. Classification of sleep states in the week-2 and week-5 old newborn (left panel) and its generalization across age (right panel).

Fig 5

Sleep in older (5 week old) babies can be classified more accurately as compared to 2 weeks (left panel). Right panel shows that the “generalization” or classification across age-groups leads to lower classification accuracy specifically for detecting stage NREM (week-5 training, week-2 test), but also stage REM (week-2 training, week-5 test). Vertical histograms represent the null distribution, with the empirically estimated chance levels in both cases being close to 33% (red dashed line). 95% confidence-intervals (for both accuracy and F1 scores) are displayed on the basis of bootstrap analysis.

A confusions matrix (cf. S3 Fig) as well as visual inspection of the single subject results (cf. S7 Fig for an exemplary subject) reveals that a very limited proportion of NREM epochs is actually falsely classified as WAKE (on average 3%) or vice versa (8% of the epochs). During week-2 the classifier is worst in distinguishing REM and WAKE, while at week-5 the classification generally increases and NREM and REM are sometimes wrongly assigned.

Cross classification between week-2 and week-5—Across age-groups

In a second and last step we tested whether the classifier can generalize across age groups. That is we trained the classifier on a given age-group and looked at the classification accuracy onto the other age-group (cf. Fig 5, right).

Compared to the within-session classification (week-2 to week-2 and week-5 to week-5), the cross-session (week-5 to week-2 and week-2 to week-5) classification accuracy was decreased by 7.7% and 9.9%, respectively. Classification however remained well above chance level (~33%) and was better when the classifier was trained on week-2 and tested on week-5 data (Mdn = 62.8%) as vice versa (Mdn = 53.1%) (cf. Fig 5, right). Interestingly, the per-class evaluation shows that detection of NREM deteriorates when trained on week-5 (and tested on week-2) data, and REM classification deteriorates when trained on week-2 (and tested on week-5) data only. The classification of state WAKE remains relatively stable indicating that the way this stage presents itself in the EEG remains widely constant from week-2 to week-5 in the newborn.

Uncovering the black box of classification– feature importance and decision boundaries

Evaluation of channel importance from a trained random forest classifier revealed that primarily the horizontal EOGs, frontal brain channels (F4, F3) as well ECG contribute to a good classification accuracy (S8 Fig). Visualization of classification boundaries of a trained random forest classifier in addition confirms previous results and reveals more compact and distinct sleep/wake classes at week-5 as compared to week-2 (see S8 Fig for details).

Discussion

The main focus of this study was the development of an automatic sleep staging technique in order to sleep stage newborn sleep as early as 2 to 5 weeks after birth. As babies at this age also sleep a significant proportion of time (and irrespective of environmental stimulation such as noise) we also had the opportunity to study sleep and associated brain dynamics in this early age group. However, please note that we do not claim that our newborn sleep data is necessarily representing natural sleep at that early age as auditory stimulation is ongoing half of the time in our study protocol.

Generally newborns are known to sleep up to 16–17 hours a day. We found that the most dominant behavioral state in our study is actually active sleep or state REM (week-2: 60.6%, week-5: 57.2% of total time). Indeed, several earlier studies have shown that newborns spend more than half their time in REM [2]. Its proportion, however, gradually decreases within the first 12 weeks of life [39], which is interpreted as an ongoing adaptation of the sensory system to the environment [2]. In contrast the mean percentage of time for NREM increases from week-2 (11.5%) to week-5 (19%). This proportional increase in NREM is believed to reflect a gradual shift towards the adult pattern known as slow wave sleep (SWS) [2] which is known to be important for recovery as well as brain plasticity or learning. However, it is worth mentioning that testing longitudinally within subject, we found no significant changes in the percentage of NREM. It was likely due to limited sensitivity in this measure, magnified by small number of available subjects with NREM during week 2.

Over the first weeks of life the human brain is growing at a rapid rate establishing a complex network that includes trillions of synaptic connections [40]. Accordingly, a continuous increase in brain signal complexity could be expected. Instead, we observed a clear decrease in EEG complexity from week-2 to week-5 in present data. Although similar findings have already been reported for both entropy [25] as well as spectral [2628, 41] EEG brain measures, the understanding of what might account for this effect remains limited. Most of the aforementioned authors point to a decline in bursting activities (known also as spontaneous “activity transients” or “delta brushes”), which are abundant in premature infants, but which remain detectable until about the end of the first month of life [42].

In contrast to prior approaches, where temporally more unspecific entropy measures were used [25], we used multi-scale entropy (MSE) providing more details about temporal scale or frequency band which may contribute to the effect. In our data we found that EEG entropy decreases with age (especially during NREM and REM). This change was observed not only at the fast temporal scale, (mixture of low and high frequencies), but also at the slow temporal scale (slow frequencies only), suggesting a specific bandwidth or temporal scale being involved. Indeed, we also observed a significant increase in EEG spectral power at 1–15Hz (during NREM). The observed entropy decrease and corresponding power increase is likely linked to the emergence of sleep spindles taking place between week-2 and week-8 after birth [41]. The fact that frontal channels have been identified as the ones with lowest entropy (fast temporal scale) could be related to infant (1.5–6 months) sleep spindles which are prominent over the fronto-central area [43]. Nevertheless, this result needs to be interpreted with caution. It is widely acknowledged that frontal EEG channels are impacted most by eye blinks and muscles artifacts. Despite using robust entropy measure, we cannot exclude the possibility that some part of the effect may be driven by non-neural sources.

Most of the before mentioned studies focus exclusively on sleep periods (as it constitutes about 70% of the time in newborns from birth to 2 months), and leave out periods of wakefulness. It was therefore rather unclear whether foremost “quiet” NREM and “active” REM sleep states show these extensive changes early in development or if similar changes in the “brain signatures” are found in wakefulness during the first weeks of life. Indeed, data indicate that a drop in entropy at the fast temporal scale is exclusive to NREM and REM sleep states, with no significant changes during wakefulness. Likewise, we observed an increase in EEG spectral power only during NREM. These findings support the first line of interpretation, as in adults, newborns sleep spindles are observed during NREM sleep [2, 41]. Interestingly, the age-related decrease in entropy at a slow temporal scale is also observed only for NREM which supports the idea of a reorganization of brain oscillations, and the formation of more stable sleep-wake cycles with dominant slow waves during NREM.

Finally, in adults the relative entropy level across sleep stages is strongly time-scale dependent. At fast temporal scale EEG entropy follows a pattern of WAKE > REM > N2 > N3 whereas at slow temporal scale this pattern reverses (N3 > N2 > N1 > REM > WAKE) and is interpreted as a reflection of mainly local information processing during WAKE and increasingly global synchronization during NREM (reaching its culmination during N3) [15]. Our result shows that in neonates this pattern reverses partially. Both NREM and REM sleep show higher entropy than WAKE, however there is no significant difference between NREM and REM at the slow temporal scale which suggests that global connections may not be yet fully functional likely due to incomplete myelination of the newborns brain [44]. In relation to PSD data, we observed that high frequency spectral components correspond to higher entropy (when compared across sleep stages), which agrees with previous studies suggesting a link between fast scale entropy and PSD slope [19, 24].

In addition to this descriptive analysis of baby EEG data in the first weeks of life, we developed a neonatal sleep classifier by employing machine learning using entropy-based features. We reveal that our classifier performs well over chance (60.1% week-2, 72.7% week-5) and is close to human scoring performance with adapted scoring rules (inter-scorer agreement of 80.6%, kappa score of 0.73 [12]). Crowell and colleagues [11] for example reported moderate inter-scorer reliability for infant sleep staging, with kappa coefficient going below 0.6.

However it has to be noted, that visual sleep scoring of neonatal recordings is particularly difficult, even for experienced sleep experts. In order to reach an acceptable inter-scorer agreement intensive training and careful attention to scoring specifications (which even vary in the literature) are required [11], which undermines the practical applicability of this approach. Note that even in adults the interrater agreement rarely exceeds 80% agreement [45]. As a matter of fact, 12 out 72 recordings in our sample have been classified as “difficult” by our scoring expert (Scholle S.;[30]). Considering the uncertainty in the ‘ground truth’, the obtained results are notably high. In fact after excluding the 12 “difficult” recordings the overall classification raises by about 6% which confirms presence of mislabeling in the dataset. The high classification accuracy of our automatic classifier also demonstrates that complexity measures are able to capture and quantify essential characteristics of vigilance states in newborns. Recently an independent group of researchers has used a similar approach to study sleep in preterm infants finding a significant correlation between the EEG complexity and the infant’s age (ranging from 27 to 42 weeks) [46]. In contrast to the aforementioned study (where sample entropy was used), our approach was based on robust permutation entropy. A limitation of our current approach is certainly that we did not yet include respiration (mainly due to practical reasons, e.g., to ensure a rapid recording-start in the newborns) and that our “ground truth” scoring is lacking a double scoring. Given that we included a pioneer (Scholle S.) in infant sleep staging which carefully reviewed all our baby PSG recordings (including simultaneous video recordings and Prechtl vigilance-scorings from the recordings) we believe that our manual scoring is as good as it can get for the moment. Future studies should still try to add respiration and independent second scorers which potentially would lead to further improvement of classification accuracy.

The significantly higher overall classification performance for week-5 as compared to week-2 recordings indicates that the neonatal sleep/wake states become more distinguishable within a very short time spanning only about 3 weeks of early human development. For across age-groups classification we observed overall decrease in accuracy as compared to within-group classification, which indicates marked "dissimilarity" in the sleep organization between the two sessions. It is worthy of note that although fully consistent with our univariate entropy results, the classification approach models simultaneously all temporal scales and all channels (including EMG, ECG, EOGs), which provides a more complete picture on the entropy changes across time. Hence, to account for the rapid developmental changes in newborns and thus further improve the automated sleep classification, we suggest using ‘transfer learning’ procedure, known from recent application in deep-learning. In this scenario a classifier could be pre-trained on larger dataset and further fine-tuned on a specific age-group.

Interestingly testing on week-5 (when trained on week-2) is outperforming testing on week-2 (when trained on week-5) classification. This seems surprising as in case of cross-session (or -condition) classification it is in general more efficient to train a classifier on data with clear class-boundaries (high signal-to-noise; in our case week-5) and test it on data with less definite class-boundaries (low signal-to-noise; in our case week-2) than the other way around [47]. In our case however, this difference in signal-to-noise ratio has been counterbalanced by the sleep specific entropy decrease from week-2 to week-5, which manifests itself as decreased performance in classifying both NREM (week-5 training, week-2 test) and REM (week-2 training, week5-test) with a 24% and 14% accuracy drop, respectively. Interestingly there is no change for stage wake when applying cross-session classification. This suggests that developmental changes in signal complexity are more pronounced for NREM and REM states whereas stage wake may be already widely developed.

Finally yet importantly we found that horizontal EOGs, frontal brain channels as well ECG contribute the most in the classification. This suggests that development of sleep patterns is not only associated with neural, but also with physiological changes (e.g., eye movement) and that these features may be crucial markers for the manually labeled sleep stages. In summarizing we observed massive developmental changes on the brain-level in the first 5 weeks of life in human newborns. These changes were limited to “quiet” NREM and “active” REM sleep and showed an unexpected drop of signal complexity from week-2 to week-5. In addition our classifier data demonstrated that we can classify well above chance and similar to human scorers using multi-scale permutation entropy (using just 6 EEG and 5 physiological channels). Altogether, these results highlight the need to perform electrophysiological studies during the first weeks of life where rapid changes in neuronal development and related brain activity can be observed.

Supporting information

S1 Fig. Influence of the feature extraction procedure on the classification performance.

Multiscale permutation entropy as compared to PSD boosts discrimination of sleep stages. MSPE improves classification especially of WAKE, also REM and slightly NREM class (note the diagonals of the lower panels).

(TIF)

S2 Fig. Cross-validation schemes.

Splitting into training (in blue) and testing (in green) sets was performed within sessions (upper row) or across sessions (lower row). Note, that half of the subjects are used to train and half to test (two-fold cross validation), with both sessions of a single subject (week 2 and 5) always being in separate sets.

(TIF)

S3 Fig. Confusion matrices summarizing the classification results for week-2 and week-5 (MSPE-based classifier).

Note the off-diagonals showing limited proportion of NREM falsely classified as WAKE (on average 3%) and similarly WAKE falsely classified as NREM (on average 8%).

(TIF)

S4 Fig. Comparison of MSPE at fast time scale between sleep stages and the two recording sessions (week-2 vs week-5 data)—Central and occipital channels.

Note that overall entropy at the fast temporal scale is lower over frontal (see Fig 3, main text) as compared to both central and occipital channels (left panels).

(TIF)

S5 Fig. Average log-log-scale PSD spectra for the individual sleep stages for central and occipital electrodes.

The shaded area shows statistical difference between week-2 and week-5. Note that similarly to frontal channels (main text), there is a clear difference in PSD also for central channels during NREM.

(TIF)

S6 Fig. Comparison of multi-scale permutation entropy values across different scales between sleep stages and sessions.

Points represent the averages and error bars show the 95% bootstrap confidence intervals. Note that results for scale = 1 and scale = 5 (X-axis) correspond to the results presented in the main text. Maximal entropy in NREM and WAKE (also REM) is attained at different temporal scale (X-axis). Also, similarly to the main text there is no statistical difference in WAKE between sessions (week-2 vs week-5) across all temporal scale.

(TIF)

S7 Fig. Exemplary single-subject classification (week-5).

Note the limited proportion of NREM epochs being falsely classified as WAKE and similarly WAKE classified as NREM.

(TIF)

S8 Fig. Channel importance extracted from a trained random forest classifier and a decision boundaries of a trained random forest classifier.

Horizontal EOGs, frontal channels as well ECG contribute most to the sleep classification (upper panel). For visualization purposes multidimensional scaling was used to reduce dimension of the MSPE data to two (lower panel, X and Y axis). Points represent epochs (N = 100 for each class), colors (red, green and blue) represent true class labels and shading (pink, light blue and light grey) shows the decision boundary. Note that in week-5 (right panel) there is more apparent overlap between the true class labels (points) and the predictions (shading) as compared to week-2, which agrees with higher classification accuracies for week-5.

(TIF)

S9 Fig. Correlations between spectral features and entropy at fast temporal scale (both sessions were merged).

Spectral features correspond to the average power values within three frequency ranges (rows). Solid red line indicates significant results. Note negative correlation between entropy and delta-theta band power during NREM.

(TIF)

Acknowledgments

We are especially grateful to S. Scholle for manual sleep staging and A. Lang for manual verification of all sleep scorings, with Prechtl scores and simultaneous video files.

Data Availability

Data are available at a public repository: https://figshare.com/articles/On_the_development_of_sleep_states_in_the_first_weeks_of_life_-_REVISED/9767567.

Funding Statement

The study was supported by a grant from the Austrian Science Fund FWF (Y-777). TW, AL, MW, were additionally supported by the Doctoral College ''Imaging the Mind'' (FWF; W1233-G17). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Iber C, Ancoli-Israel S, Chesson A, Quan S. American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Westchester: American Academy of Sleep Medicine; 2007. [Google Scholar]
  • 2.Grigg-Damberger MM. The Visual Scoring of Sleep in Infants 0 to 2 Months of Age. J Clin Sleep Med. 2016;12(3):429–45. 10.5664/jcsm.5600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ellingson RJ. Electroencephalograms of normal, full-term newborns immediately after birth with observations on arousal and visual evoked responses. Electroencephalogr Clin Neurophysiol. 1958;10(1):31–50. 10.1016/0013-4694(58)90101-9 [DOI] [PubMed] [Google Scholar]
  • 4.Whitehead K, Pressler R, Fabrizi L. Characteristics and clinical significance of delta brushes in the EEG of premature infants. Clin Neurophysiol Pract. 2017;2:12–8. 10.1016/j.cnp.2016.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Anders TF, Emde T, Parmelee A. A manual of standardized terminology, techniques and criteria for scoring states of sleep and wakefulness in newborn infants. Los Angeles, CA: UCLA Brain Information Service, NINDS Neurological information Network; 1971. [Google Scholar]
  • 6.Scholle S, Schäfer T. Atlas of states of sleep and wakefulness in infants and children. Somnologie–Schlafforschung und Schlafmedizin. 1999;3(4):163–241. [Google Scholar]
  • 7.Anders TF, Keener MA, Kraemer H. Sleep-wake state organization, neonatal assessment and development in premature infants during the first year of life. II. Sleep. 1985;8(3):193–206. 10.1093/sleep/8.3.193 [DOI] [PubMed] [Google Scholar]
  • 8.Borghese IF, Minard KL, Thoman EB. Sleep rhythmicity in premature infants: implications for development status. Sleep. 1995;18(7):523–30. 10.1093/sleep/18.7.523 [DOI] [PubMed] [Google Scholar]
  • 9.Gertner S, Greenbaum CW, Sadeh A, Dolfin Z, Sirota L, Ben-Nun Y. Sleep-wake patterns in preterm infants and 6 month’s home environment: implications for early cognitive development. Early Hum Dev. 2002;68(2):93–102. 10.1016/s0378-3782(02)00018-x [DOI] [PubMed] [Google Scholar]
  • 10.Freudigman KA, Thoman EB. Infant sleep during the first postnatal day: an opportunity for assessment of vulnerability. Pediatrics. 1993;92(3):373–9. [PubMed] [Google Scholar]
  • 11.Crowell DH, Brooks LJ, Colton T, Corwin MJ, Hoppenbrouwers TT, Hunt CE, et al. Infant polysomnography: reliability. Collaborative Home Infant Monitoring Evaluation (CHIME) Steering Committee. Sleep. 1997;20(7):553–60. [PubMed] [Google Scholar]
  • 12.Satomaa AL, Saarenpaa-Heikkila O, Paavonen EJ, Himanen SL. The adapted American Academy of Sleep Medicine sleep scoring criteria in one month old infants: A means to improve comparability? Clin Neurophysiol. 2016;127(2):1410–8. 10.1016/j.clinph.2015.08.013 [DOI] [PubMed] [Google Scholar]
  • 13.Ma Y, Shi W, Peng CK, Yang AC. Nonlinear dynamical analysis of sleep electroencephalography using fractal and entropy approaches. Sleep Med Rev. 2018;37:85–93. 10.1016/j.smrv.2017.01.003 [DOI] [PubMed] [Google Scholar]
  • 14.Jordan D, Stockmanns G, Kochs EF, Pilge S, Schneider G. Electroencephalographic order pattern analysis for the separation of consciousness and unconsciousness: an analysis of approximate entropy, permutation entropy, recurrence rate, and phase coupling of order recurrence plots. Anesthesiology. 2008;109(6):1014–22. 10.1097/ALN.0b013e31818d6c55 [DOI] [PubMed] [Google Scholar]
  • 15.Miskovic V, MacDonald KJ, Rhodes LJ, Cote KA. Changes in EEG multiscale entropy and power-law frequency scaling during the human sleep cycle. Hum Brain Mapp. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shen Y, Olbrich E, Achermann P, Meier PF. Dimensional complexity and spectral properties of the human sleep EEG. Electroencephalograms. Clin Neurophysiol. 2003;114(2):199–209. 10.1016/s1388-2457(02)00338-3 [DOI] [PubMed] [Google Scholar]
  • 17.Olofsen E, Sleigh JW, Dahan A. Permutation entropy of the electroencephalogram: a measure of anaesthetic drug effect. Br J Anaesth. 2008;101(6):810–21. 10.1093/bja/aen290 [DOI] [PubMed] [Google Scholar]
  • 18.Burioka N, Miyata M, Cornelissen G, Halberg F, Takeshima T, Kaplan DT, et al. Approximate entropy in the electroencephalogram during wake and sleep. Clin EEG Neurosci. 2005;36(1):21–4. 10.1177/155005940503600106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bruce EN, Bruce MC, Vennelaganti S. Sample entropy tracks changes in electroencephalogram power spectrum with sleep state and aging. J Clin Neurophysiol. 2009;26(4):257–66. 10.1097/WNP.0b013e3181b2f1e3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Nicolaou N, Georgiou J. The use of permutation entropy to characterize sleep electroencephalograms. Clin EEG Neurosci. 2011;42(1):24–8. 10.1177/155005941104200107 [DOI] [PubMed] [Google Scholar]
  • 21.Janjarasjitt S, Scher MS, Loparo KA. Nonlinear dynamical analysis of the neonatal EEG time series: the relationship between sleep state and complexity. Clin Neurophysiol. 2008;119(8):1812–23. 10.1016/j.clinph.2008.03.024 [DOI] [PubMed] [Google Scholar]
  • 22.Wislowska M, Giudice RD, Lechinger J, Wielek T, Heib DP, Pitiot A, et al. Night and day variations of sleep in patients with disorders of consciousness. Sci Rep. 2017;7(1):266 10.1038/s41598-017-00323-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.McIntosh AR, Kovacevic N, Itier RJ. Increased brain signal variability accompanies lower behavioral variability in development. PLoS Comput Biol. 2008;4(7):e1000106 10.1371/journal.pcbi.1000106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Waschke L, Wostmann M, Obleser J. States and traits of neural irregularity in the age-varying human brain. Sci Rep. 2017;7(1):17381 10.1038/s41598-017-17766-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang D, Ding H, Liu Y, Zhou C, Ding H, Ye D. Neurodevelopment in newborns: a sample entropy analysis of electroencephalogram. Physiol Meas. 2009;30(5):491–504. 10.1088/0967-3334/30/5/006 [DOI] [PubMed] [Google Scholar]
  • 26.Sterman MB, Harper RM, Havens B, Hoppenbrouwers T, McGinty DJ, Hodgman JE. Quantitative analysis of infant EEG development during quiet sleep. Electroencephalogr Clin Neurophysiol. 1977;43(3):371–85. 10.1016/0013-4694(77)90260-7 [DOI] [PubMed] [Google Scholar]
  • 27.Scher MS. Neurophysiological assessment of brain function and maturation. II. A measure of brain dysmaturity in healthy preterm neonates. Pediatr Neurol. 1997;16(4):287–95. 10.1016/s0887-8994(96)00009-4 [DOI] [PubMed] [Google Scholar]
  • 28.Myers MM, Grieve PG, Izraelit A, Fifer WP, Isler JR, Darnall RA, et al. Developmental profiles of infant EEG: overlap with transient cortical circuits. Clin Neurophysiol. 2012;123(8):1502–11. 10.1016/j.clinph.2011.11.264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Scholle S, Zwacka G, Scholle HC. Sleep spindle evolution from infancy to adolescence. Clin Neurophysiol. 2007;118(7):1525–31. 10.1016/j.clinph.2007.03.007 [DOI] [PubMed] [Google Scholar]
  • 30.Scholle S, Feldmann-Ulrich E. Polysomnographic atlas of sleep-wake states during development from infancy to adolescence. Landsberg, Germany: Ecomed Medizin; 2012. [Google Scholar]
  • 31.Einspieler C, Prechtl HF. Prechtl’s assessment of general movements: a diagnostic tool for the functional assessment of the young nervous system. Ment Retard Dev Disabil Res Rev. 2005;11(1):61–7. 10.1002/mrdd.20051 [DOI] [PubMed] [Google Scholar]
  • 32.Li D, Li X, Liang Z, Voss LJ, Sleigh JW. Multiscale permutation entropy analysis of EEG recordings during sevoflurane anesthesia. J Neural Eng. 2010;7(4):046010 10.1088/1741-2560/7/4/046010 [DOI] [PubMed] [Google Scholar]
  • 33.Costa M, Goldberger AL, Peng CK. Multiscale entropy analysis of biological signals. Phys Rev E Stat Nonlin Soft Matter Phys. 2005;71(2 Pt 1):021906 10.1103/PhysRevE.71.021906 [DOI] [PubMed] [Google Scholar]
  • 34.Bates D, Maechler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015;67(1):1–48. [Google Scholar]
  • 35.Team RC. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2013. [Google Scholar]
  • 36.Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biom J. 2008;50(3):346–63. 10.1002/bimj.200810425 [DOI] [PubMed] [Google Scholar]
  • 37.Jamalabadi H, Alizadeh S, Schonauer M, Leibold C, Gais S. Classification based hypothesis testing in neuroscience: Below-chance level classification rates and overlooked statistical properties of linear parametric classifiers. Hum Brain Mapp. 2016;37(5):1842–55. 10.1002/hbm.23140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–30. [Google Scholar]
  • 39.Ellingson RJ, Peters JF. Development of EEG and daytime sleep patterns in normal full-term infant during the first 3 months of life: longitudinal observations. Electroencephalogr Clin Neurophysiol. 1980;49(1–2):112–24. 10.1016/0013-4694(80)90357-0 [DOI] [PubMed] [Google Scholar]
  • 40.Khazipov R, Luhmann HJ. Early patterns of electrical activity in the developing cerebral cortex of humans and rodents. Trends Neurosci. 2006;29(7):414–8. 10.1016/j.tins.2006.05.007 [DOI] [PubMed] [Google Scholar]
  • 41.Jenni OG, Borbely AA, Achermann P. Development of the nocturnal sleep electroencephalogram in human infants. Am J Physiol Regul Integr Comp Physiol. 2004;286(3):R528–38. 10.1152/ajpregu.00503.2003 [DOI] [PubMed] [Google Scholar]
  • 42.Vanhatalo S, Palva JM, Andersson S, Rivera C, Voipio J, Kaila K. Slow endogenous activity transients and developmental expression of K+-Cl- cotransporter 2 in the immature human cortex. Eur J Neurosci. 2005;22(11):2799–804. 10.1111/j.1460-9568.2005.04459.x [DOI] [PubMed] [Google Scholar]
  • 43.Louis J, Zhang JX, Revol M, Debilly G, Challamel MJ. Ontogenesis of nocturnal organization of sleep spindles: a longitudinal study during the first 6 months of life. Electroencephalogr Clin Neurophysiol. 1992;83(5):289–96. 10.1016/0013-4694(92)90088-y [DOI] [PubMed] [Google Scholar]
  • 44.McArdle CB, Richardson CJ, Nicholas DA, Mirfakhraee M, Hayden CK, Amparo EG. Developmental features of the neonatal brain: MR imaging. Part I. Gray-white matter differentiation and myelination. Radiology. 1987;162(1 Pt 1):223–9. 10.1148/radiology.162.1.3786767 [DOI] [PubMed] [Google Scholar]
  • 45.Magalang UJ, Chen NH, Cistulli PA, Fedson AC, Gislason T, Hillman D, et al. Agreement in the scoring of respiratory events and sleep among international sleep centers. Sleep. 2013;36(4):591–6. 10.5665/sleep.2552 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wel OD, Lavanga M, Dorado A, Jansen K, Dereymaeker A, Naulaers G, et al. Complexity Analysis of Neonatal EEG Using Multiscale Entropy: Applications in Brain Maturation and Sleep Stage Classification. Entropy. 2017;19(10):516. [Google Scholar]
  • 47.King JR, Dehaene S. Characterizing the dynamics of mental representations: the temporal generalization method. Trends Cogn Sci. 2014;18(4):203–10. 10.1016/j.tics.2014.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Daniele Marinazzo

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

16 Jul 2019

PONE-D-19-18265

On the development of sleep states in the first weeks of life

PLOS ONE

Dear Dr. Schabus,

Thank you for submitting your manuscript to PLOS ONE. Your paper was overall well received and definitely has merit. Still, some important issues need to be addressed. Also please note that one reviewer had already submitted their review before receiving your data. As I wrote you per email, please make an extra effort in annotating the data, and in providing instructions for the complete reproducibility of your results. We would appreciate receiving your revised manuscript by Aug 30 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Daniele Marinazzo

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

1. Please amend the subsection category “[FOR JOURNAL STAFF USE ONLY]” for your manuscript. Unfortunately, this is not a valid category. At this time, please choose one or more subsections that best represent the topic(s) of your study.

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall the paper is well organised, clear, straight to the point and easy to understand.

I only have few minor comments that the authors may consider to keep into account for a revised version of the ms.

I would avoid the term "well-sized sample" when defining the number of newborns included in the study, at least if the authors did not perform any analysis to define the sample size.

The authors report that the MSPE was calculated for non-overlapping 30s segments. Do the authors investigated the possible time-window effect? How much the reported results may depend on this "arbitrary" choice?

The authors state that "To provide equal number of observation for each subject and for each sleep stage MSPE (or PSD) values were averaged first across EEG channels, and next across a random sample of 10 epochs". If I understand correctly, the authors started with a 128 channels EEG and ended with features extracted from the "average" of 6 EEG channels. Isn't this a bit odd? May the authors comment on this?

I would avoid to define "marginally significant" an interaction with p=.086. At least, if they do not consider "marginally no significant" the p-values equal to 0.02...

Probably, the - only - residual main concern (but fully disclosed by the author in the limitations section) is related with the lacking of a double (manual) scoring.

Reviewer #2: Wielek et al. (1) describe the power and multiscale permutation entropy features of babies’ wake, REM and non-REM states as a function of early development and (2) automatically classify those stages with such features. They validate the automatic classification using manual labeling techniques. Characterizing developmental changes in the spectral and complex signatures that differentiate various arousal/sleep states is highly relevant both to developmental science as well as cognitive neuroscience, as is knowledge regarding the practical utility of automatically predicting sleep stages on the basis of these features. The use of spectral and entropy features is well motivated and the associated references are well chosen. Wielek et al. find that across all channels, fast permutation entropy increased from REM to REM and wake, whereas the opposite pattern was observed at slow scales. Furthermore, entropy during sleep decreased with early development, whereas no power or entropy changes were observed during wake. Changes in entropy were not equally observed in different power bins and entropy (mostly from frontal and physiological channels) better recovered the manual sleep stage labels than spectral power. NREM and wake states could be recovered with little confusion, aligning with the observed entropy gradient across arousal states.

Most of the manuscript is clearly framed and motivated, while results are presented in a clear fashion. Nonetheless, I found some of the results difficult to interpret due to methodological and conceptual limitations, which I will describe in more detail below. Given the relevance of the topic, I would recommend the acceptance of this manuscript conditional on changes made referring to the points below.

Major points

1. One general limitation of the approach by Wielek et al. is that it relies on an accurate manual scoring, which the authors sufficiently note in the text. Due to the centrality of accurate scoring, I was surprised that the paper did not describe changes in sleep stages between sessions in more detail and discuss whether and how their results deviate from previous work looking at early development and sleep. Under the assumption that manual sleep scoring is the gold standard, manually labeled sleep stages should already offer insight into development, although I could surprisingly not find results pertaining to e.g. the relative duration of classes.

2. The dual aims of this manuscript (1: describing entropy & PSD changes with arousal states and age; 2: decoding such states) are not perfectly intersected. For example, decoding of sleep stages appears to be maximal from frontal channels, but the description of entropy & PSD features is done on the average across all channels. Therefore, both parts of the manuscript appear rather segmented.

Scoring is described as difficult due to artifacts. This would also affect the features used for manual labeling. Sleep stage scoring appears to be driven primarily by physiological as well as frontal channels that are highly prone to muscular contributions, suggesting that automatic sleep stage scoring may be driven primarily by philological channels as well as frontal channels that are highly prone to muscular contribution. Thus, decoding accuracy may derive to a large part also from non-neural sources, which fundamentally constrains arguments related to ‘cortical development’ This should be more clearly stated and discussed in the manuscript as it challenges the interpretation as stemming from neural sources.

3. More generally, the mechanistic interpretation of the observed entropy effects is difficult here, due to the potential nonlinear contributions to the measure. In contrast to the observed differences in entropy and limited modulation of rhythmic features, much of the discussion focusses on rhythmic signatures. This may relate at least in part to the way these binned values are calculated. A global power normalization is uncommon and produces effects that are very difficult to interpret. Due to the 1/f distribution of the power spectrum, even high frequency power will always be normalized to the predominant low-frequency power. This is readily observed also from the plots (see Figure 4, 1-3Hz has relative power around 90%, whereas beta power is in the range of 5%.) This is especially problematic, as the normalization is applied regardless of sleep stage, which presumably consist of different rations of low-to-high-frequency content. Unequal distributions of sleep stages as observed in the present data may therefore create unequal baselines. It’s further not clear how this normalization would ‘facilitate comparison between session’ (l. 196 f.) as session differences could be expressed in different baselines. Concerns regarding normalization choices could be alleviated by presenting average PSD spectra for the individual sleep stages as has been done in Figure S6 for entropy.

4. Linked to the relevance of considering the entire spectrum rather than narrow bands/bins, preliminary evidence suggests that sleep stages may be differentiated by the slope of the arrhythmic 1/f spectrum (Lendner et al., 2019, bioRxiv). Notably, fast scale entropy is often directly related to 1/f slopes (Bruce et al., 2009, Waschke et al., 2017, Vakorin & McIntosh, 2012), suggesting that a similar link may also exist in the present data. [On a side note, this link of a single scale to a multiscale property such as the 1/f spectrum questions to some extent the notion that prior approaches used ‘temporally more unspecific entropy measures’ (l. 443).] High convergence between these measures would provide more information about the interpretation of fine-scale permutation entropy here.

5. A big advantage of this dataset appears to be the longitudinal (vs. cross-sectional) nature of the design, which was surprisingly not overtly noted. On the negative side, there does not appear to be a habituation session. The session effect could thus at least in part also reflect a retest effect due to habituation effects, which should be noted.

Minor points

-L. 30: ‘the baby’s brain signal complexity (and spectral power) revealed huge developmental changes in sleep in the first 5 weeks of life’. As no effect size measures were provided and effects visually are rather constrained, I would refrain from using the word ‘huge’ here. The same goes for statements such as ‘massive drop’ (l. 458).

-L. 100ff.: ‘a big practical advantage is that entropy-based features, such as permutation

-entropy, are typically more robust against common EEG artifacts as compared to spectral measures (Bandt et al., 2002)]’. This statement and reference are questionable. The reference merely shows that nonlinear features can be robustly identified even in the presence of noise. But no comparison to spectral features is provided. Furthermore, strong noise would also impact permutation entropy, especially if it is strong enough to limit manual labeling as suggested by the authors.

-The section on Participants and EEG should add a description of the criteria for starting and stopping the approx. 30 min recordings were.

-L151f.: “Segments with artifacts were rejected based on simple power spectral density (PSD) thresholds”

oNo information is available regarding what these thresholds were.

-L93: “such as Fast Fourier”. The full title of ‘Fast Fourier Transform’ would be necessary here.

-I found particular aspects of the statistical procedure questionable. First, all data are averaged across EEG channels and then 10 random samples were selected to equate epoch amounts. Here, some sort of random re-sampling (e.g. bootstrapping) should be considered. Regarding the averaging procedure, Figure S4 provides an interesting contrast of frontal and occipital channels. Such a contrast makes sense given that the decoding analysis suggests a stronger representation of sleep stage information at frontal channels. However, no statistics appear to be used, especially no statistics support the claim that the ‘difference between REM and WAKE at week-5 is more pronounced at the frontal channels’ (lines 690ff.). In Figure S1, inference on which stage decoding is improved between features appears to lack statistics.

-Regarding the power results, effects are observed in the beta and delta band. These bands are infamous for muscle artifact contamination in EEG recordings and some strong outliers are present in the data. The possible influence of artifacts should at least be discussed, although it would be even better to supplement such discussion with power spectra of the data.

-The presentation of individual data points is appreciated. However, this reveals that some conditions include clear outliers (e.g. Slow entropy NREM; especially relevant as much of the discussion focusses on the potential relevance of delta oscillations). These should be controlled for in the statistical analysis.

-Figure 1D is supposed to schematically display that permutation entropy at different levels of coarse-graining can differentiate between different sleep stages. It is unclear how these three exemplary traces were chosen, what the error associated with them is etc. Even for a schematic example, this is misleading for inference purposes. Why not plot MSPE values in addition to power values as depicted in plot C? Or accumulate across all time points within sleep stages to get estimates with an indication of the associated error.

-The bars in the upper plots of Figure 5 are missing labels.

-The provided Figures were of a noticeably low resolution. High-quality vector images would be more appealing for final publication.

-‘This confirms that not all changes in EEG complexity are reflected by changes in power’ (l. 449 ff.). This is a very strong statement that cannot be backed up by the data. Differences in the power spectrum (e.g., 1/f slopes) other than the bins tested could also covary with age and sleep stage.

-‘In our case however, this difference in signal-to-noise ratio has been counterbalanced by the sleep specific entropy decrease from week-2 to week-5, which manifests itself as decreased performance in classifying NREM (week-5 training, week-2 test).’ (l. 520 ff.). It is unclear to my why these two observations are claimed to be related? Fast scale entropy appears to decrease during sleep with age both for NREM and REM sleep. Why would this exclusively affect the classification of NREM?

-‘This confirms that developmental changes’ (l. 524). Very strong, and in my view unsupported strength of the conclusion. Better go with ‘suggest’ etc.

-‘whereas stage wake may be (oscillatory-wise) already widely developed’ (l. 525 f.). To the extent that the power measures here can be interpreted as ‘oscillatory’, the results suggest that also the sleep stages are already widely developed as no pairwise session effects were observed for any frequency bin in any sleep stage.

-The current manuscript still contains grammatical errors. Please consider carefully assessing the manuscript again and correcting those in a revision.

omight render rather low reproducibility (89); cf. yield

oduring quite sleep (116); cf. quiet

oThe classifier is later called as MSPE-based (189).

owe restrict our (260); cf. restricted

oat fast scale (311); cf. a fast scale/ fast scales

o…

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2019 Oct 29;14(10):e0224521. doi: 10.1371/journal.pone.0224521.r002

Author response to Decision Letter 0


10 Sep 2019

Reviewer #1

• I would avoid the term "well-sized sample" when defining the number of newborns included in the study, at least if the authors did not perform any analysis to define the sample size.

Reply: We agree, corrected as suggested. LINES 125-127

• The authors report that the MSPE was calculated for non-overlapping 30s segments. Do the authors investigate the possible time-window effect? How much the reported results may depend on this "arbitrary" choice?

Reply: We computed MSPE for 30s segments (3750 data points) to match the analysis with standard sleep scoring procedure. We did not check the effect of window length empirically. Based on the literature however, we believe that the window-length is sufficient for the analysis. According to the original paper by Bandt on the permutation entropy, window length ‘should be considerably larger’ than the number of possible ordinal patterns [1]. Similarly Cao and colleagues found that windows of 512, 1024 and 2048 data points (EEG segments) give very similar results, and concluded that within this range, the exact choice of the window length is not critical [2]. In our analysis (multiscale permutation entropy) the effective window size depends on the coarse-graining applied (scale) such that the length of the original signal (3750 data points) gets reduced by the scale factor. The number of possible patterns is always 6. Thus for all degrees of coarse-graining (scales) applied, the above requirements are fulfilled. In this exemplary calculation we calculate window length (W) for the largest time scale used (scale=5):

- scale=5

- W = 3750/5

- W >> 6 and 512 < W < 2048

Hypothetically considered, if the window length were too short, the computed signal entropy would be underestimated (due to inaccurate estimation of the distribution of patterns). In our analysis such a bias would affect especially slow time scale MSPE (scale=5) rather than fast time scale MSPE (scale=1) as the window length is more strongly reduced in the former case (3750/5 compared to 3750/1).

1. Bandt C, Pompe B. Permutation entropy: a natural complexity measure for time series. Phys Rev Lett. 2002;88(17):174102.2.

2. Cao Y, Tung WW, Gao JB, Protopopescu VA, Hively LM. Detecting dynamical changes in time series using the permutation entropy. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;70(4 Pt 2):046217.

• The authors state that "To provide equal number of observations for each subject and for each sleep stage MSPE (or PSD) values were averaged first across EEG channels, and next across a random sample of 10 epochs". If I understand correctly, the authors started with a 128 channels EEG and ended with features extracted from the "average" of 6 EEG channels. Isn't this a bit odd? May the authors comment on this?

Reply:

We thank the reviewer for this comment. The reduction in the number of channels (128 to 6 channels) was actually performed in order to mimic the usual recordings available at this age and in clinical settings in general. In addition we want to compare our automated sleep staging to manual sleep scoring which always relies on a set of 2-6 electrodes on the brain level (C3 + C4 in the old Rechtschaffen and Kales standard, 2 frontal, 2 central and 2 occipital in the newer AASM standard). We agree with the reviewer that running statistics on the averages of 6 EEG channels and using 11 channels for the classification was inconsistent. In the manuscript we now do not anymore average across channels but added the channel locations as an additional independent variable to the statistical model (LINES 234-235). Also we updated Table 1 and Fig. 3 in order to illustrate the results for frontal channels (F3 and F4); central and occipital sites can be seen in the supplementary material (S4 Fig. and S5 Fig.).

Topo-plots based on the full 128 channels setup are now presented in addition in the supplements and show that the selected 6 channels sufficiently capture the spatial patterns (S9 Fig.). However given the exact cap placement and movement of babies at this age we would warn from over-interpreting these results.

• I would avoid to define "marginally significant" an interaction with p=.086. At least, if they do not consider "marginally no significant" the p-values equal to 0.02...

Reply:

We corrected as suggested. LINE 333

Reviewer #2

MAJOR POINTS

• One general limitation of the approach by Wielek et al. is that it relies on an accurate manual scoring, which the authors sufficiently note in the text. Due to the centrality of accurate scoring, I was surprised that the paper did not describe changes in sleep stages between sessions in more detail and discuss whether and how their results deviate from previous work looking at early development and sleep. Under the assumption that manual sleep scoring is the gold standard, manually labeled sleep stages should already offer insight into development, although I could surprisingly not find results pertaining to e.g. the relative duration of classes.

Reply:

We fully agree with the reviewer that this part was not sufficiently adressed in our earlier version. We now have added further details (LINES 306-313).

The revised manuscript reads:

On average 5-week old newborns spend a higher percentage of total time (19%) in NREM sleep as compared with 2-weeks old babies (11.5%). In contrast relative REM duration decreases from 60.6% at week-2 to 57.2%, at week-5, and WAKE decreases from 27.9% at week-2 to 23.8 at week-5. Statistically a significantly larger proportion of participants showed NREM during week-5 (66%) as compared to week-2 (35%) (χ² (1) = 6.81, p < .05). Using paired-samples (by including only those subjects that actually show given sleep-states in both session), we found no significant differences in the median duration of classes from week-2 to week-5

The corresponding discussion part has been altered on lines 500-507.

The revised discussion part reads:

Generally newborns sleep up to 16-17 hours a day. We found that the most dominant sleep stage is REM (week-2: 60.6%, week-5: 57.2% of total time). Indeed, several studies show that newborns spend more than half their time in REM [2]. Its proportion, however, gradually decreases within the first 12 weeks of life [38], which is interpreted as an ongoing adaptation of the sensory system to the environment [2]. In contrast the mean percentage of time for NREM increases from week-2 (11.5%) to week-5 (19%). This proportional increase in NREM reflects a gradual shift towards the adult pattern known as slow wave sleep (SWS) predominance [2].

• The dual aims of this manuscript (1: describing entropy & PSD changes with arousal states and age; 2: decoding such states) are not perfectly intersected. For example, decoding of sleep stages appears to be maximal from frontal channels, but the description of entropy & PSD features is done on the average across all channels. Therefore, both parts of the manuscript appear rather segmented.

Reply: We thank the reviewer for this remark and agree that the two sections were a bit inconsistent. Now, instead of averaging across EEG channels, we add brain location (frontal, central, occipital) as an additional independent variable into the statistical model (LINES 234-235). Also, Table 1 and Fig. 3 have been modified in order to illustrate the results for frontal channels (F3 and F4), whereas central and occipital sites have been included in the supplements (S4 Fig. and S5 Fig.).

• Scoring is described as difficult due to artifacts. This would also affect the features used for manual labeling. Sleep stage scoring appears to be driven primarily by physiological as well as frontal channels that are highly prone to muscular contributions, suggesting that automatic sleep stage scoring may be driven primarily by philological channels as well as frontal channels that are highly prone to muscular contribution. Thus, decoding accuracy may derive to a large part also from non-neural sources, which fundamentally constrains arguments related to ‘cortical development’ This should be more clearly stated and discussed in the manuscript as it challenges the interpretation as stemming from neural sources.

Reply: In general we agree with the reviewer. We have provided further discussion on possible confounding factors. LINES 504-509. But please also note that epochs with major artifacts were not scored by the manual scorer (not classified) and consequently not included in the analysis.

The revised manuscript reads:

The fact that frontal channels have been identified as the ones with lowest entropy (fast temporal scale), could be related to infant (1.5 - 6 months) sleep spindles which are prominent over the fronto-central area [44]. Nevertheless, this result needs to be interpreted with caution. It is widely acknowledged that frontal EEG channels are impacted most by eye blinks and muscles artifacts. Despite using robust entropy measure, we cannot exclude the possibility that some part of the effect may be driven by non-neural sources.

• More generally, the mechanistic interpretation of the observed entropy effects is difficult here, due to the potential nonlinear contributions to the measure. In contrast to the observed differences in entropy and limited modulation of rhythmic features, much of the discussion focusses on rhythmic signatures. This may relate at least in part to the way these binned values are calculated. A global power normalization is uncommon and produces effects that are very difficult to interpret. Due to the 1/f distribution of the power spectrum, even high frequency power will always be normalized to the predominant low-frequency power. This is readily observed also from the plots (see Figure 4, 1-3Hz has relative power around 90%, whereas beta power is in the range of 5%.) This is especially problematic, as the normalization is applied regardless of sleep stage, which presumably consists of different ratios of low-to-high-frequency content. Unequal distributions of sleep stages as observed in the present data may therefore create unequal baselines. It’s further not clear how this normalization would ‘facilitate comparison between session’ (l. 196 f.) as session differences could be expressed in different baselines. Concerns regarding normalization choices could be alleviated by presenting average PSD spectra for the individual sleep stages as has been done in Figure S6 for entropy.

Reply: We thank the reviewer for the thoughtful comment. As suggested, we present average PSD spectra instead of the normalized frequency bins. LINES 207-213.

Admittedly, the implemented corrections made the interpretation easier. We modified the discussion session accordingly. LINES 498-506.

The revised manuscript reads:

This change was observed not only at the fast temporal scale, (mixture of low and high frequencies), but also at the slow temporal scale (slow frequencies only), suggesting a specific bandwidth or temporal scale being involved. Indeed, we also observed a significant increase in EEG spectral power at 1-15Hz (during NREM). The observed entropy decrease and corresponding power increase is likely linked to the emergence of sleep spindles taking place between week-2 and week-8 after birth [3]. The fact that frontal channels have been identified as the ones with lowest entropy (fast temporal scale) could be related to infant (1.5 - 6 months) sleep spindles which are prominent over the fronto-central area [4]

• Linked to the relevance of considering the entire spectrum rather than narrow bands/bins, preliminary evidence suggests that sleep stages may be differentiated by the slope of the arrhythmic 1/f spectrum (Lendner et al., 2019, bioRxiv). Notably, fast scale entropy is often directly related to 1/f slopes (Bruce et al., 2009, Waschke et al., 2017, Vakorin & McIntosh, 2012), suggesting that a similar link may also exist in the present data. [On a side note, this link of a single scale to a multiscale property such as the 1/f spectrum questions to some extent the notion that prior approaches used ‘temporally more unspecific entropy measures’ (l. 443).] High convergence between these measures would provide more information about the interpretation of fine-scale permutation entropy here.

Reply: We thank the reviewer for this comment and added a sentence to the discussion (LINES 541-543):

In relation to PSD data, we observed that high frequency spectral components correspond to higher entropy (when compared across sleep stages), which agrees with previous studies suggesting a link between fast scale entropy and PSD slope [20, 25]

• 5. A big advantage of this dataset appears to be the longitudinal (vs. cross-sectional) nature of the design, which was surprisingly not overtly noted. On the negative side, there does not appear to be a habituation session. The session effect could thus at least in part also reflect a retest effect due to habituation effects, which should be noted.

Reply: We include additional information on data sample in the Participants and methods section (LINES 150-153):

Recording times were determined by the experimental protocol including nine 3min or 5min periods of alternating rest and auditory stimulation periods (with simple nursery rhymes). For the current study we disregard this experimental stimulation and merely focus on the changes in behavioral states over the full recording time.

Second, we put an additional clarification at the beginning of the discussion (LINES 471-477). The text reads:

Please note that the main focus of this study was the development of an automatic sleep staging technique in order to sleep stage newborn sleep as early as 2 to 5 weeks after birth. As babies at this age also sleep a significant proportion of time (and irrespective of environmental stimulation such as noise) we also had the opportunity to study sleep and associated brain dynamics in this early age group. However, please note that we do not claim that our newborn sleep data is necessarily representing natural sleep at that early age as auditory stimulation is ongoing half of the time in our study protocol.

MINOR POINTS

• -‘the baby’s brain signal complexity (and spectral power) revealed huge developmental changes in sleep in the first 5 weeks of life’. As no effect size measures were provided and effects visually are rather constrained, I would refrain from using the word ‘huge’ here. The same goes for statements such as ‘massive drop’ (l. 458).

Reply: We corrected as suggested. LINE 30 and LINE 523

• ‘a big practical advantage is that entropy-based features, such as permutation

-entropy, are typically more robust against common EEG artifacts as compared to spectral measures (Bandt et al., 2002)]’. This statement and reference are questionable. The reference merely shows that nonlinear features can be robustly identified even in the presence of noise. But no comparison to spectral features is provided. Furthermore, strong noise would also impact permutation entropy, especially if it is strong enough to limit manual labeling as suggested by the authors.

Reply: We agree with the reviewer’s assessment that a direct comparison is not possible here. We corrected the reference and rephrased the sentence by emphasizing the robustness due to symbolic/ordinal nature of permutation entropy. LINES 100-104.

It read as follows:

In contrast to FFT-based measures, symbolic measures such as permutation entropy are operating on the order of values rather than on the absolute values of a time series. This has a big practical advantage if a signal is highly non-stationary and corrupted by noise [5], as is the case with the data of newborns. For instance, noise due to high electrode impedance is less likely to affect symbolic measures such as permutation entropy [6].

• The section on Participants and EEG should add a description of the criteria for starting and stopping the approx. 30 min recordings were.

Reply: We corrected as suggested. LINES 149-153. The revised manuscript reads:

The signal was recorded continuously with a sampling rate of 500Hz over 35min (n=11) or 27min (n=31). Recording times were determined by the experimental protocol including nine 3min or 5min periods of alternating rest and auditory stimulation periods with simple nursery rhymes. For the current study we disregard this experimental stimulation and merely focus on the changes in behavioral states over the full recording time.

• “Segments with artifacts were rejected based on simple power spectral density (PSD) thresholds”

No information is available regarding what these thresholds were.

Reply: We added information as suggested; L160-164. In the manuscript it read as follows:

Electrode (impedance check) artifacts characterized by a 20Hz component were deleted semi-automatically by first visually inspecting individual recordings in the time-frequency domain and next iterating over segments. Percentile thresholding was used to exclude bad segments which resulted in an exclusion of 4.5% of total segments.

• “such as Fast Fourier”. The full title of ‘Fast Fourier Transform’ would be necessary here.

Reply: Corrected as suggested. LINE 93.

• I found particular aspects of the statistical procedure questionable. First, all data are averaged across EEG channels and then 10 random samples were selected to equate epoch amounts. Here, some sort of random re-sampling (e.g. bootstrapping) should be considered. Regarding the averaging procedure, Figure S4 provides an interesting contrast of frontal and occipital channels. Such a contrast makes sense given that the decoding analysis suggests a stronger representation of sleep stage information at frontal channels. However, no statistics appear to be used, especially no statistics support the claim that the ‘difference between REM and WAKE at week-5 is more pronounced at the frontal channels’ (lines 690ff.). In Figure S1, inference on which stage decoding is improved between features appears to lack statistics.

Reply: We thank the reviewer again for the very constructive comments. We implemented random re-sampling as suggested. Its implementation (python function named mybootstraper) can be found in the two scripts: 4_run_plot_mspe.py and 7_run_plot_psd.py. Both statistics and figure have been re-computed using the new sampling scheme. With regards to channels, we have included locations as an additional factor in the statistical model (please also see our reply to the major point 2). The sentence ‘difference between REM and WAKE at week-5 is more pronounced at the frontal channels’, has been deleted as the 3-way interaction was not significant. Regarding Figure S1, we now provide additional statistics as suggested.

The revised manuscript reads:

LINES 232-235: To provide an equal number of observations for each subject and for each sleep stage a bootstrap sample of 10 MSPE values was repeatedly (1000 times) drawn and eventually averaged.

LINES 774-776: Note that overall entropy at the fast temporal scale is lower over frontal (see Fig. 3, main text) as compared to both central and occipital channels (left panels).

• Regarding the power results, effects are observed in the beta and delta band. These bands are infamous for muscle artifact contamination in EEG recordings and some strong outliers are present in the data. The possible influence of artifacts should at least be discussed, although it would be even better to supplement such discussion with power spectra of the data.

Reply: We thank the reviewer for the comment and as suggested included average power spectra (Fig. 4 for frontal and S5 Fig. for both central and occipital channels). Also, as described above, we discuss possible influence of frontal artifacts.

• The presentation of individual data points is appreciated. However, this reveals that some conditions include clear outliers (e.g. Slow entropy NREM; especially relevant as much of the discussion focusses on the potential relevance of delta oscillations). These should be controlled for in the statistical analysis.

Reply: We agree with the reviewer. We drop outliers (as by the interquartile range rule), repeat the statistical analysis and if inferences (with and without outliers) differ, report both results.

The revised manuscript reads:

LINES 242-245: Additionally, we carefully report any differences if the exclusion of statistical outliers (as identified by the interquartile range rule) changed results significantly.

LINES 349-352: Note that session x sleep stage interaction changed to a trend when excluding outliers (based on the interquartile rule) (Wald chi-square (2, 73) = 4.67, p=.09).

• Figure 1D is supposed to schematically display that permutation entropy at different levels of coarse-graining can differentiate between different sleep stages. It is unclear how these three exemplary traces were chosen, what the error associated with them is etc. Even for a schematic example, this is misleading for inference purposes. Why not plot MSPE values in addition to power values as depicted in plot C? Or accumulate across all time points within sleep stages to get estimates with an indication of the associated error.

Reply: We thank the reviewer for this suggestion. As proposed, instead of presenting single epochs, we accumulate across time. Also color-coding of sleep classes was used for a better visibility.

• The bars in the upper plots of Figure 5 are missing labels.

Reply: We corrected as suggested, the revised manuscript reads (LINE 443):

95% confidence-intervals (for both accuracy and F1 scores) are displayed on the basis of bootstrap analysis.

• The provided Figures were of a noticeably low resolution. High-quality vector images would be more appealing for final publication.

Reply: We corrected as suggested

• ‘This confirms that not all changes in EEG complexity are reflected by changes in power’ (l. 449 ff.). This is a very strong statement that cannot be backed up by the data. Differences in the power spectrum (e.g., 1/f slopes) other than the bins tested could also covary with age and sleep stage.

Reply: We agree. The discussion section has been reorganized accordingly, LINES 495-509. The aforementioned statement was deleted.

• ‘In our case however, this difference in signal-to-noise ratio has been counterbalanced by the sleep specific entropy decrease from week-2 to week-5, which manifests itself as decreased performance in classifying NREM (week-5 training, week-2 test).’ (l. 520 ff.). It is unclear to me why these two observations are claimed to be related? Fast scale entropy appears to decrease during sleep with age both for NREM and REM sleep. Why would this exclusively affect the classification of NREM?

Reply: We agree that the statement was imprecise as accuracy for REM decreases as well. The sentence has been changed and now reads (LINES 588-592):

In our case however, this difference in signal-to-noise ratio has been counterbalanced by the sleep specific entropy decrease from week-2 to week-5, which manifests itself as decreased performance in classifying both NREM (week-5 training, week-2 test) and REM (week-2 training, week5-test) with a 24% and 14% accuracy drop, respectively.

• ‘This confirms that developmental changes’ (l. 524). Very strong, and in my view unsupported strength of the conclusion. Better go with ‘suggest’ etc.

Reply: Corrected as suggested. LINE 593

• ‘whereas stage wake may be (oscillatory-wise) already widely developed’ (l. 525 f.). To the extent that the power measures here can be interpreted as ‘oscillatory’, the results suggest that also the sleep stages are already widely developed as no pairwise session effects were observed for any frequency bin in any sleep stage.

Reply: We agree. As the sentence refers to ‘signal complexity’ mentioning oscillations was misleading. Corrected as suggested. Please note, however, that after the suggested change in PSD analysis, we observe session effect for NREM. LINE 594

• The current manuscript still contains grammatical errors. Please consider carefully assessing the manuscript again and correcting those in a revision.

omight render rather low reproducibility (89); cf. yield

Reply: Corrected as suggested. LINE 89

• oduring quite sleep (116); cf. quiet

Reply: Corrected as suggested. LINE 113 and LINE 121

• The classifier is latr called as MSPE-based (189).

owe restrict our (260); cf. restricted

Reply: Corrected as suggested. LINE 280

• fast scale (311); cf. a fast scale/ fast scales

Reply: Corrected as suggested. LINE 351

Attachment

Submitted filename: Response_to_Reviewers.docx

Decision Letter 1

Daniele Marinazzo

25 Sep 2019

PONE-D-19-18265R1

On the development of sleep states in the first weeks of life

PLOS ONE

Dear Dr. Schabus,

Thank you for submitting your revised manuscript to PLOS ONE.

Your way of addressing the issues emerging upon the first submission was overall very appreciated. A few points, detailed in the reviews below, remain to be clarified before we can accept this paper in PLOS One.

We would appreciate receiving your revised manuscript by Nov 09 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Daniele Marinazzo

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I have not further comments. All my previous comments have been addressed. The manuscript is technically sound and clearly presented.

Reviewer #2: Many thanks to the authors for the revision of the manuscript. The changes addressed many of my previous concerns and the current manuscript presents the data and their limitations more clearly. In addition, the available data documentation appears clear, although I did not run any of the scripts. There are a few points that I would still like to see addressed, including some changes introduces in the current revision. With such minor revisions, I would recommend publication in PLOS ONE.

- A more thorough discussion of decoding sleep stages from physiological channels is still missing. While the potential influence of physiological noise on the interpretation of frontal EEG recordings has been adequately noted in multiple places, the decoding results warrant a specific discussion. I recommend noting that development of sleep patterns is not only associated with neural, but also with physiological changes (e.g., eye movement output) and that these features may be crucial markers for the manually labeled sleep stages. With the provided data, it appears wrong for readers to infer that neural differences are driving classification results and this should be made explicit.

- “On average 5-weeks old newborns spend a higher percentage of total time (19%) in NREM sleep as compared with 2-weeks old babies (11.5%). […] A significantly larger proportion of participants showed NREM during week-5 (66%) as compared to week-2 (35%) […] Using paired-samples (by including only those subjects that actually show given sleep-state in both sessions), we found no significant effects differences in the median duration of classes from week-2 to week-5”. Were pair-wise statistics performed on the initial (overall) data? If these differences are significant, wouldn’t the inference be that cross-sectionally, NREM sleep increases, but not longitudinally within person? If so, this should be noted as much of the discussion ascribes the observed neural effects to an increase in NREM sleep and an associated increase in slow waves.

- Thank you for adding information on when the recordings were taken. I am still a bit unclear about the exact recording setup. Were mothers asked to perform nursery rhymes as acoustic stimulation every approx. 5 min? More information on this would be appreciated. The fact that the data were not recorded during continuous night-time sleep should also be repeated in the Introduction and Discussion section to avoid misunderstandings.

- Figure S9 is unclear to me. Why are the PSD curves repeated for each bin when they are identical? I am also not sure what the Figure is telling me about the presented bins as no statistical comparison is made here. This is complicated by the fact that value ranges are largely unreadable.

- The discussion of sleep spindle development and anticorrelated entropy is much clearer now. Given this discussion, it would be interesting to know whether spectral differences between sessions were inter-individually anticorrelated with differences in entropy, i.e., whether spectral features relate to observed entropy effects.

- 163: ‘Percentile thresholding’. 95% percentile?

- 224: ‘individual amount’. I’d recommend ‘inter-individual differences.

- 286: ‘paired-samples’. Specific that those are ‘paired-samples t-tests’.

- 347: “Note that only NREM shows differences in the PSD spectra between age groups and the developing 9–14 Hz peak exclusively observed at week-5 of age.” Differences in the alpha peak are hard to see and are more obvious at occipital channels (Figure S5). A log-log scaling in addition with error bars may aid visualization.

- 359: “reveled” vs. ‘revealed’

- 376: “We found an increase in oscillatory activity from week-2 to week-5”. I would recommend to change this to spectral power, as this does not appear to be a narrowband, i.e. oscillatory, effect. Also, the addition of error bars to Figure 4 would be beneficial.

- 471: Starting the discussion with “Please note that” appears unnecessarily defensive and should be dropped.

- 745: “Given the rapid cap placement and movement of babies at this early age we would refrain from over-interpreting these results.” This is a puzzling statement to me, at least in its current phrasing. Are the authors concerned that cap positioning strongly varied across time points and babies? Then all of the other results in the manuscript would be equally questionable as the inferences assume that the selected channels are in comparable locations.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2019 Oct 29;14(10):e0224521. doi: 10.1371/journal.pone.0224521.r004

Author response to Decision Letter 1


14 Oct 2019

1. A more thorough discussion of decoding sleep stages from physiological channels is still missing. While the potential influence of physiological noise on the interpretation of frontal EEG recordings has been adequately noted in multiple places, the decoding results warrant a specific discussion. I recommend noting that development of sleep patterns is not only associated with neural, but also with physiological changes (e.g., eye movement output) and that these features may be crucial markers for the manually labeled sleep stages. With the provided data, it appears wrong for readers to infer that neural differences are driving classification results and this should be made explicit.

Reply: We agree with the reviewer and thank for the suggestion. The revised manuscript (LINE 555-558) reads:

“Finally yet importantly we found that horizontal EOGs, frontal brain channels as well ECG contribute the most in the classification. This suggests that development of sleep patterns is not only associated with neural, but also with physiological changes (e.g., eye movement) and that these features may be crucial markers for the manually labeled sleep stages.”

2. “On average 5-weeks old newborns spend a higher percentage of total time (19%) in NREM sleep as compared with 2-weeks old babies (11.5%). […] A significantly larger proportion of participants showed NREM during week-5 (66%) as compared to week-2 (35%) […] Using paired-samples (by including only those subjects that actually show given sleep-state in both sessions), we found no significant effects differences in the median duration of classes from week-2 to week-5”. Were pair-wise statistics performed on the initial (overall) data? If these differences are significant, wouldn’t the inference be that cross-sectionally, NREM sleep increases, but not longitudinally within person? If so, this should be noted as much of the discussion ascribes the observed neural effects to an increase in NREM sleep and an associated increase in slow waves.

Reply: To compare time in a given sleep stage across sessions we used ‘standard’ pair-wise statistics which limits sample to only those subjects that actually show given sleep stage in both sessions (and in consequence reduces sensitivity/statistical power). I contrast, testing entropy changes we used more sophisticated mixed models, better preserving statistical power in the presence of missing data. We agree however that additional comment would be useful here and add following sentence in the Discussion. LINE 454-457:

“However, it is worth mentioning that testing longitudinally within subject, we found no significant changes in the percentage of NREM. It was likely due to limited sensitivity in this measure, magnified by small number of available subjects with NREM during week 2. “

3. Thank you for adding information on when the recordings were taken. I am still a bit unclear about the exact recording setup. Were mothers asked to perform nursery rhymes as acoustic stimulation every approx. 5 min? More information on this would be appreciated. The fact that the data were not recorded during continuous night-time sleep should also be repeated in the Introduction and Discussion section to avoid misunderstandings.

Reply: The presented auditory stimuli were pre-recorded, so mothers were not performing any task during PSD recording of their babies. The 3min or 5min periods mentioned in the text were achieved by repetitive presentation of a nursery rhyme that lasted approx. 41s each. We have modified the method section; LINE 148. Also additional clarification has been added in the Introduction (LINE123-125):

“It has to be mentioned that the reported data were not recorded during continuous night-time sleep period, thus may differ from natural sleep in newborns.”

With regard to the Discussion we believe that its first paragraph already provides sufficient clarification in this respect, thus no further changes were implemented. It reads as follows:

“The main focus of this study was the development of an automatic sleep staging technique in order to sleep stage newborn sleep as early as 2 to 5 weeks after birth. As babies at this age also sleep a significant proportion of time (and irrespective of environmental stimulation such as noise) we also had the opportunity to study sleep and associated brain dynamics in this early age group. However, please note that we do not claim that our newborn sleep data is necessarily representing natural sleep at that early age as auditory stimulation is ongoing half of the time in our study protocol.”

4. Figure S9 is unclear to me. Why are the PSD curves repeated for each bin when they are identical? I am also not sure what the Figure is telling me about the presented bins as no statistical comparison is made here. This is complicated by the fact that value ranges are largely unreadable.

Reply: We agree that Figure S9 presenting topo-plots and repeated psd curves is largely redundant. We decided to replace it with a new Figure S9 that shows correlations between sigma power and entropy values (please see the response to the point below).

5. The discussion of sleep spindle development and anticorrelated entropy is much clearer now. Given this discussion, it would be interesting to know whether spectral differences between sessions were inter-individually anticorrelated with differences in entropy, i.e., whether spectral features relate to observed entropy effects.

Reply: We thank the reviewer for this comment. As suggested we show correlations between spectral features and entropy at fast temporal scale (samples from the 2 sessions were merged). Suggested analysis for relative values (differences between sessions) was inconclusive due to small number of available subjects with NREM during week 2. LINE 761-764

6. ‘Percentile thresholding’. 95% percentile?

Reply: Yes, that's right, we specified. LINE 158

7. ‘individual amount’. I’d recommend ‘inter-individual differences.

Reply: We thank for this comment. We corrected as suggested. LINE 226

8. ‘paired-samples’. Specific that those are ‘paired-samples t-tests’.

Reply: We specified as suggested ‘Using paired-samples Wilcoxon test’. LINE 288

9. “Note that only NREM shows differences in the PSD spectra between age groups and the developing 9–14 Hz peak exclusively observed at week-5 of age.” Differences in the alpha peak are hard to see and are more obvious at occipital channels (Figure S5). A log-log scaling in addition with error bars may aid visualization.

Reply: We thank for this suggestion. We modified both Fig4 and S5_Fig by adding log-log scaling.

10. “reveled” vs. ‘revealed’

Reply: We corrected as suggested. LINE 329

11. “We found an increase in oscillatory activity from week-2 to week-5”. I would recommend to change this to spectral power, as this does not appear to be a narrowband, i.e. oscillatory, effect. Also, the addition of error bars to Figure 4 would be beneficial.

Reply: We corrected as suggested; LINE 343. Also we provided standard error of the mean to Figure 4.

12. Starting the discussion with “Please note that” appears unnecessarily defensive and should be dropped.

Reply: We thank the reviewer for this comment, we corrected as suggested. LINE 417

13. “Given the rapid cap placement and movement of babies at this early age we would refrain from over-interpreting these results.” This is a puzzling statement to me, at least in its current phrasing. Are the authors concerned that cap positioning strongly varied across time points and babies? Then all of the other results in the manuscript would be equally questionable as the inferences assume that the selected channels are in comparable locations.

Reply: We agree that the wording was confusing. There was no variation in cap positioning. For the reasons mentioned above, we decided not to include previous version of Figure S9 in the already lengthy supplementary materials.

Attachment

Submitted filename: Response_to_Reviewers.docx

Decision Letter 2

Daniele Marinazzo

16 Oct 2019

On the development of sleep states in the first weeks of life

PONE-D-19-18265R2

Dear Dr. Schabus,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Daniele Marinazzo

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Daniele Marinazzo

21 Oct 2019

PONE-D-19-18265R2

On the development of sleep states in the first weeks of life

Dear Dr. Schabus:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Daniele Marinazzo

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Influence of the feature extraction procedure on the classification performance.

    Multiscale permutation entropy as compared to PSD boosts discrimination of sleep stages. MSPE improves classification especially of WAKE, also REM and slightly NREM class (note the diagonals of the lower panels).

    (TIF)

    S2 Fig. Cross-validation schemes.

    Splitting into training (in blue) and testing (in green) sets was performed within sessions (upper row) or across sessions (lower row). Note, that half of the subjects are used to train and half to test (two-fold cross validation), with both sessions of a single subject (week 2 and 5) always being in separate sets.

    (TIF)

    S3 Fig. Confusion matrices summarizing the classification results for week-2 and week-5 (MSPE-based classifier).

    Note the off-diagonals showing limited proportion of NREM falsely classified as WAKE (on average 3%) and similarly WAKE falsely classified as NREM (on average 8%).

    (TIF)

    S4 Fig. Comparison of MSPE at fast time scale between sleep stages and the two recording sessions (week-2 vs week-5 data)—Central and occipital channels.

    Note that overall entropy at the fast temporal scale is lower over frontal (see Fig 3, main text) as compared to both central and occipital channels (left panels).

    (TIF)

    S5 Fig. Average log-log-scale PSD spectra for the individual sleep stages for central and occipital electrodes.

    The shaded area shows statistical difference between week-2 and week-5. Note that similarly to frontal channels (main text), there is a clear difference in PSD also for central channels during NREM.

    (TIF)

    S6 Fig. Comparison of multi-scale permutation entropy values across different scales between sleep stages and sessions.

    Points represent the averages and error bars show the 95% bootstrap confidence intervals. Note that results for scale = 1 and scale = 5 (X-axis) correspond to the results presented in the main text. Maximal entropy in NREM and WAKE (also REM) is attained at different temporal scale (X-axis). Also, similarly to the main text there is no statistical difference in WAKE between sessions (week-2 vs week-5) across all temporal scale.

    (TIF)

    S7 Fig. Exemplary single-subject classification (week-5).

    Note the limited proportion of NREM epochs being falsely classified as WAKE and similarly WAKE classified as NREM.

    (TIF)

    S8 Fig. Channel importance extracted from a trained random forest classifier and a decision boundaries of a trained random forest classifier.

    Horizontal EOGs, frontal channels as well ECG contribute most to the sleep classification (upper panel). For visualization purposes multidimensional scaling was used to reduce dimension of the MSPE data to two (lower panel, X and Y axis). Points represent epochs (N = 100 for each class), colors (red, green and blue) represent true class labels and shading (pink, light blue and light grey) shows the decision boundary. Note that in week-5 (right panel) there is more apparent overlap between the true class labels (points) and the predictions (shading) as compared to week-2, which agrees with higher classification accuracies for week-5.

    (TIF)

    S9 Fig. Correlations between spectral features and entropy at fast temporal scale (both sessions were merged).

    Spectral features correspond to the average power values within three frequency ranges (rows). Solid red line indicates significant results. Note negative correlation between entropy and delta-theta band power during NREM.

    (TIF)

    Attachment

    Submitted filename: Response_to_Reviewers.docx

    Attachment

    Submitted filename: Response_to_Reviewers.docx

    Data Availability Statement

    Data are available at a public repository: https://figshare.com/articles/On_the_development_of_sleep_states_in_the_first_weeks_of_life_-_REVISED/9767567.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES