Summary
Study of emerging sleep–wake patterns in neonates is important for promptly identifying and treating abnormal sleep behaviours to ensure healthy infant development and neurobehavioral outcomes. Current methods to assess sleep are costly, labour intensive, and particularly difficult to implement in fragile, hospitalised infants requiring intensive medical care. The aim of the present study was to assess the validity of actigraphy as a tool for detecting sleep in preterm infants, using polysomnography (PSG) as the “gold standard”. A total of 10 neonates (mean [SD] 35.8 [1.2] weeks post-menstrual age; five female) hospitalised since birth for prematurity each participated in one 8–10 hr session during which PSG and actigraphy were recorded simultaneously. Inter-feed minute-by-minute PSG Sleep–Wake scores were compared to concurrent actigraph epochs categorised as either “Sleep” or “Wake” using three separate movement-per-minute thresholds (≤20, ≤40, ≤80). Tool validity was assessed using five metrics. A key finding was that for each of the movement thresholds there was high agreement rate, sensitivity, and predictive value of sleep (85.2%–97.2%), whereas specificity and predictive value of wake remained low (12%–46%). Receiver operating characteristic curve analysis also revealed low discriminatory power of actigraphy for estimating sleep (area under the curve = 0.636; Youden’s Index J = 0.2173). Lack of sufficient minutes of autonomous wake periods among infants was identified as a key limitation in actigraphy. Findings from the present study suggest actigraphy cannot be validated for Sleep/Wake discrimination in preterm infants and that proper validation requires sufficient data from periods of both Sleep and Wake.
Keywords: actigraphy, newborns, polysomnography, preterm infant, receiver operating characteristic, sleep-wake patterns
1 |. INTRODUCTION
Sleep is critical for the healthy development of newborns (Holditch-Davis et al., 2004; Mirmiran et al., 2003). It plays a key role in the physical maturation of the central nervous system (CNS), including brain development and synaptogenesis, and significantly affects neurodevelopment and long-term behavioural and cognitive outcomes (Gertner et al., 2002; Park, 2020; Tham et al., 2017). Typically, full-term newborns sleep upwards of 70% of the day (Anders et al., 1985; Mirmiran et al., 2003), whereas premature infants sleep ~90% of the day (Ardura et al., 1995). The amount of sleep and patterns of sleep–wake states, which reflect CNS maturation, impacts infant development and neurobehavioural outcomes (Borghese et al., 1995; Gertner et al., 2002; Graven, 2006; Mirmiran et al., 2003). Study of emerging sleep patterns in premature infants is important in order to promptly identify abnormal sleep behaviours and treat them as early as possible (Anders et al., 1985).
Sleep disturbances among hospitalised preterm infants are often exacerbated by the environment of the neonatal intensive care unit (NICU), including round-the-clock medical interventions, frequent nursing cares, loud noises from bedside medical-device alarms, and bright light needed to implement medical cares, all of which fragment sleep and disrupt development of integrated and coordinated sleep patterns (Barbeau & Weiss, 2017; Levy et al., 2017; Mirmiran et al., 2003). While steps are taken to mitigate environmental factors that disrupt infant sleep and development in the NICU, it remains unclear how best to monitor and measure sleep in vulnerable, fragile premature infants (Ednick et al., 2009). Current methods used to delineate sleep states such as polysomnography (PSG) and behavioural observation are costly, labour intensive, and require complex setups, skilled technologists to conduct the studies, and trained scorers to code the records (Gertner et al., 2002; Sadeh et al., 1995). Such techniques are especially difficult to implement in tiny infants, particularly those who are sick or irritable and receiving round-the-clock care in the NICU. A more suitable tool for sleep characterisation in premature infants that does not rely on sleep-study personnel at the bedside, is minimally invasive, and is capable of uninterrupted recording over several days or weeks without imposing on bedside cares could prove extremely useful in sleep monitoring and timely identification of disrupted sleep states among preterm infants.
Actigraphy, which employs objective, small-scale movement detection algorithms to assess activity levels, has been used to delineate Sleep and Wake states in adults, children, and infants (Cole et al., 1992; Sadeh et al., 1995; So et al., 2005). The actigraph is a lightweight, wearable sensor with a built-in accelerometer that can record frequency of movement (typically in contiguous 1-min intervals) over several weeks using solid-state drive memory that allows for fast, high-volume data storage, making it ideal for long-term study without observer-based analysis. The actigraph is typically placed on the subject’s ankle or wrist and measures acceleration along multiple axes, converting the magnitude and frequency of the acceleration into a time-stamped gross movement count (John & Freedson, 2012). A set movement rate threshold is then used to differentiate between Sleep and Wake (So et al., 2005; Sung et al., 2009; Yang et al., 2014).
Several studies have been conducted to validate actigraphy as a method for measuring sleep, but studies of actigraphy in premature infants are lacking and agreement on tool validity for use in neonates is mixed. This is due in part to different sleep assessment methods used to compare actigraphy, various thresholds of movement activity to categorise Sleep and Wake, and metrics employed to validate actigraphy and interpretation of these metrics (Ancoli-Israel et al., 2003; Rioualen et al., 2015; Sadeh et al., 1995; So et al., 2005; Sung et al., 2009). The present study tested actigraphy using PSG determination of sleep state (Anders et al., 1971) in a cohort of premature infants. The two primary aims were to: (1) Assess actigraphy at three movement activity thresholds for defining Sleep (≤20, ≤40, and ≤80 movements/min [mpm]) using five metrics for validation: agreement rate (AR), sensitivity, specificity, predictive value of sleep (PVS), and predictive value of wake (PVW); and (2) Explore methods for optimising actigraphy analysis for assessing sleep in neonates.
2 |. METHODS
2.1 |. Subjects and setting
This study used data from 10 premature infants who were part of larger, prospective clinical trial (NCT03881553) in which PSG and actigraphy were utilised to evaluate the effectiveness of an interventional device for reducing noise levels in hospitalised infants; the present study did not assess the effect of the intervention. Infants participated in a single-session study in the NICU at UMass Memorial HealthCare. The trial was approved by University of Massachusetts Medical School Institutional Review Board. Written informed consent was obtained from the biological mother of each infant.
Data used were from a cohort of 10 infants (five female) delivered between 24 and 37-week gestational age (mean [SD] 34.1 [1.34] weeks) and treated in the NICU for prematurity. Participation required that infants be ≥30 weeks post-menstrual age at the time of study (mean [SD] 35.83 [1.21] weeks), on gavage (seven infants) or oral (three infants)
2.2 |. Procedures and measurements
Infants were studied between approximately 6:00 a.m. and 6:00 p.m. The session was separated into three ‘inter-feed intervals’ constituting the time between feeding periods and nursing cares (~2.5–3.5 hr intervals) during which the infant was studied with or without the study-intervention device. All infants received the same interventional device pattern across the three inter-feed intervals (respectively, intervention OFF-ON-OFF). The device did not inhibit recording of PSG or actigraphy.
Infants wore an actigraph sensor (Actiwatch II, Philips Respironics, Bend, OR, USA) around their lower leg, between their ankle and knee, using a foam bracelet (Posey Co, Arcadia, CA, USA) throughout the study session. The sensor employed a solid state piezo-electric accelerometer with a sample rate of 32 Hz and 2 Mbits non-volatile memory. Movement activity was recorded in 1-min epochs (Actiware Software, Philips Respironics) and stored for offline analysis. The actigraph time, set prior to placement on the infant, was synchronised with the PSG recording device using a common mobile phone (cellphone).
The PSG data for sleep staging were collected with disposable surface sensors using standard recording placements for neonates (Anders et al., 1971). Electroencephalography (EEG) measured brain activity at the scalp (FZ, CZ, PZ, C3, C4, P3, P4, O1, O2). Eye movements (electrooculographic [EOG]) were measured from the upper and lower canthus of opposite eyes. Muscle tone was measured from two sensors placed under the chin (electromyographic [EMG]). A three-lead configuration over the chest was used to measure electrocardiographic (ECG) data. Respiratory inductance plethysmography (RIP) was used to measure abdominal respiratory movements. A pulse oximeter (Masimo, Irvine, CA, USA) attached to the infant’s foot recorded transcutaneous arterial blood oxygen levels; the quality of the plethysmographic signal was used to identify periods of movement (Zuzarte et al., 2017). The digitised EEG, EMG, and EOG were sampled at 200 Hz, ECG at 2,000 Hz, RIP at 50 Hz, and pulse-oximeter at 10 Hz. Signals were recorded continuously throughout the study session along with synchronised time-stamped comments regarding feeds, nursing cares, and interventions, which were all stored on hard disk for offline analysis (Embla N7000; Embla Systems, Inc., Broomfield, CO, USA).
2.3 |. Analysis
Data were combined across the three inter-feed intervals, regardless of intervention. Periods of feeding and nursing care were excluded from analysis as this external manipulation of the infant is known to artificially increase the actigraph movement counts (Tsai & Thomas, 2010) and create artefact in the physiological signals (Bloch-Salisbury et al., 2009). Additional nursing and technical interventions during the inter-feed periods noted during the study session were also removed from the analyses.
For each infant, the time stamped minute-by-minute movement activity data was exported for the entire study session using proprietary software (Actigraph Actiware 6, Philips Respironics). Epochs of actigraph data with movement activity ≥400 mpm were considered artefactual and excluded from analyses. Sleep state was then classified by movement rate within each 1-min actigraphy epoch. Epochs with movement activity lower than or equal to a given threshold were defined as “Sleep” (i.e. Thresh20: ≤20 mpm; Thresh40: ≤40 mpm; Thresh80: ≤80 mpm) and movement activity higher than the threshold were defined as “Wake” (i.e., respectively, Thresh20: >20≤400 mpm; Thresh40: >40≤400 mpm; Thresh80: >80≤400 mpm).
Sleep staging based on scoring rules for neonates (Anders et al., 1971) was performed by a trained sleep and EEG technologist (DC) from the multi-channel PSG activity, technical notes depicting eyes open or closed, and offline automated analyses of the plethysmographic signal that identified movement periods (Zuzarte et al., 2017); the PSG scorer did not use video in order to be masked to the intervention device for the comprehensive clinical trial. Contiguous 30-s epochs were marked as Quiet Sleep, Active Sleep, Indeterminate Sleep, and Wake for the entire study session (Anders et al., 1971; Curzi-Dascalova & Mirmiran, 1996). The Quiet Sleep, Active Sleep, and Indeterminate Sleep scores were combined into a single Sleep category so that analysis could be conducted on a binary Sleep/Wake dataset consistent with actigraph threshold scoring. 1-min epochs were created by pairing 30-s epochs. The 30-s epochs were then converted into 1-min epochs, where the first 30-s epoch determined the sleep state for each minute. For minutes in which the two 30-s epochs had different sleep scores, the sleep state of the first epoch was used to represent the minute.
In the primary analysis, three levels of actigraphy (Thresh20, Thresh40, and Thresh80) were used to compare the minute-by-minute movement activity level and respective PSG Sleep and Wake scores. These thresholds are commonly used in commercially available actigraph software for automatic sleep staging (e.g. Actigraph Actiware 6, Philips Respironics) and reported in actigraphy validation studies (Ancoli-Israel et al., 2003; Hunt et al., 2008; Rioualen et al., 2015; Sadeh, 2011).
Table 1 provides the formulas used to calculate five metrics: AR, sensitivity, specificity, PVS, and PVW. State categorisation (Sleep or Wake) for each study minute was compared between the PSG (True) and actigraphy (Test) score: (a) Sleep PSG correctly identified as Sleep by actigraph; (b) Wake PSG incorrectly identified as Sleep by actigraph; (c) Sleep PSG was incorrectly identified as Wake by actigraph; or (d) Wake PSG was correctly identified as Wake by actigraph (see Table 1).
TABLE 1.
Metrics to compare polysomnography and actigraphy Sleep–Wake statess
| Metric | Formula × 100 | Description |
|---|---|---|
| Agreement rate (AR) | (a+d)/(a+b+c+d) | Overall accuracy of actigraph Sleep/Wake classification |
| Sensitivity | a/(a+c) | Proportion of polysomnography (PSG) True Sleep minutes correctly classified as Sleep by actigraphy |
| Specificity | d/(b+d) | Proportion of PSG True Wake minutes correctly classified as Wake by actigraphy |
| Predictive value of sleep (PVS) | a/(a+d) | Proportion of minutes classified as Sleep by actigraphy that were confirmed by PSG |
| Predictive value of wake (PVW) | d/(c+d) | Proportion of minutes classified as Wake by actigraphy that were confirmed by PSG |
| Formula definitions | ||
| True (gold standard) sleep–wake state | ||
| Test sleep–wake state | PSG (True) sleep (+) | PSG (True) Wake (−) |
| Actigraph (Test) Sleep (+) | a (True positive; TP) | b (False positive; FP) |
| Actigraph (Test) Wake (−) | c (False negative; FN) | d (True negative; TN) |
In the secondary analysis, AR, sensitivity, specificity, PVS, and PVW were calculated for all possible thresholds between 0 and 400 mpm. A receiver operating characteristic (ROC) curve was constructed (Carter et al., 2016). The area under the ROC curve (AUC) was calculated to assess the strength of the ROC curve, and Youden’s index (J), calculated as J = sensitivity + (specificity − 1), was used to determine the optimal Sleep threshold from the ROC (Carter et al., 2016). Calculations and ROC construction were performed using commercially available software (SPSS version 26, IBM Corp., Armonk, NY, USA; Matlab Mathworks, Natick, MA, USA).
3 |. RESULTS
3.1 |. General
Two infants were studied in their bedside isolette and eight infants were studied in their bedside open-air crib; all infants were swaddled and laying supine during all study periods. The total study time collapsed across the three inter-feed periods among the 10 premature infants was 3,750 min; 192 min (5.1%) were excluded from analysis due to nursing and technical interventions or artefact. Accordingly, 3,558 min of data were used in the analyses. The range of individual study lengths was 243–459 min with a mean (SD) study time of 356 (73.3) min. This wide range of study-session duration was due to some infants having short inter-feed intervals of sleep (minimum 77 min) and some having longer intervals (maximum 190 min); notably, all infants participated in all three inter-feed periods.
There were 10,046 30-s epochs of scored PSG data: 3,362 epochs were scored as Quiet Sleep, 3,812 epochs were scored as Active Sleep, 326 epochs were scored as Indeterminate Sleep, and 2,546 epochs were scored as Wake. After converting all sleep states to a single Sleep score and adjusting to 1-min epochs; of the 3,558 total analysed minutes (i.e. excluding intervention and feed periods) there were 2,986 min (84%) identified as Sleep and 572 min (16%) identified as Wake by PSG. Of these, 67 of the 1-m in epochs (1.8%) did not have the same score in the first and second 30-s interval; these were equally represented among the 1-min epochs, 34 Sleep and 33 Wake.
3.2 |. Literature-based (proprietary) movement thresholds
Figure 1 Panel a illustrates the minute-by-minute movement activity for one infant for three inter-feed study intervals (approximately 7:00–8:30 a.m., 9:45–11:30 a.m., and 12:30–2:30 p.m.). Figure 1 Panel b displays only the minutes that were scored as Sleep by PSG (True Sleep) and Panel c displays only the minutes that were scored as Wake by PSG (True Wake). The selected subject was chosen because this infant had the greatest percentage of True Wake data (32.75%). Note that both low and high levels of actigraph movement activity are observed in both panels showing PSG Sleep and PSG Wake.
FIGURE 1.
Example of minute-by-minute movement activity (actigraphy) for one infant and the associated polysomnography (PSG) score. Panel a, all epochs of actigraphy minutes recorded as Sleep and Wake by PSG; Panel b, actigraphy epochs scored as Sleep by PSG; Panel c, actigraphy epochs scored as Wake by PSG
Table 2 shows the number of epochs that were scored Sleep and Wake using PSG compared to the number of epochs that were categorised as Sleep and Wake using actigraphy for each of the three movement thresholds among all subjects.
TABLE 2.
Polysomnography (PSG) scored Sleep/Wake compared to actigraph categorised Sleep/Wake for three thresholds of movement activity
| PSG sleep (True +) |
PSG wake (True −) |
Total |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| Thresh20 | Thresh40 | Thresh80 | Thresh20 | Thresh40 | Thresh80 | Thresh20 | Thresh40 | Thrsh80 | |
| Actigraph Sleep (Test +) | 2,556a | 2,752a | 2,903a | 399b | 454b | 503b | 2,955 | 3,206 | 3,406 |
| Actigraph Wake (Test −) | 173c | 118c | 69c | 430d | 234d | 83d | 603 | 352 | 152 |
| Total | 2,729 | 2,870 | 2,972 | 829 | 688 | 586 | 3,558 | 3,558 | 3,558 |
Data are number of 1-min epochs. PSG Sleep (True +), gold standard polysomnography scored Sleep; PSG Wake (True −), gold standard polysomnography scored Wake; Actigraph Sleep (Test +), actigraphy categorised Sleep; Actigraph Wake (Test −), actigraphy categorised Wake; Thresh20, actigraphy ≤20 mpm; Thresh40, actigraphy ≤40 mpm; Thresh 80, actigraphy ≤80 mpm.
True positive: PSG scored Sleep and actigraph categorised Sleep.
False positive: PSG scored Wake and actigraph categorised Sleep.
False negative: PSG scored Sleep and actigraph categorised Wake.
True negative: PSG scored Wake and actigraph categorised Wake.
3.3 |. Metrics and optimal thresholds
The calculated value for each metric is displayed in Table 3 for each of the movement thresholds. Overall, sensitivity and PVS were high for all three thresholds, ranging between 85.2% and 97.2%. In contrast, specificity and PVW were low across all three thresholds (ranging between 12% and 46%). AR ranged between 77% and 84% among the three thresholds. The highest AR and highest sensitivity occurred at the highest threshold (Thresh80) wherein the lowest specificity and PVS were also observed.
TABLE 3.
Five metrics assessing actigraphy for categorising Sleep/Wake at three movement thresholds
| Movement threshold level, % |
|||
|---|---|---|---|
| Metric | Thresh20 | Thresh40 | Thresh80 |
| Agreement rate | 76.70 | 80.66 | 83.53 |
| Sensitivity | 85.60 | 92.16 | 97.22 |
| Specificity | 30.24 | 20.63 | 12.06 |
| Predictive value of Sleep | 86.50 | 85.84 | 85.23 |
| Predictive value of Wake | 28.69 | 33.52 | 45.39 |
Thresh20, actigraphy ≤20 mpm; Thresh40, actigraphy ≤40 mpm; Thresh 80, actigraphy ≤80 mpm.
Figure 2 Panel a illustrates the five metrics (AR, sensitivity, specificity, PVS and PVW) at each movement activity threshold between 0 and 400 mpm among all infants. The highest AR achieved was 84.12%, which occurred at three movement ranges: 102–104, 108–110, and 179–184 mpm (each of these thresholds produced the exact same AR). At the 102 mpm threshold, the lowest movement activity threshold for which the maximum AR was achieved, sensitivity was 98.77%, specificity was only 8.86%, PVS was 79.93%, and PVW was 53.40%. Figure 2 Panel b zooms in on the lowest spectrum of actigraphy (0–5 mpm), where specificity and sensitivity diverge. Note the rapid increase in sensitivity and rapid decrease in specificity as the threshold for distinguishing Sleep via actigraphy increases from 0 to 5 mpm. Within this window of low movement activity thresholds, PVS remained >85% whereas PVW fell to <25%. These metrics illustrate the relative performance rate of actigraphy for determining Sleep and Wake States.
FIGURE 2.
Characterisation of actigraphy to accurately assess sleep. Panel a, all movement activity; Panel b, magnification of the lower spectrum of movement activity. Blue line, sensitivity; red line, specificity; yellow line, agreement rate; purple line, predictive value of sleep; green line, predictive value of wake. Note the high sensitivity and low specificity across nearly all movement periods
Figure 3 shows the ROC curve illustrating the low effectiveness of actigraphy for determining Sleep. The AUC was 0.636 and Youden’s Index calculated for sleep peaked at 0 and 1 mpm (J = 0.217), further suggesting low discriminatory power of actigraphy for estimating sleep.
FIGURE 3.
Receiver operator characteristic (ROC) curve for estimating sleep. AUC, area under the curve; J, Youden’s index
4 |. DISCUSSION
The present study compared minute-by-minute “gold-standard” PSG scores and actigraphy, and applied five metrics (AR, sensitivity, specificity, PVS, and PVW) to assess effectiveness of actigraphy at three commonly used threshold levels of movement (Thresh20, Thresh40, Thresh80; (Ancoli-Israel et al., 2003; Hunt et al., 2008; Sadeh, 2011)) for depicting sleep in premature infants. A key finding was that high AR, high sensitivity, and high PVS were achieved at each of the thresholds, but specificity and PVW remained low regardless of the set threshold, suggesting that actigraphy is not a reliable tool for ascertaining sleep in neonates.
The ROC curve in Figure 3 illustrates the poor effectiveness of actigraphy, depicted by the low specificity and low PVW across the spectrum of movement activity. Notably, the AUC was <0.7, which is considered poor, while Youden’s Index was close to 0, which means that the test is not statistically better at distinguishing the two states than a coin flip (Carter et al., 2016). These values suggest that actigraphy-based movement is not a good predictor of Sleep or Wake, at least in our small cohort of premature infants.
Findings from our present study corroborate other validity studies in which diagnostic metrics were used to assess actigraphy (1997, 2003; Gnidovec et al., 2002; Rioualen et al., 2015; Sadeh, 2011; Sadeh et al., 1995; Sung et al., 2009). For example, Sung et al. (2009) evaluated actigraphy in preterm newborns based on behavioural sleep observations and found AR of 61.9%–89.1%, sensitivity of 66.0%–96.8%, specificity of 31.5%–61.0%, PVS of 91.3%–96.5%, and PVW of 31.1%–53.7%, which is consistent with our present findings (Table 3). In their study, Sung et al. (2009) noted that actigraphy was reliable at predicting Sleep but not reliable at predicting Wake, ultimately concluding that actigraphy was a valid tool for sleep assessment. However, their conclusion was based largely on the AR, sensitivity, and PVS, while little consideration was given to the shared role of low specificity and PVW in validating actigraphy as a sleep-assessment tool. As pointed out by Sadeh (2011), many actigraphy studies have similarly concluded validity despite demonstrating low specificity, a parameter that he posited as necessary to consider in validation. In line with this reasoning, the results of our present study support that actigraphy should not be considered a valid tool for depicting sleep in studies when specificity is low. Understanding this claim requires an understanding of the clinical relevance of these metrics, particularly sensitivity and specificity, and the limitations of using them to describe a non-diagnostic tool such as actigraphy.
4.1 |. Understanding sensitivity and specificity contextually
In diagnostic tests such as disease surveillance, there is a positive and a negative condition –the presence or absence of a disease (Hazra & Gogtay, 2017). In this model, it is generally understood that sensitivity represents the ability of the diagnostic test to correctly identify positive cases within a population, while specificity represents the ability to correctly identify negative cases. When this model is applied to actigraphy, the “population” is not a set of individuals, but the set of 1-min epochs during which the subject is either asleep or awake compared to a gold-standard scoring tool (e.g. PSG). Following this diagnostic model, Sleep is conventionally designated the “positive” condition and Wake the “negative” condition, so that each minute can be “diagnosed” with one of these two conditions. This is why several validation studies describe sensitivity as ‘the ability to predict sleep’ and specificity as ‘the ability to predict wake’ (Hyde et al., 2007). While this is partially true, this description only considers half of the equation for each metric. Sensitivity is not just the number of true positives but is the number of true positives (actigraph correctly scored Sleep) out of all positive (total PSG Sleep) cases. Likewise, specificity is not just the number of true negatives, but is the number of true negatives (actigraph correctly scored Wake) out of all negative (total PSG Wake) cases (Table 1). Otherwise put, high sensitivity reflects not only a high true positive count, but also a low false negative count, and high specificity reflects not only a high true negative count, but also a low false positive count. This means that sensitivity and specificity are not only related to Sleep and Wake respectively but are each related to both the positive and the negative condition –sensitivity provides information about both Sleep and Wake identification, and so does specificity. Thus, actigraph cannot truly be a strong indicator of sleep state without achieving both high sensitivity and high specificity.
4.2 |. Why specificity matters clinically
While Sleep and Wake are translated into “positive” and “negative” to simplify the analysis, actigraphy cannot be evaluated with the same allowances. The goal of sleep assessment using actigraphy is not to diagnose each individual minute as “Sleep” or “Wake”, but to assess and diagnose the individual based on the entire dataset. This means that Sleep and Wake must both be identified accurately in order to correctly “diagnose” the individual. Consider a newborn who is expected to sleep 70%–90% of their day. Actigraphy-based sleep scoring that is highly sensitive but non-specific for Sleep will result in overestimating the amount of sleep the infant is getting, which could mask harmful sleep disturbances or disorders and result in misguided diagnosis and treatment. In fact, the consequences of low specificity are reversed from the disease surveillance model –here it is low specificity rather than low sensitivity that leads to under diagnosis. This means that using actigraphy for sleep characterisation despite low specificity could lead to harmful clinical outcomes that should not be ignored.
As shown in our present study, although high sensitivity was achieved for each movement threshold, even at the lowest movement levels for defining Sleep (Thresh20) specificity never exceeded 30%; i.e. actigraphy erroneously categorised Wake as Sleep 70% of the time. Specificity was even worse at the higher sleep-threshold settings. This suggests that while actigraph appears to be a good indicator of Sleep, its low specificity will almost certainly result in an overestimation of infant sleep.
4.3 |. The relationship between sensitivity, specificity, and agreement rate
It is worth noting that for many actigraph validation studies (including the present one), AR remained high even though specificity was low (Hyde et al., 2007; Rioualen et al., 2015; So et al., 2005; Sung et al., 2009). Given AR represents the successful classification of both Sleep and Wake, one would expect that low specificity, or poor classification of Wake, would be reflected in a lower AR. For example, at Thresh20 in our present study, AR was 76% even though specificity was only 30%. In this case, AR better reflected the sensitivity rate (85%). However, this relatively high AR is not a reflection that actigraphy is a good indicator of Sleep. Rather, it was in large part due to a paucity of Wake data in the current data set. In our present study, the infants collectively were asleep for 84% of the total study time and awake for only 16% (as per gold-standard PSG). Because AR represents an unweighted average of both sensitivity and specificity, this uneven distribution caused AR to closely resemble the sensitivity, rather than serving as an accurate reflection of overall classification strength (Alberg et al., 2004). Had our present study produced the same sensitivity (85%) and specificity (30%) with equal weight given to Sleep and Wake, the AR would have been only ~57% at the Thresh20 movement level. This was demonstrated by Paquet et al. (2007), who found that AR for actigraphy was lower in subjects with more Wake data. In his review, Sadeh (2011) noted that low specificity can be due to the inability of an actigraph device to detect movement activity, the algorithm or thresholds used to define sleep states, or to the relative lack of Wake periods observed in a study population. In the present study, we demonstrated this latter phenomenon and the impact that lack of Wake periods has on the strength of assessment metrics, undermining tool validity.
4.4 |. Limitations
The present study demonstrates why specificity is critical in assessing the validity of actigraphy. Robust validation cannot be achieved without adequate representation of Sleep and Wake data. It is important to recognise, however, that achieving this may not be feasible in many populations, particularly neonates. Newborns spend up to 90% of their time sleeping (Ardura et al., 1995), with little Wake time. Moreover, during those periods when newborns are awake, they are typically subject to outside interventions from caregivers and parents (i.e. feeds, nursing care, medical attention). As demonstrated by Tsai and Thomas (2010), external disturbance of the infant artificially inflates actigraphy values and thus would impede any movement assessment and confound related metrics. Actigraphy can be validated in older infants, children, and adults, as subjects in these populations can be studied during long periods of autonomous wakefulness, but premature and newborn infants lack prolonged periods of independent wakefulness that enable accurate measurement of actigraph validity (Ancoli-Israel et al., 1997; Sadeh, 2011; Usui et al., 1999).
Another key limitation of actigraphy use in neonates is that it cannot reliably distinguish between Active Sleep and Wake (Rioualen et al., 2015; Sung et al., 2009). This is notable because during Active Sleep, infants engage in a considerable number of body movements, which could falsely be interpreted as wakefulness by actigraphy (Anders et al., 1985; Barbeau & Weiss, 2017; Rioualen et al., 2015; Sung et al., 2009). PSG Active Sleep incorrectly interpreted as Wake by actigraphy would reduce sensitivity, PVW and AR, but would not impact specificity.
This was a limited single-session study in a small, homogenous cohort of premature infants all of whom received an interventional device within the study session. Only 10 infants were studied, and aside from differences in gestational age at birth, all infants were studied at similar post-natal age, free of any significant medical concerns with the primary unresolved issues being feeding and weight gain. It is unlikely that a larger or less homogenous population of premature infants would have yielded different results given that any population of premature infants is likely to have the same underlying lack of available Wake data. Moreover, as shown by Paquet et al. (2007), having more Wake data does not necessarily increase specificity, but may contribute to lower AR.
While metrics applied in the present study show that actigraphy does not accurately distinguish Sleep/Wake states in premature infants, findings do support the unique utility of PSG and actigraphy for providing independent assessments in neonatal research. For example, PSG serves as a gold-standard for depicting Sleep/Wake state, whereas actigraphy provides a measure of movement frequency that may serve as an objective index of infant “irritability” (or “comfort”). Such analyses, which were outside the scope of the present paper, are currently underway to assess the effect of the interventional device on PSG and actigraphy measures.
Actigraphy may also be a useful tool to help quantify irritability in newborns withdrawing from opioids or other substances. Current common identification of withdrawal severity relies largely on caregiver observations of several symptoms including subjective estimates of wake and cry behaviours (e.g., Finnegan et al., 1975; Timpson et al., 2018). Nearly two decades ago, O’Brien et al. (2004) demonstrated that movement activity obtained by actigraphy provided an objective measure of withdrawal severity among a cohort of newborns with and without opioid exposure. They found that movement activity was significantly higher for opioid-exposed newborns than healthy controls and varied as a function of withdrawal severity and treatment. Preliminary data from our group also support the use of actigraphy as an index of withdrawal severity in opioid-exposed newborns (Bruch et al., 2020; NCT02868844). Currently underway is a larger dual-site randomised clinical trial that employs actigraphy throughout hospitalisation among a cohort of >200 opioid-exposed newborns (Bloch-Salisbury et al., 2021; NCT02801331), which will help validate this use of actigraphy. Although further study is needed, it may prove more useful to apply actigraphy as an objective measure of infant irritability (or comfort), rather than as a tool for assessing neonatal sleep.
5 |. CONCLUSION
The present study assessed the validity of actigraphy for determining sleep state in premature neonates among five metrics. PSG served as the gold standard for identifying Sleep and Wake. Although actigraphy was a good indicator of Sleep across a wide range of movement activity levels, specificity (i.e. the ability of actigraphy to accurately assess Wake) remained low. Lack of sufficient minutes of autonomous Wake periods among infants was identified as a key limitation. Findings from the present study suggest actigraphy cannot be validated for Sleep discrimination in preterm infants due to lack of Wake periods, and that proper validation requires sufficient data from periods of both Sleep and Wake.
ACKNOWLEDGMENTS
We thank Mr Nicolas Rodriguez for assisting on data collection. We thank Ms Tory Bruch and Mrs Barbara Glidden for thoughtful discussions regarding the use of actigraphy in neonates. We thank the UMass Memorial Healthcare NICU nurses assigned to infant care during the study sessions for their assistance with implementation of the study protocol into routine care of the infants. We gratefully acknowledge the infants and their families for participating in this study.
Funding information
This work was supported by the National Institutes of Health (NIH), National Heart, Lung, and Blood Institute (NHLBI) and the National Center for Complementary and Integrative Health (NCCIH) Grant 1U54HL143541 and in part by the National Institute on Drug Abuse (NIDA) Grant R01DA042074 (EBS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Footnotes
CONFLICT OF INTEREST
No authors have any conflicts of interest to report.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- Alberg AJ. .,Park JW, Hager BW, Brock MV, & Diener-West M. (2004). The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests. Journal of General Internal Medicine, 19(5), 460–465. 10.1111/j.1525-1497.2004.30091.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ancoli-Israel S, Clopton P, Klauber MR, Fell R, & Mason W. (1997). Use of wrist activity for monitoring sleep/wake in demented nursing-home patients. Sleep, 20(1), 24–27. 10.1093/sleep/20.1.24 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ancoli-Israel S, Cole R, Alessi C, Chambers M, Moorcroft W, & Pollak CP (2003). The role of actigraphy in the study of sleep and circadian rhythms. American Academy of Sleep Medicine Review Paper. Sleep, 26(3), 342–392. 10.1093/sleep/26.3.342 [DOI] [PubMed] [Google Scholar]
- Anders T, Emde R, & Parmelee A. (1971). A Manual of Standardized Terminology, Techniques and Criteria for Scoring of States of Sleep and Wakefulness in Newborn Infants. UCLA Brain Information Service/ BRI Publications Office, NINDS Neurological Information Network. [Google Scholar]
- Anders TF, Keener MA, & Kraemer H. (1985). Sleep-wake state organization, neonatal assessment and development in premature infants during the first year of life. II. Sleep, 8(3), 193–206. 10.1093/sleep/8.3.193 [DOI] [PubMed] [Google Scholar]
- Ardura J, Andrés J, Aldana J, & Revilla M. (1995). Development of sleep–wakefulness rhythm in premature babies. Acta Pædiatrica, 84(5), 484–489. 10.1111/j.1651-2227.1995.tb13679.x [DOI] [PubMed] [Google Scholar]
- Barbeau DY, & Weiss MD (2017). Sleep disturbances in newborns. Children, 4(10), 90. 10.3390/children4100090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloch-Salisbury E, Bogen D, Vining M, Netherton D, Rodriguez N, Bruch T, Burns C, Erceg E, Glidden B, Ayturk D, Aurora S, Yanowitz T, Barton B, & Beers S. (2021). Study design and rationale for a randomized controlled trial t assess effectiveness of stochastic vibrotactile mattress stimlation versus standard non-oscillating crib mattress for treating hospitalized opioid-exposed newborns. Contemporary Clinical Trials Communications, 21, 100737. 10.1016/j.conctc.2021.100737 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloch-Salisbury E, Indic P, Bednarek F, & Paydarfar D. (2009). Stabilizing immature breathing patterns of preterm infants using stochastic mechanosensory stimulation. Journal of Applied Physiology, 107(4), 1017–1027. 10.1152/japplphysiol.00058.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borghese IF, Minard KL, & Thoman EB (1995). Sleep rhythmicity in premature infants: Implications for developmental status. Pediatrics and Sleep, 18(7), 523–530. 10.1093/sleep/18.7.523 [DOI] [PubMed] [Google Scholar]
- Bruch T, Rodriguez N, McKenna L, Ta B, Coffman B, & Bloch-Salisbury E. (2020). Actigraphy as an objective measure of irritability in Neonatal Abstinence Syndrome, [Conference peer-reviewed poster session]. Pediatric Academic Societies Meeting, Philadelphia, PA, May 1 -May 5, 2020; Conference cancelled due to Covid-19, 2020: Disseminated in Meeting Program Guide 04/03/20, E-PAS2020:2836.611. [Google Scholar]
- Car ter JV, Pan J, Rai SN, & Galandiuk S. (2016). ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery (United States), 159(6), 1638–1645. 10.1016/j.surg.2015.12.029 [DOI] [PubMed] [Google Scholar]
- Cole RJ. , Kripke DF, Gruen W, Mullaney DJ, & Gillin JC (1992). Automatic sleep/wake identification from wrist activity. Sleep, 15(5), 461–469. 10.1093/sleep/15.5.461 [DOI] [PubMed] [Google Scholar]
- Curzi-Dascalova L, & Mirmiran M. (1996). Manual of Methods for Recording and Analyzing Sleep-Wakefulness States in Preterm and Full-Term Infant. [Google Scholar]
- Ednick M, Cohen AP, McPhail GL, Beebe D, Simakajornboon N, & Amin RS (2009). A review of the effects of sleep during the first year of life on cognitive, psychomotor, and temperament development. Sleep, 32(11), 1449–1458. 10.1093/sleep/32.11.1449 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finnegan LP. , Connaughton JF, Kron RE, & Emich JP (1975). Neonatal abstinence syndrome: assessment and management. Addictive Diseases, 2(1–2), 141–158. [PubMed] [Google Scholar]
- Gertner S, Greenbaum CW, Sadeh A, Dolfin Z, Sirota L, & Ben-Nun Y. (2002). Sleep-wake patterns in preterm infants and 6 month’s home environment: Implications for early cognitive development. Early Human Development, 68(2), 93–102. 10.1016/S0378-3782(02)00018-X [DOI] [PubMed] [Google Scholar]
- Gnidovec B, Neubauer D, & Zidar J. (2002). Actigraphic assessment of sleep-wake rhythm during the first 6 months of life. Clinical Neurophysiology, 113(11), 1815–1821. 10.1016/S1388-2457(02)00287-0 [DOI] [PubMed] [Google Scholar]
- Graven S. (2006). Sleep and brain development. Clinics in Perinatology, 33(3), 693–706. 10.1016/j.clp.2006.06.009 [DOI] [PubMed] [Google Scholar]
- Hazra A, & Gogtay N. (2017). Biostatistics series module 7: The statistics of diagnostic tests. Indian Journal of Dermatology, 62(1), 18–24. 10.4103/0019-5154.198047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holditch-Davis D, Scher M, Schwartz T, & Hudson-Barr D. (2004). Sleeping and waking state development in preterm infants. Early Human Development, 80(1), 43–64. 10.1016/j.earlhumdev.2004.05.006 [DOI] [PubMed] [Google Scholar]
- Hunt RW. ., Tzioumi D, Collins E, & Jeffer y HE (2008). Adverse neurodevelopmental outcome of infants exposed to opiate in-utero. Early Human Development, 84(1), 29–35. 10.1016/j.earlhumdev.2007.01.013 [DOI] [PubMed] [Google Scholar]
- Hyde M, O’Driscoll DM, Binette S, Galang C, Tan SK, Verginis N, Davey MJ, & Horne RSC. (2007). Validation of actigraphy for determining sleep and wake in children with sleep disordered breathing. Journal of Sleep Research, 16(2), 213–216. 10.1111/j.1365-2869.2007.00588.x [DOI] [PubMed] [Google Scholar]
- John D, & Freedson P. (2012). ActiGraph and Actical physical activity monitors: a peek under the hood. Medicine and Science in Sports and Exercise, 44(1 Suppl 1), S86–S89. 10.1249/MSS.0b013e3182399f5e.ACTIGRAPH [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levy J, Hassan F, Plegue MA., Sokoloff MD, Kushwaha JS, Cher vin RD. Barks JDE, & Shellhaas RA. (2017). Impact of hands-on care on infant sleep in the neonatal intensive care unit. Pediatric Pulmonology, 52(1), 84–90. 10.1002/ppul.23513 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirmiran M, Maas YGH, & Ariagno RL (2003). Development of fetal and neonatal sleep and circadian rhythms. Sleep Medicine Reviews, 7(4), 321–334. 10.1053/smrv.2002.0243 [DOI] [PubMed] [Google Scholar]
- O’Brien C, Hunt R., & Jeffer y HE (2004). Measurement of movement is an objective method to assist in assessment of opiate withdrawal in newborns. Archives of Disease in Childhood: Fetal and Neonatal Edition, 89(4), 305–310. 10.1136/adc.2002.025270 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paquet J, Kawinska A., & Carrier J. (2007). Wake detection capacity of actigraphy during sleep. Sleep, 30(10), 14–17. 10.1093/sleep/30.10.1362 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park J. (2020). Sleep promotion for preterm infants in the NICU. Nursing for Women’s Health, 24(1), 24–35. 10.1016/j.nwh.2019.11.004 [DOI] [PubMed] [Google Scholar]
- Rioualen S, Roué JM, Lefranc J, Gouillou M, Nowak E, Alavi Z, Dubourg M, & Sizun J. (2015). Actigraphy is not a reliable method for measuring sleep patterns in neonates. Acta Paediatrica, International Journal of Paediatrics, 104(11), e478–e482. 10.1111/apa.13088 [DOI] [PubMed] [Google Scholar]
- Sadeh A. (2011). The role and validity of actigraphy in sleep medicine: An update. Sleep Medicine Reviews, 15(4), 259–267. 10.1016/j.smrv.2010.10.001 [DOI] [PubMed] [Google Scholar]
- Sadeh A, Acebo C, Seifer R, Aytur S, & Carskadon MA (1995). Activity-based assessment of sleep-wake patterns during the 1st year of life. Infant Behavior and Development, 18(3), 329–337. 10.1016/0163-6383(95)90021-7 [DOI] [Google Scholar]
- So K, Buckley P, Adamson TM, & Horne RSC (2005). Actigraphy correctly predicts sleep behavior in infants who are younger than six months, when compared with polysomnography. Pediatric Research, 58(4), 761–765. 10.1203/01.PDR.0000180568.97221.56 [DOI] [PubMed] [Google Scholar]
- Sung M, Adamson TM, & Horne RSC (2009). Validation of actigraphy for determining sleep and wake in preterm infants. Acta Paediatrica, International Journal of Paediatrics, 98(1), 52–57. 10.1111/j.1651-2227.2008.01002.x [DOI] [PubMed] [Google Scholar]
- Tham EKH, Schneider N, & Broekman BFP (2017). Infant sleep and its relation with cognition and growth: A narrative review. Nature and Science of Sleep, 9, 135–149. 10.2147/NSS.S125992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Timpson W, Killoran C, Maranda L, Picarillo A, & Bloch-Salisbury E. (2018). A quality improvement initiative to increase scoring consistency and accuracy of the finnegan tool. Advances in Neonatal Care, 18(1), 70–78. 10.1097/ANC.0000000000000441 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsai SY, & Thomas KA (2010). Actigraphy as a measure of activity and sleep for infants: A methodologic study. Archives of Pediatrics and Adolescent Medicine, 164(11), 1071–1072. 10.1001/archpediatrics.2010.208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Usui A, Ishizuka Y, Obinata I, Okado T, Fukuzawa H, & Kanba S. (1999). Validity of sleep log compared with actigraphic sleep-wake state II. Psychiatry and Clinical Neurosciences, 53(2), 183–184. 10.1046/j.1440-1819.1999.00529.x [DOI] [PubMed] [Google Scholar]
- Yang SC, Yang A, & Chang YJ (2014). Validation of actiwatch for assessment of sleep-wake states in preterm infants. Asian Nursing Research, 8(3), 201–206. 10.1016/j.anr.2014.06.002 [DOI] [Google Scholar]
- Zuzarte I, Indic P, Barton B, Paydarfar D, Bednarek F, & Bloch-Salisbury E. (2017). Vibrotactile stimulation: A nonpharmacological intervention for opioid exposed newborns. PLoS ONE, 12(4), 1–15. 10.1371/journal.pone.0175981 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.



