Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Feb 10;14:3420. doi: 10.1038/s41598-024-53174-1

Estimation bias and agreement limits between two common self-report methods of habitual sleep duration in epidemiological surveys

Maria Korman 1,, Daria Zarina 1, Vadim Tkachev 1, Ilona Merikanto 2,3, Bjørn Bjorvatn 4,5, Adrijana Koscec Bjelajac 6, Thomas Penzel 7, Anne-Marie Landtblom 8,9, Christian Benedict 10, Ngan Yin Chan 11, Yun Kwok Wing 11, Yves Dauvilliers 12, Charles M Morin 13, Kentaro Matsui 14, Michael Nadorff 15, Courtney J Bolstad 15,16, Frances Chung 17, Sérgio Mota-Rolim 18, Luigi De Gennaro 19,20, Giuseppe Plazzi 21,22, Juliana Yordanova 23, Brigitte Holzinger 24, Markku Partinen 25,26, Cátia Reis 27,28,
PMCID: PMC10858912  PMID: 38341476

Abstract

Accurate measurement of habitual sleep duration (HSD) is crucial for understanding the relationship between sleep and health. This study aimed to assess the bias and agreement limits between two commonly used short HSD self-report methods, considering sleep quality (SQ) and social jetlag (SJL) as potential predictors of bias. Data from 10,268 participants in the International COVID Sleep Study-II (ICOSS-II) were used. Method-Self and Method-MCTQ were compared. Method-Self involved a single question about average nightly sleep duration (HSDself), while Method-MCTQ estimated HSD from reported sleep times on workdays (HSDMCTQwork) and free days (HSDMCTQfree). Sleep quality was evaluated using a Likert scale and the Insomnia Severity Index (ISI) to explore its influence on estimation bias. HSDself was on average 42.41 ± 67.42 min lower than HSDMCTQweek, with an agreement range within ± 133 min. The bias and agreement range between methods increased with poorer SQ. HSDMCTQwork showed less bias and better agreement with HSDself compared to HSDMCTQfree. Sleep duration irregularity was − 43.35 ± 78.26 min on average. Subjective sleep quality predicted a significant proportion of variance in HSDself and estimation bias. The two methods showed very poor agreement and a significant systematic bias, both worsening with poorer SQ. Method-MCTQ considered sleep intervals without adjusting for SQ issues such as wakefulness after sleep onset but accounted for sleep irregularity and sleeping in on free days, while Method-Self reflected respondents’ interpretation of their sleep, focusing on their sleep on workdays. Including an SQ-related question in surveys may help bidirectionally adjust the possible bias and enhance the accuracy of sleep-health studies.

Subject terms: Neuroscience, Health care, Health occupations

Introduction

Habitual Sleep Duration (HSD) is a widely investigated parameter due to the number of highly reproducible associations to physical and psychological health outcomes1,2. It is common to find that health outcomes of interest deteriorate as self-reported HSD deviates from the reference sleep norm interval37. Choosing the right tools to estimate HSD is challenging in epidemiological sleep research. The best method to self-report HSD is a sleep diary8, but it is generally non-applicable in surveys. Majority of the validated (vis-a-vis polysomnography (PSG)) sleep questionnairs, that are routinely used in clinical evaluation to reliably distinguish between individuals with and without sleep disorders, are relatively long9. To ensure good compliance and high response rates, tools that have minimal number of items are therefore prioritized in epidemiological surveys10.

Assessment of HSD in epidemiological surveys can include single questions such as “How many hours do you usually sleep at night?” (e.g., Pittsburgh Sleep Quality Index—PSQI, Self-Assessment of Sleep Survey—SASS)11,12, which assumes that adults provide an accurate global and retrospective approximation of their sleep length. Other HSD estimation methods use two questions about sleep onset and offset times to estimate the sleep interval (e.g., Karolinska Sleep Questionnaire—KSQ, Basic Nordic Sleep Questionnaire—BNSQ, Munich Chronotype Questionnaire—MCTQ); these questions are asked separately for work and free days1315. This method estimates sleep timing and crucial sleep metrics like social jetlag (SJL) and irregular sleep16. For example, inconsistent sleep timing is an important risk factor for metabolic abnormalities, even more significant than sleep duration17.

Various studies found weak-to-moderate correlations between single items of HSD and objectively measured sleep, however the agreement between different methods is poor—ranging between 2.0 and 3.5 h above and below the difference between the means1,1822. Also, sleep diaries and single-question HSDs, displayed either non-significant or weak associations1. Self-assessment and time-in-bed duration calculated from habitual bedtime and wake time (rather than sleep onset and offset times), were recently reported to show disagreement with actigraphy-based sleep duration. Specifically, the single question provided a significant underestimate of HSD while the bed-wake interval agreed well with Time-in-Bed (TIB) but overestimated Total Sleep Time (TST)18. These biases and disagreements pose a significant challenge in the accurate assessment of contribution of HSD to physical and psychological health in survey research. Further, a recent methodological review showed that the variability in the questions relating to sleep, such as event definitions (e.g., “go to bed” vs. “fall asleep”), context (e.g., “habitual” vs. “work/free days”) and timeframe (“typical night” vs. “recently”) leads to discrepancies in HSD estimation by different self-report methods23. Additionally, perceived sleep quality, insomnia symptoms and social schedules are important factors that can affect self-reported HSD19, but the extent of these effects have not been systematically quantified in large cohorts.

Sleep quality refers to the subjective experience of sleep, reflecting a number of quantifiable components of physiological sleep, such as depth of sleep (i.e., amount of slow-wave sleep), sleep continuity (i.e., wake after sleep onset, percentage of time awake, and number of awakenings) and additional internal or external factors (i.e., circadian profile, pain, stress)24. Poor sleep quality can lead to overestimation or underestimation of sleep duration25. A single question of overall sleep quality using a Likert scale is common in both experimental and epidemiological studies, with a verbal scale providing more stable estimation compared to a numerical scale10,12. The Insomnia Severity Index (ISI) is sometimes also used as a proxy for sleep quality26,27. Social time pressure refers to the demands and constraints of social obligations that may limit the sleep duration28. In industrialized societies, people often experience a high social time pressure on workdays, and a large mismatch between internal biological and social times. This mismatch can be quantified by the difference between mid-sleep point on free and workdays and reflects irregularity of sleep timing, called Social Jet Lag (SJL)29. Because self-report questions always encompass more than physiological sleep duration alone, evaluating the differences between common self-report methods used to assess HSD in surveys focusing on the potential predictors of the bias is important. The first objective of this study was to evaluate within-subjects estimation bias and the limits of agreement between two short self-report methods used to assess HSD in a large, global, heterogeneous sample of the International Covid Study II (ICOSS-II) project30. The second objective of this study was to address the contribution of subjective Sleep Quality and Social Time Pressure to estimate the HSD bias. The contribution of Sleep Quality was validated vis-à-vis Insomnia Severity Index (ISI)—one of the most widely used tools to assess sleep problems in clinical and community samples27.

Results

The sample consisted of 10,268 participants with a mean age of 43.16 ± 16.80 years (Mean ± standard deviation) and 68.3% were female. Demographic descriptive in Table 1.

Table 1.

Socio-demographic characteristics and sleep measures of the sample. Mean ± SD or frequency (% of group total).

Variables Sample total
n = 10,268
Age, years 43.2 ± 16.8
 18–34 3636 (35.4%)
 35–39 2771 (27.0%)
 50–64 2522 (24.6%)
 65–99 1339 (13.0%)
Gender, female 7012 (68.3%)
Country
 Austria 527 (5.1%)
 Brazil 197 (1.9%)
 Bulgaria 341 (3.3%)
 Canada 464 (4.5%)
 Croatia 477 (4.6%)
 France 305 (3.0%)
 Germany 445 (4.3%)
 Finland 1181 (11.5%)
 Hong Kong 243 (2.4%)
 Israel 352 (3.4%)
 Italy 786 (7.7%)
 Japan 2581 (25.1%)
 Norway 491 (4.8%)
 Portugal 408 (4.0%)
 Sweden 688 (6.7%)
 USA 744 (7.2%)
 Other 38 (0.4%)
Ethnicity
 White (Caucasian) 6626 (64.9%)
 Asian 2713 (26.6%)
 African 153 (1.5%)
 Hispanic 212 (2.1%)
 Other 503 (4.9%)
Marital status
 Single 3351 (32.6%)
 Married/relationship 6026 (58.7%)
 Divorce/separated 707 (6.9%)
 Widowed 179 (1.7%)
Education
 Primary/elementary/lower secondary school 295 (2.9%)
 Secondary/high/vocational school 3184 (31.9%)
 University/college or above 6512 (65.2%)
Present work
 Student 1903 (18.5%)
 Regular day work 5119 (49.9%)
 Irregular day work/freelancer/artist/research 989 (9.6%)
 Unemployed 356 (3.5%)
 Retired 1205 (11.7%)
 At home (no salary) 605 (5.9%)
 Temporary laid off 91 (0.9%)
Financial burden
 Not at all 4794 (4.8%)
 A little/somewhat 3825 (3.8%)
 Much/very much 1467 (1.4%)
 Body Mass Index (BMI) 25.0 ± 6.3
 Insomnia Severity Index (ISI) 8.5 ± 6.1
 0–7; no clinical insomnia 5136 (50.3%)
 8–14; subthreshold insomnia 3249 (31.8%)
 15–21; moderate insomnia 1502 (14.7%)
 22–28; severe insomnia 320 (3.1%)
Sleep quality
 Well 2059 (20.1%)
 Rather well 2994 (29.2%)
 Neither well nor badly 2658 (25.9%)
 Rather badly 1958 (19.1%)
 Badly 599 (5.8%)
Habitual sleep duration self-report (HSDself), min 418.9 ± 77.2
Habitual sleep duration MCTQweek (HSDMCTQweek), min 461.4 ± 75.1
Habitual sleep duration MCTQwork (HSDMCTQwork), min 449.0 ± 81.1
Habitual sleep duration MCTQfree (HSDMCTQfree), min 492.3 ± 87.7
Social jetlag (SJL), min 56.5 ± 62.2

Estimation of habitual sleep duration bias and the agreement between methods

Distributions of HSDs from both methods are shown in Fig. 1a, with mean HSDself being shorter (418.9 ± 77.2) than HSDMCTQweek (461.4 ± 75.1). A paired t-test was used to quantify the within-subject difference between methods. A systematic HSD estimation bias was observed (t =  − 63.07, df = 10,267, p < 0.001). The mean bias was − 42.41 ± 67.42 min (95% CI of the difference: − 43.72 to − 41.11) and had a normal distribution (Fig. 1b), though HSDself and HSDMCTQweek were significantly positively correlated (rho = 0.604, p < 0.001, weighted by age).

Figure 1.

Figure 1

Habitual sleep duration (HSD) by Method-Self and Method-MCTQweek. (a) Upper panel—HSD distributions, percent from group total by method: blue line—HSDself, black line—HSDMCTQweek, 1-h bin. Lower panel—Boxplots of individual HSD by method. Whiskers—max and min values, box borders—75th and 25th percentiles, line through the box—median. (b) Upper panel—HSD estimation bias values distribution, percent from group total, 30-min bin. Lower panel—Boxplots of individual HSD estimation bias values. (c) Bland–Altman plot comparing Method-Self and Method-MCTQweek. The blue line indicates that the Method-Self sleep duration estimates are on average 42 min shorter than Method-MCTQ estimates. The green lines indicate the 95% limits of agreement (± 1.96SDs). The linear regression line (red) shows that the HSD estimation bias is stable through the whole range values. The two methods only agree to within ± 2.2 h.

The level of agreement between the two HSD assessment methods is visualized using the Bland–Altman plot in Fig. 1c. As neither of the two methods is a “reference”, the bias was compared with the means of the HSDself and the HSDMCTQweek values. To assess whether the bias (represented by the gap between the X axis, and the mean line (blue)) is stable through the whole range of values, a linear regression line (red) was fit to the HSD data points. A Pearson test demonstrated a significant negligible slope (k = 0.034, Beta = 0.02, p = 0.03). Finally, the limits of agreement between methods were calculated as: Upper limit d¯-1.96s=-42.41-1.96×67.42=175; Lower limit d¯+1.96s=-42.41+1.96×67.42=90. Altogether, the two methods only agreed within ± 133 min, in other words, the HSDself may be 90 min above or 175 min below the HSDMCTQweek.

A simple regression model using weighted joint distribution of gender and age by country showed that age was not a significant predictor of the HSD bias (F(1, 10,256) = 2.77, p = 0.096, Beta = 0.016). However, women had significantly larger HSD bias than men (t = 4.55, p < 0.001, mean difference = 6.6 min), but with a negligibly small effect size (Cohen’s d = 0.097).

Sleeping well? The HSD estimation bias and the agreement of the methods depend on subjective sleep quality

HSD estimated by both methods negatively correlated with participants’ subjective Sleep Quality, with sleep quality demonstrating a stronger relation to HSDself (Pearson correlations weighted by age: rho =  − 0.334, p < 0.01, rho =  − 0.134, p < 0.01; HSDself and HSDMCTQweek, respectively). Although the two methods are presumably estimating the same construct, using the Fisher r-to-z transformation we found that the two correlation coefficients were also significantly different (z =  − 15.71, p < 0.01). The correlation between HSD estimation bias and subjective Sleep Quality was also significant (rho =  − 0.207, p < 0.01).

To quantify the dependence of the agreement between the two methods in reference to subjective sleep quality, given the large sample size of the ICOSS-II study, HSD bias for each 5 Sleep Quality groups was separately analyzed. One-way ANOVA showed that the estimation bias became more negative as the sleep quality decreased (F(4, 10,256) = 105.16, p < 0.001). The results are summarized in Fig. 2. The minimal HDS estimation bias value (− 26.69 ± 58.10 min) and the narrowest range of agreement between methods (± 114 min) were in the group sleeping “well”. The estimation bias and range of agreement became progressively larger with poorer sleep quality. HDS estimation bias in the group sleeping “badly” reached a maximum value of (− 79.97 ± 97.29 min) with a range of agreement of ± 191 min. Post-hoc pairwise comparisons with Bonferroni corrections demonstrated significant distinctions between each of the five sleep quality groups (see supplementary information SI-Table S.1), suggesting underestimation of HSDself relative to HSDMCTQweek increases incrementally.

Figure 2.

Figure 2

HSD estimation bias by Sleep Quality. (a) Bland–Altman plots comparing Method-Self and Method-MCTQweek in five Sleep Quality groups. The blue lines (mean per Sleep Quality group) indicate that underestimation of HSDself relative to HSDMCTQweek increased incrementally as the Sleep Quality worsened: from − 27 min in the “well” sleeping group to − 70 min in the “badly” sleeping group. The 95% limits of agreement (± 1.96 SDs, green lines) also become progressively further apart. (b) Statistics of the Bland and Altman plots. (c) Boxplots of HSD estimation bias by Sleep Quality. Notations as in Fig. 1c.

Workdays or freedays? The HSD estimation bias and the agreement of methods depends on social time pressure (workdays/free days)

Most participants reported irregular sleep durations across the week. The mean difference between HSDMCTQwork and HSDMCTQfree was − 43.35 ± 78.26 min (449.0 ± 81.1 and 492.3 ± 87.7 min, respectively; paired t-test, t(10,267) =  − 56.13, p < 0.001). Accordingly, the distribution of the difference between HSDMCTQwork and HSDMCTQfree, with majority of respondents reporting longer sleep duration during free days (percentiles in minutes: 25th = 0, 50th = 30, 75th = 75).

Next, we tested the hypothesis that HSDMCTQwork would demonstrate a smaller estimation bias and better agreement with HSDself as compared to HSDMCTQfree. The mean estimation bias for the HSDMCTQwork was smaller than the HSDMCTQfree (− 30 min, and − 73 min, respectively, Fig. 3a). Further, the agreement limits with the HSDself were similar to the limits of the HSDMCTQweek but better than in HSDMCTQfree (± 140 min vs. ± 169 min, respectively, Fig. 3b,c). The observation that Sleep Quality groups were significantly different from each other was replicated also in HSDself–HSDMCTQwork and HSDself–HSDMCTQfree comparisons (SI-Tables S.2, S.3).

Figure 3.

Figure 3

Estimation bias differences between Method-MCTQwork and Method-MCTQfree. (a) Habitual sleep duration estimation bias values distribution for workdays and free days, percent from group total. Dotted line—HSDMCTQfree, dashed line—HSDMCTQwork. (b) Bland–Altman plot comparing Method-Self and Method-MCTQwork. Notations as in Fig. 1c. The two methods agree within ± 2.3 h. (c) Bland–Altman plot comparing Method-Self and Method-MCTQfree. The two methods agree within ± 2.8 h. Notations as in Fig. 1c.

The mean SJL of the sample was 56.5 ± 62.2 min (SJL percentiles, in minutes: 25th = 15, 50th = 45, 75th = 90). There were no significant differences in SJL between the Sleep Quality groups (One-way ANOVA p = 0.205).

The combined contribution of sleep quality and social time pressure on HSD estimation bias

Having established the effects of Sleep Quality and Social Time Pressure on HSD estimation bias, we presumed that their combination may demonstrate conditions under which the bias is minimal and the agreement between the methods is most reliable. One-way ANOVAs showed that the estimation bias became more negative in both methods as the sleep quality decreased (F(4, 10,263) = 84.312, p < 0.001; F(4, 10,263) = 79.65, p < 0.001; Method-MCTQwork and Method-MCTQfree, respectively). Post-hoc pairwise comparisons with Bonferroni corrections for HSDMCTQwork showed that “well” and “rather well” Sleep Quality groups did not differ, while all other groups showed significant differences (SI-Table S.4). In contrast, for HSDMCTQfree, “rather badly” and “badly” Sleep Quality groups were not significantly different from each other, while all other groups showed significant differences (SI-Table S.5). The “well” and “rather well” sleeping groups during workdays showed the best parameters: the mean HSD estimation bias was only − 15.81 ± 62.77 min and the two methods agreed within ± 114 min (Fig. 4a,b).

Figure 4.

Figure 4

HSD estimation bias as a function of Sleep Quality by (a) Method-MCTQwork versus (b) Method-MCTQfree. HSD estimation bias values are smaller (closer to zero line) in the Method-MCTQwork as compared with the Method-MCTQfree in all Sleep Quality groups. Green areas around the means—the 95% limits of agreement (± 1.96 SDs). Note that the Method-MCTQwork narrower agreement ranges in all Sleep Quality groups as compared to the Method-MCTQfree.

Weighted least squares stepwise regressions were conducted to examine the extent to which Sleep Quality and Social Time Pressure (represented by SJL) explained the variance in different HSDs and the HSD estimation bias itself. The main model had 5 predictors: Sleep Quality, SJL, age, gender, and BMI. Gender and age by country distribution was used for weighting. The model explained 13.7% of the HSDself variance, 4.2% of the HSDMCTQweek variance, 3.6% of the HSDMCTQwork variance, 10.8% of the HSDMCTQfree variance and 6.9% of the variance in the HSD estimation bias. Leading predictor in all models, except HSDMCTQfree, was Sleep Quality, with HSDself demonstrating the largest dependence (12.5% vs. 2.1% vs. 2.1% and 6.2%; HSDself, HSDMCTQweek and HSDMCTQwork and HSD estimation bias, respectively). Leading predictor of HSDMCTQfree was SJL (7.4%). Age and gender were significant predictors in most models but explained less than 1% of the variance for all (statistical details in supplementary information SI-Table S.6).

Comparison between the contributions of sleep quality and ISI score to HSD estimation bias

The contribution of subjective Sleep Quality to the models was assessed using the ISI score, a clinical index of insomnia symptoms severity. Weighted least squares stepwise regressions were re-run with the ISI score used instead of the Sleep Quality and the other four predictors similar to the original model. The variance in HSDself, HSDMCTQweek and HSDMCTQwork was primarily explained by the ISI score but the models were less robust (8.4%, 1.4% and 1.5%, respectively, (see details in supplementary information SI-Table S.7). See full statistical details in SI-Table S.7 and SI-Fig. S.1 for the distribution of the HSD estimation bias values by ISI categories. Finally, a model including both Sleep Quality and ISI continuous score as predictors (and SJL, gender, age, and BMI), explained 6.9% of the variance in HSD estimation bias. Note that the ISI score was the least robust contributor accounting only for 0.1% of the variance (SI-Table S.8), demonstrating that ISI score was practically redundant as a predictor of the HSD estimation bias.

Discussion

It is not clear which self-report method to measure sleep duration can be advised to be used with confidence in large online surveys, since great discrepancies are systematically observed between different methods. Our findings in a large international sample of 10,268 participants also showed poor agreement range (± 133 min), and also indicated systematic and high estimation bias (42.41 ± 67.42 min) between HSD derived from sleep onset and offset and a single question. Thus, for a given person, self-reported sleep duration (HSDself) will be almost always lower than self-reported sleep interval (according to HSDMCTQweek). For example, if somebody says they sleep 7.5 h a night that means that he/she would estimate their sleep interval as ~ 8h12min (+ 42 min), on average, but the accuracy of this estimation will be very low (± 133 min).

While inaccuracy and problems with face validity of different methods are well recognized in the literature, differences in the dimensionality of the self-report methods, factors that contribute to the poor agreement between them and explain the bias, at least partially, were less studied18,19,23. If HSD is systematically under- or overestimated depending on the question, the associations of the health outcomes with sleep duration will also be systematically inflated or flattened31. Our findings showed that subjective sleep quality was a strong driver of the estimation bias, the bias almost tripled from the best to worst Sleep Quality group (from 26.69 ± 58.10 to − 79.97 ± 97.29 min). Furthermore, estimation bias changed incrementally with decreasing sleep quality. We also showed that a single question addressing sleep quality contributed to the model explaining the HSD estimation bias more than a multi-item insomnia symptoms severity score. Moreover, having both Sleep Quality and ISI scores as predictors of HSD estimation bias was, in fact, redundant. Sleep quality was also a leading predictor of HSDself, HSDMCTQweek and HSDMCTQwork, while SJL was a leading predictor of HSDMCTQfree. The quantitative estimation of the bias between methods can be used bi-directionally to estimate HSD from one method to the other, if a subjective sleep quality parameter is available.

Our findings therefore indicate that assessing HSD with a single question, or HSD from sleep onset and offset, may capture distinct aspects of sleep duration. The HSDMCTQweek was only subtly influenced by sleep quality, while HSDself and the estimation bias were profoundly sensitive to it. Conversely, the single-question method accounts for poor sleep, but lacks sensitivity to sleep rebound on free days. This may happen because people tend to report the most representative days of the week (i.e., workdays), and lower sleep satisfaction during workdays. This makes the single-question method more susceptible to sleep misperception. Sleep misperception has been found to vary a lot in people from the general population, in patients with insomnia32, hypersomnia33 and obstructive sleep apnea34. These results are in agreement with previous findings, where single questions about sleep duration and sleep quality using the PSQI tool were shown to represent workdays, whereas when the same PSQI questions were asked separately, participants from the general population35 had better sleep during free days as well as in clinical populations, and this difference was mediated by SJL36. Women had a slightly higher HSD estimation bias compared to men (~ 6 min), and this finding may be explained by the fact that women tend to report lower sleep quality37. Interestingly, although sleep duration changes through life38, age had no effect on the HSD estimation bias, suggesting that underestimation of HSDself relative to HSDMCTQweek is a stable phenomenon across ages related to sleep quality.

Several limitations exist when interpreting our results. Among those, it was a convenience sample that was collected during COVID-19 pandemic, included unusual participants with a novel health profile of long COVID, and had a clear overrepresentation of women (68.3%). In particular, the data collection period was associated with many changes in the social and personal lives of people across participating countries but note that data was not collected during confinement. Sleep–wake habits during the pandemic were adaptively changing worldwide, with many people working and studying from home3941. Additionally, this study was designed to engage participants who may have had COVID-19 and suffer from symptoms of long COVID25,30. Indeed, 9.1% of the sample reported symptoms of long COVID when enrolled in the ICOSS-II study. However, the sensitivity analyses in a sub-group of participants with long-COVID symptoms and in a subgroup of older adults supported the conclusion that HSD bias between methods is a stable trait primarily related to Sleep Quality (see details in the “Methods” and Supplementary Materials sections). Altogether, the web-based survey's generalizability is limited, but maybe partially offset by the large sample size and uniform data acquisition period.

Concerns about self-reported sleep duration accuracy in surveys are longstanding19,42,43, even prompting suggestions to exclude it from epidemiological studies44. Nevertheless, in large-scale field sleep studies the use of self-report tools is often the only possible option, like in the case of the COVID-19 pandemic28,30. Over the last years, many studies showed associations between self-report measures with chronic diseases and mental health57,45, identifying risk factors, screening for sleep disorders, monitoring changes in the population habits, and understanding the broader public health implications. We believe that researchers using measures of sleep duration based on self-reports should be aware of the meanings and limitations associated with each method, as well as about their disagreement without assuming that all of them reflect physiological sleep to the same extent and strive to add objective measurements of sleep duration or sleep diary when possible.

To conclude, the two methods showed very poor agreement and a significant systematic bias, both worsening with poorer subjective sleep quality. The method using self-reported sleep onset and offset times provides a “raw” calculation of the sleep intervals for work and free days, accounts for irregularities in sleep duration and timing but is inherently insensitive to the frequency and length of awakenings46,47. The accuracy of sleep intervals estimations would benefit from inclusion of a wakefulness after sleep onset item, as in Evanger et al.48. The single-question sleep duration assessment was found to be associated with sleep quality, and thus may reflect in part how respondents perceive their sleep. However, this method is inherently insensitive to the sleep rebound that occurs on days off31,49. We suggest that assessing sleep duration and subjective sleep quality separately for workdays and free days may improve the design of future studies35,36. This can be done using either single or two-question approach, in accordance with the specific objectives of the study and, when possible, should include objective measures of sleep. Future studies should evaluate whether including items assessing sleep quality (e.g., single question) and wakefulness after sleep onset may facilitate the implementation of adjustments accounting for potential biases between HSD estimation methods.

Methods

Data collection

This study used data from the International Covid Study II (ICOSS-II)30, which is an international collaboration between sleep and circadian rhythm experts. Using a web-based anonymous survey, ICOSS-II took place between May to December 2021 in parallel across the following 16 countries using translations to local languages: Austria, Brazil, Bulgaria, Canada, Hong Kong/China, Croatia, Finland, France, Germany, Israel, Italy, Japan, Norway, Portugal, Sweden, USA. The survey used Qualtrics and Redcap platforms. The study conforms to recognized standards by the Declaration of Helsinki. After a brief explanation of the study, the survey was available to participants after obtaining their informed consent to be part of the study. All investigators obtained local ethical committee (REB) approval when applicable (detailed list in supplementary material Table S.8). Due to the anonymous nature of the survey, REB permissions were exempted in some countries.

A total of 16,899 participants opened the link to the ICOSS questionnaire, and 15,859 had valid data. For this study we excluded shift/night workers and subjects reporting severe health conditions (atrial fibrillation, heart failure, stroke, other heart conditions, chronic obstructive pulmonary disease, kidney failure, cancer, immunosuppressive treatment, ongoing Covid-19). For quality control reasons, we excluded participants with HSD < 2.5 h or > 16 h (in either HSDself and HSDMCTQfree), with discrepancy in sleep duration estimation of more than 400 min between the two methods, or with missing data in sleep duration and sleep quality parameters. We had a final sample of 10,268 individuals.

Sleep assessment items and measures

HSD times were assessed twice for each participant using two methods: Method-Self assessment was based on a single-question (i.e., “How many hours per night you have been sleeping on average CURRENTLY?”) in the format hh:mm (HSDself). The Method-MCTQ used an adapted version of the Munich Chronotype Questionnaire (µMCTQ). The questions were referring to sleep onset and offset timings (reported in 24 h local time format) (i.e., “At what time do you usually fall asleep at work/free days CURRENTLY?”, “At what time do you usually wake up at work/free days CURRENTLY?”). Separate reports were obtained for workdays and free days, enabling calculation of HSD during workdays and free days (HSDMCTQwork, HSDMCTQfree) and a weighted weekly average HSD, assuming 5 workdays (HSDMCTQweek)50. The resolution of the answers was 15 min. Sleep mid-points (between reported sleep onset and offset times) on work- and free days were used to calculate SJL (absolute difference between sleep mid-points on free and workdays)29.

Subjective Sleep Quality was reported by participants on a 5-point Likert scale (i.e., well, rather well, neither well or badly, rather badly and badly) as in the BNSQ, in response to the question “How well have you been sleeping CURRENTLY?”. We used these categories to stratify the sample by Sleep Quality groups. Symptoms of insomnia were assessed using the Insomnia Severity Index (ISI), a 7-item questionnaire assessing the nature, severity, and impact of insomnia during “the last month”. A 5-point Likert scale is used to rate each item (0 = no problem to 4 = very severe problem), which provided a total score ranging from 0 to 28. The total score was interpreted as follows: absence of insomnia (0–7); sub-threshold insomnia (8–14); moderate insomnia (15–21); and severe insomnia (22–28)27.

Statistical analysis

Data are reported as mean ± SD or frequency (% of group total). The agreement between the two methods for assessment of HSD (Method-Self and Method-MCTQ) was analyzed using the approach proposed by Bland and Altman51. Mean differences between the methods [HSDself–HSDMCTQweek], or [HSDself–HSDMCTQwork], or [HSDself–HSDMCTQfree] were valued as a measure of systematic bias using paired t-tests. The upper and lower limits of agreement were defined as mean difference ± 1.96 × standard deviation with corresponding 95% confidence interval (95% CI). The difference between limits of agreement represents the range of HSD values covering the agreement between the two methods for ~ 95% of the individuals as a measure of precision. Sleep Quality groups were compared using Mann–Whitney or t-tests for continuous variables, according to the type and variables distribution. A simple regression model with weighted joint distribution of gender and age by country was used to estimate the contribution of these demographics to the HSD bias. Multiple regressions were run to evaluate the extent to which Sleep Quality and social time pressure (given by SJL) explained the variance in different HSDs and the HSD estimation bias itself. The main model included a set of 5 predictors: Sleep Quality, SJL, and potential demographic confounders previously linked to HSD—including age, gender, and Body Mass Index (BMI). In the validation analysis, ISI score was also used as a predictor. Collinearity tests showed no multicollinearity concerns with the predictors.

The sensitivity analyses to explore potential plausible biases were performed in a sub-group of participants with long-COVID symptoms (SI-Table S.8) and in a subgroup of older adults (> 65 years old, majority after retirement, SI-Table S.9): (1) As the ICOSS-II data were collected 15–21 months after the onset of the COVID-19 pandemic, the first subgroup for sensitivity analysis included 934 (9.1% from total) individuals who met the WHO criteria for long COVID-1952. COVID-19 is a recent disorder that impacts sleep and may change the perception of sleep duration with the two estimates. We performed a sensitivity analysis focusing on the HSD estimation and agreement between Method-Self and Method-MCTQ to investigate potential bias in a sub-sample of participants with symptoms of long COVID. (2) Since age and retirement play a major role in sleep habits, sleep quality and social time pressure, the second subgroup for sensitivity analysis included 1187 participants (11.5% from total). The mean age of this group was 71.22 ± 3.68 years old. The data were analyzed using SPSS 29.0 (IBM Corp., Armonk, NY, USA) and R (version 4.0.5).

Supplementary Information

Supplementary Information. (355.2KB, docx)

Acknowledgements

We acknowledge Ying Huang (Germany), Harald Hrubos-Strøm (Norway), Colin A. Espie (Great Britain) and Yuichi Inoue (Japan) for being instrumental in giving inputs for study design or providing data to this study. This material is partially the result of work supported with resources at the South Texas Veterans Health Care System in San Antonio, TX, USA. The contents of this publication do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.

Author contributions

Conceptualization: M.K., C.R.; Data curation: M.K. (Israel), I.M., M.P. (Finland), B.B. (Norway), A.K.B. (Croatia), T.P. (Germany), A.L., C.B. (Sweden), N.Y.C., Y.K.W. (Hong Kong), Y.D. (France), C.M.M. (Canada), K.M. (Japan), M.N., F.C. (US), S.M.R. (Brazil), L.G., G.P. (Italy), J.Y. (Bulgaria), B.H. (Austria), C.R. (Portugal); Formal analysis: M.K., D.Z., V.T.; Analysis dicussion: M.K., C.R.; Methodology: M.K., C.R.; Project administration: M.P., I.M.; Writing—original draft: M.K., D.Z., V.T., C.R.; and all authors review, edit and approved the final version.

Data availability

We included all the data needed for the evaluation of the conclusions in the “Results” section or in the Supplementary Information file. Additional data related to this article may be requested from the authors.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Maria Korman, Email: maria.korman@ariel.ac.il.

Cátia Reis, Email: catia.reis@medicina.ulisboa.pt.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-53174-1.

References

  • 1.Benz F, et al. How many hours do you sleep? A comparison of subjective and objective sleep duration measures in a sample of insomnia patients and good sleepers. J. Sleep Res. 2023;32:e13802. doi: 10.1111/jsr.13802. [DOI] [PubMed] [Google Scholar]
  • 2.Chaput J-P, et al. Sleep duration and health in adults: An overview of systematic reviews. Appl. Physiol. Nutr. Metab. 2020;45:S218–S231. doi: 10.1139/apnm-2020-0034. [DOI] [PubMed] [Google Scholar]
  • 3.Zhu G, et al. Exploration of sleep as a specific risk factor for poor metabolic and mental health: A UK biobank study of 84,404 participants. Nat. Sci. Sleep. 2021;13:1903–1912. doi: 10.2147/NSS.S323160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kósa K, Vincze S, Veres-Balajti I, Bába ÉB. The pendulum swings both ways: Evidence for U-shaped association between sleep duration and mental health outcomes. Int. J. Environ. Res. Public Health. 2023;20:5650. doi: 10.3390/ijerph20095650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cappuccio FP, Cooper D, D’Elia L, Strazzullo P, Miller MA. Sleep duration predicts cardiovascular outcomes: A systematic review and meta-analysis of prospective studies. Eur. Heart J. 2011 doi: 10.1093/eurheartj/ehr007. [DOI] [PubMed] [Google Scholar]
  • 6.Cappuccio FP, D’Elia L, Strazzullo P, Miller MA. Sleep duration and all-cause mortality: A systematic review and meta-analysis of prospective studies. Sleep. 2010;33:585–592. doi: 10.1093/sleep/33.5.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Reis C, et al. Sleep duration, lifestyles and chronic diseases: A cross-sectional population-based study. Sleep Sci. 2018;11:217–230. doi: 10.5935/1984-0063.20180036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Carney CE, et al. The consensus sleep diary: Standardizing prospective sleep self-monitoring. Sleep. 2012;35:287–302. doi: 10.5665/sleep.1642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Shahid A, Wilkinson K, Marcu S, Shapiro CM. STOP, THAT and One Hundred Other Sleep Scales. Springer; 2012. [Google Scholar]
  • 10.Croy I, Smith MG, Gidlöf-Gunnarsson A, Persson-Waye K. Optimal questions for sleep in epidemiological studies: Comparisons of subjective and objective measures in laboratory and field studies. Behav. Sleep Med. 2017;15:466–482. doi: 10.1080/15402002.2016.1163700. [DOI] [PubMed] [Google Scholar]
  • 11.Buysse DJ, Reynolds CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research. Psychiatry Res. 1989;28:193–213. doi: 10.1016/0165-1781(89)90047-4. [DOI] [PubMed] [Google Scholar]
  • 12.Dietch JR, Sethi K, Slavish DC, Taylor DJ. Validity of two retrospective questionnaire versions of the Consensus Sleep Diary: The whole week and split week Self-Assessment of Sleep Surveys. Sleep Med. 2019;63:127–136. doi: 10.1016/j.sleep.2019.05.015. [DOI] [PubMed] [Google Scholar]
  • 13.Nordin M, Åkerstedt T, Nordin S. Psychometric evaluation and normative data for the karolinska sleep questionnaire. Sleep Biol. Rhythms. 2013;11:216–226. doi: 10.1111/sbr.12024. [DOI] [Google Scholar]
  • 14.Roenneberg T, Wirz-Justice A, Merrow M. Life between clocks: Daily temporal patterns of human chronotypes. J. Biol. Rhythms. 2003;18:80–90. doi: 10.1177/0748730402239679. [DOI] [PubMed] [Google Scholar]
  • 15.Partinen M, Gislason T. Basic Nordic Sleep Questionnaire (BNSQ): A quantitated measure of subjective sleep complaints. J. Sleep Res. 1995;4:150–155. doi: 10.1111/j.1365-2869.1995.tb00205.x. [DOI] [PubMed] [Google Scholar]
  • 16.Monk TH, et al. Measuring sleep habits without using a diary: The sleep timing questionnaire. Sleep. 2003;26:208–212. doi: 10.1093/sleep/26.2.208. [DOI] [PubMed] [Google Scholar]
  • 17.Huang T, Redline S. Cross-sectional and prospective associations of actigraphy-assessed sleep regularity with metabolic abnormalities: The multi-ethnic study of atherosclerosis. Diabetes Care. 2019;42:1422–1429. doi: 10.2337/dc19-0596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lauderdale DS. Commentary on “Agreement between simple questions about sleep duration and sleep diaries in a large online survey”. Sleep Health. 2015;1:138–139. doi: 10.1016/j.sleh.2015.03.004. [DOI] [PubMed] [Google Scholar]
  • 19.Miller CB, et al. Agreement between simple questions about sleep duration and sleep diaries in a large online survey. Sleep Health. 2015;1:133–137. doi: 10.1016/j.sleh.2015.02.007. [DOI] [PubMed] [Google Scholar]
  • 20.Silva GE, et al. Relationship between reported and measured sleep times. J. Clin. Sleep Med. 2007;03:622–630. doi: 10.5664/jcsm.26974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Matthews KA, et al. Similarities and differences in estimates of sleep duration by polysomnography, actigraphy, diary, and self-reported habitual sleep in a community sample. Sleep Health. 2018;4:96–103. doi: 10.1016/j.sleh.2017.10.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lee PH. Validation of the National Health And Nutritional Survey (NHANES) single-item self-reported sleep duration against wrist-worn accelerometer. Sleep Breath. 2022;26:2069–2075. doi: 10.1007/s11325-021-02542-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Robbins R, et al. Self-reported sleep duration and timing: A methodological review of event definitions, context, and timeframe of related questions. Sleep Epidemiol. 2021;1:100016. doi: 10.1016/j.sleepe.2021.100016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.McCarter SJ, et al. Physiological markers of sleep quality: A scoping review. Sleep Med. Rev. 2022;64:101657. doi: 10.1016/j.smrv.2022.101657. [DOI] [PubMed] [Google Scholar]
  • 25.Santos RB, et al. Prevalence and predictors of under or overestimation sleep duration in adults: The ELSA-Brasil study. Sleep Epidemiol. 2021;1:100013. doi: 10.1016/j.sleepe.2021.100013. [DOI] [Google Scholar]
  • 26.Muzni K, Groeger JA, Dijk D, Lazar AS. Self-reported sleep quality is more closely associated with mental and physical health than chronotype and sleep duration in young adults: A multi-instrument analysis. J. Sleep Res. 2021;30:e13152. doi: 10.1111/jsr.13152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Morin CM, Belleville G, Bélanger L, Ivers H. The insomnia severity index: Psychometric indicators to detect insomnia cases and evaluate treatment response. Sleep. 2011;34:601–608. doi: 10.1093/sleep/34.5.601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Korman M, et al. COVID-19-mandated social restrictions unveil the impact of social time pressure on sleep and body clock. Sci. Rep. 2020;10:22225. doi: 10.1038/s41598-020-79299-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Roenneberg T, Pilz LK, Zerbini G, Winnebeck EC. Chronotype and social jetlag: A (self-) critical review. Biology (Basel) 2019;8:54. doi: 10.3390/biology8030054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Merikanto I, et al. Disturbances in sleep, circadian rhythms and daytime functioning in relation to coronavirus infection and Long-COVID—A multinational ICOSS study. J. Sleep Res. 2022;31:e13542. doi: 10.1111/jsr.13542. [DOI] [PubMed] [Google Scholar]
  • 31.Jackson CL, et al. 0694 Concordance between self-reported and objectively-assessed sleep duration among African–American adults: Findings from the Jackson Heart Sleep Study. Sleep. 2019;42:A278–A278. doi: 10.1093/sleep/zsz067.692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Fernandez-Mendoza J, et al. Sleep misperception and chronic insomnia in the general population: Role of objective sleep duration and psychological profiles. Psychosom. Med. 2011;73:88–97. doi: 10.1097/PSY.0b013e3181fe365a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Evangelista E, et al. Characteristics associated with hypersomnia and excessive daytime sleepiness identified by extended polysomnography recording. Sleep. 2021;44:zsaa264. doi: 10.1093/sleep/zsaa264. [DOI] [PubMed] [Google Scholar]
  • 34.Choi SJ, Suh S, Ong J, Joo EY. Sleep misperception in chronic insomnia patients with obstructive sleep apnea syndrome: Implications for clinical assessment. J. Clin. Sleep Med. 2016;12:1517–1525. doi: 10.5664/jcsm.6280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Pilz L, Keller L, Lenssen D, Roenneberg T. Time to rethink sleep quality: PSQI scores reflect sleep quality on workdays. Sleep. 2018;2:zsy029. doi: 10.1093/sleep/zsy029. [DOI] [PubMed] [Google Scholar]
  • 36.Reis C, Pilz LK, Keller LK, Paiva T, Roenneberg T. Social timing influences sleep quality in patients with sleep disorders. Sleep Med. 2020;71:8–17. doi: 10.1016/j.sleep.2020.02.019. [DOI] [PubMed] [Google Scholar]
  • 37.Fatima Y, Doi SAR, Najman JM, Al Mamun A. Exploring gender difference in sleep quality of young adults: Findings from a large population study. Clin. Med. Res. 2016;14:138–144. doi: 10.3121/cmr.2016.1338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hirshkowitz M, et al. National Sleep Foundation’s updated sleep duration recommendations: Final report. Sleep Health. 2015;1:233–243. doi: 10.1016/j.sleh.2015.10.004. [DOI] [PubMed] [Google Scholar]
  • 39.Leone MJ, Sigman M. Effects of lockdown on human sleep and chronotype during the COVID-19 pandemic. Curr. Biol. 2020;30:R905–R931. doi: 10.1016/j.cub.2020.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Scarpelli S, et al. Subjective sleep alterations in healthy subjects worldwide during COVID-19 pandemic: A systematic review, meta-analysis and meta-regression. Sleep Med. 2022;100:89–102. doi: 10.1016/j.sleep.2022.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Brandão LEM, et al. Social jetlag changes during the COVID-19 pandemic as a predictor of insomnia—A multi-national survey study. Nat. Sci. Sleep. 2021;13:1711–1722. doi: 10.2147/NSS.S327365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bliwise DL, Young TB. The parable of parabola: What the U-shaped curve can and cannot tell us about sleep. Sleep. 2007;30:1614–1615. doi: 10.1093/sleep/30.12.1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Lauderdale DS, Knutson KL, Yan LL, Liu K, Rathouz PJ. Self-reported and measured sleep duration. Epidemiology. 2008;19:838–845. doi: 10.1097/EDE.0b013e318187a7b0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Bianchi MT, Thomas RJ, Westover MB. An open request to epidemiologists: Please stop querying self-reported sleep duration. Sleep Med. 2017;35:92–93. doi: 10.1016/j.sleep.2017.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Schurhoff N, Toborek M. Circadian rhythms in the blood–brain barrier: Impact on neurological disorders and stress responses. Mol. Brain. 2023;16:5. doi: 10.1186/s13041-023-00997-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zavada A, Gordijn MCM, Beersma DGM, Daan S, Roenneberg T. Comparison of the Munich Chronotype Questionnaire with the Horne–Östberg’s morningness–eveningness score. Chronobiol. Int. 2005;22:267–278. doi: 10.1081/CBI-200053536. [DOI] [PubMed] [Google Scholar]
  • 47.Roenneberg T, Daan S, Merrow M. The art of entrainment. J. Biol. Rhythms. 2003;18:183–194. doi: 10.1177/0748730403018003001. [DOI] [PubMed] [Google Scholar]
  • 48.Evanger LN, et al. Later school start time is associated with longer school day sleep duration and less social jetlag among Norwegian high school students: Results from a large-scale, cross-sectional study. J. Sleep Res. 2023;32:e13840. doi: 10.1111/jsr.13840. [DOI] [PubMed] [Google Scholar]
  • 49.St-Onge M-P, et al. Information on bedtimes and wake times improves the relation between self-reported and objective assessments of sleep in adults. J. Clin. Sleep Med. 2019;15:1031–1036. doi: 10.5664/jcsm.7888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ghotbi N, et al. The µMCTQ: An ultra-short version of the Munich ChronoType Questionnaire. J. Biol. Rhythms. 2019 doi: 10.1177/0748730419886986. [DOI] [PubMed] [Google Scholar]
  • 51.Bland JM, Altman DG. Comparing methods of measurement: Why plotting difference against standard method is misleading. Lancet. 1995;346:1085–1087. doi: 10.1016/S0140-6736(95)91748-9. [DOI] [PubMed] [Google Scholar]
  • 52.Soriano JB, Murthy S, Marshall JC, Relan P, Diaz JV. A clinical case definition of post-COVID-19 condition by a Delphi consensus. Lancet Infect. Dis. 2022;22:e102–e107. doi: 10.1016/S1473-3099(21)00703-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information. (355.2KB, docx)

Data Availability Statement

We included all the data needed for the evaluation of the conclusions in the “Results” section or in the Supplementary Information file. Additional data related to this article may be requested from the authors.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES