Skip to main content
Sleep logoLink to Sleep
. 2010 Nov 1;33(11):1539–1549.

Reliability and Validity of the Brief Insomnia Questionnaire in the America Insomnia Survey

Ronald C Kessler 1,, Catherine Coulouvrat 2, Goeran Hajak 2, Matthew D Lakoma 1, Thomas Roth 3, Nancy Sampson 1, Victoria Shahly 1, Alicia Shillington 4, Judith J Stephenson 5, James K Walsh 6, Gary K Zammit 7
PMCID: PMC2954704  PMID: 21102996

Abstract

Study Objectives:

To evaluate the reliability and validity of the Brief Insomnia Questionnaire (BIQ), a fully structured questionnaire developed to diagnose insomnia according to hierarchy-free Diagnostic and Statistical Manual, Fourth Edition, Text Revision (DSM-IV-TR), International Classification of Diseases-10 (ICD-10), and research diagnostic criteria/International Classification of Sleep Disorders-2 (RDC/ICSD-2) general criteria without organic exclusions in the America Insomnia Survey (AIS).

Design:

Probability subsamples of AIS respondents, oversampling BIQ positives, completed short-term test-retest interviews (n = 59) or clinical reappraisal interviews (n = 203) to assess BIQ reliability and validity.

Setting:

The AIS is a large (n = 10,094) epidemiologic survey of the prevalence and correlates of insomnia.

Participants:

Adult subscribers to a national managed healthcare plan.

Intervention:

None

Measurements and Results:

BIQ test-retest correlations were 0.47-0.94 for nature of the sleep problems (initiation, maintenance, nonrestorative sleep [NRS]), 0.72-0.95 for problem frequency, 0.66-0.88 for daytime impairment/distress, and 0.62 for duration of sleep. Good individual-level concordance was found between BIQ diagnoses and diagnoses based on expert interviews for meeting hierarchy-free inclusion criteria for diagnoses in any of the diagnostic systems, with area under the receiver operating characteristic curve (AUC, a measure of classification accuracy insensitive to disorder prevalence) of 0.86 for dichotomous classifications. The AUC increased to 0.94 when symptom-level data were added to generate continuous predicted-probability of diagnosis measures. The AUC was lower for dichotomous classifications based on RDC/ICSD-2 (0.68) and ICD-10 (0.70) than for DSM-IV-TR (0.83) criteria but increased consistently when symptom-level data were added to generate continuous predicted-probability measures of RDC/ICSD-2, ICD-10, and DSM-IV-TR diagnoses (0.92-0.95).

Conclusions:

These results show that the BIQ generates accurate estimates of the prevalence and correlates of hierarchy-free insomnia in the America Insomnia Survey.

Citation:

Kessler RC; Coulouvrat C; Hajak G; Lakoma MD; Roth T; Sampson N; Shahly V; Shillington A; Stephenson JJ; Walsh JK; Zammit GK. Reliability and validity of the brief insomnia questionnaire in the america insomnia survey. SLEEP 2010;33(11):1539-1549.

Keywords: Insomnia, reliability, validity, epidemiology, DSM-IV, ICD-10, ICSD-2, RDC


THIS PAPER PRESENTS DATA ON THE RELIABILITY AND VALIDITY OF A NEW FULLY STRUCTURED MEASURE OF INSOMNIA THAT WAS DEVELOPED FOR USE in the America Insomnia Survey (AIS), a large (n = 10,094) epidemiologic survey of the prevalence and correlates of insomnia in a sample of subscribers to a large, national, managed healthcare plan in the United States. This Brief Insomnia Questionnaire (BIQ) was designed to be administered by an interviewer. It generates insomnia diagnoses according to the inclusion criteria of the Diagnostic and Statistical Manual, Fourth Edition, Text Revision1 (DSM-IV-TR), International Classification of Diseases-102 (ICD-10), and research diagnostic criteria3 (RDC), and International Classification of Sleep Disorders-24 (ICSD-2), and systems but without operationalizing the diagnostic hierarchy rules or organic exclusion rules in these systems. One important aim of the AIS is to compare the prevalence and correlates of insomnia across these diagnostic systems to determine the implications of using one system rather than another. Although a number of self-report scales or fully structured interviews have shown good concordance with clinical diagnoses either in patient samples59 or community samples comparing insomniacs with good sleepers,1012 none of those scales or interviews available generates insomnia diagnoses according to the definitions and general criteria of all the systems of interest to the AIS: DSM-IV-TR, ICD-10, and RDC/ICSD-2. Rather than use multiple existing measures for this purpose, we developed the BIQ to assess insomnia according to all of these diagnostic systems. This was done by developing a set of questions similar to those in other brief screening measures that were judged by an advisory group of experts in sleep medicine to have good face validity in operationalizing each of the relevant criteria in each of the 3 diagnostic systems.

Analysis of the psychometric properties of the BIQ proceeded in 4 steps. First, we investigated short-term test-retest reliability of each questionnaire component in a small (n = 59) probability subsample of AIS respondents, oversampling BIQ positives who were readministered the BIQ 2 days after the AIS interview. The retest interval was purposefully kept short to guarantee that there was very little change in the recall period (the 30 days before interview) and that any lack of concordance in BIQ reports in the test and retest interviews was due to unreliability rather than to true chance. Second, we examined the aggregate consistency of prevalence estimates based on the BIQ with those based on independent, semistructured, clinical reappraisal interviews in a probability subsample of AIS respondents (n = 203). As in the test-retest sample, BIQ positives were oversampled in the clinical reappraisal sample. Third, we examined individual-level concordance of diagnoses based on the BIQ and clinical interviews. Fourth, we explored the possibility of improving concordance by developing prediction equations where BIQ item-level data were added to the diagnostic classifications to predict clinical diagnoses.

METHODS

The Main AIS Sample

The AIS was carried out between October 2008 and July 2009 in a stratified probability sample of 10,094 adult (ages 18 years and older) members of a large (more than 34 million members) national commercial health plan in the United States. The sample was restricted to fully insured members enrolled for at least 12 months to allow medical and pharmacy claims data to be used in substantive analyses. Sample eligibility was also limited to members who provided the plan with a telephone number, could speak English, and had no impairment that limited their ability to be interviewed over the telephone. The sample was selected with stratification to match the United States census population distribution on the cross-classification of age (18-34, 35-49, 50-64, 65-74 and 75+), sex, urbanicity (Census Standard Metropolitan Statistical Areas [SMSA], non-SMSA urbanized areas, and rural areas), and census region (Northeast, South, Midwest, and West).

The AIS was designed to obtain 10,000 completed interviews. Target respondents were sampled in random replicates of 50, which were constructed to be representative of the population on the sociodemographic and geographic selection criteria. Information about previous diagnoses or treatment of sleep problems was ignored in selecting the sample so as to make it representative of all plan subscribers. An introductory letter was sent to target respondents a few days before attempts to make telephone contact began. The letter explained that the survey was being carried out “to better understand how health and health problems affect the daily lives of people,” that respondents were selected randomly, that participation was voluntary, that responses were completely confidential, that participation would not affect healthcare benefits in any way, and that a $20 incentive was being offered for participation. A toll-free number was included for respondent questions and opting out. Once respondents were contacted, verbal informed consent was obtained before beginning interviews. The Human Subjects Committee of the New England Institutional Review Board approved these recruitment, consent, and field procedures.

Up to 20 call attempts were made to reach each target respondent. Disconnected numbers were traced. Forwarded numbers were followed. The interview, which used computer-assisted telephone administration, averaged 39 minutes (with a standard deviation of 6 minutes). Additional replicates were released as earlier ones were completed. All cases in all released replicates were resolved (i.e., an interview, a refusal, or a termination due to reaching the 20-attempt limit), resulting in a total of 10,094 completed interviews. The cooperation rate (the rate of survey completion among target respondents with known working telephone numbers, including respondents who were never reached) was 65.0%. The 10,094 interviews were weighted for residual discrepancies between the joint distribution in the sample and the census population on the cross-classification of the sociodemographic and geographic selection criteria, although these variables were only weakly related to the probability of participation.

The Test-Retest Sample

A probability subsample of 59 AIS respondents was administered a second brief telephone retest interview 2 days after their AIS interview to establish the test-retest reliability of the BIQ symptom questions. This test-retest sample oversampled BIQ positives. The target-sample size (n = 50), which we somewhat exceeded, was selected to yield 0.90 power to detect a Pearson correlation of 0.40. The final retest sample included 23 respondents classified by the initial survey as having insomnia, 12 classified as subthreshold cases (i.e., respondents who reported meeting Criterion A: at least 1 sleep problem, 3 or more days a week 30 minutes a night, 1 month or longer, but who do not meet full criteria), and 24 randomly selected respondents representing all others in the AIS sample (i.e., those who did not meet Criterion A). Respondents in the third of these sampling strata are usually referred to as good sleepers. As noted below, comparisons of respondents screened positive for insomnia only with good sleepers (i.e., excluding respondents in the rather substantial intermediate group of people who have at least some symptoms but do not meet full criteria on the insomnia-screening scale) leads to biased estimated of the accuracy of insomnia screening scales. It is consequently noteworthy that our comparison group is not limited to good sleepers but also includes this intermediate stratum of respondents with some insomnia symptoms. The sample was weighted so that the weighted proportion of respondents in each of the 3 sampling strata was identical to that in the total weighted AIS sample and summed to 100%. This weighting took into consideration the poststratification weight used in the main survey so that significance tests could be made using appropriate design-based methods.

Because the recall period for the assessment of insomnia in both the AIS and the retest survey was the past 30 days, the retest interviews were carried out 2 days after the main survey interviews to guarantee that no meaningful true change in insomnia status occurred in the retest interval, compared with the recall interval. As is typical in short-term test-retest studies, retest respondents were informed that we were testing the instrument and not testing the accuracy of respondent reports and were instructed not to try to remember their responses in the initial interview but, rather, to respond in the way they currently thought provided the most accurate answers to the questions. Respondents targeted for retest interviews received verbal notice of selection at the end of their AIS interview. Verbal informed consent was then obtained before scheduling the retest interview for within the next 2 days. Respondents were paid $20 for participation in the retest interview. The Human Subjects Committee of the New England Institutional Review Board approved these recruitment, consent, and field procedures.

The Clinical Reappraisal Sample

A separate (from the test-retest sample) probability subsample of 203 AIS respondents was administered a telephone, clinical reappraisal interview (described below) within 2 weeks of their AIS interview to assess the accuracy of the insomnia diagnoses in the main interview. The target sample size (n = 200) was selected to yield a standard error of 0.10 in estimating Cohen's κ,13 a standard measure of concordance in clinical reappraisal studies. We slightly exceeded the target sample size. The clinical reappraisal sample oversampled BIQ positives and used the same 3-stratum sampling scheme as in the test-retest sample. The sample included 83 respondents classified in the main survey as having insomnia, 63 subthreshold cases, and 57 respondents representing all others in the AIS sample (i.e., those who did not meet BIQ criteria for either insomnia or subthreshold insomnia).

As in the test-retest survey, respondents targeted for the clinical reappraisal survey received verbal notice of selection at the end of their AIS interview and provided verbal informed consent before scheduling their clinical reappraisal interviews. Each respondent was paid $20 for participation in the clinical reappraisal survey. The Human Subjects Committee the New England Institutional Review Board approved these recruitment, consent, and field procedures. The completed clinical reappraisal interviews were weighted to adjust for the oversampling of BIQ positives. As in the test-retest sample, this weighting took into consideration the poststratification weight used in the main survey so that significance tests could be made using appropriate design-based methods.

The Assessment of Insomnia

The BIQ

The BIQ was designed to assess current insomnia by operationalizing DSM-IV-TR inclusion Criteria A (predominant complaint of difficulty initiating or maintaining sleep or NRS for at least 1 month) and B (the sleep disturbance or associated daytime fatigue causes clinically significant distress or impairment) for a diagnosis of primary insomnia; ICD-10 inclusion Criteria A (complaint of difficulty falling asleep or maintaining sleep or poor quality sleep), B (at least 3 times per week for at least 1 month), C (preoccupation with sleeplessness and excessive concern over consequences), and D (marked distress or interference with activities of daily living) for a diagnosis of non-organic insomnia; and RDC/ICSD-2 inclusion Criteria A (difficulty initiating or maintaining sleep or waking up too early or chronically NRS), B (difficulty occurs despite adequate opportunity and circumstances for sleep), and C (daytime impairment related to the nighttime sleep difficulty) for a diagnosis of insomnia disorder. (The full text of the BIQ and coding rules for diagnoses are available at www.hcp.med.harvard.edu/wmh/affiliated_studies.php. The instrument is in the public domain and can be used by other investigators without restriction.) It should be recalled that RDC and ICSD-2 general criteria for insomnia were developed to be identical, except that the former are intended for research applications and the latter for clinical use.14 That is why we refer to these as defining RDC/ICSD-2 insomnia.

Due to difficulties associated with distinguishing primary insomnia from insomnia due to physical, mental, or other sleep disorders or due to substance or alcohol use,15 no attempt was made to operationalize the diagnostic hierarchy rules in DSM-IV-TR Criteria C and D (the sleep disturbance does not occur exclusively during the course of narcolepsy, breathing-related sleep disorder, circadian rhythm sleep disorder, parasomnia, or another mental disorder) or the organic exclusion rules in DSM-IV Criterion E (the sleep disturbance is not due to the direct physiological effects of a substance or general medical condition) or to specify DSM-IV-TR or RDC/ICSD-2 primary insomnia or ICD-10 non-organic insomnia in terms of their respective subtypes. It is noteworthy in this regard that 1 major revision under consideration for DSM-V is to eliminate the current DSM-IV distinction between primary insomnia and sleep disorders due to another mental disorder or a general medical condition in favor of a unitary diagnosis of insomnia disorder with concurrent specification of clinically comorbid conditions.16 This revision would avoid causal attributions for comorbid conditions, a move consistent with the recommendations of the 2005 National Institutes of Health State of the Science position on the classification of insomnia disorders.17 Importantly, the AIS included fully structured assessments of many of the physical and mental disorders known to be associated with insomnia so that comorbidity could be studied empirically. In addition, medical and pharmacy claims data for the 12 months before the interview were obtained from the health plan for all AIS respondents to allow an investigation of patterns of comorbidity of insomnia with diagnosed and treated comorbid conditions.

The BIQ question series began by asking respondents how many nights out of 7 in a typical week they have problems falling asleep, how many nights they have problems staying asleep throughout the night, how many mornings out of 7 they typically wake up before they want to, and how many mornings they wake up still feeling tired or unrested. Positive responses were followed with questions about how long it usually takes to fall asleep on nights with a problem falling asleep, how much time they usually spend awake at night on nights they have trouble sleeping, how many times per night they usually wake up during those nights, how long it usually takes them to get back to sleep once they wake up at night, and how much earlier than they wished do they awaken in the morning when they awaken early. Respondents who reported NRS were asked to rate the severity of their problem waking up feeling tired or unrested using the response options mild, moderate, severe, and very severe.

Respondents with sleep problems were then asked how many weeks, months or years these problems had been going on in order to operationalize the 1-month duration requirement in DSM-IV-TR and ICD-10. They were also asked 2 questions about adequate opportunity to sleep prefaced with the preamble “(t)he next questions are about how much your sleep problems are caused by the place you sleep being too light, too noisy, too hot or cold, or uncomfortable.” The first question was: “How much do you think your sleep problems are caused by problems with the place you sleep—would you say not at all, a little, some, a lot, or totally?” The second question was: “Some people have sleep problems because they either have to get up very early, stay up late, or get up in the night because of their job or because of having a baby or a sick person who needs their help. How much do you think your sleep problems are caused by these kinds of demands on your time—would you say not at all, a little, some, a lot, or totally?”

Respondents with sleep problems were then asked 16 questions about daytime distress and impairment. The first 8 of these questions asked how much difficulty respondents had because of their sleep problems over the past 30 days in each of the following areas: reduced motivation; performance at work, school, or social activities; making errors or having accidents; irritability, nerves, or mood disturbance; daytime attention, concentration, or memory problems; daytime fatigue; daytime sleepiness; tension headaches; or digestive problems. The response options were none, mild, moderate, or severe difficulty. The next 4 distress-impairment questions were a modified version of the Sheehan Disability Scales18 that asked respondents to rate the extent to which their sleep problems interfered with their daily activities during the past 4 weeks using a 0-to-10scale, where 0 means no interference and 10 means very severe interference. The 4 areas of role functioning were: “your home management, like cleaning, shopping, and taking care of your home; your ability to work; your social life; and your close personal relationships.” Respondents were reminded of the anchors before answering each question and were also instructed that they could use any number between 0 and 10 to answer.

The next 2 distress-impairment questions asked about days out of role due to sleep problems: “About how many days out of 30 in the past month were you totally unable to work or carry out your other usual daily activities because of problems with your sleep? About how many days out of 365 in the past year were you totally unable to work or carry out your other usual daily activities because of problems with your sleep?” The final 2 distress-impairment questions asked respondents how much concern or worry they had about their sleep (response options: none, mild, moderate, and severe) and how worried or distressed they were about their sleep problems (response options: not at all, a little, some, much, and very much).

A variety of coding schemes were used to combine BIQ question responses to generate diagnoses. The final scheme defined DSM-IV-TR Criterion A, ICD-10 Criteria A and B, and RDC/ICSD-2 Criterion A as requiring at least 30 days of problems initiating sleep 3 or more nights a week, with an average of 30 or more minutes to fall asleep at night; problems staying asleep 3 or more nights a week, with an average of 30 minutes of being awake; waking 3 or more times a night 3 or more nights a week; waking too early 3 or more nights a week, with an average of 30 or more minutes too early; or NRS with at least moderate severity 3 or more nights a week. RDC/ICSD-2 Criterion B was defined as requiring the respondent to not report that their sleep problems were caused a lot or totally by problems with the place they sleep and that the problems were not caused a lot or totally by demands on their time that required them to sleep irregularly. DSM-IV-TR Criterion B, ICD-10 Criterion D, and RDC/ICSD-2 Criterion B were defined as requiring endorsement of at least 2 (1 in the case of RDC/ICSD-2) of the distress-impairment questions with responses of at least moderate severity to the first 8 questions, 7 to 10 on items of the Sheehan Disability Scales, and either at least moderate concern or worry about sleep or much or very much worry or distress about sleep. The latter 2 items were also used to define ICD-10 Criterion C.

The semistructured assessment in the clinical reappraisal survey

As noted above, the clinical reappraisal interviews were administered over the telephone. Telephone administration is now widely accepted in clinical reappraisal studies of psychiatric disorders based on evidence of validity comparable with in-person administration.1921 The semistructured research diagnostic interview schedule was developed specifically for this project. It included DSM, ICD, and RDC/ICSD-2 symptom checklists, scripted initial questions, suggested probes, spaces to record notes, and interviewer-based rating categories of definite, probable, possible, and no for each rated symptom. Clinical interviewers were blinded to respondents' reports on the AIS when conducting clinical reappraisal interviews.

The seven 7 clinical interviewers were all PhD-level licensed psychologists with a specialization in the diagnosis and treatment of insomnia and between 6 and 20+ years of clinical experience diagnosing and treating sleep disorders. Five of the seven 7 had a Certificate in Behavioral Sleep Medicine from the American Academy of Sleep Medicine. One of the remaining 2 was both a PhD-level clinical psychologist and a registered polysomnographic technologist. The final interviewer was a PhD-level clinical psychologist who specialized in cognitive behavioral therapy for sleep disorders.

Interviewer training took place over 2 sessions that focused on review of the data-collection protocol, discussion of diagnostic criteria, and practice in administration and scoring of the semistructured clinical interview. The supervisor was a PhD-level clinical psychologist with extensive experience administering and supervising research diagnostic interviews of sleep disorders. The supervisor also provided 1-on-1feedback to interviewers in practice and production interviews based on review of digitally recorded clinical interviews and hard-copy interviews. The supervisor independently scored 15% of these interviews. Symptom-level supervisor-interviewer interrater agreement less than κ = 0.85 prompted immediate contact of the interviewer to resolve ratings differences. Aggregate diagnosis-level supervisor-interviewer interrater agreement (κ) before reconciliation across all the dually rated interviews in this random 15% sample of clinical interviews was 0.95 for diagnoses based on DSM, 1.0 for ICD, 0.86 for RDC/ISCD-2, and 0.95 for any criteria.

Initial inspection of associations between AIS and clinical interview reports found that a number of respondents denied having any sleep problems in the clinical reappraisal interviews despite reporting sleep problems in the AIS. To address this issue of respondent-reporting inconsistency, reconciliation interviews were carried out by the clinical interviewer supervisor with 16 respondents classified as meeting criteria for insomnia according to at least 1 of the diagnostic systems in the AIS and either not meeting criteria (n = 9) or continuing to meet criteria (n = 7) in the clinical reappraisal survey. Respondents were reminded of their main survey responses at the beginning of these reconciliation interviews. Clinical probes were then used to judge whether or not these symptoms met criteria for insomnia in each of the diagnostic systems, with the supervisor blinded as to the results of the first clinical reappraisal interview. Final diagnoses of these discrepant cases were based on LEAD standard procedures22 that used the information in both of the 2 clinical interviews. The method of multiple imputation23 was used to impute final clinical diagnoses based on the results of the reconciliation interviews to discrepant cases when reconciliation interviews could not be carried out. Multiple imputation adjusts standard errors of descriptive statistics to account for the increases in measurement error introduced by imputed missing values.

Analysis Methods

After weighting the test-retest sample data to adjust for the oversampling of BIQ positives, we investigated the stability of the responses at both the aggregate and individual levels. At the aggregate level, we compared test and retest interview means. At the individual level, we calculated Pearson correlations between reports made in the 2 interviews.

After weighting the clinical reappraisal sample data to adjust for the oversampling of BIQ positives, we investigated concordance between diagnoses based on the BIQ and the clinical interviews at both the aggregate and individual levels. At the aggregate level, we compared prevalence estimates based on the 2 methods using McNemar χ2 tests. Individual-level diagnostic concordance was then evaluated using 2 different descriptive measures, the area under the receiver operating characteristic curve (AUC)24 and κ. Although κ is the most widely used measure of concordance in validity studies of psychiatric disorders, it has been criticized because it is dependent on prevalence and, consequently, is often low in situations in which there appears to be high agreement between low-prevalence measures.25 An important implication is that κ varies across populations that differ in prevalence even when the populations do not differ in sensitivity (SN; the percentage of true cases correctly classified by the AIS) or specificity (SP; the percentage of true noncases correctly classified). Because SN and SP are considered to be fundamental parameters, this means that the comparison of κ across different populations cannot be used to evaluate the cross-population performance of a test.

Critics of κ prefer to assess concordance with measures that are a function of SN and SP. The odds ratio (OR) meets this requirement, as OR is equal to [SN × SP]/[(1-SN) × (1 - SP)].26 However, the upper end of the OR is unbounded, making it difficult to interpret the OR as a measure of concordance. Yule's Q has been proposed as an alternative measure to resolve this problem,27 as Q is a bounded transformation of OR [Q = (OR - 1)/(OR + 1)] that ranges between −1 and +1. Q can be interpreted as the difference in the probabilities of a randomly selected clinical case and a randomly selected clinical noncase that differ in their classification in the clinical interview being correctly versus incorrectly classified by the BIQ. The difficulty with Q is that “tied pairs” (i.e., clinical cases and noncases that have the same BIQ classification) are excluded, which means that Q does not tell us about actual prediction accuracy.

AUC is a measure that resolves this problem, as AUC can be interpreted as the probability that a randomly selected clinical case will score higher on the AIS than will a randomly selected noncase. Although originally developed to study the association between a continuous predictor and a dichotomous outcome, AUC can be used in the special case in which the predictor is a dichotomy, in which case AUC equals (SN + SP)/2. As a result of this useful interpretation, we focus on AUC in our evaluation of concordance between diagnoses based on the AIS and the clinical interviews. We also report SN and SP, the key components of AUC in the dichotomous case, as well as positive predictive value (PPV; the proportion of BIQ cases confirmed by the clinical interviews), negative predictive value (NPV; the proportion of BIQ noncases confirmed as noncases by the clinical interviews), and κ.

We next estimated a series of stepwise logistic-regression equations, in which clinical diagnoses were treated as dichotomous outcomes and BIQ symptom variables were included along with BIQ diagnoses as predictors, to determine whether BIQ symptom-level data improve the prediction of clinical diagnoses, compared with prediction based on BIQ diagnoses. Significant improvement of this sort can be used to generate predicted probabilities of clinical diagnoses for each survey respondent not in the clinical reappraisal sample. As discussed in more detail elsewhere,28 diagnostic imputations based on these predicted probabilities can then be used to make estimates of the prevalence and correlates of clinical diagnoses in the full sample so as to incorporate the analysis of validity into substantive investigations. For example, it is possible in this way to carry out parallel analyses of the extent to which the correlates of predicted clinical diagnoses differ from the correlates of BIQ diagnoses.

The AUC was the statistic used to describe these improvements. As noted above, the AUC is typically used with a dimensional predictor and a dichotomous outcome. As a result, it is a simple matter to think of the AUC as the association between a predicted probability of a dichotomous outcome, in our case based on a prediction either from the dichotomous BIQ case classification or from a logistic-regression equation containing both BIQ diagnoses and symptom measures as predictors, and the observed classification on the outcome. This makes it possible to evaluate the extent to which AUC increases as more complex predictors are added to an equation over and above the initial BIQ dichotomous diagnostic classification.

Statistical significance was consistently evaluated in all of the above analyses using 0.05-level 2-sided tests. The Taylor series linearization method29 was used to adjust estimates of statistical significance for the effects of weighting. All the standard errors reported here are design-based estimates that were obtained using the Taylor series method.

RESULTS

Test-Retest Reliability

Good short-term (2 days) test-retest reliability was found for most core BIQ items. Test-retest Pearson correlations are in the range 0.72 to 0.95 for reports about numbers of nights in a typical week respondents have problems falling asleep, staying asleep, waking too early, and feeling unrested after a full night's sleep (Table 1). Correlations are also quite high for reports of how long it takes to fall asleep (0.75), amount of time awake at night (0.80), and average number of nighttime awakenings (0.94). Correlations for assessments of daytime impairment and distress due to sleep problems are in the range 0.71 to 0.88.

Table 1.

Short-term (2 days) test-retest reliability of continuous Brief Insomnia Questionnaire component measures of sleep problems in the weighted America Insomnia Survey test-retest sample of 59 adultsa

I. Number of nights per week Time 1
Time 2
Pearson r
Mean SD Mean SD
    Problems falling asleep 0.9 (1.6) 1.0 (1.8) 0.87c
    Waking in the night 1.3 (2.3) 1.2 (2.2) 0.95c
    Waking too early in the morning 2.4 (2.5) 2.2 (2.3) 0.72c
    Nonrestorative sleep 3.3b (2.2) 2.8 (2.4) 0.80c
II. Severity on nights of occurrence
    Sleep latency, min 26.0 (43.6) 27.1 (43.7) 0.75c
    Amount of time awake at night, min 29.0 (63.2) 19.1 (41.5) 0.80b
    Number of awakenings per night 0.8b (1.6) 0.7 (1.3) 0.94c
    Earliness of waking in morning, min 26.9 (46.2) 18.8 (26.6) 0.51b
    Severity of nonrestorative sleep 1.1 (0.6) 1.0 (0.6) 0.47c
III. Daytime impairment/distress
    Difficulties caused by sleep problemsd 1.5b (0.5) 1.4 (0.4) 0.87c
    Sheehan Disabilities Scalese 0.9b (1.3) 0.6 (1.0) 0.88c
    Days out of rolef 0.2 (1.0) 0.1 (0.6) 0.66c
    Concern/worry/distress about sleepg −0.1 (0.8) −0.2 (0.9) 0.71 c
IV. Duration of sleep problems
    Number of months 53.8 (89.6) 42.5 (57.8) 0.62c
a

Significant within-individual difference in scores in the test and retest interviews based on a 0.05-level 2-sided test.

b

Significant correlations between test and retest scores based on a 0.05-level, 2-sided test.

c

The data were weighted to adjust for the oversampling of respondents classified in the Brief Insomnia Questionnaire as meeting criteria for insomnia or subthreshold insomnia.

d

Mean responses to 8 questions about how much difficulty respondents had because of their sleep problems over the past 30 days in reduced motivation; performance at work, school, or social activities; making errors or having accidents; irritability, nerves, or mood disturbance; daytime attention, concentration, or memory problems; daytime fatigue; daytime sleepiness; tension headaches; or digestive problems. Response options were 1-4 (none, mild, moderate, severe).

e

Mean responses to 4 questions about how much sleep problems interfered with daily activities during the past 4 weeks in home management, ability to work, social life, and close personal relationships. Response options were 0-10 (none to very severe interference).

f

Days out of role (0-30) due to sleep problems in the past month.

g

Standardized (to mean 0 and standard deviation 1.0) responses to 2 questions about concern/worry (none, mild, moderate, and severe) and worry/distress (not at all, a little, some, much, and very much) about sleep problems.

Four of the BIQ symptom reports have test-retest correlates lower than the 0.70, which is generally considered the minimum acceptable level.30 One of these 4, NRS (0.47), is unstable due to only 7 retest respondents reporting NRS. Two others, amount of time waking too early in the morning (0.51) and number of days out of role due to sleep problems in the past month (0.66), are influenced by wide observed response ranges. When responses are dichotomized (0-29 vs 30+ minutes waking too early and 0 vs 1+ days out of role), the test-retest tetrachoric correlations become much higher (0.86 for waking too early; 0.84 for days out of role). The final symptom report with an unacceptably low test-retest correlation, number of months with sleep problems (0.62), is due to instability in the reports of respondents with subthreshold sleep problems. The correlation is much higher (0.81) among respondents who reported in the AIS that their sleep problems occur at least 3 nights a week.

Mean values of BIQ symptom reports are generally similar for the test and retest interviews, although the within-person change scores are statistically significant for 4 items (number of nights of NRS, number of awakenings per night, difficulties caused by sleep problems, and the Sheehan Disability Scales), all with higher means in the test than in the retest. This indicates that the test-retest correlations are likely to be downwardly biased by effects of repeat administration on reduced reports.

Concordance of Diagnoses Based on the BIQ and Clinical Interviews

Prevalence estimates based on the BIQ were compared with those based on clinical interviews in the clinical reappraisal sample for each of the diagnostic systems. McNemar tests show that the prevalence estimates differ significantly for DSM-IV-TR, RDC/ICSD-2, and any diagnosis but not for ICD-10 diagnoses (Table 2). Prevalence estimates based on the BIQ were lower than those based on the clinical interviews, resulting in PPV consistently being higher than SN.

Table 2.

Consistency of diagnoses based on the Brief Insomnia Questionnaire with diagnoses based on clinical interviews in the weighted America Insomnia Scale clinical reappraisal sample of 203 adultsa

Criteria
DSM-IV-TR ICD-10 RDC/ICSD-2 Any
McNemar χ2 testa 760.2b 0.0 560.0b 661.4b
κ 0.72 (0.06) 0.40 (0.12) 0.42 (0.06) 0.77 (0.06)
AUC 0.83 (0.05) 0.70 (0.13) 0.68 (0.05) 0.86 (0.05)
OR (95% CI) 141.6 (114.7-174.8) 31.0 (24.4-39.4) 12.0 (10.0-13.7) 229.1 (181.1-289.7)
SN 67.6 (5.3) 42.5 (13.6) 41.7 (5.9) 72.6 (5.1)
SP 98.5 (0.8) 97.7 (0.9) 94.4 (1.9) 98.9 (0.7)
PPV 95.5 (2.2) 42.7 (11.9) 70.9 (6.2) 96.7 (1.9)
NPV 86.9 (3.8) 97.7 (1.1) 83.1 (4.4) 88.7 (3.5)

Data are shown as mean (SEM), except odds ratio (OR) and 95% confidence interval of the OR (95% CI).

DSM-IV-TR refers to Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; RDC/ICSD-2, research diagnostic criteria and International Classification of Sleep Disorders-2; AUC, area under the receiver operating characteristic curve; SN, sensitivity; SP, specificity; PPV, positive predictive value; NPV, negative predictive value.

a

The data were weighted to adjust for the oversampling of respondents classified in the fully structured America Insomnia Survey interviews as meeting criteria for insomnia or subthreshold insomnia

b

Significant at the 0.05 level, 2-sided test.

Using established descriptors for the values of κ,31 individual-level concordance between diagnoses based on the BIQ and the clinical interviews is substantial (κ in the range 0.61-0.80; AUC in the range 0.81-0.90) for diagnoses based on DSM-IV-TR and any criteria, fair to moderate (κ in the range 0.41-0.60; AUC in the range 0.61-0.70) for diagnoses based on RDC/ICSD-2 criteria, and fair (κ in the range 0.20-0.40; AUC in the range 0.61-0.70) for diagnoses based on ICD-10 criteria. The majority of clinical cases are detected by the BIQ (SN) for DSM-IV-TR and any criteria (67.6%-72.6%) but not by RDC/ICSD-2 or ICD-10 criteria (41.7%-42.5%). The vast majority of BIQ cases are confirmed by the clinical interviews (PPV) using DSM-IV-TR and any criteria (95.5%-96.7%), compared with smaller proportions using RDC/ICSD-2 (70.9%) and ICD-10 (42.7%) criteria. As might be expected given the BIQ underestimation of prevalence, the vast majority of clinical noncases are classified accurately by the BIQ (SP) for all diagnostic systems (94.4%-98.9%). In a similar way, the proportions of BIQ noncases confirmed by the clinical interviews (NPV) are consistently high (83.1%-97.7%).

Bias Introduced by Using Super-Normal Controls

As noted in the introduction, a number of previously developed self-report insomnia scales were validated in community samples by comparing the scale scores of confirmed insomniacs with those of good sleepers.1012 This approach excludes people who have some evidence of sleep problems but do not meet full criteria for insomnia in a clinical interview. For example, the validity of a short insomnia questionnaire known as the Sleep Disorders Questionnaire (SDQ)32 was evaluated in a sample of college students who were recruited based on SDQ scores obtained in a classroom screening exercise.12 Respondents were divided into 3 mutually exclusive and collectively exhaustive subgroups based on these screening scores: those who did not complain of any sleep problems (A), those who reported sleep problems but did not meet diagnostic criteria of the SDQ for insomnia (B), and those who met diagnostic criteria for insomnia in the SDQ (C). An accredited expert in sleep medicine then carried out clinical interviews to evaluate the accuracy of diagnoses based on the SDQ among respondents in subgroups A and C. Importantly, respondents in subgroup B were excluded. Control groups of this sort, which exclude people with subthreshold evidence of the syndrome under study, are referred to in the methodologic literature as super-normal controls.33

Super-normal control groups are often favored in studies designed to detect risk factors that have subtle effects specific to particular disorders.34,35 However, this usually introduces upward bias in the estimate of AUC in validity studies because the increase in SN (due to the omitted true positives all being screened negatives) is typically greater than the decrease in SP (due to the omitted true negatives all being screened negatives) because the proportion of all true positives who are in subgroup B is usually greater than the proportion of all true negatives who are in subgroup B, and AUC is the average of SN and SP.36

It is important to note that the sampling strategy used in the AIS clinical reappraisal study is different in an important way from the super-normal control approach in that all AIS respondents had a non-0 probability of selection into the clinical reappraisal sample. As in the super-normal control design, we began with 3 sampling strata: respondents who reported no sleep problems (A), respondents who reported some sleep problems who did not meet the caseness criteria (B), and BIQ cases (C). Unlike the situation with super-normal sampling, in which only respondents in strata A and C are sampled, we included respondents in all 3 strata in the clinical reappraisal sample. Even though respondents in strata B and C were oversampled, the data were weighted to adjust for this oversampling prior to carrying out analysis. The weighted data had the same sample proportions in the 3 strata as in the full AIS sample.

The results reported here for concordance of BIQ diagnoses with clinical diagnoses will inevitably be compared with the results of published validations of other insomnia measures in community samples even though the latter studies usually used super-normal controls. Based on this realization, we examined the effect of using super-normal controls on estimates of BIQ validity by replicating the validity analyses reported in the last section after excluding from the clinical reappraisal sample respondents who reported some sleep problems (2 or more days a week for 1 month or longer) but failed to meet full criteria for insomnia in the BIQ. Results show clearly that concordance estimates are substantially inflated in this way. (Table 3); κ increases from 0.40 to 0.77 in the full sample to 0.53 to 0.91. AUC increases from 0.68 to 0.86 in the full sample to 0.84 to 0.95. These increases are due to consistently larger increases in SN (from 41.7-72.6 to 93.7-100.0) than decreases in SP (from 94.4-98.9 to 73.6-93.8) when noncases with sleep problems are excluded from consideration.

Table 3.

The effect of using super-normal controls versus normal controls in the evaluation of consistency of diagnoses based on the Brief Insomnia Questionnaire with diagnoses based on clinical interviews in the weighted America Insomnia Scale clinical reappraisal samplea

Cohen's κ Criteria
DSM-IV-TR ICD-10 RDC/ICSD-2 Any
    SNC 0.89 (0.04) 0.53 (0.16) 0.64 (0.08) 0.91 (0.04)
    NC 0.72 (0.06) 0.40 (0.12) 0.42 (0.06) 0.77 (0.06)
Area under the receiver operating characteristic curve
    SNC 0.95 (0.11) 0.92 (0.23) 0.84 (0.07) 0.95 (0.13)
    NC 0.83 (0.05) 0.70 (0.13) 0.68 (0.05) 0.86 (0.05)
Sensitivity
    SNC 96.8 (2.2) 100.0 (0.0) 93.7 (4.4) 97.0 (2.1)
    NC 67.6 (5.3) 42.5 (13.6) 41.7 (5.9) 72.6 (5.1)
Specifity
    SNC 92.2 (3.9) 84.9 (4.8) 73.6 (6.1) 93.8 (3.6)
    NC 98.5 (0.8) 97.7 (0.9) 94.4 (1.9) 98.9 (0.7)
Positive predictive value
    SNC 95.5 (2.2) 42.7 (12.0) 70.9 (6.2) 96.7 (1.9)
    NC 95.5 (2.2) 42.7 (11.9) 70.9 (6.2) 96.7 (1.9)
Negative predictive value
    SNC 94.4 (3.9) 100.0 (0.0) 94.4 (3.9) 94.4 (3.9)
    NC 86.9 (3.8) 97.7 (1.1) 83.1 (4.4) 88.7 (3.5)
    (n)b (113) (52) (91) (117)

DSM-IV-TR refers to Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; RDC/ICSD-2, research diagnostic criteria and International Classification of Sleep Disorders-2; SNC, results based on the analysis of Brief Insomnia Questionnaire (BIQ) cases compared to super-normal controls; NC, results based on the full sample, which includes normal controls.

a

Data are displayed as mean (SEM). The data were weighted in exactly the same way as in Table 2, but the sample excluded respondents who reported sleep problems at least 2 nights a week for at least 1 month in the BIQ who did not meet diagnostic criteria for insomnia in the BIQ according to the diagnostic system specified in the row heading. This exclusion resulted in the BIQ noncases consisting exclusively of BIQ SNC.

b

The sample sizes are those for the analyses based on the SNC data. The sample size for analyses based on the data from the NC data is consistently n = 203 across outcomes. Sample size varies across outcomes for the SNC data because the excluded cases were confined to respondents who did not meet criteria for insomnia according to the diagnostic system specified in the specific row heading.

Concordance Based on Continuous Classifications Using BIQ Symptom Data

Stepwise logistic-regression analysis was used to select BIQ symptom questions that significantly predict clinical diagnoses after controlling for dichotomous BIQ diagnoses. Each respondent in the clinical reappraisal sample was then assigned a predicted probability of each clinical diagnosis based on the resulting logistic-regression equations (Table 4). The AUC for these predicted probability measures predicting clinical diagnoses were substantially higher than the AUC for the dichotomous BIQ diagnostic classifications in predicting clinical diagnoses based on RDC/ICSD-2 (0.92 vs 0.68) and ICD-10 (0.95 vs 0.70) criteria and somewhat higher in predicting diagnoses based on DSM-IV-TR (0.92 vs 0.83) and any (0.94 vs 0.86) criteria.

Table 4.

Comparisons of AUC in predicting clinical diagnoses based on the dichotomous Brief Insomnia Questionnaire classification and of continuous predicted probabilities of clinical diagnoses based on Brief Insomnia Questionnaire item-level data in the clinical reappraisal sample of 203 adults

Criteria AUCa
Dichotomous Continuous
    DSM-IV-TR 0.83 (0.05) 0.92 (0.06)
    ICD-10 0.70 (0.13) 0.95 (0.19)
    RDC/ICSD-2 0.68 (0.05) 0.92 (0.06)
    Any 0.86 (0.05) 0.94 (0.06)

DSM-IV-TR refers to Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; RDC/ICSD-2, research diagnostic criteria and International Classification of Sleep Disorders-2.

a

Dichotomous area under the receiver operating characteristic curve (AUC) values are for the dichotomous Brief Insomnia Questionnaire (BIQ) diagnostic classifications; Continuous AUC values are for the continuous predicted probabilities of clinical diagnoses derived from logistic regression equations using BIQ diagnoses and BIQ item-level data to predict clinical diagnoses. Data are displayed as mean (SEM).

The improved predictions of diagnoses using the continuous rather than dichotomous classifications is due to the fact that dichotomous classifications do not take into consideration the fact that some respondents are closer to the diagnostic threshold than are other respondents. This information is taken into account in calculating predicted probabilities of diagnoses, resulting in ROC curves with good discrimination between cases and controls throughout the range of the distributions (Figure 1). In addition to the values of AUC being higher, bias in prevalence estimates is removed when predicted probabilities of clinical diagnoses are used instead of diagnoses based on dichotomous BIQ disorder classifications.

Figure 1.

Figure 1

Receiver operating characteristic curves for America Insomnia Survey-based predicted probabilities of insomnia based on different diagnostic systems in the clinical reappraisal sample of 203 adults. DSM-IV-TR refers to Diagnostic and Statistical Manual, Fourth Edition, Text Revision; ICD-10, International Classification of Diseases-10; research diagnostic criteria and International Classification of Sleep Disorders-2, RDC/ICSD-2.

DISCUSSION

Unlike other sleep conditions, such as breathing-related and circadian rhythm disorders, that are confirmable using relatively objective methods, chronic or transient insomnia is primarily diagnosed by self-reported sleep history obtained during clinical interview.37 Indeed, American Academy of Sleep Medicine best-practice guidelines caution against diagnosing insomnia using objective indices such as polysomnography or actigraphy because of substantial night-to-night variability in sleep symptoms and “first-night” phenomena. The Academy recommends instead that routine insomnia diagnoses be grounded in self-reported sleep history.38 This inherently subjective nature of insomnia presumably contributes to the fact that test-retest reliability is generally found to be lower for insomnia than other sleep complaints in psychometric studies,6 as well as for the fact that reports of more qualitative symptoms of insomnia, such as the restorative value of sleep and daytime functioning, are usually found to be less stable in test-retest studies than are more quantitative reports of sleep-problem frequency and duration.5,39 Our results regarding the short-term test-retest reliability of BIQ symptoms are not entirely consistent with the latter results, though, as the test-retest correlations for BIQ reports of daytime impairment (0.87-0.88) are as high as those for reports of more quantitative symptoms. Indeed, the majority of BIQ items have good to excellent test-retest reliability. The few cases in which test-retest reliability is unacceptably low, as noted in the results section, are due either to sparse data, sensitivity at the upper end of a wide response range, or response inconsistency among respondents with subthreshold symptoms. We also found that mean values of some items were significantly lower on retest, which suggests that the test-retest paradigm underestimates true BIQ reliability.

As noted in the introduction, only limited psychometric information is available on the validity of other fully structured insomnia scales. Much of this information is based on patient samples,59 which cannot be compared with the results in the current report because the severity of insomnia is almost certainly greater in clinical than in community samples. Several other studies have evaluated the validity of various fully structured insomnia scales in community samples, compared with diagnoses based on clinical interviews.1012 Estimates of AUC in those studies were in the range 0.81 to 1.0. However, all of those studies used super-normal control groups. As shown in our comparison of validity statistics based on the full BIQ clinical reappraisal sample and the subsample that excluded BIQ noncases who reported sleep problems, the use of super-normal controls inflates estimates of diagnostic concordance. The exact magnitude of this inflation in previous studies cannot be determined because it depends on how extreme the exclusion criteria were in defining super-normal subjects. As a result, there is no principled way to compare the validity estimates in earlier studies with those in the current report. Nor is it legitimate to compare the estimates in earlier reports with the BIQ estimates based on comparisons with super-normal control subjects because we have no way of knowing if the exclusion rules used in our sampling scheme were comparable with those used in earlier studies.

While these issues of noncomparability make it impossible to evaluate the comparative performance of the BIQ versus other fully structured insomnia measures such as the Pittsburgh Sleep Quality Index5 or SDQ,32 our results show clearly that diagnoses based on dichotomous BIQ classifications have substantial concordance with clinical diagnoses of insomnia based on DSM-IV-TR and any criteria, fair to moderate concordance with clinical diagnoses based on RDC/ICSD-2 criteria, and fair concordance with diagnoses based on ICD-10 criteria. In addition, we found that probability-of-diagnosis measures based on BIQ item-level data have consistently excellent AUCs (0.92-0.95) across all diagnostic systems. This result is important because, unlike the situation in clinical practice, there is no need in epidemiologic studies to classify individual respondents dichotomously as either definite cases or definite noncases.

A limitation of the current study is that the survey cooperation rate was only 65.0%. This low rate might indicate the existence of bias in the sample composition, which could distort estimates of instrument validity. Another limitation is that the clinical reappraisal interviews were carried out over the telephone. Although telephone administration has been shown to be a valid mode of clinical assessment for other DSM disorders,1921 it is conceivable that the accuracy of clinical diagnoses would have been improved by in-person assessment. If so, concordance between diagnoses based on the BIQ and those based on clinical interviews would presumably have been higher than reported are here. A related limitation is that the accuracy of clinical diagnoses might have been improved by including sleep diaries, polysomnography, or other additional information to supplement clinical interviews. This limitation highlights the fact that, even though we used the word validation to characterize the current investigation, the fact that the clinical diagnoses cannot be taken as perfectly valid means that the estimates of concordance reported here should be considered lower-bound estimates on the validity of the BIQ. Even if accepted on face value, rather than as lower-bound estimates, though, the results demonstrate that the BIQ accurately estimates the prevalence and correlates of insomnia in the AIS.

DISCLOSURE STATEMENT

Sanofi-Aventis provided support for this study. Dr. Kessler has been a consultant or a member of an advisory board for Eli Lilly, GlaxoSmithKline, Kaiser Permanente, Pfizer, Sanofi-Aventis, Shire Pharmaceuticals, and Wyeth-Ayerst and has had research support for his epidemiologic studies from Bristol-Myers Squibb, Eli Lilly, GlaxoSmithKline, Johnson & Johnson Pharmaceuticals, Ortho-McNeil Pharmaceuticals, Pfizer, and Sanofi-Aventis. Dr. Coulouvrat is an employee of Sanofi-Aventis. Dr. Hajak has been a consultant or a member of an advisory board for Actelion, Affectis, Astellas, Astra-Zeneca, Bayer Vital, Bristol-Meyers Squibb, Boehringer Ingelheim, Cephalon, Essex, Gerson Lerman Group Council of Healthcare Advisors, GlaxoSmithKline, Janssen-Cilag, Lundbeck, McKinsey, MedaCorp, Merck, Merz, Mundipharm, Network of Advisors, Neurim, Neurocrine, Novartis, Organon, Orphan, Pfizer, Pharmacia, Proctor & Gamble, Purdue, Sanofi-Aventis, Schering-Plough, Sepracor, Servier, Takeda, Transcept, and Wyeth; has participated in speaking engagements for Actelion, Astra-Zeneca, Bayer Vital, Bristol-Meyers Squibb, Boehringer Ingelheim, Cephalon, EuMeCom, Essex, GlaxoSmithKline, Janssen-Cilag, Eli Lilly, Lundbeck, Merck, Merz, Neurim, Novartis, Organon, Pfizer, Pharmacia, Sanofi-Aventis, Schering-Plough, Servier, Takeda, Transcept, and Wyeth; and has received research support from Actelion, Affectis, Astra-Zeneca, BrainLab, Daimler Benz, Essex, GlaxoSmithKline, Lundbeck, Neurim, NeuroBiotec, Neurocrine, Novartis, Organon, Sanofi-Aventis, Schwarz, Sepracor, Takeda, UCB, Volkswagen, Weinmann, and Wyeth. Mr. Lakoma is an employee of the Department of Health Care Policy at Harvard Medical School. His group has received research funding from Pfizer, Sanofi Aventis, Shire Development, and Janssen Pharmceutica, N.V. Lakoma has no financial interest in these organizations. Dr. Roth has served as a consultant for Abbott, Accadia, Acogolix, Acorda, Actelion, Addrenex, Alchemers, Alza, Ancel, Arena, AstraZeneca, Aventis, AVER, Bayer, BMS, BTG, Cephalon, Cypress, Dove, Eisai, Elan, Eli Lilly, Evotec, Forest, GlaxoSmithKline, Hypnion, Impax, Intec, Intra-Cellular, Jazz, Johnson and Johnson, King, Lundbeck, McNeil, MediciNova, Merck, Neurim, Neurocrine, Neurogen, Novartis, Orexo, Organon, Otsuka, Prestwick, Proctor and Gamble, Pfizer, Purdue, Resteva, Roche, Sanofi, Schoering-Plough, Sepracor, Servier, Shire, Somaxon, Syrex, Takeda, TransOral, Vanda, Vivometrics, Wyeth, Yamanuchi, and Xenoport; has participated in speaking engagements for Cephalon, Sanofi, and Sepracor; and has received research support from Aventis, Cephalon, GlaxoSmithKline, Merck, Neurocrine, Pfizer, Sanofi, Schoering-Plough, Sepracor, Somaxon, Syrex, Takeda, TransOral, Wyeth, and Xenoport. Ms. Sampson is an employee of the Department of Health Care Policy at Harvard Medical School. Her group has received research funding from Pfizer, Sanofi-Aventis, Shire Development, Inc., and Janssen Pharmceutica, N.V. Sampson has no financial interest in these organizations. Dr. Shillington is employee of a company (Epi-Q) that has received funding from Astra-Zeneca, Pfizer, Cephalon, Daichii Samkyo, Takeda, Biogen, Sanofi-Aventis, Abbot Laboratories, Merck, Novartis, Shire, Affymax, and Adolor. Her compensation is limited to her salary. She owns stock in Epi-Q. Ms. Stephenson is an employee of HealthCore, Inc., a research and consulting organization. All of her research activities are industry sponsored. Dr. Walsh has consulted for Pfizer, Sanofi-Aventis, Cephalon, Schering-Plough/Organon, Neurocrine, Takeda America, Actelion, Sepracor, Jazz, Respironics, Transcept, Neurogen, GlaxoSmithKline, Somaxon, Eli Lilly, Evotec, Merck, Kingsdown, Vanda, Ventus, and Somnus. Research support has been provided to his institution by Pfizer, Merck, Somaxon, Evotec, Actelion, Vanda, Neurogen, Sanofi-Aventis, Ventus, Respironics, and Jazz Pharmaceuticals. Dr. Zammit has received research support from Actelion, Ancile, Arena, Aventis, Boehringer-Ingelheim, Cephalon, Elan, Epix, Evotec, Forest, GlaxoSmithKline, H. Lundbeck A/S, King, Merck, National Institute of Health, Neurim, Neurocrine Biosciences, Neurogen, Organon, Orphan Medical, Pfizer, Respironics, Sanofi-Aventis, Schering-Plough, Sepracor, Somaxon, Takeda, Targacept, Transcept, UCB Pharma, Predix, Vanda, and Wyeth. He has been a consultant for Actelion, Alexza, Arena, Aventis, Biovail, Boehringer Ingelheim, Cephalon, Elan, Eli Lilly, Evotec, Forest, GlaxoSmithKline, Jazz, King Pharmaceuticals, Ligand, McNeil, Merck, Neurocrine Biosciences, Organon, Pfizer, Renovis, Sanofi-Aventis, Select Comfort, Sepracor, Shire, Somnus, Takeda, Vela, and Wyeth. He has ownership or directorship status in Clinilabs, Inc., Clinilabs IPA, Inc., and Clinilabs Physician Services, PC. Dr. Shahly reports no conflict of interest.

ACKNOWLEDGMENTS

The America Insomnia Survey (AIS) was conceived of and funded by Sanofi-Aventis (SA). The study was designed and supervised by a 4-member Executive Committee of academic experts in insomnia (Goran Hajak, Thomas Roth, James K. Walsh) and psychiatric epidemiology (Ronald C. Kessler). The Executive Committee developed the study protocol and survey instrument, supervised data collection, and is responsible for planning data analyses, interpreting results, and publishing study reports. An AIS Steering Committee made up of both academics and representatives from SA provides consultation to the Executive Committee. Steering Committee members include experts in sleep (Diego Garcia, Damien Leger, Charles Morin, Gary Zammit), psychiatric epidemiology (Bruno Falissard), and health services research (Alicia Shillington, Judith Stephenson). SA representatives on the Steering Committee include Catherine Coulouvrat, Gilles Perdriset, Christophe Candelas, Françoise Dellatolas, Lewis Warrington, Adam Winseck, and Brian Seal. The main AIS survey was carried out by DataStat, Inc. The AIS clinical reappraisal study was carried out by Clinilabs, Inc. A Publications Committee made up of the Executive Committee and health services research and SA representatives from the Steering Committee is responsible for overseeing AIS publication plans. Data analysis for the current report was carried out at Harvard Medical School under the supervision of Ronald Kessler. The first draft of the manuscript was prepared by Kessler and the other Harvard Medical School coauthors. The remaining coauthors collaborated in designing the data-analysis plan, interpreting results, providing critical comments on the first draft, and making revisions. Authors are fully responsible for all content and editorial decisions. Although a draft of the manuscript was submitted to SA for review and comment prior to submission, this was with the understanding that comments would be no more than advisory. SA played no role in data collection or management other than in posing the initial research question, providing operational and financial support, and facilitating communications among collaborators. SA played no role in data analysis, interpretation of results, or preparation of the manuscript. The authors thank Marcus Wilson and his staff at HealthCore, Inc., for recruiting the AIS sample and for the use of the HealthCore research environment; Marielle Weindorf and her staff at DataStat, Inc., for AIS fieldwork; and Jon Freeman at Clinilabs, Inc., and his panel of interviewers, Drs. Melanie Means, Angela Randazzo, Rebecca Scott, Stephanie Silberman, Elaine Wilson, and Rochelle Zozula, for carrying out the clinical reappraisal study. The AIS interview schedule and a complete list of AIS publications can be found at http://www.hcp.med.harvard.edu/wmh/affiliated_studies.php.

SUPPLEMENTAL MATERIAL

Brief Insomnia Questionnaire
aasm.33.11.1539s1.pdf (248.8KB, pdf)
Scoring Algorithms
aasm.33.11.1539s2.pdf (169.1KB, pdf)

REFERENCES

  • 1.American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) Fourth Edition, Text Revision. Washington, DC: American Psychiatric Association; 2000. [Google Scholar]
  • 2.World Health Organization. International Classification of Diseases (ICD-10) Geneva, Switzerland: World Health Organization; [Google Scholar]
  • 3.Edinger JD, Bonnet MH, Bootzin RR, et al. Derivation of research diagnostic criteria for insomnia: report of an American Academy of Sleep Medicine Work Group. Sleep. 2004;27:1567–96. doi: 10.1093/sleep/27.8.1567. [DOI] [PubMed] [Google Scholar]
  • 4.American Academy of Sleep Medicine. International Classification of Sleep Disorders: Diagnostic and Coding Manual, Second Edition (ICSD-2) Westchester, IL: American Sleep Disorders Association; 2005. [Google Scholar]
  • 5.Buysse DJ, Reynolds CF, 3rd, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res. 1989;28:193–213. doi: 10.1016/0165-1781(89)90047-4. [DOI] [PubMed] [Google Scholar]
  • 6.Ohayon MM, Guilleminault C, Zulley J, Palombini L, Raab H. Validation of the sleep-EVAL system against clinical assessments of sleep disorders and polysomnographic data. Sleep. 1999;22:925–30. doi: 10.1093/sleep/22.7.925. [DOI] [PubMed] [Google Scholar]
  • 7.Okun ML, Kravitz HM, Sowers MF, Moul DE, Buysse DJ, Hall M. Psychometric evaluation of the Insomnia Symptom Questionnaire: a self-report measure to identify chronic insomnia. J Clin Sleep Med. 2009;5:41–51. [PMC free article] [PubMed] [Google Scholar]
  • 8.Roth T, Zammit G, Kushida C, et al. A new questionnaire to detect sleep disorders. Sleep Med. 2002;3:99–108. doi: 10.1016/s1389-9457(01)00131-9. [DOI] [PubMed] [Google Scholar]
  • 9.Soldatos CR, Dikeos DG, Paparrigopoulos TJ. The diagnostic validity of the Athens Insomnia Scale. J Psychosom Res. 2003;55:263–7. doi: 10.1016/s0022-3999(02)00604-9. [DOI] [PubMed] [Google Scholar]
  • 10.Kohn L, Espie CA. Sensitivity and specificity of measures of the insomnia experience: a comparative study of psychophysiologic insomnia, insomnia associated with mental disorder and good sleepers. Sleep. 2005;28:104–12. doi: 10.1093/sleep/28.1.104. [DOI] [PubMed] [Google Scholar]
  • 11.Smith S, Trinder J. Detecting insomnia: comparison of four self-report measures of sleep in a young adult population. J Sleep Res. 2001;10:229–35. doi: 10.1046/j.1365-2869.2001.00262.x. [DOI] [PubMed] [Google Scholar]
  • 12.Violani C, Devoto A, Lucidi F, Lombardo C, Russo PM. Validity of a short insomnia questionnaire: the SDQ. Brain Res Bull. 2004;63:415–21. doi: 10.1016/j.brainresbull.2003.06.002. [DOI] [PubMed] [Google Scholar]
  • 13.Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46. [Google Scholar]
  • 14.Summers MO, Crisostomo MI, Stepanski EJ. Recent developments in the classification, evaluation, and treatment of insomnia. Chest. 2006;130:276–86. doi: 10.1378/chest.130.1.276. [DOI] [PubMed] [Google Scholar]
  • 15.Buysse DJ. Diagnosis and classification of insomnia disorders. In: Szuba MP, Kloss JD, Dinges DF, editors. Insomnia: Principles and Management. New York, NY: Cambridge University Press; 2003. pp. 3–22. [Google Scholar]
  • 16.Reynolds CF, 3rd, Redline S. The DSM-V sleep-wake disorders nosology: an update and an invitation to the sleep community. Sleep. 2010;33:10–1. doi: 10.1093/sleep/33.1.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.National Institute of Health. NIH State-of-the-Science Conference Statement on manifestations and management of chronic insomnia in adults. NIH Consens State Sci Statements. 2005;22:1–30. [PubMed] [Google Scholar]
  • 18.Sheehan DV, Harnett-Sheehan K, Raj BA. The measurement of disability. Int Clin Psychopharmacol. 1996;11:89–95. doi: 10.1097/00004850-199606003-00015. [DOI] [PubMed] [Google Scholar]
  • 19.Kendler KS, Neale MC, Kessler RC, Heath AC, Eaves LJ. A population-based twin study of major depression in women. The impact of varying definitions of illness. Arch Gen Psychiatry. 1992;49:257–66. doi: 10.1001/archpsyc.1992.01820040009001. [DOI] [PubMed] [Google Scholar]
  • 20.Rohde P, Lewinsohn PM, Seeley JR. Comparability of telephone and face-to-face interviews in assessing axis I and II disorders. Am J Psychiatry. 1997;154:1593–8. doi: 10.1176/ajp.154.11.1593. [DOI] [PubMed] [Google Scholar]
  • 21.Sobin C, Weissman MM, Goldstein R, et al. Diagnostic interviewing for family studies: comparing telephone and face-to-face interview methods for the diagnosis of lifetime psychiatric disorders. Psychiatr Genet. 1993;3:227–33. [Google Scholar]
  • 22.Spitzer RL. Psychiatric diagnosis: are clinicians still necessary? Compr Psychiatry. 1983;24:399–411. doi: 10.1016/0010-440x(83)90032-9. [DOI] [PubMed] [Google Scholar]
  • 23.Klebanoff MA, Cole SR. Use of multiple imputation in the epidemiologic literature. Am J Epidemiol. 2008;168:355–7. doi: 10.1093/aje/kwn071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. doi: 10.1148/radiology.143.1.7063747. [DOI] [PubMed] [Google Scholar]
  • 25.Kraemer HC, Morgan GA, Leech NL, Gliner JA, Vaske JJ, Harmon RJ. Measures of clinical significance. J Am Acad Child Adolesc Psychiatry. 2003;42:1524–9. doi: 10.1097/00004583-200312000-00022. [DOI] [PubMed] [Google Scholar]
  • 26.Agresti A. An Introduction to Categorical Data Analysis. New York, NY: John Wiley and Sons; 1996. [Google Scholar]
  • 27.Spitznagel EL, Helzer JE. A proposed solution to the base rate problem in the κ statistic. Arch Gen Psychiatry. 1985;42:725–8. doi: 10.1001/archpsyc.1985.01790300093012. [DOI] [PubMed] [Google Scholar]
  • 28.Kessler RC, Abelson J, Demler O, et al. Clinical calibration of DSM-IV diagnoses in the World Mental Health (WMH) version of the World Health Organization (WHO) Composite International Diagnostic Interview (WMHCIDI) Int J Methods Psychiatr Res. 2004;13:122–39. doi: 10.1002/mpr.169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wolter KM. Introduction to Variance Estimation. New York, NY: Springer-Verlag; 1985. [Google Scholar]
  • 30.Anastasi A, Urbina S. Psychological Testing, Seventh Edition. Upper Saddle River, NJ: Prentice Hall; 1997. [Google Scholar]
  • 31.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74. [PubMed] [Google Scholar]
  • 32.Douglass AB, Bornstein R, Nino-Murcia G, et al. The Sleep Disorders Questionnaire. I: Creation and multivariate structure of SDQ. Sleep. 1994;17:160–7. doi: 10.1093/sleep/17.2.160. [DOI] [PubMed] [Google Scholar]
  • 33.Morabia A. Case-control studies in clinical research: mechanism and prevention of selection bias. Prev Med. 1997;26:674–7. doi: 10.1006/pmed.1997.0189. [DOI] [PubMed] [Google Scholar]
  • 34.Hill SY, Neiswanger K. The value of narrow psychiatric phenotypes and “super” normal controls. In: Blum K, Noble EP, editors. Handbook of Psychiatric Genetics. Boca Raton, FL: CRC Press; 1997. pp. 37–48. [Google Scholar]
  • 35.Klein DF. The utility of the super-normal control group in psychiatric genetics. Psychiatr Genet. 1993;3:17–20. [Google Scholar]
  • 36.Schwartz S, Link BG. The ‘well control’ artefact in case/control studies of specific psychiatric disorders. Psychol Med. 1989;19:737–42. doi: 10.1017/s0033291700024338. [DOI] [PubMed] [Google Scholar]
  • 37.Littner M, Hirshkowitz M, Kramer M, et al. Practice parameters for using polysomnography to evaluate insomnia: an update. Sleep. 2003;26:754–60. doi: 10.1093/sleep/26.6.754. [DOI] [PubMed] [Google Scholar]
  • 38.Chesson A, Jr, Hartse K, Anderson WM, et al. Practice parameters for the evaluation of chronic insomnia. An American Academy of Sleep Medicine report. Standards of Practice Committee of the American Academy of Sleep Medicine. Sleep. 2000;23:237–41. [PubMed] [Google Scholar]
  • 39.Tsai PS, Wang SY, Wang MY, et al. Psychometric evaluation of the Chinese version of the Pittsburgh Sleep Quality Index (CPSQI) in primary insomnia and control subjects. Qual Life Res. 2005;14:1943–52. doi: 10.1007/s11136-005-4346-x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Brief Insomnia Questionnaire
aasm.33.11.1539s1.pdf (248.8KB, pdf)
Scoring Algorithms
aasm.33.11.1539s2.pdf (169.1KB, pdf)

RESOURCES