Abstract
Alcohol consumption is an important predictor of a variety of negative outcomes. There is an extensive literature that examines the differences in the estimated level of alcohol consumption between types of assessments (e.g., quantity–frequency [QF] questionnaires, daily diaries). However, it is typically assumed that all QF-based measures are nearly identical in their assessment of the volume of alcohol consumption in a population. Using timeline follow-back data and constructing common QF consumption measures, we examined differences among survey instruments to assess alcohol consumption and heavy drinking. Using three data sets, including clinical to community samples, we demonstrate how scale-specific item characteristics (i.e., number of response options and ranges of consumption assessed by each option) can substantially affect the estimated mean level of consumption and estimated prevalence of binge drinking. Our analyses suggest that problems can be mitigated by employing more resolved measures of quantity and frequency in consumption questionnaires.
Keywords: alcohol consumption, quantity–frequency, measurement, timeline follow-back
Research exploring self-report alcohol consumption has lasting effects on our understanding of the causes and correlates of the dependence syndrome (Babor, Brown, & Del Boca, 1990). Increased alcohol consumption has been linked to several negative physical health risks (Courtenay, 2000; Rimm et al., 1991), social issues (DeHart, Tennen, Armeli, Todd, & Mohr, 2009), and psychological correlates (Hasin, Keyes, Hatzenbuehler, Aharonovich, & Alderson, 2007; Rossow, 2000) to name a few. In a meta-analysis by Reynolds et al. (2003), it was found that the risk of stroke increases with heavy alcohol consumption but that there is evidence to support that there might be a protective effect against strokes with light to moderate consumption. The continual emphasis placed on consumption when studying the harms of alcohol is paramount and therefore, there should be a demand for proper assessment to obtain proper interpretation of results across varying levels of consumption.
When direct observation of alcohol consumption is unfeasible, the most common method of identifying levels of consumption is by administration of questionnaire items assessing the quantity and frequency of typical drinking, the frequency of heavy drinking, and the maximum drinks consumed on a single occasion, collectively referred to as a quantity–frequency (QF) questionnaire. The assessment of maximum consumption during a time period is important to differentiate consumption levels since past research has found that drinkers who infrequently drink large amounts per occasion are at increased risk of suffering negative outcomes compared with drinkers who consume an equivalent volume of alcohol in smaller, more frequent episodes (Midanik, Tam, Greenfield, & Caetano, 1996; Wechsler & Nelson, 2001). If time allows, use of graduated frequency measure (Greenfield, 2000) where the frequency of different levels of consumption are assessed might represent a useful compromise between the brevity of typical QF measures and the burden of highly resolved timeline follow-back (TLFB) measures. Existing QF questionnaires differ from each other in several ways, including the number of response options and the range of alcohol consumption that is assessed. Alcohol consumption is assessed by a variety of other strategies as well, including daily diaries (e.g., Boynton & Richman, 2014; Heeb & Gmel, 2005), dietary history (Koppes, Twisk, Snel, & Kemper, 2002), TLFB interviews/surveys (Sobell, Cellucci, Nirenberg, & Sobell, 1982; Sobell et al., 2003), ecological momentary assessment (Piasecki, 2019; Shiffman, 2016), or a combination of these (Dulin, Alvarado, Fitterling, & Gonzalez, 2017; Feunekes, van’t Veer, van Staveren, & Kok, 1999). These other approaches, although providing valuable additional data, are more time consuming and incur greater subject burden. Consequently, QF questionnaires continue to be employed in a wide range of survey studies with little attention given to the differences in their structure.
Assessment of Alcohol Consumption Using Quantity–Frequency Questionnaires
QF questionnaires are used for two related purposes: (1) to estimate the typical alcohol consumption for an individual (or the sample more broadly) and (2) to determine whether individuals exceed thresholds of hazardous drinking (or the proportion of individuals in the sample which drink at a hazardous level). The calculated volume of alcohol consumed using these questions is typically referred to as the “QF” (Straus & Bacon, 1953) and is determined by taking the product of the number of drinks consumed on a typical drinking occasion and the frequency of these occasions. Although more sophisticated QF questionnaires that consider frequency of drinking at various levels (as opposed to frequency of drinking and inquiring as to “typical quantity,” such graduated frequency measures (e.g., Midanik, 1994) are used much less frequently that traditional QF questionnaires, presumably because of their longer length.
Although more ecologically valid techniques for estimating alcohol consumption have gained in popularity with the development of software and other technology, which simplifies gathering such data on a large scale (e.g., ecological momentary assessment or other ambulatory assessments), these methods are costly and time intensive to implement and are therefore not as easily integrated into many research contexts. Thus, studies use the QF as a quick and efficient measure of alcohol consumption that can be integrated into diagnostic interviews (Vinson, MacLure, Reidinger, & Smith, 2003), screening tools (Aalto, Alho, Halme, & Seppa, 2009), or epidemiological surveys (Heeb & Gmel, 2005; Stockwell et al., 2004). The QF is also used to assess the average consumption in a sample, for comparisons between groups of interest (de Goeij et al., 2015; Wiers, Hoogeveen, Sergeant, & Gunning, 1997). Although QF questionnaires are widely utilized, there are many variants on response ranges depending on the survey used.
Consider, for example, the quantity and frequency items in the widely used Alcohol Use Disorders Identification Test (AUDIT; Saunders et al., 1993). The AUDIT’s quantity item is worded as follows: “How many standard drinks do you have on a typical day when you are drinking?” with possible response options “10 or more, 7–9, 5–6, 3–4, 1–2.” The AUDIT measures drinking frequency with the question “How often do you have a drink containing alcohol?”, which has possible responses “4+ times a week, 2–3 times a week, 2–4 times a month, monthly or less, never”. These items differ significantly from, for example, the items recommended for assessing alcohol consumption by the National Institute of Alcohol Abuse and Alcoholism (NIAAA)1, which have possible quantity responses “25 or more, 19–24, 16–18, 12–15, 9–11, 7–8, 5–6, 3–4, 2, 1” and frequency responses “Every day, 5–6 times a week, 3–4 times a week, twice a week, once a week, 2–3 times a month, once a month, 3–11 times in the past year, 1–2 times in the past year.” The NIAAA recommended items include more response options and extend to higher values than the AUDIT response options (i.e., an increased number of drinks and more measures of frequency). For some individuals, especially those who typically drink high amounts of alcohol, the estimated consumption from the AUDIT and NIAAA-recommended items will differ substantially.
In addition to the quantity and frequency items, surveys also assess consumption that exceeds various thresholds. These items can be used to assess the prevalence of drinking patterns that are associated with negative health outcomes, which we refer to broadly as hazardous drinking. The thresholds utilized in these analyses are defined by the NIAAA and the Substance Abuse and Mental Health Services Administration (SAMHSA): (1) drinking binge, or consuming 4 or more drinks in a single occasion for women and 5 or more drinks in a single occasion for men; (2) risky drinking, or exceeding the binge threshold on any one occasion or exceeding weekly maximums: more than 7 drinks in a week for women and 14 drinks in a week for men; and (3) heavy drinking, or the consumption of 5 or more drinks in a single occasion on 5 or more days a month, regardless of sex.2 These thresholds are important for the prediction of additional risky behaviors (Miller, Tonigan, & Longabaugh, 1995) and interpersonal and health consequences (U.S. Department of Health and Human Services, 2007; SAMSHA’s Center for the Application of Prevention Technologies, Northeast Resource Team, 2011). Measures which assess alcohol consumption often additionally include items to measure excessive consumption. These items also bin responses (separate a continuous scale into categories with bounded ranges of responses), occasionally around what are considered critical boundaries that separate safe from hazardous consumption patterns (to be discussed further in the following section). This can affect the estimated proportion of individuals in the sample that drink at hazardous levels. For example, the National Alcohol Survey (NAS3; Rehm, Greenfield, & Rogers, 2001) assesses the frequency of drinking 5 or more drinks in a single occasion with response options “Every day or nearly every day,” “3–4 times a week,” “1–2 times a week,” “1–3 times a month,” “less than once a month,” “1 in 12 months,” and “never.” The NIAAA and the SAMSHA’s Center for the Application of Prevention Technologies, Northeast Resource Team (2011) define a “heavy” drinking pattern as drinking 5 or more drinks, 5 or more times a month. The response options on the NAS survey, however, groups together 1 to 2 times a week, or individuals who drink 5+ drinks in a single occasion between 4 and 8 times a month. Therefore, false positives are likely to emerge when utilizing this item to assess the prevalence rate of heavy drinking as defined above. However, it is unknown whether binning categories in this way has a significant impact on estimated consumption and heavy drinking rates, and if so, to what extent.
Timeline Follow-Back
Another method of assessing alcohol consumption in a single sitting is by use of a TLFB interview. In a TLFB interview, the respondent recollects the number of drinks he or she consumed on each day for T days preceding the interview. Typically, when an individual cannot remember the number of drinks for a specific day, they report a pattern of characteristic drinking. Although all self-report measures of alcohol consumption are prone to inaccuracies, TLFB included, several studies have supported the precision of TLFB compared with other assessment techniques (O’Hare, 1991; Sobell & Sobell, 1992). Although studies have indicated that patterns of individual differences in daily reported consumption reported by TLFB are less reliable than daily diaries (Carney, Tennen, Affleck, Del Boca, & Kranzler, 1998; Searles, Helzer, & Walter, 2000), TLFB has been found to accurately capture the overall aggregate level of drinking well when compared with both daily diaries and real-time electronic interviews (Carney et al., 1998). Research by Searles, Helzer, Rose, and Badger (2002) have examined the responses of consumption over 30, 90, and 366 days. This study found that although aggregated daily reports were underestimated (a trend found consistently in the literature) on the TLFB, the underreporting was stable across all time periods assessed. These findings indicate that although under-reporting may be an issue, it is unbiased with respect to elapsed time.
TLFB studies utilize calendar records to produce reliable and valid derived estimates of consumption (e.g., Pedersen, Grow, Duncan, Neighbors, & Larimer, 2012; Robinson, Sobell, Sobell, & Leo, 2014). Using aided-recall techniques, TLFB studies have the advantage of recording drinking episodes over a span of predetermined time to generate typical estimates of consumption while also capturing episodes that are atypical drinking events (i.e., maximum number of drinks over a time period and frequency of binge drinking). This article uses the TLFB to compare several items in QF questionnaires, determining the degree to which the same drinking pattern can produce variable estimates of QF consumption heavy drinking episodes in the sample.
Sobell et al. (2003) assessed consumption of alcohol abusers at two points in time (separated by 2.5 weeks) using both TLFB and a Quick Drinking Screener (a quantity frequency questionnaire) to participants covering the same time interval. This study found “remarkably similar aggregate data” across the two forms of assessment. It should be noted, however, that the “quick screener” that they used was open-ended in contrast to many QF questionnaires where response options use a fixed response format to binned ranges of quantity per occasion and number of occasions. Thus, it is unknown how more commonly used QF measures would compare with TLFB. Although TLFB has been found to be valid and reliable in the literature, there has been much debate on the validity of QF questionnaires. By comparing the formats of differing QF questionnaires we can determine the generalizability of the Sobell et al. (2003) findings.
The items selected for these analyses were chosen from measures that are widely used in the literature and were developed in congruence with the U.S. government and international institutes: the NIAAA (an institute of the National Institutes of Health [NIH]), the WHO, and the U.S. Census Bureau. The items of interest are the typical quantity consumed (discussed further in the next section and in the appendix), the typical frequency of drinking occasions, the frequency of binge drinking, and the maximum number of drinks in any occasion (if included). The following surveys/interviews are the focus of this comparison: (1) interviews from the 2012 wave of the National Longitudinal Survey of Youth (NLSY; Bureau of Labor Statistics, 2012); (2) interviews from the 2009 to 2010 wave of the National Alcohol Survey (NAS; see Zemore, Karriker-Jaffe, & Mulia, 2013); (3) questions recommended by the NIAAA for researchers assessing alcohol consumption (NIAAA, 2003); (4) Alcohol Use Disorders and Associated Disabilities Interview Schedule (AUDADIS-IV, Grant et al., 2003); (5) interviews from the in-home portion of the second wave of the National Longitudinal Study of Adolescent to Adult Health (Add Health; Harris et al., 2009); and (6) AUDIT (Saunders et al., 1993). The questions and response options of these quantity and frequency items are given in Table 1.
Table 1.
Quantity Frequency Items From Chosen Interviews/Questionnaires.
Interview | Assessment | Wording | Response options | Time |
---|---|---|---|---|
NIAAA Recommended Questions | Quantity | During the past 12 months, how many alcoholic drinks did you have on a typical day when you drank alcohol? | 25 or More, 19–24, 16–18, 12–15, 9–11, 7–8, 5–6, 3–4, 2, 1 | 12 months |
Frequency | During the past 12 months, how often did you usually have any kind of drink containing alcohol? | Every day, 5–6 times a week, 3–4 times a week, twice a week, once a week, 2–3 times a month, once a month, 3–11 times in the past year, 1–2 times in the past year | 12 months | |
Binge | During the past 12 months, how often did you have 5 or more (males) or 4 or more (females) drinks containing any kind of alcohol in within a 2-hour period? | Every day, 5–6 days a week, 3–4 days a week, 2 days a week, 1 day a week, 2–3 days a month, 1 day a month, 3–11 days in the past year 1, or 2 days in the past year | 12 months | |
Max drinks | During the past 12 months, what is the largest number of drinks containing alcohol that you drank within a 24-hour period? | 36 Drinks or more, 24–35 drinks, 18–23 drinks, 12–17 drinks, 8–11 drinks, 5–7 drinks, 4 drinks, 3 drinks, 2 drinks, 1 drink | 12 months | |
AUDIT | Quantity | How many standard drinks do you have on a typical day when you are drinking? | 10 or more, 7–9, 5–6, 3–4, 1–2 | NA |
Frequency | How often do you have a drink containing alcohol? | 4+ Times a week, 2–3 times a week, 2–4 times a month, monthly or less, never | NA | |
Binge 1 | During the past 12 months, about how often did you drink 4 or more drinks in a single day? | Every day, nearly every day, 3–4 times a week, 2 times a week, 1 time a week, 2–3 times a month, once a month, 7–11 times a year, 3–6 times a year, 1–2 times a year, never | 12 months | |
Binge 2 | During the past 12 months, about how often did you drink 5 or more drinks in a single day? | Every day, nearly every day, 3–4 times a week, 2 times a week, once a week, 2–3 times a month, once a month, 7–11 times a year, 3–6 times a year, 1–2 times a year, never | 12 months | |
Max drinks | During the past 12 months, what was the LARGEST number of drinks that you drank in a single day? | Any number | 12 months | |
NAS | Quantity | On those days when you drink, how many drinks do you usually have? | Number entered (max 96) | NA |
Frequency | How often do you usually have any kind of beverage containing alcohol? | More than once a day, once a day, nearly every day, 3–4 times a week, 1–2 times a week, 2–3 times a month, once a month, less than once a month but at least once a year, less than once a year | NA | |
Binge 1 | During the past 12 months, how often did you have either three drinks or four drinks but no more than 4 drinks of any kind of alcoholic beverage? | Every day or nearly every day, 3–4 times a week, 1–2 a week, 1–3 times a month, less than once a month, 1 in 12 months, never | l2 months | |
Binge 2 | During the past 12 months, how often did you have 5, 6, or 7, but no more than 7 drinks of any kind of alcoholic beverage? | Every day or nearly every day, 3–4 times a week, 1–2 a week, 1–3 times a month, less than once a month, 1 in 12 months, never | 12 months | |
Max drinks | During the past 12 months, what is the largest number of drinks you had on any single day? | 24 or More, 12–23, 8–11, 5–7, 4, 3, 2, 1, don’t know | 12 months | |
Add Health | Quantity | Think of all the times you have had a drink during the past 12 months. How many drinks did you usually have each time? | Number entered (max 18) | 12 months |
Frequency | During the past 12 months, on how many days did you drink alcohol? | Every day or almost every day, 3–5 days a week, 1–2 days a week, 2–3 days a month, once a month or less, 1–2 days in the past 12 months | 12 months | |
Binge 1 | During the past 2 weeks, how many times did you have 4 or more drinks on a single occasion, for example, in the same evening? | 0 Through 14 days | 2 weeks | |
Binge 2 | Over the past 12 months, on how many days did you drink five or more drinks in a row? | Every day or almost every day, 3–5 days a week, 1–2 days a week, 2–3 days a month, once a month or less, 1–2 days in the past 12 months, never | 12 months | |
AUDADIS | Quantity | What is the number of drinks of any alcohol usually consumed on days when you drank alcohol in the past 12 months? | Number entered (max 98) | 12 months |
Frequency | How often did you drink any amount of alcohol in the last twelve months? | Every day, nearly every day, 3–4 times a week, 2 times a week, once a week, 7–11 times in the past year, 3–6 times in the past year, and 1 or 2 times in the past year | 12 months | |
Binge | How often do you have 5 or more drinks on one occasion? | Never, less than monthly, monthly, weekly, daily or almost daily | NA | |
NLSY | Quantity | In the past 30 days, on the days you drank alcohol, about how many drinks did you usually have? | Number entered (max 99) | 30 days |
Frequency | During the past 30 days, on how many days did you have one or more drinks of an alcoholic beverage? | Number entered (max 30) | 30 days | |
Binge drinks | On how many days did you have 5 or more drinks on the same occasion during the past 30 days? | 0, 1–4, 5–9, 10–14, 15–19, 20–24, 25–29, 30 | 30 days |
Note. NLSY = National Longitudinal Survey of Youth (Bureau of Labor Statistics, 2012); NIAAA = National Institute of Alcohol Abuse and Alcoholism http://www.niaaa.nih.gov/research/guidelines-and-resources/recommended-alcohol-questions/; AUDIT = Alcohol Use Disorders Identification Test (World Health Organization, 1990); NAS = National Alcohol Survey http://arg.org/center/national-alcohol-surveys/; Add Health = The National Longitudinal Study of Adolescent to Adult Health (Harris et al., 2009); AUDADIS = The Alcohol use Disorder and Associated Disabilities Interview Schedule–IV (Grant et al., 2003). NA = not applicable.
Contribution of Current Article
Most methodological examinations into the validity of alcohol consumption measures have focused on issues related to recall and comparisons of several forms of alcohol consumption estimation (e.g., Sobell & Sobell, 1995; Stockwell et al., 2004). This article examines the quantity-frequency questionnaires from six commonly used measures, where the responses on the surveys are created from an identical set of drinking histories (using data collected using TLFB methods), to determine how the estimates differ depending on the survey used. Comparisons are conducted using data from three samples with varied base rates of heavy drinking–one sample of alcohol dependent individuals from a randomized controlled trial, and samples from a case and control study of individuals presented to the emergency room (ER) with an acute injury. The measures are compared using consumption data from studies using a TLFB interview (Sobell & Sobell, 1992). TLFB provides a high degree of detail about an individual’s drinking over a period of time. This provides enough data to permit us to convert these records into responses that empirically conform to the information requested by various questionnaire instruments. Using a common set of drinking histories then allows head-to-head comparisons of the performance of various assessment schemes. Based on past findings (Sobell et al., 2003), it is expected that when response options for QF are left open-ended that they will align closer to that of conducting a TLFB interview without the time and cost burdens associated with conducting TLFB. Therefore, it is expected that binning responses will result in more biased estimates of consumption, with wider bins and bins with more restrictive ranges at the high end resulting in the greatest bias. We would also expect variability in the prevalence of drinking statuses across more clinical versus control samples (e.g., population mean is close to the threshold).
Method
To evaluate each of these measures in estimating alcohol consumption, TLFB data from three samples in two studies are used: (1) the COMBINE study, a multisite randomized controlled trial of combined pharmacotherapy and behavioral intervention on alcohol dependent individuals and (2) the ER study, a case-control/case-crossover study of factors influencing risk for injury conducted using (a) medical patients presenting at an ER for acute injury and (b) community controls contacted by phone. The use of different types of samples allows for the examination of whether some of these measures are better assessing populations where very heavy use is rare (i.e., the ER samples) versus when it is prevalent (i.e., the COMBINE sample). The use of multiple samples will allow for the assessment of patterns occurring within sample, where varying levels of consumption are expected, depending on the QF measure utilized. For example, within a treatment seeking sample, the assessment of higher levels of consumption may be more important than in a general sample, where consumption levels are expected to be more representative of the general population.
Samples
The COMBINE study was conducted between 2001 and 2004 and assessed the effects of naltrexone, acamprosate, and combined behavioral intervention in alcohol dependent patients (Anton et al., 2006). COMBINE assessed TLFB of alcohol consumption for the 90 days prior to randomization into a treatment condition. A calendar prompt and a number of calendar aids facilitated the completion of the number of drinks consumed on each day. A standard drink was defined as a 10-ounce beer, 4 ounces of wine, or 1.2 ounces of hard liquor. When the participant could not recall their drinking on a specific day, the individual’s typical pattern of alcohol consumption was imputed. Inclusion criteria for randomization included the participant being at least 18 years of age, meeting Diagnostic and Statistical Manual of Mental Disorders–Fourth Edition alcohol dependence criteria, and not diagnosing under any other substance use disorders (excluding nicotine, caffeine, or marijuana). Inclusion criteria of the study also required that individuals have 4 days of abstinence before randomization. Thus, the past 6 days of data were removed to attenuate the effects of cessation of drinking, resulting in N = 1,383 individuals assessed over T = 84 days. In summary, the TLFB data within COMBINE provides a continuous measure of quantity of alcoholic drinks consumed over a total of 84 days, allowing for calculations of QF as assessed in the six commonly used consumption measures described in the following section.
The ER study (see the primary report, Vinson et al., 2003) was designed to assess the factors which increased risk for injury, including alcohol consumption. These data were collected between 1998 and 2000, and recruited individuals presenting at the ER with an injury (N = 2,517), individuals presenting at an ER with noninjury illness (N = 2,103), and community controls not recruited from an ER but contacted by phone (N = 1,856). The researchers assessed the past month of alcohol consumption for the individuals presenting in the ER for an acute injury (referred to as the ER cases) and the community controls (referred to as the ER controls). Community controls were matched to ER cases by age, gender, residence, and injury event by day of week. Similar to COMBINE,4 the researchers presented calendar aids and assessed the 20 days of drinking history proceeding contact (and, when unknown, drinking pattern). A drink was defined as 12 ounces of beer, 5 ounces of wine, and 1.5 ounces of distilled spirits. 7 outliers were removed, as they reported more than 500 drinks per week. All other individuals who completed any portion of the TLFB were used for the following analyses, resulting in N = 1461 ER cases and N = 748 ER controls over T = 29 days.
Calculation of Consumption
To determine what differences can result from alternatively binning the same drinking history, the TLFB data are converted to represent the (1) typical amount consumed on a drinking occasion, (2) the frequency of drinking occasions, (3) the frequency of 4+ drinking, (4) the frequency of 5+ drinking, and (5) the maximum drinks consumed on a drinking occasion. Then, each measure was used to place the individual’s drinking pattern into a bin for each of the items included in the measure.
To determine drinking quantity, the TLFB data were converted to represent the typical amount consumed when a respondent reported a nonzero number of drinks for a day. There is evidence that individuals differ in what they consider their “usual” number of drinks (Stahre, Naimi, Brewer, & Holt, 2006), so to derive item responses from the TLFB data that most closely resemble responses on actual QF items, the median drinks per drinking day was used for the Q term in QF calculations. This was empirically supported by preliminary analyses that are described in the supplementary appendix (available online). This value is placed into a bin for each of the six quantity items presented in Table 1. All values falling into that bin are scaled to the midpoint of the bin (Corrao, Bagnardi, Zambon, & Arico, 1999). The highest bin for quantity items was set equal to the average quantity of individuals in that bin. That is, for the NIAAA-recommended questions, individuals who report typically consuming over 25 drinks have a scaled value equivalent to the average quantity of all individuals who typically consume over 25 drinks. Note that this scaling procedure is taking advantage of information that a typical analysis would not have, making it potentially more accurate than in practice.
To assess drinking frequency, the TLFB histories for each individual were examined and the number of days when the individual consumed any alcohol were recorded. Similar to the quantity item, the number of drinking days were placed into the designated bin for each of the six frequency measures. Several frequency items in Table 1 address drinking over the course of a year. The drinking pattern evidenced in the TLFB interview is assumed to be characteristic of drinking throughout the year (this is the motivation behind truncating the final days of the COMBINE TLFB, as individuals were necessarily abstinent over these days). Therefore, if an individual reported drinking 15 days in the ER study TLFB interview (51.72% of assessment days), we assume that they consume alcohol on 51.72% of all days a year. Frequency items were binned and scaled to the drinking days per week. After placing the individual into a frequency bin, the values are scaled to the midpoint of a bin (i.e., 1–2 days a week is scaled to 1.5), with all frequency values reflecting the drinking occasions per week. The binned and scaled values are then multiplied to create six QF estimates of the alcohol volume consumed per week. Generally, the QF statistic is calculated by
where Quantity and Frequency are the midpoints of the bins where the typical number of drinks per drinking day and frequency of drinking occasions fall, respectively. An illustrative example is provided in the appendix.
The measures were also used to assess how idiosyncratic binning schemes effect the estimation of whether an individual exceeds hazardous drinking thresholds. To assess binge drinking, the TLFB data is used to create a variable to reflecting the number of days in which the individual consumed 4 or more drinks (for women) or 5 or more drinks (for men). An individual is considered a binge drinker by the TLFB if this has occurred at least once.5 Heavy drinking is assessed by the proportion of days in which 5 or more drinks were consumed. If this occurs 5 or more days a month then the individual is classified as a heavy drinker. Individuals are classified as risky drinkers if they ever had a drinking binge (4+/5+) or if, for any concurrent 7-day period over the TLFB, the individual consumed 7 drinks (for a woman) or 14 drinks (for a man) as recommended by the NIH (NIAAA, 2003).
Similar to the quantity and frequency estimates, the TLFB data are scored according to the additional items in the QF questionnaire. However, not all measures include the requisite additional items to assess all of these behaviors. For example, the NLSY includes only one additional item than typical quantity and frequency items: “On how many days did you have five or more drinks in the past 30 days?” with possible responses “0, 1–4, 5–9, 10–14, 15–19, 20–24, 25–29, and 30.” Therefore, to assess binge drinking using the NLSY, individuals are considered a binge drinker if they have consumed 5+ drinks on any occasion. Note that measures that only including a 5+ drinking item would misclassify women who have never consumed 5+ drinks but have had 4+ on at least one occasion as non–binge drinkers. All examined measures except for the NLSY and the AUDIT include a 4+ drinking item, or have alternative responses for men and women to assess binge drinking. Heavy drinking is assessed similarly–if an individual’s frequency of drinking 5+ drinks (regardless of sex) is 5 or more times a month, then they are a heavy drinker.
The maximum number of drinks item is incorporated in the assessment of risky drinking. An individual is considered a risky drinker if they have consumed 4+/5+ drinks on any occasion, if their QF statistic of drinks per week exceeds 7/14, or if their max drinks is 4+/5+. In the case of the NLSY, only the quantity response, QF statistic, and 5+ drinking frequency are used to determine risky drinking, whereas in the NIAAA items, the responses using the quantity item, QF statistic, binge drinking item, and the maximum drinks item are used (i.e., all possible items are used).
In the example presented in the appendix, the proposed individual consumed an identical number of drinks on each drinking occasion. However, this is unlikely to be the case in real data, as individuals’ actual drinking levels fluctuate over time (e.g., weekends versus weekdays, holidays, etc.). This example is simply demonstrating the calculation of QF and shows how varying consumption rates can differ greatly depending on the measure used.
Results
Table 2 presents the (1) mean and standard deviation of the absolute difference between individual’s estimated drinks per week using each QF versus the drinks per week as estimated by the TLFB, (2) the mean and standard deviation of estimated consumption in the population, (3) the product–moment (Pearson’s r) and rank (Kendall’s τ) correlations6 between consumption estimated by the TLFB and the QF estimates, and (4) the base rate of binge, heavy, and risky drinking as defined in the previous section. Table 3 summarizes this information by presenting the rank ordering of QF measures on these statistics. Significant differences between the estimated measures and the TLFB estimates were also examined with regard to the estimated consumption mean and base rates of hazardous drinking. Significance was determined using a Dunnett’s test of multiple comparisons against a common control, where p values are corrected to control for Type I error. Values less than .05 are considered significant.
Table 2.
Comparisons Between TLFB- and QF-Estimated Consumption.
Data set | TLFB | NLSY | NIAAA | AUDADIS | NAS | ADDH | AUDIT |
---|---|---|---|---|---|---|---|
aAverage absolute difference between the TLFB-estimated consumption and QF-estimated consumption | |||||||
COMBINE | – | 7.81 (10.06) | 11.78 (15.55) | 10.03 (11.66) | 10.28 (11.63) | 13.97 (21.66) | 21.09 (26.14) |
Controls | – | 0.74 (1.73) | 1.04 (1.99) | 0.94 (1.81) | 1.15 (2.01) | 1.21 (2.22) | 1.71 (3.28) |
Cases | – | 0.96 (2.26) | 1.45 (2.83) | 1.29 (2.54) | 1.52 (2.78) | 1.65 (3.20) | 2.19 (4.01) |
bEstimated mean and standard deviation of consumption | |||||||
COMBINE | 66.12 (43.24) | 63.51 (43.65) | 63.28 (43.04) | 62.69 (43.17) | 62.34 (43.34) | 59.68 (32.68)* | 60.21 (29.85)* |
Controls | 4.88 (7.67) | 4.51 (7.56) | 4.41 (7.46) | 4.40 (7.32) | 4.50 (7.14) | 4.60 (7.51) | 4.51 (8.45) |
Cases | 6.74 (10.89) | 6.28 (10.36) | 6.31 (10.88) | 6.29 (10.68) | 6.37 (10.49) | 6.44 (10.45) | 6.02 (10.12) |
cSpearman (r) and Kendall (τ) correlations between the TLFB- and QF-estimated consumption | |||||||
COMBINE | |||||||
Rho | – | 0.95 | 0.92 | 0.93 | 0.93 | 0.90 | 0.80 |
Tau | – | 0.83 | 0.78 | 0.80 | 0.79 | 0.75 | 0.65 |
Controls | |||||||
Rho | – | 0.99 | 0.97 | 0.97 | 0.97 | 0.97 | 0.93 |
Tau | – | 0.93 | 0.89 | 0.90 | 0.88 | 0.88 | 0.82 |
Cases | |||||||
Rho | – | 0.99 | 0.97 | 0.98 | 0.97 | 0.97 | 0.95 |
Tau | – | 0.93 | 0.89 | 0.90 | 0.89 | 0.89 | 0.84 |
dEstimated base rates of binge, risky, and heavy episodic drinking | |||||||
COMBINE | |||||||
BRBinge | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
BRHeavy | 0.99 | 0.97* | 0.97* | 0.95* | 0.98* | 0.98* | 0.98 |
BRRisky | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Controls | |||||||
BRBinge | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 |
BRHeavy | 0.08 | 0.08 | 0.08 | *0.05 | 0.12 | 0.12 | 0.12 |
BRRisky | 0.38 | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 | 0.38 |
Cases | |||||||
BRBinge | 0.51 | 0.48 | 0.48 | 0.48 | 0.40* | 0.51 | 0.48 |
BRHeavy | 0.16 | 0.16 | 0.16 | 0.10* | 0.19 | 0.19 | 0.19 |
BRRisky | 0.53 | 0.50 | 0.50 | 0.52 | 0.52 | 0.52 | 0.51 |
Note. COMBINE = sample of alcohol dependent individuals from the COMBINE study; ER-Cont = sample of community controls from ER study, ER-Cases = sample of individuals presenting at ER for acute injury from the ER study; TLFB = Timeline Follow Back; NLSY = National Longitudinal Survey of Youth; NAS = National Alcohol Survey; NIAAA = National Institute of Alcohol Abuse and Alcoholism; AUDADIS = Alcohol Use Disorders and Associated Disabilities Interview Schedule; AH = Add Health; AUDIT = Alcohol Use Disorders Identification Test; ER = emergency room.
Contains the mean and standard deviation of the absolute differences between each individual’s QF-estimated drinks per week and the estimated drinks per week using the TLFB.
Contains the mean and standard deviation of the estimated drinks per week.
Contains the Pearson (r) and Kendall (τ) correlations between the QF and TLFB-estimated drinks per week.
Contains the base rate estimate of Binge (4 or more drinks on any day for a woman, 5 or more for a man), Heavy (5 or more drinks, 5 or more times a month) and Risky (7 or more drinks in any week for a woman, 14 or more for a man, OR 4+/5+ on any one day) drinking.
An asterisk (*) indicates the estimate is significantly different from the TLFB base rate, using a Dunnett’s corrected p value <.05.
Table 3.
Ranking of Performance of Six Commonly Used Measures of Alcohol Consumption.
NLSY | NIAAA | AUDADIS | NAS | AH | AUDIT | |
---|---|---|---|---|---|---|
Overall Average | 1.71 | 2.88 | 2.21 | 3.08 | 3.83 | 4.04 |
COMBINE | ||||||
Rank (Abs.Diff) | 1 | 4 | 2 | 3 | 5 | 6 |
Rank (M) | 1 | 2 | 3 | 4 | 6 | 5 |
Rank (SD) | 1 | 4 | 3 | 2 | 5 | 6 |
Rank (r) | 1 | 4 | 2 | 3 | 5 | 6 |
Rank (τ) | 1 | 4 | 2 | 3 | 5 | 6 |
Rank (BRBinge) | 1 | 1 | 1 | 1 | 1 | 1 |
Rank (BRHeavy) | 2 | 2 | 1 | 4 | 4 | 4 |
Rank (BRRisky) | 1 | 1 | 1 | 1 | 1 | 1 |
Average rank | 1.13 | 2.75 | 1.88 | 2.63 | 4.00 | 4.38 |
ER-Cases | ||||||
Rank (Abs.Diff) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank (M) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank (SD) | 5 | 1 | 2 | 3 | 4 | 6 |
Rank (r) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank (τ) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank (BRBinge) | 2 | 2 | 2 | 1 | 6 | 2 |
Rank (BRHeavy) | 5 | 5 | 1 | 2 | 2 | 2 |
Rank (BRRisky) | 1 | 1 | 4 | 4 | 4 | 3 |
Average rank | 2.13 | 2.63 | 2.13 | 3.25 | 4.50 | 4.63 |
ER-Cont | ||||||
Rank (Abs.Diff) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank(M) | 2 | 5 | 6 | 4 | 1 | 2 |
Rank (SD) | 2 | 4 | 5 | 6 | 3 | 1 |
Rank (r) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank (τ) | 1 | 3 | 2 | 4 | 5 | 6 |
Rank (BRBinge) | 1 | 1 | 1 | 1 | 1 | 1 |
Rank (BRHeavy) | 5 | 5 | 1 | 2 | 2 | 2 |
Rank (BRRisky) | 2 | 2 | 2 | 2 | 2 | 1 |
Average rank | 1.88 | 3.25 | 2.63 | 3.38 | 3.00 | 3.13 |
Note. COMBINE = sample of alcohol dependent individuals from the COMBINE study; ER-Cont = sample of community controls from ER study; QF = quantity–frequency; TLFB = Timeline Follow Back; ER-Cases = sample of individuals presenting at the ER for acute injury from the ER study; NLSY = National Longitudinal Survey of Youth; NAS = National Alcohol Survey; NIAAA = National Institute of Alcohol Abuse and Alcoholism; AUDADIS = Alcohol Use Disorders and Associated Disabilities Interview Schedule; AH = Add Health; AUDIT = Alcohol Use Disorders Identification Test. Each cell is the rank in the performance for each QF in estimating the TLFB consumption. Abs.Diff = average absolute difference for each individual’s estimated QF consumption and sum of TLFB drinks; r = Pearson correlations between QF and TLFB volume; τ = Kendall correlations between QF and TLFB volume. Binge: 4+/5+, Heavy: 5+, 5 or more times a month, and Risky: 7+/14+ a week OR 4+/5+ in a day. Mean, SD, and BR rankings are based on the absolute difference between the QF estimate and the TLFB estimate. Measures were ranked for Abs.Diff based on the smallest value, and measures were ranked for r and τ based on the highest correlations.
Examining the average absolute differences, it is clear that no matter the sample the NLSY QF has the smallest difference in the TLFB volume (recall this measure uses two continuous scales of quantity and frequency). In the case of COMBINE data, the average absolute difference varies dramatically–from 7.81 drinks per week (NLSY) to 21.09 (AUDIT). The estimates from the ER study vary far less (between 0.74 and 1.71 drinks for the Control sample and between 0.96 and 2.19 drinks for the Case sample), with the largest difference to the TLFB estimate coming from the AUDIT items in all samples. The standard deviations are large, exceeding the mean estimate in every case.
The estimated means and standard deviations of consumption in the second panel of Table 2 indicate that all QF measures are shown to underestimate the mean level of consumption in the sample. However, only two mean estimates are significantly different from the TLFB-estimated consumption–the estimate using the Add Health items, and the AUDIT items for the COMBINE sample.
The distribution of consumption is presented using histograms of each QF and the TLFB for each dataset in Figure 1 (see supplementary appendix available online). Examining these plots, it is clear that the Add Health and AUDIT items decrease the estimated mean consumption by bringing all individuals on the tail of the distribution to the uppermost bin. This has a marked effect on the COMBINE sample, where high consumption is prevalent, but has a smaller effect on the ER Control sample and ER Case sample, where heavy drinking is less prevalent.
The next panel of Table 2 presents the product–moment (Pearson) and rank (Kendall) correlations between the TLFB-estimated consumption per week to the QF estimates. As expected from the properties of each coefficient, Kendall’s τ exhibits smaller values than Spearman’s r correlation; however, both measures of correlation demonstrate moderate-high levels of association, with the lowest overall Pearson correlation resulting from the AUDIT in the COMBINE sample (.80) and the lowest Kendall correlation from the AUDIT in the COMBINE and control samples (.82). The NLSY, in addition to having the lowest absolute difference to the TLFB, also has the highest correlations, with none lower than .93. The correlations are visualized in plot matrices in Figure 2 (see supplementary appendix available online), which display the pairwise scatter plots of each measure (below the diagonal), density plots (on the diagonal), and contour plots (above the diagonal). Rank correlations can be visualized by examining the first column of the plot matrices: the scatter plots the QF measures with the TLFB. Points which lie along a horizontal line show that the same TLFB volume results in different QF estimates. Points which lie along a vertical line have the same QF estimate but different TLFB estimates. Density and contour plots are smoothed representations of the univariate and bivariate distributions of the measures, respectively, where more lines in the contour plot indicate a denser group of points. Contour plots are helpful when many points are tightly clustered, providing a visual representation of the density where a scatterplot may be obscured. These plots show the “ceiling effects” present in the Add Health and AUDIT measures that was shown in the histograms–the upper levels of consumption are truncated and there is no resolution at these levels. Again, this has a much more dramatic effect on the COMBINE sample than for the ER Controls or ER Cases.
The final measure of performance is the estimation of the base rates of hazardous drinking: (1) binge drinking (4+/5+ on any day), (2) heavy drinking (5+ drinks on 5 or more days in a month), and (3) risky drinking (4+/5+ any day or 7+/14+ in any week). The fourth panel of Table 2 presents the base rates and estimates which are significantly different from the TLFB-calculated base rate (indicated with an asterisk; again, using a Dunnett’s corrected p values and significance threshold of .05). The COMBINE sample, being completely comprised of alcohol dependent individuals, has a base rate of nearly 1 for each type of hazardous drinking. The measures do very well in this case, with the only significant differences occurring with the estimation of heavy drinking. In this case, all measures have estimated base-rates significantly lower than the TLFB-indicated base rate of 0.99. However, the lowest base rate (from the AUDADIS) is 0.95, so the differences, although statistically significant, may not be of practical significance. The ER Controls and ER cases have fewer significant differences, but the differences are much larger than in the COMBINE study. In the ER Cases, for example, the NAS estimates a binge drinking base rate of only 0.40, whereas the TLFB estimates it to be 0.51. The ER Controls and ER Cases also show an instance when using the items can overestimate the base rate of heavy drinking–due to the fact that the NAS, Add Health, and AUDIT surveys bin together heavy drinking of 1 to 2 times a week (1.5 times a week or 6.42 days a month), they erroneously estimate a heavy drinking base rate of 0.12, whereas the TLFB only finds a base rate of 0.08.
To bring together all of these results, the ranking in performance of the measures is calculated on each of: (1) smallest average absolute difference with TLFB, (2) smallest difference with TLFB in estimated mean drinks per week, (3) smallest difference with TLFB in estimated standard deviation of drinks per week, (4) highest product–moment (r) correlation, (5) highest rank (τ) correlation, (6) smallest difference in base rate estimate of binge drinking, (7) smallest difference in base rate estimate of heavy drinking, and (8) smallest difference in base rate estimate of risky drinking. The measures are ranked individually for each dataset and the average rank is also provided in Table 3.
Unsurprisingly, the best overall average rank is achieved by the NLSY (1.71). The worst rank that the NLSY receives is 5, which occurs three times. Two of these occurrences are in the estimation of the base rate of heavy drinking in the ER Controls and the ER Cases. In all other cases, however, the NLSY’s rank is either 1 or 2. The statistic receiving the most 6 ranks and the worst overall average ranking is the AUDIT, with an overall average rank of 4.04. In five cases, however, the AUDIT received a 1 ranking, and has the overall best rank on average when considering only the three hazardous drinking base rates.
Discussion
Considering that TLFB requires a lengthy administration and even graduated frequency scales are often viewed as too lengthy, it does not seem likely that QF measures will be supplanted by more time-intensive survey approaches anytime soon. Not surprisingly, our results demonstrate that all QF item-based measures of alcohol consumption are, to varying degrees, losing an amount of the more fine-grained drinking history that is present in a TLFB interview.
Of particular note, the estimated mean level of drinking is consistently underestimated (this replicates previous comparisons of the two forms of consumption assessment; Grant, Tonigan, & Miller, 1995; O’Hare, 1991), the rank order of total amount consumed is not strictly maintained, and the estimation of hazardous drinking rates and total volume of alcohol consumed can be inaccurate. Moreover, there is considerable variability among existing instruments and the choice of specific questions to administer in a survey or interview can have a substantial impact on the accuracy of consumption estimates when using the highly resolved TLFB data as the criterion. Of the six alternative measures chosen for this examination, the NLSY items (two continuous measures of quantity and frequency) corresponded most closely with the TLFB in all three samples in estimating the amount of alcohol consumed in a week. However, when comparing measures in how well they reproduce rates of exceeding thresholds for binge, heavy, and risky drinking, the AUDADIS items perform the best (see Table 1 for the wording of these items).
The results presented indicate that the consumption of heavy drinking samples will not be fully captured using measures such as the ADDH or AUDIT. TLFB assessments of consumption are known to underestimate overall quantity and frequency of use and therefore, we should avoid more underestimation by use of measures that fail to fully capture use when possible. Vast differences emerge in estimation of consumption when using varying binning of quantity and frequency. Depending on which measure is used, there can be between a two- to threefold increase in rates of heavy drinking in both the control (e.g., AUDADIS estimates rate of heavy drinking at 5% while both NAS and ADDH estimate 12%) and case ER samples (e.g., AUDADIS estimates rate of heavy drinking at 10% while both NAS and ADDH estimate 19%). Similarly, in both the case and controls ER study, heavy drinking is significantly underestimated by AUDADIS when compared with the “true” TLFB scores (see Table 2). With increasing interest in what is sometimes referred to as high intensity drinking, it has been found that the thresholds for what is considered a binge far exceed the current limits (4+ for women/5+ for men). It is not unusual for individuals to double or triple binge (10+/15+ drinks; Patrick & Azar, 2018). Scales whose maximum values are only 10+ are unlikely to resolve issues with these drinking patterns.
The AUDIT, likely the most commonly used measure of the six presented here, is typically used for screening respondents for harmful drinking and problems related to alcohol. This measure was shown to be the least accurate for estimating the volume of alcohol consumed per week, especially in the clinical sample, due to the fact that the highest response in the quantity item is only 10 or more drinks. While the AUDIT was shown to have adequate performance in estimating the base rates of hazardous drinking (a result seen additionally in Aalto et al., 2009; Sobell et al, 1982) and has considerable validity for screening purposes, it is less satisfactory for estimating QF. Consequently, if a researcher would like to use the AUDIT for screening purposes AND would also like to have a reasonably accurate assessment of QF, we would recommend supplementing their assessment with more finely resolved QF measures (e.g., NLSY items). We also note that these results have significant implications for integrative data analysis (Curran & Hussong, 2009; Lenzerini, 2002). Differing item responses in the QF questionnaires can present considerable challenges, as the bins frequently consist of overlapping boundaries.
Limitations
The issue of scaling the bin values was not considered in the analyses in this article; the frequency value was set to the midpoint of the bins for each response option, in line with common practice. The scaling values which were used for the highest quantity bin for each QF measure were set to the average quantity of all individuals above the highest threshold. This is utilizing information not available to the researcher who only assesses alcohol consumption using quantity and frequency items and thus leads to a more accurate scaling value of this category than can be expected in real-world settings. A wider discussion of the effect of scaling bins of QF measures should include approaches to estimating an imputed value for the highest bin in the absence of knowledge of the true distribution of the sample’s consumption. While the issue of how to impute a value for the highest bin is obvious, the issue generalizes across all strata of consumption levels. Depending on the width of a bin and the underlying true distribution of consumption, mid-points could under- or overestimate the median. Consequently, calibration studies where a random subsample of participants receive both the QF measure and a TLFB would permit more accurate conversion metrics if time and resources allow. The current analyses also only considered patterns measured over the past 86 days before randomization while most of the comparison QF measures were measured over a 12-month period. Typical drinking over a year may vary more than drinking within the past 3 months.
These analyses presuppose the accuracy of the TLFB as an estimate of drinking quantity. Although the QF has shown to have lower estimates of overall drinking than a TLFB interview (O’Hare, 1991), the TLFB also suffers from the same decrease in reporting of overall drinking as compared with real-time measures (Carney et al., 1998; Searles et al., 2000). Future studies should examine whether the same differences are found in other highly resolved, fine-grained measures of alcohol consumption. TLFB is not the only kind of data that could permit this kind of work. Ecological momentary assessment or daily diary studies contain enough detailed data on drinking to allow the same kind of approach, and it may be useful to use other data sources to evaluate the generality of the patterns seen with TLFB. There are various approaches to assessing consumption that rely on reporting on a more highly resolved time frame. For example, daily diaries and ecological momentary assessment both assessment the number of drinks within in close temporal proximity to the drinking events. In addition, emerging approaches to directly assess blood alcohol in real time (e.g., transdermal alcohol sensors). However, all these approaches impose considerable time and effort burden on the participant when compared with brief, efficient QF questionnaires. Because of this, QF questionnaires are likely to be employed in most clinical and research settings.
Furthermore, we can interpret the above results to reflect the correspondence between the TLFB and the QF in a best-case scenario, where an individual is able to calculate their median drinks per drinking day. Because the term “typical” is ambiguous and it is not clear what type of heuristic respondents use when responding to a quantity item, we also examined mean and mode as we did for the median in parallel analyses (see the appendix), and the median has the best correspondence with the TLFB, and the mode corresponds slightly less with the TLFB than the mean. There were no cases when the ordering of performance differed based on the measure of typical drinking. Although the mode is intuitively the statistic most likely to correspond with an individual’s interpretation of “typical” when answering the quantity items, this was not shown in our preliminary analyses to be the closest correspondence to the results of QF items. It is much more likely that individuals are either heterogeneous in their answering of these items, or they are calculating no statistics and are perhaps using the most recent drinking occasions based on some type of availability heuristic (Carey, Borsari, Carey, & Maisto, 2006; Tversky & Kahneman, 1973). The discussion of "typical” drinking is somewhat outside the scope of this article, where the main focus of this comparison was to determine the alternative effects of binning and scaling the drinking history. It should also be noted that the two studies used differing definitions of a standard alcoholic drink; however, for the purposes of this study this was not an issue since inferences are not being made between studies.
Some problems that are common to all alcohol assessments are differences in standard drinks across cultures (12–14 grams of ethanol in the United States); moreover, the alcoholic content of beers, wines, and even distilled spirits can vary by a factor of 2. This is in addition to variation in what people consider to be a standard drink. Some questionnaires try to address this issue by establishing and presenting standards on what is meant by a “standard drink.” We recognize there is variability in alcohol content in beers, wines, and distilled spirits can vary greatly by geographical location. For example, a standard drink in the United States is defined as roughly 10 grams of alcohol while a standard drink in Japan is defined as 20 grams (Turner, 1990). Consequently, we must always be careful in generalizing any QF data across countries and historical periods.
Conclusion
While QF measures are widely used in the study of drinking behavior, they vary widely with respect to the number and ranges of response options. While, to varying degrees, commonly used QF measures appear to be highly correlated with each other, they do vary significantly in their ability to resolve the total volume of alcohol consumed over a given period of time and in estimating the frequency of heavy drinking. Our analyses suggest that these problems can be mitigated by employing increasingly more resolved measures of both quantity and frequency when more comprehensive assessments (e.g., graduated QF scales, TLFB, electronic diaries, and ecological momentary assessment) are not feasible.
Supplementary Material
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The present study was supported by the NIH grants K05AA017242, T32AA013526, and R01AA024133.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
See definitions of hazardous drinking thresholds at http://www.niaaa.nih.gov/alcohol-health/overview-alcohol-consumption/moderate-binge-drinking
The National Alcohol Survey is a series of nationally representative surveys conducted every 5 to 6 years by the Alcohol Research Group (www.arg.org), and assesses alcohol use patterns and problems related to alcohol. The NAS began collection in the 1960s and is the longest consistently-repeated alcohol survey. For more detail, see http://arg.org/center/national-alcohol-surveys.
The procedure for the TLFB portion of data collection in the ER Study (Vinson et al., 2003) was modeled on Project MATCH (Project MATCH Research Group, 1993), the predecessor of COMBINE (Anton et al., 2006).
It is important to note here that the NIAAA defines a drinking binge as drinking that occurs within 2 hours. However, the amount of time over which drinking occurs was not assessed for the ER study so was disregarded for these analyses. This to be a limitation of the results.
Kendall’s τ (1938) is a rank correlation coefficient for data with several ties in ranks, which is likely in the binned data created by the QF measures. Kendall’s τ is calculated by where N1 is the number of pairs not tied in the TLFB and N2 is the number of pairs not tied in the QF.
References
- Aalto M, Alho H, Halme JT, & Seppa K (2009). AUDIT and its abbreviated versions in detecting heavy and binge drinking in a general population survey. Drug and Alcohol Dependence, 103, 25–29. doi: 10.1016/j.drugalcdep.2009.02.013 [DOI] [PubMed] [Google Scholar]
- Anton R, O’Malley S, Ciraulo D, Cisler RA, Couper D, Donovan DM, … Zweben A (2006). Combined pharmacotherapies and behavioral interventions for alcohol dependence. The COMBINE study: A randomized controlled trial. JAMA Journal of the American Medical Association, 295, 2003–2017. [DOI] [PubMed] [Google Scholar]
- Babor TF, Brown J, & Del Boca FK (1990). Validity of self-reports in applied research on addictive behaviors: Fact or fiction? Behavioral Assessment, 12, 5–31. [Google Scholar]
- Boynton MH, & Richman LS (2014). An online daily diary study of alcohol use using Amazon’s Mechanical Turk. Drug and Alcohol Review, 33, 456–461. doi: 10.1111/dar.12163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bureau of Labor Statistics, US Department of Labor. (2012). National Longitudinal Survey of Youth 1979 Cohort, 1979–2010 (rounds 1–24). Produced and distributed by the Center for Human Resource Research. [Google Scholar]
- Carey KB, Borsari B, Carey MP, & Maisto SA (2006). Patterns and importance of self-other differences in college drinking norms. Psychology of Addictive Behaviors, 20, 385–393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney MA, Tennen H, Affleck G, Del Boca FK, & Kranzler HR (1998). Levels and patterns of alcohol consumption using timeline follow-back, daily diaries and real-time “electronic interviews.” Journal of Studies on Alcohol, 59, 447–454. [DOI] [PubMed] [Google Scholar]
- Corrao G, Bagnardi V, Zambon A, & Arico S (1999). Exploring the dose-response relationship between alcohol consumption and the risk of several alcohol-related conditions: A meta-analysis. Addiction, 94, 1551–1573. [DOI] [PubMed] [Google Scholar]
- Courtenay WH (2000). Engendering health: A social constructionist examination of men’s health beliefs and behaviors. Psychology of Men & Masculinities, 1, 4–15. [Google Scholar]
- Curran PJ, & Hussong AM (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14, 81–100. doi: 10.1037/a0015914 [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Goeij MCM, Suhrcke M, Toffolutti V, van de Mheen D, Schoenmakers TM, & Kunst AE (2015). How economic crises affect alcohol consumption and alcohol-related health problems: A realist systematic review. Social Science & Medicine, 131, 131–146. doi: 10.1016/J.SOCSCIMED.2015.02.025 [DOI] [PubMed] [Google Scholar]
- DeHart T, Tennen H, Armeli S, Todd M, & Mohr C (2009). A diary study of implicit self-esteem, interpersonal interactions and alcohol consumption in college students. Journal of Experimental Social Psychology, 45, 720–730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dulin PL, Alvarado CE, Fitterling JM, & Gonzalez VM (2017). Comparisons of alcohol consumption by timeline follow back vs. smartphone-based daily interviews. Addiction Research & Theory, 25, 195–200. doi: 10.1080/16066359.2016.1239081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feunekes GI, van’t Veer P, van Staveren WA, & Kok FJ (1999). Alcohol intake assessment: The sober facts. American Journal of Epidemiology, 150, 105–112. [DOI] [PubMed] [Google Scholar]
- Grant BF, Dawson DA, Stinson FS, Chou PS, Kay W, & Pickering R (2003). The Alcohol Use Disorder and Associated Disabilities Interview Schedule–IV (AUDADIS-IV): Reliability of alcohol consumption, tobacco use, family history of depression and psychiatric diagnostic modules in a general population sample. Drug and Alcohol Dependence, 71, 7–16. [DOI] [PubMed] [Google Scholar]
- Grant KA, Tonigan JS, & Miller WR (1995). Comparison of three alcohol consumption measures: A concurrent validity study. Journal of Studies on Alcohol, 56, 168–172. [DOI] [PubMed] [Google Scholar]
- Greenfield TK (2000). Ways of measuring drinking patterns and the difference they make: Experience with graduated frequencies. Journal of Substance Abuse, 12, 33–49. [DOI] [PubMed] [Google Scholar]
- Harris KM, Halpern CT, Whitsel E, Hussey J, Tabor J, Entzel P, & Udry JR (2009). The National Longitudinal Study of adolescent to adult health: Research design. Retrieved from http://www.cpc.unc.edu/projects/addhealth/design
- Hasin DS, Keyes KM, Hatzenbuehler ML, Aharonovich EA, & Alderson D (2007). Alcohol consumption and posttraumatic stress after exposure to terrorism: Effects of proximity, loss, and psychiatric history. American Journal of Public Health, 97, 2268–2275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heeb JL, & Gmel G (2005). Measuring alcohol consumption: A comparison of graduated frequency, quantity frequency, and weekly recall diary methods in a general population survey. Addictive Behaviors, 30, 403–413. [DOI] [PubMed] [Google Scholar]
- Kendall M (1938). A new measure of rank correlation. Biometrika, 30(12), 81–89. [Google Scholar]
- Koppes LLJ, Twisk JWR, Snel J, & Kemper HCG (2002). Concurrent validity of alcohol consumption measurement in a “healthy” population; quantity-frequency questionnaire v. dietary history interview. British Journal of Nutrition, 88, 427–434. doi: 10.1079/BJN2002671 [DOI] [PubMed] [Google Scholar]
- Lenzerini M (2002). Data integration: A theoretical perspective. In Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 233–246). doi: 10.1145/543613.543644 [DOI] [Google Scholar]
- Midanik LT (1994). Comparing usual quantity/frequency and graduated frequency scales to assess yearly alcohol consumption: Results from the 1990 US National Alcohol Survey. Addiction, 89, 407–412. [DOI] [PubMed] [Google Scholar]
- Midanik LT, Tam TW, Greenfield TK, & Caetano R (1996). Risk functions for alcohol-related problems in a 1988 US national sample. Addiction, 91, 1427–1437. [DOI] [PubMed] [Google Scholar]
- Miller WR, Tonigan JS, & Longabaugh R (1995). The Drinker Inventory of Consequences (DrInC): An instrument for assessing adverse consequences of alcohol abuse: Test manual (No. 95) Washington, DC: U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Institute on Alcohol Abuse and Alcoholism. [Google Scholar]
- National Institute on Alcohol Abuse and Alcoholism. (2003). The Task Force on Recommended Alcohol Questions. Retrieved from http://www.niaaa.nih.gov/research/guidelines-and-resources/recommended-alcohol-questions
- O’Hare T (1991). Measuring alcohol consumption: A comparison of the retrospective diary and the quantity-frequency methods in a college drinking survey. Journal of Studies on Alcohol, 52, 500–502. [DOI] [PubMed] [Google Scholar]
- Patrick ME, & Azar B (2018). High-intensity drinking. Alcohol Research, 39, 49–55. [PMC free article] [PubMed] [Google Scholar]
- Pedersen ER, Grow J, Duncan S, Neighbors C, & Larimer ME (2012). Concurrent validity of an online version of the Timeline Followback assessment. Psychology of Addictive Behaviors, 26, 672–677. doi: 10.1037/a0027945 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piasecki TM (2019). Assessment of alcohol use in the natural environment. Alcoholism: Clinical and Experimental Research, 43, 564–577. doi:0.1111/acer.13975 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Project MATCH Research Group. (1993). Project MATCH (Matching Alcoholism Treatment to Client Heterogeneity): Rationale and methods for a multisite clinical trial matching patients to alcoholism treatment. Alcoholism, 17, 1130–1145. [DOI] [PubMed] [Google Scholar]
- Rehm J, Greenfield TK, & Rogers JD (2001). Average volume of alcohol consumption, patterns of drinking, and all-cause mortality: Results from the US National Alcohol Survey. American Journal of Epidemiology, 153, 64–71. [DOI] [PubMed] [Google Scholar]
- Reynolds K, Lewis B, Nolen JDL, Kinney GL, Sathya B, & He J (2003). Alcohol consumption and risk of stroke: A meta-analysis. JAMA Journal of the American Medical Association, 289, 579–588. [DOI] [PubMed] [Google Scholar]
- Rimm EB, Giovannucci EL, Willett WC, Colditz GA, Ascherio A, Rosner B, & Stampfer MJ (1991). Prospective study of alcohol consumption and risk of coronary disease in men. Lancet, 338, 464–468. [DOI] [PubMed] [Google Scholar]
- Robinson SM, Sobell LC, Sobell MB, & Leo GI (2014). Reliability of the Timeline Followback for cocaine, cannabis, and cigarette use. Psychology of Addictive Behaviors, 28, 154–162. doi: 10.1037/a0030992 [DOI] [PubMed] [Google Scholar]
- Rossow I (2000). Suicide, violence and child abuse: A review of the impact of alcohol consumption on social problems. Contemporary Drug Problems, 27, 397–433. [Google Scholar]
- SAMSHA’s Center for the Application of Prevention Technologies, Northeast Resource Team (2011). Focus on student binge drinking: The prevalence and consequences. Waltham, MA: Author. [Google Scholar]
- Saunders JB, Aasland OG, Babor TF, De la Fuente JR, & Grant M (1993). Development of the alcohol use disorders identification test (AUDIT): WHO collaborative project on early detection of persons with harmful alcohol consumption-II. Addiction, 88, 791–804. [DOI] [PubMed] [Google Scholar]
- Searles JS, Helzer JE, Rose GL, & Badger GJ (2002). Concurrent and retrospective reports of alcohol consumption across 30, 90 and 366 days: Interactive voice response compared with the timeline follow back. Journal of Studies on Alcohol, 63, 352–362. [DOI] [PubMed] [Google Scholar]
- Searles JS, Helzer JE, & Walter DE (2000). Comparison of drinking patterns measured by daily reports and timeline follow back. Psychology of Addictive Behaviors, 14, 277–286. doi: 10.1037/0893-164X.14.3.277 [DOI] [PubMed] [Google Scholar]
- Shiffman S (2016). Ecological momentary assessment. In Sher KJ (Ed.), The Oxford handbook of substance use and substance use disorders (Vol. 2, pp.466–509). New York, NY: Oxford University Press. [Google Scholar]
- Sobell LC, Agrawal S, Sobell MB, Leo GI, Young LJ, Cunningham JA, & Simco ER (2003). Comparison of a quick drinking screen with the timeline followback for individuals with alcohol problems. Journal of Studies on Alcohol, 64, 858–861. [DOI] [PubMed] [Google Scholar]
- Sobell LC, Cellucci T, Nirenberg TD, & Sobell MB (1982). Do quantity-frequency data underestimate drinking-related health risks? American Journal of Public Health, 72, 823–828. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sobell LC, & Sobell MB (1992). Timeline follow-back. In Litten RZ & Allen JP (Eds.), Measuring alcohol consumption (pp. 41–72). Totowa, NJ: Humana Press. [Google Scholar]
- Sobell LC, & Sobell MB (1995). Alcohol consumption measures. Assessing Alcohol Problems, 4, 55–76. [Google Scholar]
- Stahre M, Naimi T, Brewer R, & Holt J (2006). Measuring average alcohol consumption: The impact of including binge drinks in quantity-frequency calculations. Addiction, 101, 1711–1718. doi: 10.1111/j.1360-0443.2006.01615.x [DOI] [PubMed] [Google Scholar]
- Stockwell T, Donath S, Cooper-Stanbury M, Chikritzhs T, Catalano P, & Mateo C (2004). Under-reporting of alcohol consumption in household surveys: A comparison of quantity-frequency, graduated-frequency and recent recall. Addiction, 99, 1024–1033. [DOI] [PubMed] [Google Scholar]
- Straus R, & Bacon SD (1953). Drinking in college. New Haven, CT: Yale University Press. [Google Scholar]
- Turner C (1990). How much alcohol is in a ‘standard drink’? An analysis of 125 studies. British Journal of Addiction, 85, 1171–1175. [DOI] [PubMed] [Google Scholar]
- Tversky A, & Kahneman D (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207–232. [Google Scholar]
- U.S. Department of Health and Human Services. (2007). The Surgeon General’s call to action to prevent and reduce underage drinking. Washington, DC: U.S. Department of Health and Human Services, Office of the Surgeon General. [PubMed] [Google Scholar]
- Vinson DC, MacLure M, Reidinger C, & Smith GS (2003). A population-based case-crossover and case-control study of alcohol and the risk of injury. Journal of Studies in Alcohol, 64, 358–366. [DOI] [PubMed] [Google Scholar]
- Wechsler H, & Nelson TF (2001). Binge drinking and the American college students: What’s five drinks? Psychology of Addictive Behaviors, 15, 287–291. [DOI] [PubMed] [Google Scholar]
- Wiers RW, Hoogeveen KJ, Sergeant JA, & Gunning WB (1997). High- and low-dose alcohol-related expectancies and the differential associations with drinking in male and female adolescents and young adults. Addiction, 92, 871–888. [PubMed] [Google Scholar]
- Zemore SE, Karriker-Jaffe K, & Mulia N (2013). Temporal trends and changing racial/ethnic disparities in alcohol problems: Results from the 2000 to 2010 National Alcohol Surveys. Journal of Addiction Research and Therapy, 4, 160. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.