Expired-air carbon monoxide (CO) is commonly used to biochemically verify smoking status. The CO cutoff and CO monitor brand may affect the probability of classifying smokers as abstinent, thus influencing conclusions about the efficacy of cessation trials. No systematic reviews have tested this hypothesis. Therefore, we performed a meta-analysis examining whether the likelihood of smoking cessation classification varied due to CO cutoff and monitor brand.
Eligible studies (k=122) longitudinally assessed CO-verified cessation in adult smokers in randomized trials. Primary meta-regressions separately assessed differences in quit classification likelihood due to continuous and categorical CO cutoffs (Low, 3–4 parts per million [ppm]; [SRNT] Recommended, 5–6 ppm; Moderate, 7–8 ppm; and High, 9–10 ppm); exploratory analyses compared likelihood outcomes between monitor brands: Bedfont and Vitalograph.
The likelihood of quit classification increased 18% with each 1 ppm increase above the lowest cutoff (3 ppm). Odds of classification as quit significantly increased between each cutoff category and High: 261% increase from Low; 162% increase from Recommended; and 150% increase from Moderate. There were no differences in cessation classification between monitor brands.
As expected, higher CO cutoffs were associated with greater likelihood of cessation classification. The lack of CO monitor brand differences may have been due to model-level variance not able to be followed up in the present dataset. Researchers are advised to report outcomes using a range of cutoffs—including the recommended range (5–6 ppm)—and the CO monitor brand/model used. Using higher CO cutoffs significantly increases likelihood of quit classification, possibly artificially elevating treatment strategies.
Keywords: Smoking Cessation, Bioverification, Expired-air Carbon Monoxide, Meta-analysis
Smoking cessation randomized clinical trials are the gold standard for determining the efficacy of interventions to promote quitting. Biochemical verification of abstinence in smoking cessation trials adds rigor to study methods and is strongly encouraged for use, when possible (Benowitz et al., 2020). In other words, verifying smoking status objectively—rather than by self-report—is critical for accurately determining the efficacy of cessation treatments. Self-reported smoking status can be subject to bias—including recall bias and social desirability reporting—in which smokers unintentionally or intentionally misreport their smoking behavior (Gorber et al., 2009).
Cotinine and expired-air carbon monoxide (CO) testing are two of the most common forms of biochemical verification of smoking status. While both methods are feasible to do as point-of-care testing, expired-air CO has the advantage of being less costly, less invasive to obtain, and not affected by the use of nicotine replacement products (Benowitz et al., 2020, 2002). Historically, the common threshold (i.e., cutoff) of < 8–10 parts per million (ppm) for CO was considered indicative of abstinence, and was the gold standard of detection endorsed by Society for Research on Nicotine and Tobacco (SRNT) (Benowitz et al., 2002).
However, that cutoff recommendation was made in 2002 and since that time, there has been increasing skepticism about its validity. More recently, research has supported lowering the threshold to 6 ppm and potentially to as low as 3 ppm as a means of increasing specificity (Cropsey et al., 2006, 2014; Deveci et al., 2004; Javors et al., 2005; Low et al., 2004; MacLaren et al., 2010; Perkins et al., 2013). Single sample non-randomized studies assessing the impact of reducing the CO threshold on cessation prevalence found mixed results, depending on which CO cutoff was employed (Brose et al., 2013; Cropsey et al., 2008). For example, Cropsey et al. (2008) found that the prevalence of cessation doubled when comparing a cutoff of 3 ppm versus the standard 10 ppm (18.4% vs. 37.2% at end of treatment). Further, while Brose et al. (2013) indicated no significant differences when using < 10 ppm vs. < 8 ppm (35% vs. 34.7%), they found that the prevalence of cessation reduced significantly when compared to a < 3 ppm cutoff (26.3%). Taken together, these findings suggest that the CO cutoff utilized to determine abstinence does impact cessation prevalence. Yet another important confounding variable to consider is the type of CO monitor used—prior research has demonstrated that CO results from two commonly used monitor brands differ significantly (Karelitz et al., 2017).
Based on findings from these studies, SRNT recently updated biomarker verification recommendations for tobacco use and abstinence and reduced the CO threshold recommendation for smoking cessation to 5–6 ppm, while also recommending that all studies report which model CO monitor was utilized (Benowitz et al., 2020). However, these recommendations are based on a narrative synthesis of the literature and did not distinctly measure the impact of using different CO cutoffs on smoking cessation prevalence. The purpose of this study was to conduct a meta-analysis of published smoking cessation randomized trials to examine whether the likelihood of being classified as quit varies due to use of different CO cutoffs, ranging 3–10 ppm. We also explored whether cessation outcomes differed by the brand of CO monitor used.
2.1. Search strategy and selection criteria
We conducted a literature search in April 2020 in PubMed using the following combination of keywords/terms:
(“Bupropion”[MeSH Terms] OR “varenicline”[MeSH Terms] OR “Tobacco Use Cessation Devices”[MeSH Terms]) AND (“smoking cessation”[MeSH Terms] OR (“smoking”[All Fields] AND “cessation”[All Fields]) OR “smoking cessation”[All Fields]) AND Clinical Trial[ptyp] AND (“2010/01/01”[PDAT] : “2020/12/31”[PDAT]) AND “humans”[MeSH Terms] AND English[lang] AND “adult”[MeSH Terms]
In addition, we also solicited articles via the SRNT Treatment Research Network listserv to obtain additional studies for inclusion.
Article titles and abstracts were reviewed by the authors to determine eligibility. Studies were eligible for inclusion if they: (a) longitudinally assessed cessation in adult smokers (i.e., ages ≥18 years); (b) randomized participants to treatment groups; (c) recruited ≥ 50 participants (Nüesch et al., 2010); (d) used expired-air CO to confirm abstinence; (e) reported the CO cutoff that was used; (f) presented original data (i.e., secondary analyses were not included) published no earlier than 2010 in a peer-reviewed journal; and (g) were written in English. Our goal was not to provide a comprehensive review of the entire smoking cessation trial literature, therefore we limited inclusion of studies to only those published since 2010. We expected a literature search within this ten year period to identify an adequate number of relevant studies to allow us to thoroughly examine our research question using the meta-analytic procedures outlined below.
2.2. Study coding and data extraction
Nine coders—three authors (JLK, EAM, CWC) and six assistants—were trained on procedures for independently coding data from eligible studies. In order to standardize data extraction, data were coded into an online database via a survey programmed in Qualtrics (Qualtrics, 2020). Data from all studies were double-entered; issues encountered during data coding and discrepancies between coders were resolved in consultation with authors JLK and EAM. Extracted variables included: sample size, intervention/treatment type, follow-up period (in weeks), CO cutoff, brand of CO monitor used, and the proportion of participants who were classified as quit (i.e., the dependent measure). Cessation outcomes from all follow up periods that were reported were coded separately for each intervention type (Lipsey and Wilson, 2001).
2.3. Data synthesis and analysis plan
Data were analyzed using meta-regression with restricted maximum likelihood (REML) estimation in Comprehensive Meta-Analysis 3.0 (CMA) (Borenstein et al., 2013). CMA software converted cessation proportions (i.e., percent of each treatment group with CO below the respective criterion) into logits for use as the effect size in all analyses. These normally distributed logits are preferred over proportions, as the latter are constrained between 0 and 1. Analysis of proportion data often results in underestimated confidence intervals and overestimated levels of heterogeneity (Lipsey and Wilson, 2001). For all results, β indicate logit regression coefficients and ‘odds’ refer to exponentiated β’s.
As our aim was to examine whether the likelihood of classifying smokers as abstinent varied due to the CO cutoff used, and not to compare specific intervention or treatment efficacy, analyses collapsed across treatment subgroups and used study as the level of analysis (Borenstein et al., 2001). Effect sizes were averaged across follow-up periods, where applicable (due to limitations of CMA software, analyzing effect of follow up period would require treating this repeated measures data as independent, increasing risk of Type I error). All analyses used random-effects models which allowed for estimations of both between-study (T2) and within-study sources of variance (Nikolakopoulou et al., 2014). To quantify the proportion of variance explained for each model with covariates, we calculated R2 using the following formula:
Effect size heterogeneity was assessed using Cochran’s Q and I2 statistics. When significant, Cochran’s Q indicates that the heterogeneity in effect sizes between studies is not due to random error (Higgins and Thompson, 2002). The I2 statistic reflects the amount of variance between studies due to real differences (i.e., not sampling error). In other words, I2 quantifies the proportion of heterogeneity that may possibly be explained by covariates—values of 0, <30, and >50% indicate no, moderate, and high levels of heterogeneity, respectively (Higgins and Thompson, 2002).
2.3.1. Preliminary analysis and publication bias
A preliminary intercept-only meta-regression model was first estimated to determine whether the overall prevalence of cessation was greater than zero. Publication bias was then assessed across all studies using Kendall’s tau (Begg and Mazumdar, 1994) and Duval and Tweedie’s trim and fill method (Duval, 2005; Duval and Tweedie, 2000). Kendall’s tau looks for an inverse correlation between effect size and sample size, which would indicate publication bias (i.e., whether large studies tended to have small effect sizes and small studies tended to have large effect sizes). Duval and Tweedie’s trim and fill method imputes values for missing studies needed to balance the funnel plot and tests whether these imputed values affect the overall effect size.
2.3.2. Primary analyses
Primary analyses used random effects meta-regression to assess whether the CO cutoff used to determine quit status was related to the cessation effect size. We separately examined CO cutoff as a continuous (centered at 3 ppm) and categorical covariate. Categories for CO cutoffs were guided by the most recent recommendations for using CO to verify smoking abstinence (Benowitz et al., 2020): Low (3–4 ppm; k = 13), Recommended (5–6 ppm; k = 16), Moderate (7–8 ppm; k = 29), and High (9–10 ppm; k = 64).
2.3.3. Secondary analyses
As previously discussed, earlier research by Karelitz et al. (2017) found that CO values can vary due to the brand of CO monitor used. Therefore, secondary analyses examined the potential moderating effect of CO monitor brand (Vitalograph or Bedfont only) on cessation effect size with the inclusion of dichotomous covariate, CO monitor, and the interaction of CO monitor by continuous CO cutoff.
2.3.4. Exploratory analyses
Exploratory analyses examined whether adjusting CO cutoffs based on monitor brand would affect associations with cessation effect size. We used data from an earlier study of 654 pairs of consecutively obtained CO values from Vitalograph and Bedfont monitors (Karelitz et al., 2017) to derive equivalent CO values between these monitor brands. Vitalograph CO values were regressed on Bedfont CO values to obtain conversion equation:
Using this conversion equation to adjust Vitalograph CO cutoffs resulted in values < 5 ppm each being increased by 1 ppm and those 6–10 ppm increasing by 2 ppm.
3.1. Characteristics of included studies
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Moher et al., 2009) is displayed in Figure 1. The literature search identified 2,279 studies and an additional 14 publications were found through searching reference lists and responses to our listserv request. Following removal of duplicates, titles and abstracts of 2,173 articles were screened and 596 full-texts were assessed for eligibility. Overall, a total of 122 individual studies provided 605 effect sizes.
Figure 1.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of study selection.
3.1.1. Study-level characteristics
Study characteristics are presented in Supplementary Table 1. Most studies (116 out of 122) provided data for more than one follow-up period. Duration of follow-up periods (in weeks) ranged from 1 to 64, with a mean (SD) of 20.5 (14.9) (results of sensitivity analyses excluding follow-up periods < 4 weeks (11 effect sizes excluded) were not different from those of analyses including all follow-up periods; all follow-up periods were included in results detailed below). The number of subgroups per study ranged 2 to 6, with a mean (SD) of 2.3 (0.8). Less than half of the studies reported the brand of CO monitor used (k=51); 52 authors responded to our email inquiry with the missing CO monitor information. Most studies used a Bedfont CO monitor (k=73), while others used Vitalograph (k=19), a mix of Bedfont and Vitalograph (k=2), other brands (k=7), or did not know which brand they used (k=3). The number of participants analyzed per study ranged from 20 to 1841 (combined n = 46,949), with a mean (SD) of 384.8 (368.0). Due to participant attrition or exclusion following randomization to treatment groups, two studies analyzed fewer than the n ≥ 50 recruited (sensitivity analyses excluding these two studies provided results consistent with analyses including all studies, therefore these two studies were included in results reported in subsequent sections).
3.2. Preliminary analyses
3.2.1. Overall cessation classification
The overall proportion classified as quit, 27.86% (SE = 1.08), was significantly greater than zero when collapsing across follow up periods and interventions, β = −1.28, 95% CI [−1.43, −1.12], t(121) = −16.22, p < 0.001. Cessation effect sizes varied significantly, Q(121) = 3584.65, p < 0.001, with between-study variance, T2, estimated at 0.70. Almost all observed variance (I2 = 96.62%) reflected differences in study effects.
3.2.2. Publication bias
Kendall’s tau was not significant, τ = −0.12, Zτ = 1.95, p > 0.05, suggesting no publication bias. Duval and Tweedie’s trim and fill method found no missing studies to the left of the mean, further supporting an absence of publication bias.
3.3. Primary meta-regression analyses
3.3.1. Continuous CO cutoff
CO cutoff (centered at 3 ppm) was significantly associated with cessation effect sizes, β = 0.17, SE = 0.04, 95% CI [0.10, 0.24], t(120) = 4.66, p < 0.001. As illustrated in Figure 2, for each one unit increase in CO cutoff above 3 ppm, the odds of being classified as abstinent increased by 18%. CO cutoff explained 15% of the variance in effect sizes, R2 = 0.15. There was significant heterogeneity across effect sizes, Q(120) = 3226.76, p < 0.001, with between-study variance, T2, estimated at 0.59 and I2 of 96.28%.
Figure 2.
Meta-regression estimated likelihood of being classified as abstinent by the expired-air carbon monoxide cutoff (CO; in ppm units) used to determine smoking status. Each one ppm increase in CO cutoff resulted in an 18% increase in likelihood of being classified as abstinent.
3.3.2. Categorical CO criteria
Regression-estimated percent abstinent (with 95% CIs) by CO cutoff category are presented in Table 1. Overall, there was significant heterogeneity in effect sizes between CO cutoff categories, F(3, 118) = 5.70, p = 0.001. The odds of being classified as abstinent significantly increased between each cutoff category and High (9–10 ppm): 261% increase from Low (3–4 ppm), p < 0.001; 162% increase from Recommended (5–6 ppm), p = 0.04; and a 150% increase from Moderate (7–8 ppm), p = 0.03.
Table 1.
Meta-regression estimated percent abstinent and 95% confidence intervals by CO cutoff categories.
CO Cutoff Category | k | Percent Abstinent | 95% Confidence Interval |
LL | UL | |||
Low (3–4 ppm) | 13 | 13.78 | 8.57 | 22.15 |
Recommended (5–6 ppm) | 16 | 22.21 | 14.77 | 33.19 |
Moderate (7–8 ppm) | 29 | 23.91 | 17.67 | 32.36 |
High (9–10 ppm) | 64 | 35.94 | 29.38 | 43.97 |
Note. Values estimated using meta-regression; CO is expired-air carbon monoxide; k is number of studies; LL is lower limit; UL is upper limit; ppm is parts per million of CO.
Relative to the Low category (3–4 ppm), the odds of being classified as abstinent were not significantly different than Recommended (5–6 ppm), p = 0.13, or Moderate (7–8 ppm) categories, p = 0.05. Similarly, cessation classification odds were not different between the Recommended (5–6 ppm) and Moderate (7–8 ppm) categories, p = 0.77.
3.4. Secondary analyses
Analyses involving CO monitor brands were restricted to studies reporting having exclusively used one of the two most commonly reported brands: Bedfont (k = 73) and Vitalograph (k = 19). There was no difference in effect sizes between CO monitor brands, β = 0.31, 95% CI [−0.82, 1.43], t(88) = 0.54, p = 0.59. Similarly, the interaction of CO cutoff by CO monitor brand was not significant, β = −0.05, 95% CI [−0.26, 0.15], t(88) = −0.52, p = 0.60.
3.5. Exploratory analyses
To equate CO cutoffs between monitor brands, Vitalograph cutoffs ranging 1–5 ppm were increased by 1 ppm and those 6–10 ppm were adjusted upwards by 2 ppm. Only studies having reported using Vitalograph or Bedfont monitors were included in exploratory analyses (k = 92). Using the adjusted CO cutoffs, neither the main effect of CO monitor brand nor the interaction of adjusted cutoffs by monitor brand had a significant association with cessation effect size, ps > 0.44.
Biochemical verification of smoking status is crucial for the rigorous evaluation of cessation. Complicating such measurement is the variation in cutoffs used to classify participants as abstinent or not. Focusing on one commonly used bioverification method—expired-air CO—we used meta-analysis techniques to examine how the likelihood of being classified as abstinent varied across a range of cutoffs among randomized smoking cessation trials. Overall, we identified a significant amount of heterogeneity in the likelihood of being classified as abstinent across all studies. Importantly, CO cutoff was found to significantly affect cessation classification. As expected, studies using higher CO cutoffs to determine smoking status were more likely to classify participants as abstinent than those using lower cutoffs. The likelihood of being classified as abstinent increased with higher cutoffs, with an 18% rise in classification with each 1 ppm increase above 3 ppm—the lowest cutoff used among included studies.
Studies using cutoffs at the higher end of measurement may incorrectly classify nonabstinent smokers as being abstinent, leading to cessation outcomes that may not be indicative of real world patterns and quit success. On the other hand, cessation levels within studies using lower cutoffs would seem relatively underwhelming compared to those using higher cutoffs, which has implications on further evaluation of those strategies or implementation and adoption into the treatment of tobacco use disorder. The absolute proportion quit reported in a study likely influences future work and clinician adoption, without the consideration of how abstinence was determined.
Comparing between categorical CO cutoffs, we found that studies using the highest cutoffs 9–10 ppm were 261% more likely to classify participants as quit than those using cutoffs 3–4 ppm, consistent with earlier studies that found a similar pattern of cessation classification when comparing between low and high CO cutoffs within their respective samples (Brose et al., 2013; Cropsey et al., 2008). However, we also observed significant differences between the middle cutoff categories (i.e., 5–6 and 7–8 ppm) versus the highest 9–10 ppm category—contrary to Brose et al. (2013). It is important to note that each of these earlier studies examined how adjusting the CO cutoff affected quit proportions within their respective samples, whereas the current project compared between studies. It is unclear whether the current findings would generalize to within-study comparisons. Additional research is needed to confirm our findings by examining within-study differences in cessation classification across a range of CO cutoffs.
The likelihood of being classified as quit did not vary between the two most commonly used CO monitor brands Vitalograph and Bedfont. While contrary to earlier research documenting differences in measurement between these monitor brands (Karelitz et al., 2017), the current null findings could be due to within-brand model-level idiosyncrasies. The Vitalograph BreathCO model has been available since 1999 (Vitalograph USA, 1999), whereas Bedfont has released 13 different models since 2000 (Covita, 2020). A recent study by Tuck et al. (2020) found significant variation in CO measurement between different models of Bedfont CO monitors, supporting the notion that model-level variance among Bedfont monitors may have hindered our ability to detect brand-level effects. Under reporting of CO monitor model information prevented further probing for such an effect in the current project. Future work should examine the impact of model-level variance on CO outcomes and conclusions.
Results should be interpreted in consideration of the study limitations. It is possible that not all potentially eligible studies were identified in our literature search, given the vast amount of smoking cessation research in the literature. While we made efforts to identify additional studies outside of the literature search (i.e., SRNT listserv request), our meta-analysis may have unintentionally missed otherwise eligible studies. Excluding studies that recruited fewer than 50 participants may have prevented inclusion of otherwise eligible studies. This decision was made to prevent publication bias, the risk of which can increase when smaller studies are included in meta-analyses (Nüesch et al., 2010). We did not assess the influence of intervention type on cessation effect size, a variable that could have contributed to unexplained heterogeneity. Stated earlier, our aim was to assess how the CO cutoff used affected cessation effect sizes, not to gauge treatment or intervention effectiveness. Additional research is needed to determine whether CO cutoffs affect cessation outcomes within and between intervention types.
Findings from the current meta-analysis can inform design and reporting of findings from future smoking cessation studies. Overall, the current findings suggest that use of cutoffs ≥ 9 ppm can lead to significantly more participants being classified as quit compared to cutoffs ≤ 8 ppm. Researchers choosing to use CO cutoffs greater than the SRNT-recommended 5–6 ppm (Benowitz et al., 2020) in their cessation studies should also report outcomes when lower cutoffs are applied to allow for cross-study comparisons. Almost 60% of included studies did not report any information on the CO monitor used to confirm abstinence, consistent with an earlier literature search of smoking studies (Karelitz et al., 2017). Given that quit status is typically the primary dependent variable examined in smoking cessation research, it is important to identify how this variable was assessed, including information on the brand and model of the CO monitor. As shown here, the CO cutoff used has a large impact on abstinence classification and absolute rates of abstinence reported in smoking cessation publications. Consistent reporting of this CO monitor information—as earlier recommended (Benowitz et al., 2020; Karelitz et al., 2017)—would allow future research to explore variation in cessation outcomes due to CO monitor brand and model.
In conclusion, greater transparency is needed in the reporting of findings from smoking cessation research. We have shown that using a CO cutoff above the SRNT recommended 5–6 ppm (Benowitz et al., 2020) results in a greater likelihood of classifying participants as abstinent, potentially leading to artificially inflated estimates of abstinence. Therefore, reporting abstinence outcomes using a range of cutoffs (i.e., 6 ppm, 8 ppm, and 10 ppm) would provide transparency in results, allow for cross-study comparisons, and better inform decisions regarding novel treatments or strategies for tobacco use disorder. Additionally, studies relying on expired-air CO to determine smoking status should report the brand and model of the CO monitor used. It is standard practice to provide the names and citations for measures used to quantify dependent variables in smoking cessation research (e.g., withdrawal, craving, self-efficacy, etc.); providing the brand and model of the CO monitor should not be the exception to this rule.
Supplementary Material
Bioverification of smoking abstinence is critical for any study assessing cessation
Likelihood of quit classification increased 18% with each 1 ppm increase over 3 ppm
More transparency is needed in reporting of smoking cessation research
Cessation outcomes need to be reported across several CO cutoff levels
Details of the CO monitor brand and model need to be reported in cessation studies
The authors wish to thank Elizabeth Bradley, Elizabeth Chapman, Violet Fratzke, Jenny Nankoua, Benjamin Laprise, and Peter Leahy at the Medical University of South Carolina, who assisted in data coding and extraction. The authors would also like to thank the researchers who responded to our email requesting additional information about the CO monitor used in their studies.
Role of Funding Source
This project was supported by National Institutes of Health grants (T32CA186873 [PI: Yuan]; R37CA237245 [PI: McClure]; K01DA043413 [PI: Pacek]; R01DA046096-01A1 [PI: Cropsey]; R01DA044112-01A1 [PI: Cropsey]; R01DA039678-01A1 [PI: Cropsey]; U01AA020802 [PI: Cropsey]). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of Interest
No conflict declared.
