Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jan 13.
Published in final edited form as: Ann Appl Stat. 2012;6(4):1689–1706. doi: 10.1214/12-AOAS557

Truth and Memory: Linking Instantaneous and Retrospective Self-Reported Cigarette Consumption

Hao Wang §,*,, Saul Shiffman , Sandra D Griffith ‖,*, Daniel F Heitjan ‖,
PMCID: PMC3889075  NIHMSID: NIHMS474004  PMID: 24432181

Abstract

Studies of smoking behavior commonly use the time-line follow-back (TLFB) method, or periodic retrospective recall, to gather data on daily cigarette consumption. TLFB is considered adequate for identifying periods of abstinence and lapse but not for measurement of daily cigarette consumption, thanks to substantial recall and digit preference biases. With the development of the hand-held electronic diary (ED), it has become possible to collect cigarette consumption data using ecological momentary assessment (EMA), or the instantaneous recording of each cigarette as it is smoked. EMA data, because they do not rely on retrospective recall, are thought to more accurately measure cigarette consumption. In this article we present an analysis of consumption data collected simultaneously by both methods from 236 active smokers in the pre-quit phase of a smoking cessation study. We define a statistical model that describes the genesis of the TLFB records as a two-stage process of mis-remembering and rounding, including fixed and random effects at each stage. We use Bayesian methods to estimate the model, and we evaluate its adequacy by studying histograms of imputed values of the latent remembered cigarette count. Our analysis suggests that both mis-remembering and heaping contribute substantially to the distortion of self-reported cigarette counts. Higher nicotine dependence, white ethnicity and male sex are associated with greater remembered smoking given the EMA count. The model is potentially useful in other applications where it is desirable to understand the process by which subjects remember and report true observations.

Keywords and phrases: Bayesian analysis, heaping, latent variables, longitudinal data, smoking cessation

1. Introduction

A common technique for eliciting consumption in studies of substance abuse is the time-line follow-back (TLFB) method, in which one asks subjects to report daily consumption retrospectively over the preceding week, month or other designated period. In smoking cessation research, for example, TLFB is one important method for measuring cigarette consumption and defining periods of quit and lapse.

Although TLFB is a practical approach to quantifying average smoking behavior (Brown et al. 1998), TLFB data can harbor substantial errors as measures of daily consumption (Klesges et al. 1995). TLFB questionnaires request exact daily cigarette counts, which smokers are unlikely to remember, particularly after several days have passed. Moreover some smokers may understate consumption to avoid the social stigma attached to excessive smoking or an inability to quit (Boyd et al. 1998). Thus smoking cessation studies typically require validation of TLFB reports of zero consumption by biochemical measurement of exhaled carbon monoxide or nicotine metabolites from saliva or blood.

A second concern is that histograms of TLFB-derived daily cigarette counts commonly exhibit spikes at multiples of 20, 10 or even 5 cigarettes. This phenomenon, known as “digit preference” or “heaping”, is thought to reflect a tendency to report consumption in terms of packs (each pack in the US contains 20 cigarettes) or half or quarter packs. The heaps presumably arise because many smokers do not remember precisely how many cigarettes they smoked and therefore report their count rounded off to a nearby convenient number. It has also been hypothesized that some smokers consume exactly an integral number of packs per day as a self-rationing strategy (Farrell, Fry and Harris 2003), but evidence so far suggests that such behavior, if it exists, causes only a small fraction of the observed heaping (Wang and Heitjan 2008). Indeed, Klesges et al. (1995) observed that the distribution of biochemical residues of smoking is smooth, suggesting that heaping is a phenomenon of reporting rather than consumption.

Recall bias and heaping bias in self-reported longitudinal cigarette counts potentially affect estimates of both means and treatment effects. Moreover, heaping may lead to underestimation of within-subject variability, thanks to smokers who regularly report one pack rather than a precise count that varies around some mean in the vicinity of 20. If a large enough fraction of subjects in a study are of this kind, estimates of both within-subject and between-subject variability can be distorted.

Although there has been substantial research on statistical modeling of heaping and digit preference in a range of disciplines (Heitjan and Rubin 1990; Heitjan and Rubin 1991; Ridout and Morgan 1991; Pickering 1992; Klerman 1993; Torelli and Trivellato 1993; Dellaportas et al. 1996; Roberts and Brewer 2001; Wright and Bray 2003; Wolff and Augustin 2003), the only such application in smoking cessation research is that of Wang and Heitjan (2008), who described a latent-variable rounding model for heaped univariate TLFB cigarette count data. They postulated that the reported cigarette count is a function of the unobserved true count and a latent heaping behavior variable. The latter can take one of four values, representing exact reporting, rounding to the nearest 5, rounding to the nearest 10, and rounding to the nearest 20. Except for “exact” reporters (i.e., those who report counts not divisible by 5), one obtains at best partial information on the true count and the heaping behavior. They analyzed univariate count data from a smoking cessation clinical trial, assuming a zero-inflated negative binomial distribution for the true underlying counts together with an ordered categorical logistic selection model for heaping behavior given true count.

The analysis of Wang and Heitjan (2008) has three important limitations: First, they included only data from the last day of eight weeks of treatment, ignoring the 55 preceding days. Second, they assumed — without empirical verification — that reported counts not divisible by 5 were accurate. And third, they assumed that the preference for counts ending in 0 or 5 actually represented rounding rather than some other form of reporting error. That is, a declared count of 20 cigarettes was taken to mean that the true count was somewhere between 10 and 30 cigarettes, and was merely misreported as 20. In the absence of more accurate data on the true, underlying count, attempts to model heaping must rely on some such assumptions.

Precise assessment of smoking behavior has taken on increasing importance as researchers explore the value of reducing consumption as a way to lessen the harms of smoking (Shiffman et al. 2002, Hatsukami et al. 2002) and to improve the chance of ultimately quitting (Shiffman et al. 2009, Cheong et al. 2007). The advent of the inexpensive hand-held electronic diary (ED) that allows the instantaneous recording of ad libitum smoking has created the possibility of making much more accurate measurements. Such evaluation is an instance of ecological momentary assessment (EMA; Stone and Shiffman 1994), in that it generates records of events logged as they occur in real-life settings. In Shiffman (2009), researchers asked 236 participants in a smoking cessation study to use a specially programmed ED to record each cigarette as it was smoked over a 16-day pre-quit period; moreover the ED periodically prompted the smokers to record any cigarettes they had missed. At days 3, 8 and 15, subjects visited the clinic to complete a TLFB assessment of daily smoking since the preceding visit (2, 5 or 7 days previously), stating how many cigarettes they had smoked each day. The study found that while the TLFB data contained the expected heaps at multiples of 10 and 20, the EMA data had practically none. Average smoking rates from the two methods were moderately correlated (r = 0.77), but the within-subject correlation of daily consumption between TLFB and EMA was modest (r = 0.29). Self-report TLFB consumption was on average higher than EMA (by 2.5 cigarettes), but on 32% of days, subjects recorded more cigarettes by EMA than they later recalled by TLFB.

These data provide us with an opportunity — unprecedented, so far as we know — to study the relationship between self-reports of daily cigarette consumption by TLFB and EMA. To describe this relationship, we develop a statistical model with two components: The first is a regression that predicts the patient's notional “remembered” cigarette count (a latent factor) from the EMA count. The second is a regression that predicts the rounding behavior — described as in Wang and Heitjan (2008) with an ordinal logistic regression — from the remembered count and fully observed predictors. The models include random subject effects that describe the propensities of the subjects to mis-remember their actual consumption (in the first component) and to report the remembered consumption with a characteristic degree of accuracy (in the second). Assuming that EMA represents the true count, the first component of the model allows us to examine the recall bias resulting from mis-remembering, while the second component describes the heaped reporting errors.

2. Notation and model

Let Yit denote the observed heaped TLFB consumption for subject i on day t, i = 1,…, n, t = 1,…, mi, and let Yi = (Yi1,…, Yimi)T denote the vector of TLFB data for subject i. Let Xit be the EMA consumption on subject i, day t, and let Xi = (Xi1,…, Ximi)T be the vector of EMA data for subject i. We furthermore let Zi=(ZiR,ZiH) be a vector of baseline predictors for subject i, with ZiR representing predictors of recall and ZiH predictors of heaping. These predictor sets may overlap.

2.1. A model for remembered cigarette count

The first part of our model assumes that for each day and subject there is a notional remembered cigarette count, denoted Wit(Wi = (Wi1, …, Wimi)T). We assume Wit is distributed as Poisson conditionally on a random effect bi, the EMA smoking pattern Xit and the covariate vector Zi, with mean

E(Wit|Xit,Zi,bi)=exp(β0+ln(Xit)β1+ZiRβ2+bi). (2.1)

The parameters β1 and β2 represent the effects of EMA consumption and baseline predictors, respectively, on the latent remembered count. The random effect bi, which we assume normally distributed with mean 0 and variance σb2, represents heterogeneity among subjects. We note that there are no 0 values of Xit in the Shiffman data, which are from a pre-quit study in which subjects were encouraged to smoke as normal. Thus we can include ln(Xit) as a predictor. In more general contexts where 0 EMA counts are possible, one can adjust the model in simple ways to avoid this problem. Moreover when excessive 0 counts occur in the TLFB data, one can fit a zero-inflated count model, as in Wang and Heitjan (2008), for the remembered count.

2.2. A model for the latent heaping process

Following Wang and Heitjan (2008), we assume that a latent rounding indicator Git(Gi = (Gi1, …, Gimi)T) dictates the degree of rounding to be applied to the notional remembered count Wit. Specifically, we let Git take one of four possible values: Git = 1 implies reporting the exact count, Git = 2 implies rounding to the nearest multiple of 5, Git = 3 implies rounding to the nearest multiple of 10, and Git = 4 implies rounding to the nearest multiple of 20. We assume that the probability distribution of the heaping indicator depends on Wit, a subject-level random effect uiN(0,σu2) that is independent of bi, and a baseline predictor vector ZiH. Specifically, we propose the following proportional odds model for the conditional distribution of Git:

f(Git|Wit,Zi,ui)={1q(γ1+ηit+ui),ifg=1;q(γ1+ηit+ui)q(γ2+ηit+ui),ifg=2;q(γ2+ηit+ui)q(γ3+ηit+ui),ifg=3;q(γ3+ηit+ui),ifg=4. (2.2)

Here ηit=Witγ0+ZiHβ3, and q(·) is the inverse logit function q(x) = exp(x)/(1 + exp(x)). The parameters γ1 > γ2 > γ3 refer to the successive intercepts of the logistic regressions, γ0 refers to its slope with respect to the remembered count, and β3 refers to its slopes with respect to the vector of heaping predictors ZiH. The random effect ui describes between-subject differences in heaping propensity not otherwise accounted for in the model.

2.3. The coarsening function

As in Wang and Heitjan (2008), the model links the observed Yit to the latent Wit and Git via the coarsening function h(·, ·):

Yit=h(Wit,Git),i=1,,n,t=1,,mi.

For example, at time t, subject i with Wit = 14 and Git = 1 reports h(14, 1) = 14, whereas h(14, 2) = 15, h(14, 3) = 10, and h(14, 4) = 20. Figure 1 illustrates this heaping mechanism.

Fig 1.

Fig 1

Reported cigarette count Y as a function of the underlying count W and the rounding behavior G.

A coarsened outcome yit may arise from possibly several (wit, git) pairs. We denote the set of such pairs as WG(yit) = {(wit, git): yit = h(wit, git)}. For example, a reported consumption of yit = 5 may represent a precise unrounded value ((wit, git) = (5, 1)) or rounding across a range of nearby values ((wit, git) ∈ {(3, 2), (4, 2), (5, 2), (6, 2), (7, 2)}). For subject i, the probability of the observed yit at time t is the sum of the probabilities of the (wit, git) pairs that would give rise to it. The density of reported consumption yit given the random effects can therefore be expressed as

f(yit|bi,ui)=(wit,git)WG(yit)f(wit|bi)f(git|wit,ui).

2.4. Estimation

We estimate the model by a Bayesian approach that employs importance sampling (Gelman et al. 2004; Tanner 1993) to avoid iterative simulation of parameters. The steps are as follows: We first compute the posterior mode and information using a quasi-Newton method with finite-difference derivatives (Dennis et al. 1983). We then approximate the posterior with a multivariate t5 density with mean equal to the posterior mode and dispersion equal to the inverse of the posterior information matrix at the mode. Next, we draw a large number (4,000) of samples from this proposal distribution, at each draw computing the importance ratio r of the true posterior density to the proposal density. We then use sampling-importance resampling (SIR) to improve the approximation of the posterior (Gelman et al. 2004). We evaluate posterior moments by averaging functions of the simulated parameter draws with the importance ratios r as weights. The choice of a t with a small number of degrees of freedom as the importance density is intended to balance the convergence of the MC integrals and the efficiency of the simulation.

Letting θ = (β0, β1, β2, β3, σb, γ1, γ2, γ3, γ0, σu), the likelihood contribution from subject i is

L(θ;yi)=t=1mi(wit,git)WG(yit)f(wit|bi)f(git|wit,ui)f(bi)f(ui)dbidui; (2.3)

we approximate the integral in (2.3) by Gaussian quadrature. We choose proper but vague priors for the parameters, which we assume are a priori independent (except for γj, j = 1, 2, 3, as noted below). The parameter β1 in the Poisson mixed model (2.1), representing the slope of the latent recall on the EMA recorded consumption, is given a normal prior β1N(1, 102), whereas the priors of the other regression parameters in both model parts are set to N(0, 102) subject to the constraint γ1 > γ2 > γ3. We assign the random-effect variances inverse-gamma priors with mean and SD both equal to 1, a reasonably vague specification (Carlin and Louis, 2000). We obtain the posterior mode and information using SAS PROC NLMIXED, and implement Bayesian importance sampling in R.

3. Model checking

With heaped data, the unavailability of simple graphical diagnostics such as residual plots complicates model evaluation. We therefore resort to examination of repeated draws of latent quantities from their posterior distributions, in the spirit of Bayesian posterior predictive checks (Rubin 1984; Gelman et al. 1996; Gelman et al. 2005). Specifically, we evaluate the adequacy of model assumptions using imputed values of the latent recall W, which we compare to its implied marginal distribution under the model.

Imputations of latent Wi and Gi are ultimately based on the posterior density f(θ|yi) of the model parameter θ given the observed data yi. Heitjan and Rubin (1990), sampling univariate y values, used an acceptance-rejection procedure to draw quantities analogous to our W and G from a confined bivariate normal distribution. In our model, the correlation within Wi and Gi vectors poses a challenge to simulation. Note however that given the subject-specific effects bi and ui, the components of Wi and Gi are independent. Thus, we can readily simulate (Wi, Gi) from the joint posterior of (Wi, Gi, bi, ui). For each simulated θ and the observed data yi, the posterior distribution of (Wi, Gi, bi, ui) is

f(wi,gi,bi,ui|yi,θ)=f(wi,gi,bi,ui|θ)f(yi|wi,gi,bi,ui,θ)f(yi|θ)

Because the values of wit and git together determine yit, we have that

f(yi|wi,gi,bi,ui,θ)=t=1miI((wit,git)WG(yit)),

where I is an indicator function. Accordingly,

f(wi,gi,bi,ui|yi,θ)f(wi,gi,bi,ui|θ)t=1miI((wit,git)WG(yit))=f(wi,gi|bi,ui,θ)f(bi,ui|θ)t=1miI((wit,git)WG(yit))=f(wi|bi,θ)f(gi|wi,ui,θ)f(bi,ui|θ)t=1miI((wit,git)WG(yit))=(t=1mif(wit|bi,θ)f(git|wit,ui,θ)I((wit,git)WG(yit)))f(bi|σb)f(ui|σu).

Thus given random effects bi and ui, the imputation of (wi, gi) is obtained by independent draws of (wit, git), t = 1,…,mi, which can be implemented as an acceptance-rejection procedure. We therefore impute the data as follows:

  1. Make independent draws, θ(k), k = 1,…,K from f(θ|yi) by SIR.

  2. Given θ(k), for i = 1,…, n, independently draw bi(k)N(0,σb(k)2) and ui(k)N(0,σu(k)2)..

  3. For i = 1,…, n, given θ(k) and bi(k), for t = 1,…, mi, draw wit(k) as Poisson with mean (2.1). Then given θ(k), ui(k) and wit(k), draw misreporting type git(k) from (2.2). If I((wit(k),git(k))WG(yit))=0, discard (wit(k),git(k)) and repeat this step until I((wit(k),git(k))WG(yit))=1.

To assess model fit, we plot K histograms of the imputed latent count ω. Implausible patterns in these histograms, such as peaks or troughs at multiples of 5, suggest incorrect modeling of the heaping. We can also base discrepancy diagnostics specifically on the fractions of reported consumptions that are divisible by 5.

4. Simulations

To examine the performance of our approach, we conducted simulations replicating the structure of the Shiffman data with m = 12 non-visit-day observations per subject. Each data set consisted of n = 100 subjects, and for simplicity we do not consider baseline covariates. For each subject we first set xi as an observed EMA count vector from the data and generated a random effect biN(0,σb2=0.09). We then generated Wit values as independent Poisson deviates with conditional mean (2.1). With β0 = 2.358, β1 = 0.2628, when bi = 0 and EMA count xit = 20, the mean latent recall is 23.2, and when xit = 30 it is 25.8. With the random effect distributed as designated above, the marginal mean recalls for xit = 20 and xit = 30 are 24.3 and 27.0, respectively.

Next we generated the latent heaping behavior indicator Git from (2.2). We set the parameters to their estimates from the Shiffman data: The intercepts γ1, γ2, γ3 were −1.485, −5.280 and −10.141, respectively, and the slope γ0 was 0.1098. We simulated the random effect uiN(0,σu2=7.1). Under this setting, when ui = 0 and wit = 22, the probability of exact reporting is 28.3%, and the probabilities of rounding to the nearest multiples of 5, 10 and 20 are 66.3%, 5.4% and 0.04%, respectively. When the latent count wit = 36, these probabilities are 7.8%, 71.2%, 20.8% and 0.2%, respectively. The simulated latent wit and git determined yit as illustrated in Figure 1.

These parameter values allow for considerable discrepancy between remembered and recorded consumption. To examine our methods when the latent recall and EMA match more closely, we conducted a second simulation under parameter values that gave better agreement. In this scenario, we assumed β0 = 0 and β1 = 1 with biN(0, 0.05). Thus when bi = 0, the expected precise recall E(wit) = xit, and the marginal mean recalls are 20.5 and 30.8 for EMA counts of 20 and 30, respectively. We set the parameters in the heaping behavior models at −1.07, −4.37, −6.52 and 0.088 for γ1, γ2, γ3 and γ0, respectively, and σu2=5.9. In this case,when uit = 0, the probabilities of reporting exactly and to the nearest multiples of 5, 10 and 20 for a true count of 22 are 29.6%, 62.3%, 7.1% and 1%, respectively.

Table 1 presents summaries of 100 simulations of estimates of the parameter θ = (β0, β1, σb, γ1, γ2, γ3, γ0, σu). Under both scenarios, the MLEs of the fixed-effect coefficients fell near the true values on average, with no more than 0.5% bias for the parameters in the recall model and no more than 2.7% bias for those in the heaping model. The random effects variance estimates are also well estimated, with bias less than 1%. The coverage probabilities of nominal 95% confidence intervals range from 93% to 98%, except for γ3 in Case 1, where coverage is only 80%. The poor coverage rate for this parameter is a consequence of instability in the inverse Hessian matrix; it can be improved by creating parametric bootstrap confidence intervals (Table 2). The simulation shows good performance of the MLEs, and as the sample size is large we expect the Bayesian estimates to behave similarly. Moreover, the maximization part of the MLE calculation can help identify multimodality of the likelihood, should it occur, and singularity of the Hessian that we use in the Bayesian sampling.

Table 1.

Results of 100 simulations of the mis-remembering/heaping model.

Parameter True value Mean of estimate SD of estimate Bias
MSE
Coverage of 95% CI(%)
Case 1: Estimated mis-remembering

Latent recall
β0 2.36 2.36 0.07 0.002 0.07 95
β1 0.26 0.26 0.02 0.001 0.02 93
σb 0.30 0.30 0.02 0.001 0.02 95
Heaping behavior
γ1 −1.49 −1.53 0.56 −0.04 0.56 94
γ2 −5.28 −5.31 0.66 −0.03 0.66 98
γ3 −10.14 −9.99 2.55 0.15 2.54 80
γ0 0.11 0.11 0.02 0.002 0.02 96
σu 2.67 2.61 0.29 −0.06 0.29 98
Case 2: Minimal mis-remembering

Latent recall
β0 0.0 −0.01 0.09 −0.01 0.09 94
β1 1.0 1.00 0.03 0.005 0.03 94
σb 0.22 0.22 0.02 −0.001 0.02 97
Heaping behavior
 γ1 −1.07 −1.08 0.43 −0.007 0.43 98
γ2 −4.37 −4.36 0.60 0.007 0.59 94
γ3 −6.52 −6.43 0.66 0.09 0.67 94
γ0 0.088 0.090 0.02 0.002 0.02 95
σu 2.44 2.41 0.27 −0.02 0.27 95

Table 2.

Results of 100 simulations of the mis-remembering/heaping model with parameters estimated from the data (Case 1) and SEs computed by the parametric bootstrap.

Parameter True value Mean of estimate SD of estimate Bias
MSE
Coverage of 95% CI(%)
Latent recall
β0 2.36 2.36 0.08 −0.003 0.08 90
β1 0.26 0.26 0.02 0.001 0.02 90
σb 0.30 0.30 0.02 −0.001 0.02 95
Heaping behavior
γ1 −1.49 −1.61 0.55 −0.12 0.56 94
γ2 −5.28 −5.42 0.69 −0.14 0.70 96
γ3 −10.14 −10.61 3.56 −0.47 3.58 87
γ0 0.11 0.11 0.02 0.005 0.02 95
σu 2.67 2.64 0.32 −0.03 0.32 92

5. Data analysis

We applied the method of §2 to the Shiffman data, with the aim of evaluating our posited two-stage process as an explanation for the discrepancy between actual and reported consumption. To focus on the link between the self-report and true count, our first analysis included only log EMA count in (2.1) and a visit day indicator in (2.2). The latter is important because it seems reasonable that distance in time from the event would be a strong predictor of heaping coarseness. Our second analysis expanded the recall model to include a range of baseline characteristics: demographics (age, sex, race and education); addiction; measures of nicotine dependence (the Fagerströom Test for Nicotine Dependence [FTND] and the Nicotine Dependence Syndrome Scale [NDSS]); and EMA compliance measured as the daily percentage of missed prompts. Age, education, FTND, and EMA compliance are considered as quantitative variables, sex and race are binary indicators, and addiction is a categorical variable taking three levels (possible, probable, and definite). They are the first variables that a smoking researcher would think to investigate, and could potentially affect remembered count or heaping probability. The two measures of nicotine dependence FTND and NDSS showed only a modest correlation, with Spearman r = 0.56 in our data. So we considered both in the model. The dataset and programming code are included in the supplementary materials (Wang et al. 2012).

5.1. Evaluating goodness of fit

We evaluated model fit by creating multiple draws from the posterior predictive distribution of latent quantities as discussed in §3. Lack of smoothness in the histogram of the imputed latent count would suggest an inadequate heaping model.

We evaluated goodness of fit for the model that includes log EMA count in (2.1) and a visit day indicator in (2.2).The top row in Figure 2 displays the histograms of TLFB cigarette consumption at Days 3 (a visit day), 9 and 14. The spikes at 10, 15, 20, 25, 30, etc. are characteristic of self-reported cigarette counts (Wang and Heitjan 2008). As many as 70% of subjects reported cigarette smoking in multiples of 5 for non-visit-day consumption, whereas for the visit day (Day 3) that number is only 48%. Only 1/4 of the counts on the visit day ended in 0.

Fig 2.

Fig 2

Top row: Histogram of self-reported cigarette consumption. Lower three rows: Histograms of draws from the posterior distribution of the latent exact consumption recall.

The next three rows represent independent draws of the latent count Wit. The spikes at multiples of 20, 10 or 5 have disappeared. Compared to the selfreported count, the percentage of subjects whose exact counts are divisible by 5 (or 10 or 20) is smaller and consistent across time. Averaged over three imputations, the fraction of counts ending in multiples of 5 is 27%, 25%, 23% on Days 3, 9 and 14, respectively, and 15%, 14% and 12% end in multiples of 10. These checks indicate that our model offers a plausible explanation for the heaping.

5.2. The fitted model

In order to assess the impact of the assumed correlation structure, we fit the model as proposed in (2.1) and (2.2) and also a model that exclude random effects. Posterior modes and 95% credible intervals (CIs) appear in Tables 3 and 4. The estimates in both the remembered count model that characterizes the latent recall process and the heaping behavior model are sensitive to the assumption of random effects. The Bayesian information criterion (BIC) of the model with two random effects is 14,705 when including EMA as the only predictor and 14,059 when including EMA and the baseline patient characteristic predictors. The BICs for the corresponding models excluding random effects are 18,340 and 16,641, respectively. Thus the evidence is overwhelming that the mixed model is preferable. Furthermore, we included the patient characteristic predictors as covariates in both the remembered count model and heaping process model, but this model (BIC = 14,079) is less favorable compared to the model with the covariates in just the latent remembered count model. None of these predictors is significant in the heaping process model (results not shown).

Table 3.

Estimated parameters from the Shiffman data under simple models for recall (EMA only) and heaping (remembered count and visit day indicator).

Random Effects Model Independence Model

Parameter Posterior mode 95% CI Posterior mode 95% CI
Latent recall: Poisson model
 Intercept: β0 2.32 [2.24, 2.40] 1.14 [1.09, 1.20]
 ln(EMA): β1 0.27 [0.25, 0.30] 0.68 [0.66, 0.69]
σb2 0.09 [0.08, 0.11]
Heaping behavior: Proportional odds model
 Intercept 1: γ1 −1.50 [−2.17, −0.85] −1.06 [−1.30, −0.84]
 Intercept 2: γ2 −5.21 [−6.14, −4.43] −2.94 [−3.26, −2.65]
 Intercept 3: γ3 −10.15 [−12.49, −8.48] −4.17 [−4.59, −3.82]
 Exact count (latent): w 0.11 [0.09, 0.13] 0.07 [0.06, 0.08]
 Visit day −2.96 [−3.50, −2.50] −1.29 [−1.54, −1.06]
σu2 6.65 [5.12, 9.08]

Table 4.

Estimated parameters from the Shiffman data under an expanded model for recall.

Random Effects Model Independence Model

Parameter Posterior mode 95% CI Posterior mode 95% CI
Latent recall: Poisson model
 Intercept: β0 2.34 [2.21, 2.49] 1.51 [1.44, 1.58]
 ln(EMA): β1 0.25 [0.23, 0.28] 0.53 [0.51, 0.55]
 Addicted
  Possible vs. Definite 0.07 [−0.10, 0.24] 0.05 [0.01, 0.09]
  Probable vs. Definite −0.01 [−0.11, 0.08] −0.02 [−0.04, 0.006]
 FTND 0.06 [0.04, 0.08] 0.04 [0.03, 0.05]
 NDSS 0.08 [0.05, 0.12] 0.05 [0.04, 0.06]
 EMA compliance 0.13 [−0.28, 0.51] 0.39 [0.29, 0.49]
 Age 0.002 [−0.001, 0.006] 0.003 [0.002, 0.004]
 Race (Black vs. White) −0.14 [−0.27, −0.01] −0.06 [−0.10, −0.03]
 Sex (Male vs. Female) 0.16 [0.10, 0.23] 0.12 [0.09, 0.23]
 Education −0.001 [−0.03, 0.02] 0.003 [−0.004, 0.009]
σb2 0.06 [0.05, 0.07]
Heaping behavior: Proportional odds model
 Intercept 1: γ1 −1.62 [−2.35, −0.90] −1.14 [−1.37, −0.91]
 Intercept 2: γ2 −5.52 [−6.42, −4.61] −3.15 [−3.47, −2.82]
 Intercept 3: γ3 −10.31 [−12.65, −8.37] −4.54 [−4.99, −4.08]
 Exact count: w 0.11 [0.09, 0.14] 0.07 [0.06, 0.08]
 Visit day −2.99 [−3.51, −2.47] −1.26 [−1.50, −1.02]
σu2 6.79 [4.73, 8.68]

The 95% CI of β1 is [0.23,0.28], indicating that remembered consumption is positively associated with recorded EMA consumption. In addition, baseline patient characteristics FTND, NDSS, race and gender have significant effects on the recall process. For fixed EMA count, the following characteristics are associated with greater remembered smoking: higher nicotine dependence (measured by both FTND and NDSS), white ethnicity (compared to black), and male sex.

Figure 3 displays the estimated curve of the mean of Wit against the EMA count. A natural hypothesis is that the estimated latent mean agrees with EMA, which would be reflected in the Poisson model by an estimated intercept of 0 and slope of 1; one might call this a model of unbiased memory. To the contrary, Figure 3 shows that the fitted mean curve diverges substantially from the 45° line, with the lighter smokers on average overestimating their consumption and the heavier smokers underestimating consumption. The mean remembered consumption agrees with the true count roughly in the range 22–26 cigarettes, or slightly more than a pack per day.

Fig 3.

Fig 3

Estimate of the conditional mean of recalled count given EMA count in the Poisson mis-remembering model. Covariates are fixed at education=high school, addicted=definitely, race=white, sex=female, and mean values of the quantitative predictors: FTND=5.97, NDSS=−0.023, age=43.5, and EMA non-compliance=10.1%.

Figure 4 shows the estimated heaping probability as a function of remembered cigarette consumption for visit and non-visit days. The possibility of rounded-off reporting increases rapidly as the remembered count increases, although surprisingly the probability of rounding to the nearest 20 is not large for either type of day. When the perception of smoking is more than two packs, say 41 cigarettes, the chance of heaped reporting rises to more than 84%, of which 37% is attributed to half-pack rounding. The results confirm that the degree of heaping is much smaller on visit days. For example, only 51% of subjects round off the visit-day count when reporting 41 cigarettes, and among those 39% round off to the nearest multiple of 5.

Fig 4.

Fig 4

Estimated rounding behavior given EMA count in the proportional odds heaping model.

6. Discussion

We have developed a model to describe the process whereby exact longitudinal measurements become distorted by retrospective recall. Our approach uses latent processes to explain the data as a result of mis-remembering and rounding: A model of the latent exact value describes subject-level recall and allows for association over time and with baseline predictors, while a misreporting model describes the dependence of heaping coarseness on the latent value and other predictors. Random effects represent individual propensities in recall and heaping; in our data, inferences depend strongly on the inclusion of these random effects.

The data suggest that both mis-remembering and heaping contribute substantially to the distortion of cigarette counts. The curve of mean remembered count as a function of EMA count departs markedly from the 45° line, with lighter smokers overstating consumption and heavier smokers understating consumption. The remembered smoking coincides with the accurate EMA count at around 24 cigarettes, suggesting that the popularity of reporting one pack per day is partially a result of the general heaping behavior rather than a particular affinity for remembering a pack a day. The curves of heaping probabilities suggest that exact reporting is uncommon and practically disappears beyond about 40 cigarettes/day. Nevertheless it is interesting just how much of the misreporting is due to mis-remembering. The remembered cigarette consumption depends not only on true consumption, but also on the subject's sex, race and degree of nicotine dependence.

The interpretation of our model components as representing memory and rounding depends on the assumption that EMA data are exact. Of course, even EMA data are subject to errors, as smokers may neglect to record cigarettes both at the time of smoking and later. Yet good correspondence with smoking biomarkers strongly supports the use of EMA over TLFB as a proxy for the truth (Shiffman 2009).

We have implemented our model with a combination of standard numerical methods including Gaussian quadrature, quasi–Newton optimization, and sampling-importance resampling. Our experience suggests that with the model as specified, and incorporating a modest numbers of predictors, the method is robust and efficient. Increasing the number of random effects would increase the time demands (from the numerical integration) and raise the possibility of numerical instability (from possible errors in integration). For more extensive models, sophisticated approaches based on MCMC sampling would be necessary.

Our model allows for the inclusion of covariates to better explain the discrepancy between smokers' self-perceived behaviors and reality. It also provides a basis for predicting true counts (effectively the EMA data) from reported TLFB counts. This would be a valuable activity in the large number of studies that do not collect EMA data. To predict true counts from the recalled counts, we first need to estimate the parameters θ in the model using a subset of the primary study or an external independent study that collects both TLFB count Y and accurate EMA count X. Then we can impute the true count together with the latent remembered count and heaped reporting behavior. Specifically, the posterior distribution of (Wi, Gi, xi, bi, ui) is

f(wi,gi,xi,bi,ui|yi,θ)=f(wi,gi,xi,bi,ui|θ)f(yi|wi,gi,xi,bi,ui,θ)f(yi|θ)(t=1mif(wit|xit,bi,θ)f(git|wit,ui,θ)I((wit,git)WG(yit)))f(xi)f(bi|σb)f(ui|σu).

where f(xi) is the density function of the true count. Imputation follows similar steps as described in §3 with θ set equal to the maximum likelihood estimates.

The methods developed here also can have application in a wide variety of settings in social and medical science involving self-reported data — for example, assessing sexual risk behavior, trial drug consumption, eating episodes and financial expenditures.

Supplementary Material

ZIP file containing missing data

Acknowledgments

We are grateful to two associate editors and a referee, whose perceptive comments and suggestions greatly improved the paper.

Footnotes

Supplementary Material: Supplement: Data and programming code for the analysis (doi:???http://lib.stat.cmu.edu/aoas/???/???). It contains daily TLFB and EMA dataset, and SAS and R code to implement the method.

Contributor Information

Hao Wang, Email: wang76@jhmi.edu.

Saul Shiffman, Email: shiffman@pinneyassociates.com.

Sandra D. Griffith, Email: sgrif@mail.med.upenn.edu.

Daniel F. Heitjan, Email: dheitjan@upenn.edu.

References

  1. Boyd NR, Windsor RA, Perkins LL, et al. Quality of measurement of smoking status by self-report and saliva cotinine among pregnant women. Maternal and Child Health Journal. 1998;22:77–83. doi: 10.1023/a:1022936705438. [DOI] [PubMed] [Google Scholar]
  2. Brown RA, Burgess ES, Sales SD, Whiteley JA, Evans DM, Miller IW. Reliability and validity of a smoking timeline follow-back interview. Psychology of Addictive Behaviors. 1998;12:101–112. [Google Scholar]
  3. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods For Data Analysis, second edition. Boca Raton, FL: Chapman & Hall/CRC; 2000. [Google Scholar]
  4. Cheong Y, Yong HH, Borland R. Does how you quit affect success? A comparison between abrupt and gradual methods using data from the International Tobacco Control Policy Evaluation Study. Nicotine & Tobacco Research. 2007;9:801–810. doi: 10.1080/14622200701484961. [DOI] [PubMed] [Google Scholar]
  5. Dellaportas P, Stephens DA, Smith AFM, Guttman I. A comparative study of perinatal mortality using a two component mixture model. In: Berry DA, Stangl DK, editors. Bayesian Biostatistics. New York: Dekker; 1996. pp. 601–616. [Google Scholar]
  6. Dennis JE, Jr, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Englewood Cliffs, NJ: Prentice-Hall; 1983. [Google Scholar]
  7. Farrell L, Fry TRL, Harris MN. ‘A pack a day for twenty years’: smoking and cigarette pack sizes Research Paper Number 887. Department of Economics, University of Melbourne, 2003; 2003. [Google Scholar]
  8. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. second. Boca Raton, FL: Chapman & Hall/CRC; 2004. [Google Scholar]
  9. Gelman A, Meng XL, Stern HS. Posterior predictive assessment of model fitness via realized discrepancies (with discussion) Statistica Sinica. 1996;6:733–807. [Google Scholar]
  10. Gelman A, Van Mechelen I, Verbeke G, Heitjan DF, Meulders M. Multiple imputation for model checking: complete-data plots with missing and latent data. Biometrics. 2005;61:74–85. doi: 10.1111/j.0006-341X.2005.031010.x. [DOI] [PubMed] [Google Scholar]
  11. Hatsukami DK, Slade J, Benowitz NL, Giovino GA, Gritz ER, Leischow S, Warner KE. Reducing tobacco harm: Research challenges and issues. Nicotine & Tobacco Research. 2002;4(Suppl2):S89–S101. doi: 10.1080/1462220021000032852. [DOI] [PubMed] [Google Scholar]
  12. Heitjan DF, Rubin DB. Inference from coarse data via multiple imputation with application to age heaping. Journal of the American Statistical Association. 1990;85:304–314. [Google Scholar]
  13. Heitjan DF, Rubin DB. Ignorability and coarse data. Annals of Statistics. 1991;19:2244–2253. [Google Scholar]
  14. Klerman JA. Heaping in restrospecticve data: Insights from Malaysian family life surveys' breastfeeding data. The RAND Corporation; 2003. [Google Scholar]
  15. Klesges RC, Debon M, Ray JW. Are self-reports of smoking rate biased? Evidence from the second National Health and Nutrition Examination Survey. Journal of Clinical Epidemiology. 1995;48:1225–1233. doi: 10.1016/0895-4356(95)00020-5. [DOI] [PubMed] [Google Scholar]
  16. Pickering RM. Digit preference in estimated gestational age. Statistics in Medicine. 1992;11:1225–1238. doi: 10.1002/sim.4780110908. [DOI] [PubMed] [Google Scholar]
  17. Ridout MS, Morgan BJT. Modeling digit preference in fecundability studies. Biometrics. 1991;47:1423–1433. [PubMed] [Google Scholar]
  18. Roberts JM, Brewer DD. Measures and tests of heaping in discrete quantitative distributions. Journal of Applied Statistics. 2001;28:887–896. [Google Scholar]
  19. Rubin DB. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics. 1984;12:1151–1172. [Google Scholar]
  20. Shiffman S, Gitchell JG, Warner KE, Slade J, Henningfield JE, Pinney JM. Tobacco harm reduction: Conceptual structure and nomenclature for analysis and research. Nicotine & Tobacco Research. 2002;4:113–129. doi: 10.1080/1462220021000032717. [DOI] [PubMed] [Google Scholar]
  21. Shiffman S. How many cigarettes did you smoke? Assessing cigarette consumption by global report, time-line follow-cack, and ecological momentary assessment. Health Psychology. 2009;28:519–526. doi: 10.1037/a0015197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Shiffman S, Ferguson SG, Strahs KR. Quitting smoking by gradual reduction using nicotine gum – A controlled trial. American Journal of Preventive Medicine. 2009;36:96–104. doi: 10.1016/j.amepre.2008.09.039. [DOI] [PubMed] [Google Scholar]
  23. Stone AA, Shiffman S. Ecological momentary assessment in behavioral medicine. Annals of Behavioral Medicine. 1994;16:199–202. [Google Scholar]
  24. Tanner MA. Tools for Statistical Inference. second. New York: Springer; 1993. [Google Scholar]
  25. Torelli N, Trivellato U. Modelling inaccuracies in job-search duration data. Journal of Econometrics. 1993;59:187–211. [Google Scholar]
  26. Wang H, Heitjan DF. Modeling heaping in self-reported cigarette counts. Statistics in Medicine. 2008;27:3789–3804. doi: 10.1002/sim.3281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wang H, Shiffman S, Griffith SD, Heitjan DF. Supplement to “Truth and memeory: Linking instantaneous and restrospective self-reported cigarette consumptions”. 2012 doi: 10.1214/12-AOAS557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wolff J, Augustin T. Heaping and its consequences for duration analysis: A simulation study. Allgemeines Statistisches Archiv. 2003;87:59–86. [Google Scholar]
  29. Wright DE, Bray I. A mixture model for rounded data. The Statistician. 2003;52:3–13. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ZIP file containing missing data

RESOURCES