Abstract
Ecological momentary assessment (EMA) is a set of longitudinal methods that researchers can use to understand complex processes (e.g., health, behavior, emotion) in “high resolution.” Although technology has made EMA data collection easier, concerns remain about the consistency and quality of data collected from participants who are enrolled and followed online. In this study, we used EMA data from a larger study on HIV-risk behavior among men who have sex with men (MSM) to explore whether several indicators of data consistency/quality differed across those who elected to enroll in person and those enrolled online. One-hundred MSM (age 18–54) completed a 30-day EMA study. Forty-five of these participants chose to enroll online. There were no statistically significant differences in response rates for any survey type (e.g., daily diary [DD], experience sampling [ES], event-contingent [EC]) across participants who enrolled in-person versus online. DD and ES survey response rates were consistent across the study and did not differ between groups. EC response rates fell sharply across the study, but this pattern was also consistent across groups. Participants’ responses on the DD were generally consistent with a post-study follow-up Timeline Followback (TLFB) with some underreporting on the TLFB, but this pattern was consistent across both groups. In this sample of well-educated, mostly White MSM recruited from urban areas, EMA data collected from participants followed online was as consistent, reliable, and valid as data collected from participants followed in-person. These findings yield important insights about best practices for EMA studies with cautions regarding generalizability.
Keywords: Ecological momentary assessment, longitudinal methods, online recruitment, reliability, telehealth
Introduction
Ecological momentary assessment refers to a family of intensive longitudinal methods that are intended to help researchers study behaviors and experiences at particular moments in time (Shiffman, Stone, & Hufford, 2008; Stone & Shiffman, 1994; Wray, Merrill, & Monti, 2014). These methods use a combination of sampling strategies, like periodic diary methods, event-contingent reports, and experience sampling, to explore dynamic processes in high resolution and nearly in real time, as participants go about their lives (Shiffman et al., 2008). Although these methods have been in use since at least the 1980s, recent exponential growth in technology has facilitated a surge in EMA research, as devices and software that can be used to conduct these studies have become more accessible, affordable, and ubiquitous (Wray et al., 2014). To date, EMA methods have been used to study conditions and behaviors as wide-ranging as chronic disease (e.g., COPD; Walters, Walters, Wills, Robinson, & Wood-Baker, 2012), fibromyalgia (Williams et al., 2004), Parkinson’s disease (Fernie, Spada, & Brown, 2019), health behaviors (e.g., smoking, alcohol/drug use; Wray et al., 2014), diet (Thomas, Doshi, Crosby, & Lowe, 2011), mental health (e.g., psychosis; Granholm, Loh, & Swendsen, 2007), anxiety (Walz, Nauta, & aan het Rot, 2014), and self-harm (Granholm et al., 2007; Husky et al., 2014).
Although EMA methods have become easier to set up and use, accessing appropriate populations is still a key challenge for many researchers. Even in the last few years, the vast majority of published EMA studies have recruited and enrolled participants entirely in-person (e.g., Fernie et al., 2019; Manini et al., 2019), limiting the populations they can access to those in their immediate area or to those with whom they have regular in-person contact. However, recruiting and enrolling participants in EMA studies via the internet may allow researchers much wider access to more clinically relevant and/or representative populations, reduce the time needed to collect sufficiently large samples, and save costs. At least some of the reluctance to recruit and enroll participants in EMA studies online is likely driven by the belief that face-to-face interaction with participants leads to more investment in the study and improves protocol adherence. However, we are aware of no studies published to date that have compared response rates and data quality across participants recruited and oriented to EMA studies online versus in-person.
The steps used to orient participants to EMA studies and provide training on its procedures have played a critical role in their success. As EMA studies often involve complex procedures that rely on the engagement of participants, these procedures must be carefully explained to ensure (a) that participants understand when they are expected to respond to surveys, (b) that they are committed to respond as much as possible, and (c) that they provide high-quality and accurate data. For EMA studies that recruit online, these orientation and training appointments can be conducted via videoconferencing, which involves meeting virtually via web camera. In addition to enabling researchers to recruit with a wider reach, this approach allows participants to complete EMA studies from the convenience of their own homes. As nearly 80% of Americans now own smartphones (Smith, 2017), and given that webcams are now standard on most of these devices, many populations already have access to the equipment they need. A body of research has explored using videoconferencing in the context of telemedicine and has not shown substantial differences in the quality of care provided via videoconferencing as compared to face-to-face (McLean et al., 2013; Currell et al., 2000). Similarly, several studies have explored ‘teleconsent’ (a means of embedding the consenting process into a telemedicine session), and find the approach to be satisfactory among both researchers and participants (e.g., Bunnell et al., 2019; Welch et al., 2016). However, we are aware of no studies that have compared markers of adherence and data quality in complex longitudinal studies like EMA across participants enrolled in-person and via videoconferencing.
In the context of an EMA study that explored risk factors for HIV-risk behavior among men who have sex with men (MSM), we recruited participants from several major metro areas in the northeastern United States using both online and in-person outreach methods. In the current study, we used data from this larger study to explore whether response rates, behavioral reactivity, and several indicators of data quality differed across participants who chose to enroll via videoconferencing versus those who elected to enroll in person. Behavioral reactivity refers to the extent to which the frequency or quality of a behavior changes as a result of being monitored or assessed (Nelson, 1977) and is a concept that is especially relevant to EMA research. Although the broader study focused on the specific domain of HIV-risk, the comparisons we report span a variety of behaviors and constructs we assessed in the study in order to strengthen the rigor of our assessment.
Methods
Participants
Participants (N = 100) were recruited from gay-oriented smartphone dating apps (e.g., Grindr, Scruff), social networking sites (e.g., Facebook, Instagram), and via in-person outreach (e.g., flyers and business cards posted or left at local coffee shops, retail spaces, and bars) in the northeastern US from March 2014 to October 2018. Table 1 provides demographic characteristics for the full study sample1. Eligible participants were: (1) 18+ years old, (2) assigned male sex at birth, (3) currently male gender, (4) HIV-negative or unknown status, (5) able to read and speak English fluently, and (6) not currently prescribed or taking Pre-Exposure Prophylaxis (PrEP). They also reported (7) having had Condomless Anal Sex (CAS) with a non-exclusive male partner at least once in the past 30 days and (8) consuming five or more drinks on a single occasion at least once in the past 30 days. Since our aim was to study non-treatment-seeking MSM, we excluded those who were currently receiving counseling or medications for alcohol or drug problems. Finally, for safety reasons, participants were excluded if they were (1) currently receiving treatment for serious mental illness (e.g., schizophrenia, bipolar disorder) or (2) had injected drugs within the last three months, as assessed via self-report.
TABLE 1.
Demographic Characteristics and Key Variables (N = 100)
| Characteristics | In-Person (N = 54) | Remote (N = 46) | χ2 or tb | p |
|---|---|---|---|---|
| Mean (SD) or N (%)a | Mean (SD) or N (%)a | |||
| Age (Range: 18 – 54) | 26.6 (7.2) | 27.7 (8.2) | −0.71 | .477 |
| Race | ||||
| White | 43 (82.7) | 32 (74.4) | 0.62 | .431 |
| Black or African American | 3 (5.6) | 1 (2.2) | ||
| Asian | 2 (3.7) | 6 (13.0) | ||
| American Indian/Alaska Native | 0 (0.0) | 1 (2.2) | ||
| Multiracial | 3 (5.6) | 3 (6.5) | ||
| Chose not to respond | 3 (5.6) | 2 (4.4) | ||
| Ethnicity (Hispanic or Latino) | 8 (14.8) | 8 (17.4) | 0.12 | .726 |
| HIV-status (self-reported) | ||||
| Negative | 46 (85.2) | 37 (80.4) | 0.40 | .529 |
| Don’t know | 8 (14.8) | 9 (19.6) | ||
| Currently in sexually-exclusive relationship | 4 (7.3) | 1 (2.1) | 1.49 | .222 |
| Avg. length of relationship (in months) | 1.3 (0.96) | 2 (--) | -- | -- |
| College degree | 27 (50.0) | 27 (58.7) | 0.76 | .385 |
| Low incomec | 15 (27.8) | 14 (30.4) | 0.09 | .770 |
| Unemployedd | 7 (13.0) | 6 (13.0) | 0.01 | .990 |
| Full or part-time student | 13 (24.1) | 9 (19.6) | 0.29 | .587 |
| Identify as gay or bisexual | 48 (88.9) | 46 (100.0) | 5.44 | .020 |
| Avg. # total EMA days completed | 29.0 (1.7) | 28.9 (2.4) | 0.10 | .922 |
Note.
For continuous variables (e.g., age, average length of relationship), means and standard deviations are reported, Mean (SD). For frequency variables (e.g., racial/ethnic categories), frequencies and percentages of each group are shown, N (%).
Independent samples t-tests were used to test whether continuous variables were different across those enrolling online versus in person, and χ2 for frequency variables (e.g., race, ethnicity).
Represents those with a household annual income <$30,000/year.
Full and part-time students were considered ‘employed.’
Measures
Daily diary surveys.
Daily diary surveys assessed sexual behavior, alcohol use, and drug use over the past 24 hours. To assess sexual behavior, we asked participants to indicate the number of sex partners they had on each day (up to 4). For each partner, we asked participants to report various characteristics of each (e.g., HIV status, whether they were a sexually exclusive partner or not), which sex acts they engaged in with that partner (oral, insertive anal, receptive anal, or vaginal sex) and whether they used a condom for each act. To assess alcohol use, we asked participants to report the number of standard drinks they consumed over the last 24 hours (standard drink = 12 oz. beer, 5 oz. wine, 1 oz. liquor) and the number of hours over which they drank. Finally, to assess drug use, we asked participants whether they used any drugs over the last 24 hours, and if so, which type of drugs they used (e.g., marijuana, cocaine, methamphetamine, prescription painkillers, sedatives, or stimulants). Participants could select multiple types from a list of nine categories.
Experience sampling surveys.
We asked participants to rate their current affect using items selected from the Positive and Negative Affect Scales – Extended Form (Watson & Clark, 1994). General positive affect was assessed using three items from the Joviality subscale (happy, enthusiastic, excited), each rated on a 0 (very slightly or not at all) to 4 (extremely) scale. These three items were selected from a broader set of positive affect items to capture both higher and lower arousal aspects of positive affective valence and face validity given our population of interest, while keeping experience sampling surveys brief overall. The average overall reliability coefficient for this small scale was very high (α = .90). Although we also asked participants about several dimensions of negative affect, we only collected one item from each of the four basic negative affect scales (sadness, fear, hostility, guilt) to balance capturing data on a range of affective states with the need to keep these surveys brief. Past research also suggests that these items more aptly reflect distinct factors (rather than a single overall negative affect dimension; Watson & Clark, 1994). As such, our analyses focus primarily on positive affect.
Post-study follow-up survey.
At the end of the 30-day study period, participants completed an online follow-up survey. In this survey, we asked participants to complete an online version of the Timeline Follow-Back (TLFB; Sobell, Brown, Leo, & Sobell, 1996) that we created and have described in detail elsewhere (Wray et al., 2019). This TLFB assessed alcohol use, drug use, and sexual behavior each day over the same 30-day period in which participants completed EMA assessments, and it assessed these behaviors in the same way. That is, for each day of the 30-day period, participants indicated the number of partners they had sex with (0–4 partners), each partner’s characteristics, and sexual behaviors that occurred with each. Participants also reported the number of standard drinks they had on each day and the drugs they used. Past studies show that reports of these behaviors are reliable and valid when collected via computer-facilitated TLFBs (Simons, Wills, Emery, & Marks, 2015; Dolezal et al., 2012; Hjorthøj, Hjorthøj, & Nordentoft, 2012; Turner et al., 1998; Sobell et al., 1996).
Procedures
Participants first completed an online screening survey to determine their eligibility. Staff then contacted those who were eligible to schedule an enrollment and training appointment either in person or online via videoconferencing, depending on the participant’s preference. We conducted videoconference appointments using either Skype, Google Hangouts, or Zoom, again depending on the participant’s preference. During these appointments, staff obtained informed consent, reviewed study procedures, and asked participants to complete an online baseline survey that collected data on person-level demographics, behavioral characteristics, and the TLFB. Then, staff walked participants through downloading the MetricWire app to their personal smartphones, which was used to collect EMA data throughout the study. Staff then provided thorough training on how to use the app, explained the types of surveys participants would be asked to complete, and walked them through a typical day in the study, demonstrating how to initiate various assessments. We also explained the meaning of each question in each survey during these walkthroughs.
We instructed participants to complete three different types of assessments: (1) a self-initiated, daily diary assessment, to be completed upon waking each day, (2) up to six signal-prompted experience sampling assessments per day, delivered at random times in three-hour windows between 9 am and midnight, and (3) event-contingent assessments, which participants were instructed to complete whenever they began to drink alcohol or use drugs on a given day. We also coached participants during their appointments to achieve response rates of 100% for the daily diary surveys and at least 80% for the experience sampling prompts. During enrollment appointments, we used several procedures to help participants achieve these response rates, including: (1) building rapport, (2) soliciting buy-into the study (e.g., conveying the importance of the data to understanding behavior, asking for their investment), (3) discussing strategies that can enhance response rates/quality (e.g., “get into a routine, make sure the ringer volume is high, keep the phone with you whenever possible, set an alarm for morning surveys”), and (4) emphasizing confidentiality of responses. This coaching was consistent across both in-person and videoconference participants. We also sent feedback to participants each week throughout the study about their response rates via email and messages sent through the MetricWire app. If a participant fell below the targets for a given week, staff would contact him by phone to remind him of ways to improve. At the end of the 30 days, participants also completed a follow-up assessment, which involved collecting a TLFB covering the 30-day period in which participants were completing EMA assessments.
We compensated participants based on their response rates. For those enrolling in person, participants could earn $2 for each daily diary survey they completed, plus a “bonus” of $10 for every 10 days they submitted 100% of these surveys, as well as $0.50 for each random survey, with a “bonus” of $10 for every 10 days they completed >80% (total of $210 possible). Participants who enrolled remotely were paid slightly less for each of these response rates, given that they did not incur travel costs to participate (a total of $185 possible). Brown University’s IRB reviewed and approved all procedures.
Data Analysis Plan
We conducted all statistical analyses in Stata 16/SE (StataCorp, 2007). We were primarily interested in comparing EMA data collected from participants who enrolled in person versus online across three areas: Response rates (i.e., adherence), reactivity, and erratic responding. Response rate refers to the percentage of each type of survey that participants actually completed, out of the total number we asked them to complete. To address this, we first calculated overall and participant-level response rates for each assessment (daily diary, experience sampling, event-contingent) and then used t-tests to compare the in-person and online groups for each survey type. To examine evidence of behavioral reactivity, meaning any systematic change in the number of behaviors reported as a function time, we calculated the average survey response rate and total number of alcoholic drinks, drug use events, and sex events reported for each study day. We then plotted these response rates and the percentage of participants reporting each behavior over time within each enrollment type (in-person, online), and fit LOWESS lines to visualize trends. We also fit Poisson and Beta regression models to these day-level averages, with study day, enrollment type, and their interaction as predictors in order to explore whether response rates/behavior reporting varied systematically over time, across enrollment method, or both. In this observational study, since the number of behaviors reported on each study day should be relatively consistent absent any intervening factors, this allowed us to explore whether behavior reports changed as a function of time. If so, this might be considered evidence of behavioral reactivity. Finally, we explored evidence of erratic responding (i.e., whether participants provided incongruent or careless responses across different items or survey types) in two ways. We first matched participants’ reports of their drinking, drug use, and sexual behavior on specific days across daily diary surveys and the follow-up TLFB, and calculated their percent agreement and Cohen’s Kappa. We also generated scatterplots reflecting the agreement between the total amounts of each of these behaviors (sex events, drinking days, drug use days) across these two methods to help visualize overall agreement. Second, we calculated day-level Cronbach’s alpha statistics for participants’ responses on the three-item positive affect scale, and then characterized each study day in terms of whether its alpha values were acceptable (α > .70), questionable (.69 ≥ α ≥ .60), poor (.59 ≥ α ≥ .50), or unacceptable (α ≤ .49). We then plotted the percentage of study days that fell in each of these categories by enrollment type. Although this approach ignores the nested nature of these data, only the descriptive data (percentages, frequency plots) of the raw day-level Cronbach’s alphas were used to explore differences across enrollment type. These alpha statistics were based on only 6–7 surveys each, and so, likely overestimated the number of days classified as less than “acceptable.” However, we believe that the ease of interpretation associated with this approach outweighs these limitations. Together, these analyses allowed us to assess whether data provided by participants who enrolled online was less consistent across multi-item scales or was more discrepant across EMA versus TLFB when compared with those who enrolled in-person.
Results
Response Rates
Table 2 shows response rates to daily diary, experience sampling, and event-contingent assessments across participants overall, and among those who enrolled in person and online. Although both daily diary and experience sampling response rates were very close to their targets overall, participants initiated an event-contingent assessment just more than a third of the time when they reported drinking or using drugs the next day. However, there were no statistically significant differences in response rates for any survey type across those who elected to enroll in person versus those who chose to enroll online.
TABLE 2.
Response rates by survey type and enrollment type
| Survey type | Overall | In-Person | Online | t | p | |||
|---|---|---|---|---|---|---|---|---|
| M | SD | M | SD | M | SD | |||
| Daily diary | 98.1 | 5.6 | 97.7 | 5.0 | 98.5 | 6.2 | 0.72 | .474 |
| Experience sampling | 77.3 | 13.2 | 78.4 | 14.5 | 76.0 | 11.5 | −0.90 | .371 |
| Event-contingent | 37.3 | 24.5 | 39.8 | 24.6 | 34.3 | 24.3 | −1.08 | .281 |
Behavioral Reactivity and Response Fatigue
Figure 1 shows the percentage of enrolled participants who completed daily diary and experience sampling assessments across days of the study period grouped by those enrolled in person and those enrolled online. It also shows the percentage of event-contingent assessments that were completed when alcohol or drug use was reported on that following morning’s daily diary assessment across days of the study period by those enrolled in person and remotely. Participants’ overall response rates to daily diary surveys did not appear to change systematically over the course of the study period, or across those enrolled in person versus online. Regression models (see Table 3) support this finding, since neither time nor a dummy variable reflecting enrollment method (or their interaction) was significantly associated with response rates. Experience sampling survey response rates followed a similar pattern. However, the percentage of the time participants successfully initiated an event-contingent assessment when they reported drinking or drug use on the following morning’s daily diary declined relatively sharply over time. This decline was similar across participants who were enrolled via either method (in person, online). Regression models offer further support for these findings, suggesting that time was significantly and negatively associated with event-contingent survey response rates.
Figure 1.

Percentage of enrolled participants who completed each assessment type across days of the study by enrollment method.
TABLE 3.
Beta and Poisson regression models testing whether response rates and behaviors differed across study day and enrollment method. Incident Rate Ratio (IRR) shown.
| Response rates | |||||||||
| Daily diary | Exp. sample | Event-cont. | |||||||
| IRR | SE | p | IRR | SE | p | IRR | SE | p | |
| Study day | 0.99 | .02 | .626 | 1.00 | .01 | .754 | 0.97 | .00 | <.001 |
| Enrolled online | 0.97 | .40 | .936 | 0.81 | .11 | .123 | 0.90 | .06 | .137 |
| Study day x enrollment type | 1.03 | .02 | .165 | 1.00 | .01 | .706 | 1.00 | .01 | .376 |
| Behavior reports | |||||||||
| Alcohol use | Drug use | Sex event | |||||||
| β | SE | p | β | SE | p | β | SE | p | |
| Study day | −0.03 | .01 | <.001 | −0.01 | .01 | .608 | −0.01 | .01 | .332 |
| Enrolled online | −0.09 | .14 | .551 | 0.72 | .10 | <.001 | 0.09 | .18 | .617 |
| Study day x enrollment type | 0.01 | .01 | .394 | −0.01 | .01 | .268 | −0.01 | .01 | .548 |
Note. p < .05 values shown in bold.
Figure 2 shows the percentage of participants reporting any alcohol use, drug use, or sex event across day of the study and grouped by those enrolled in person and online. Overall, a relatively consistent percentage of participants reported each of these behaviors over the course of the study period. However, a higher percentage of participants who were enrolled and oriented in person reported drug use per study day than those who were enrolled online. In addition, the upper panels show that the percentage of participants who reported alcohol use on a given day of the study decreased as the study went on. These inferences were supported by the results of regression models, which showed that neither time nor enrollment method was associated with the percentage of sex event reports, but study day was significantly and negatively associated with the percentage of participants who reported drinking on a given day. To illustrate this trend, an average of about 52% of participants reported drinking on a given day during the first five days of the study, while an average of only about 35% of participants reported drinking on the last five days of the study, a 17% difference. Online enrollment was also associated with fewer drug use events being reported per study day versus those enrolled in person.
Figure 2.
Percentage of enrolled participants who reported various behaviors across days of the study by enrollment method.
Erratic Responding
Table 4 gives percent agreement and Cohen’s kappa values comparing whether participants’ reports of alcohol use, drug use, and sex events agreed across EMA and follow-up TLFB methods on each study day. Overall, the percent of days in which participants’ reports of these behaviors agreed across the two methods was relatively high, ranging from 70% to 88% across all behaviors. Drug use reports had particularly high agreement across the two methods. Cohen’s kappa values similarly ranged from fair to good, with the lowest agreement among sex events reported across the two methods. Importantly, though, agreement across the two methods did not appear to differ across those enrolled in person and those enrolled online, suggesting that both groups were about equally able to remember when they engaged in each behavior after the 30-day study was over. Figure 3 shows the total number of days on which each participant reported alcohol use, drug use, and sex via EMA (y-axis) and TLFB (x-axis), and further supports these findings. The dashed reference line represents what would be perfect agreement across the two methods on the total number of days participants reported engaging in each behavior. Most observations are grouped slightly above this reference line and this pattern is similar across participants enrolling through either method. As such, these figures illustrate that participants generally reported more of these behaviors via EMA when compared with TLFB, but that this pattern was relatively consistent across participants electing to enroll in person and those choosing to enroll online. Finally, Figure 4 shows the number of participant-days on which participants provided ratings to the positive affect scale that were acceptable (versus poor or unacceptable) by enrollment method. This figure shows that participants enrolled in person provided responses that were of acceptable consistency across 84.8% of person-days, compared to 83.6% of person-days in those enrolled online, χ(2)=0.57, p = .751.
TABLE 4.
Percent agreement and Cohen’s Kappa for alcohol use, drug use, and sex events reported across EMA and follow-up TLFB by enrollment method
| Behavior reports | In-Person | Online | ||
|---|---|---|---|---|
| % Agreement | Cohen’s κ | % Agreement | Cohen’s κ | |
| Alcohol use | 69.8% | 0.36 | 69.4% | 0.34 |
| Drug use | 85.6% | 0.70 | 87.8% | 0.72 |
| Sex events | 80.3% | 0.26 | 83.7% | 0.39 |
Figure 3.

Agreement in total number of behaviors reported over the study period across EMA and TLFB methods by enrollment method.
Note. These figures compare the total number of three behaviors (noted to the left of each panel) reported via daily diary surveys as part of the EMA protocol (y-axis) and via the follow-up TLFB covering the same period that participants completed at the end of the study (x-axis). The dashed diagonal line represents what would be perfect agreement across the two methods.
Figure 4.
Percent of study days in which reliabilities of participants’ positive affect ratings were acceptable, poor, or unacceptable, by enrollment method.
Discussion
EMA methods have been used to study a variety of processes, from chronic disease to mental health, and can help expand our understanding of how and why complex experiences and behaviors occur among individuals in their daily lives. They can also allow researchers to test very detailed hypotheses about the order and timing in which dynamic processes occur (Wray et al., 2014). However, accessing key populations still poses a challenge for many researchers. In this paper, we assessed whether several metrics of data consistency and quality differed across participants who chose to enroll in an EMA study of HIV-risk behavior in person versus those who elected to enroll online. Overall, there were few substantial differences between these groups in terms of response rates, reactivity, and erratic responding. These results suggest that, with thorough procedures for onboarding/orienting participants to the study, soliciting commitment to the study, incentivizing responses and targets, and providing feedback, researchers can recruit and enroll similar samples of participants into intensive longitudinal studies (like EMA) from afar. This eases access to some key populations and could boost the clinical relevance and generalizability of some studies. One exception to this may be that there was some evidence that a larger percentage of participants who enrolled in our study in-person reported using drugs on a given day in the study, when compared with those who enrolled online. In addition, since this sample focused on mostly White, well-educated MSM recruited from urban areas, similar research with more diverse and/or rural populations will be needed in order to draw firm conclusions about the quality and consistency of EMA data in these populations when recruited online. We discuss our study’s primary findings in each area further below.
Response Rates
Response rates to surveys for which we specified a target (100% for daily diary and 80% for experience sampling) were generally very high, and were higher than reported in many past EMA studies on sensitive behaviors conducted with similar populations (Livingston, Flentje, Heck, Szalda-Petree, & Cochran, 2017; Rowe et al., 2016; Yang et al., 2015). This suggests that by setting reasonable targets, incentivizing such targets with bonuses, and providing feedback to participants about their performance periodically, researchers can encourage participants to reach these targets. One area in which response rates were exceptionally low, however, was in the event-contingent assessments. This is likely because we did not specify a target response rate for these assessments, provide response rate feedback, or incentivize their completion. We originally opted not to take these steps because we felt that these incentives could inadvertently incentivize the behavior itself (i.e., alcohol/drug use). For example, compensating participants each time they complete an assessment that is contingent upon them starting to drink or use drugs could encourage some participants to use such substances more frequently in pursuit of additional compensation. In future studies, however, researchers could consider alternative ways to incentivize completion of these surveys. For instance, one approach might involve offering payments or bonuses only when participants successfully complete an event-contingent assessment the day before that matches the following morning’s daily diary assessment. This way, participants would be paid not just for initiating an event-contingent assessment (which could incentivize drinking or drug use), but for ensuring that the two reports match. Although this approach could lead some participants to avoid reporting alcohol or drug use on their DD assessment if they forgot to complete an event-contingent survey when they began drinking or using drugs the day before, incentivizing participants for ensuring that an event-contingent assessment was initiated a certain percentage of the time could help attenuate this concern. This approach would allow participants some room for error while also making it more difficult for participants to track their response rate on their own and respond to a given survey in a way that maximizes their incentive. This approach might also ensure that the easiest route for earning these incentives is to make sure to remember to initiate the event-contingent assessment. Even without incentivizing, though, it is possible that response rates for event-contingent surveys would have been improved by simply providing a target and giving participants feedback. Regardless, these low response rates reaffirm the importance of duplicating critical questions and items across multiple survey types to ensure critical data are not missed. For this reason, most of the data we collected through event-contingent assessments were also collected via experience sampling prompts.
Critically, there were no differences in response rates between those enrolled in person and those enrolled remotely, and both groups followed almost exactly the same response rate pattern, whether response rates were impressive (daily diary, experience sampling) or otherwise (event-contingent). This suggests that participants who are enrolled remotely can be adherent to very complex, intensive longitudinal studies (like EMA) as long as they are thoroughly oriented to the study and their response rates are properly incentivized.
Behavioral Reactivity and Response Fatigue
Analyses exploring response fatigue showed that response rates to the two main survey types (daily diary and experience sampling) were very consistent across the study period and did not appear to change systematically over time. This would suggest that (1) providing a target, (2) incentivizing it, and (3) providing feedback collectively help these rates stay consistent over the monitoring period. However, event-contingent assessment response rates fell relatively sharply across the study period. Again, this suggests that assessments that are unincentivized and unmonitored may have fairly strong response rates initially, but are likely to decline across the study period. As with response rates in general, no difference was evident between enrollment groups, suggesting that remotely enrolled participants likely do not exhibit more fatigue or less commitment to EMA studies over time.
In terms of behavioral reactivity, our results show that participants’ reports of alcohol use declined systematically over time, which is consistent with past studies suggesting that intensive longitudinal studies can introduce some reactivity in MSM (Newcomb, Swann, Mohr, & Mustanski, 2018). However, participants’ reports of drug use and sexual behavior did not change systematically across the study period, regardless of how they enrolled, suggesting that the method itself had little effect on the report of these behaviors. Participants enrolled in person, however, reported more drug use overall. This could suggest that, in EMA studies of drug use, participants may be more forthcoming about drug use over time after having established face-to-face rapport with a study staff member and being assured about confidentiality protections in person. Together, these results suggest that reactivity may be behavior-specific and/or that heavy-drinking participants’ vigilance in reporting their alcohol use may wane over time. Importantly, reports of these behaviors across the study period did not differ systematically by how participants were enrolled, suggesting that those enrolled online provided reports of sensitive behaviors that were very similar to those enrolled in person. This supports the notion that researchers can recruit and enroll similar samples of participants online without the risk of introducing more reactivity effects.
Erratic Responding
We evaluated erratic responding by comparing whether participants’ reports of various behaviors (e.g., alcohol, drug use, sex) differed when collected via EMA versus the follow-up TLFB at the end of the study. If reports of these behaviors were different across methods in one enrollment group versus the other (i.e., in person vs. online), it would suggest that participants enrolled through that method were reporting behaviors erratically via EMA that they later could not match when reporting via TLFB. However, our results suggest that participants’ reports of these behaviors were relatively consistent across EMA and TLFB, across both enrollment groups. However, there was an overall trend toward underreporting behaviors on the TLFB when compared to EMA, again across both groups, which is consistent with past research (Dulin, Alvarado, Fitterling, & Gonzalez, 2017; Wray, Kahler, & Monti, 2016; Schroder, Johnson, & Wiebe, 2007). Comparing participants’ reports across these two methods at the day-level showed that there was generally poor agreement about when each behavior occurred during the 30-day period, but that agreement did not significantly differ across enrollment groups. This finding suggests that agreement on the timing of behaviors was similar across the two enrollment approaches, and that generally, EMA methods may offer critical advantages over recall-based methods when researchers are interested in the precise timing of events or behaviors.
Finally, as another way of exploring erratic responding, we calculated person-day-level reliabilities of participants’ positive affect ratings across three items from a well-validated and commonly used scale. We then classified them according to conventional categories (acceptable: α = 0.70, poor: α = 0.69–0.60, and unacceptable: α < .05), and plotted the percent of study days that were classified in each category by enrollment method. We adopted this approach because of its ease of interpretation, but acknowledge that it is imperfect for several reasons, specifically that (1) this very brief “scale” only consists of three items, and (2) reliabilities across person-days only included 6–7 assessments of this scale each. (3) Affect is also inherently dynamic and is expected to fluctuate over the course of each day, so calculating consistency in responses to this short scale over the course of a day inevitably captures some expected variability. However, all of these limitations would likely underestimate the true consistency across similar items assessed that day (i.e., produce Cronbach’s alpha values that seem worse than they really were). Therefore, this approach represents a relatively conservative approach to answering this question. Overall, about 84–85% of person-days of positive affect ratings had acceptable reliabilities, while 8–9% were poor, and about 7% were unacceptable, suggesting that the vast majority of ratings provided by all EMA participants were reliable. Furthermore, no difference in the percentage of acceptable versus unacceptable reliabilities was evident by enrollment method, suggesting that participants who enrolled remotely provided data that were as reliable as those enrolled in person.
Limitations
Several limitations should be noted, most of which involve limitations in our sample. First, these results were drawn from a study of high-risk MSM and our sample was young, predominantly white, and mostly relatively socioeconomically advantaged, so these findings must be replicated in other, more diverse samples before any firm conclusions can be drawn. Similarly, populations with different demographic characteristics (e.g., older age, rural location) may not fare as well with many of the online enrollment procedures we used. Furthermore, while smartphone ownership is high globally, certain populations may lack access to (or familiarity with) the technologies required for this approach (i.e., smartphone, computer, and webcam), which could further limit reach for key populations. This online/remote approach may also not be suitable for certain vulnerable populations, like people who inject drugs or sex workers, because of concerns about confidentiality, trust, and potential social/legal consequences of participating. Using face-to-face recruitment/enrollment methods may be better suited for populations like these, so that researchers can build rapport and respond to concerns more effectively. Finally, while many of our analyses were conducted on day or within-day level data that involved thousands of observations, some person-level estimates (e.g., overall response rates, total behavior reports over the study period) may have been affected by the relatively small sample size.
Conclusions
Overall, the present findings suggest that EMA researchers can access populations similar to this one from further afield by recruiting, enrolling, and following them entirely online, without significant losses in protocol adherence, reactivity, or haphazard responding. Using online procedures like these may enable researchers to access and study more clinically relevant samples. In general, though, researchers should carefully consider setting target response rates, providing feedback on those response rates throughout the study, and incentivizing survey completion for essential survey types. Further research may also be needed to better understand reactivity in EMA studies on alcohol use and to explore ways of moderating it.
Public Significance Statements.
Recruiting and enrolling participants into intensive and/or longitudinal research studies (like ecological momentary assessment [EMA] studies) online can enable researchers to reach more clinically relevant, high-priority, and diverse samples, but many are skeptical that data collected from online participants will be accurate and of sufficient quality. In a study of mostly well-educated, predominantly White men who have sex with men (MSM) recruited from urban areas, we found few differences in aspects of data consistency and quality across participants who chose to be enrolled and followed online versus those who chose to enroll in-person, suggesting that researchers can consider recruiting similar samples online without sacrificing data quality.
Acknowledgements
This manuscript was supported by P01AA019072 and L30AA023336 from the National Institute on Alcohol Abuse and Alcoholism.
Footnotes
Informed consent: Informed consent was obtained from all participants included in the study.
Ethical approval: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Table 1 shows that more participants who identified their sexual identity as “straight” or “other” enrolled in person when compared to those who enrolled remotely. As such, we conducted all analyses among only participants identifying as gay or bisexual and then compared them with the same analyses including all participants. None of the results differed, so we report results from the full sample.
References
- Bunnell BE, Sprague G, Qanungo S, Nichols M, Magruder K, Lauzon S, … Welch BM (2019). An Exploration of Useful Telemedicine-Based Resources for Clinical Research. Telemedicine and e-Health, 1–15. 10.1089/tmj.2018.0221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Currell R, Urquhart C, Wainwright P, & Lewis R (2000). Telemedicine versus face to face patient care: effects on professional practice and health care outcomes. Cochrane Database of Systematic Reviews(2), 10.1002/14651858.CD002098 [DOI] [PubMed] [Google Scholar]
- Dolezal C, Marhefka SL, Santamaria EK, Leu C-S, Brackis-Cott E, & Mellins CA (2012). A comparison of audio computer-assisted self-interviews to face-to-face interviews of sexual behavior among perinatally HIV-exposed youth. Archives of sexual behavior, 41(2), 401–410, 10.1007/s10508-011-9769-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dulin PL, Alvarado CE, Fitterling JM, & Gonzalez VM (2017). Comparisons of alcohol consumption by timeline follow back vs. smartphone-based daily interviews. Addiction research & theory, 25(3), 195–200, 10.1080/16066359.2016.1239081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernie BA, Spada MM, & Brown RG (2019). Motor fluctuations and psychological distress in Parkinson’s disease. Health Psychology, 38(6), 518–526, 10.1037/hea0000736 [DOI] [PubMed] [Google Scholar]
- Granholm E, Loh C, & Swendsen J (2007). Feasibility and validity of computerized ecological momentary assessment in schizophrenia. Schizophrenia bulletin, 34(3), 507–514, 10.1093/schbul/sbm113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hjorthøj CR, Hjorthøj AR, & Nordentoft M (2012). Validity of timeline follow-back for self-reported use of cannabis and other illicit substances—systematic review and meta-analysis. Addictive Behaviors, 37(3), 225–233, 10.1016/j.addbeh.2011.11.025 [DOI] [PubMed] [Google Scholar]
- Husky M, Olié E, Guillaume S, Genty C, Swendsen J, & Courtet P (2014). Feasibility and validity of ecological momentary assessment in the investigation of suicide risk. Psychiatry research, 220(1–2), 564–570, 10.1016/j.psychres.2014.08.019 [DOI] [PubMed] [Google Scholar]
- Livingston NA, Flentje A, Heck NC, Szalda-Petree A, & Cochran BN (2017). Ecological momentary assessment of daily discrimination experiences and nicotine, alcohol, and drug use among sexual and gender minority individuals. Journal of Consulting and Clinical Psychology, 85(12), 1131, 10.1037/ccp0000252 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manini TM, Mendoza T, Battula M, Davoudi A, Kheirkhahan M, Young ME, … Rashidi P (2019). Perception of Older Adults Toward Smartwatch Technology for Assessing Pain and Related Patient-Reported Outcomes: Pilot Study. JMIR mHealth and uHealth, 7(3), 10.2196/10044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLean S, Sheikh A, Cresswell K, Nurmatov U, Mukherjee M, Hemmi A, & Pagliari C (2013). The impact of telehealthcare on the quality and safety of care: a systematic overview. PloS one, 8(8), 10.1371/journal.pone.0071238 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelson RO (1977). Assessment and therapeutic functions of self-monitoring Progress in behavior modification, 5, 263–308, 10.1016/B978-0-12-535605-3.50012-1 [DOI] [Google Scholar]
- Newcomb ME, Swann G, Mohr D, & Mustanski B (2018). Do diary studies cause behavior change? An examination of reactivity in sexual risk and substance use in young men who have sex with men. AIDS and Behavior, 22(7), 2284–2295, 10.1007/s10461-018-2027-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rowe C, Hern J, DeMartini A, Jennings D, Sommers M, Walker J, & Santos G-M (2016). Concordance of text message ecological momentary assessment and retrospective survey data among substance-using men who have sex with men: a secondary analysis of a randomized controlled trial. JMIR mHealth and uHealth, 4(2), http://mhealth.jmir.org/2016/2/e44/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schroder KE, Johnson CJ, & Wiebe JS (2007). Interactive voice response technology applied to sexual behavior self-reports: A comparison of three methods. AIDS and Behavior, 11(2), 313–323, 10.1007/s10461-006-9145-z [DOI] [PubMed] [Google Scholar]
- Shiffman S, Stone AA, & Hufford MR (2008). Ecological momentary assessment. Annual Review of Clinical Psychology, 4, 1–32, Retrieved from: http://www.ncbi.nlm.nih.gov/pubmed/18509902 (Accessed 11/15/19) [DOI] [PubMed] [Google Scholar]
- Simons JS, Wills TA, Emery NN, & Marks RM (2015). Quantifying alcohol consumption: Self-report, transdermal assessment, and prediction of dependence symptoms. Addictive Behaviors, 50, 205–212, 10.1016/j.addbeh.2015.06.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith A (2017). Record shares of Americans now own smartphones, have home broadband. Pew Research Center, 12. [Google Scholar]
- Sobell LC, Brown J, Leo GI, & Sobell MB (1996). The reliability of the Alcohol Timeline Followback when administered by telephone and by computer. Drug and alcohol dependence, 42(1), 49–54, 10.1016/0376-8716(96)01263-X [DOI] [PubMed] [Google Scholar]
- StataCorp (2007). Stata Statistical Software: Release 10. College Station, TX, USA: StataCorp LP [Google Scholar]
- Stone AA, & Shiffman S (1994). Ecological momentary assessment (EMA) in behavorial medicine. Annals of Behavioral Medicine, 16(3), 199–202, 10.1093/abm/16.3.199 [DOI] [Google Scholar]
- Turner CF, Ku L, Rogers SM, Lindberg LD, Pleck JH, & Sonenstein FL (1998). Adolescent sexual behavior, drug use, and violence: increased reporting with computer survey technology. Science, 280(5365), 867–873, 10.1126/science.280.5365.867 [DOI] [PubMed] [Google Scholar]
- Walters EH, Walters J, Wills KE, Robinson A, & Wood-Baker R (2012). Clinical diaries in COPD: compliance and utility in predicting acute exacerbations. International journal of chronic obstructive pulmonary disease, 7, 427, 10.2147/COPD.S32222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walz LC, Nauta MH, & aan het Rot M (2014). Experience sampling and ecological momentary assessment for studying the daily lives of patients with anxiety disorders: A systematic review. Journal of anxiety disorders, 28(8), 925–937, 10.1016/j.janxdis.2014.09.022 [DOI] [PubMed] [Google Scholar]
- Watson D, & Clark LA (1994). The PANAS-X: Manual for the positive and negative affect schedule-expanded form.
- Welch BM, Marshall E, Qanungo S, Aziz A, Laken M, Lenert L, & Obeid J (2016). Teleconsent: a novel approach to obtain informed consent for research. Contemporary clinical trials communications, 3, 74–79, 10.1016/j.conctc.2016.03.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams DA, Gendreau M, Hufford MR, Groner K, Gracely RH, & Clauw DJ (2004). Pain assessment in patients with fibromyalgia syndrome: a consideration of methods for clinical trials. The Clinical journal of pain, 20(5), 348–356, [DOI] [PubMed] [Google Scholar]
- Wray TB, Adia AC, Pérez AE, Simpanen EM, Woods L-A, Celio MA, & Monti PM (2019). Timeline: A web application for assessing the timing and details of health behaviors. The American journal of drug and alcohol abuse, 45(2), 141–150, 10.1080/00952990.2018.1469138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray TB, Kahler CW, & Monti PM (2016). Using ecological momentary assessment (EMA) to study sex events among very high-risk men who have sex with men (MSM). AIDS and Behavior, 20(10), 2231–2242, 10.1007/s10461-015-1272-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wray TB, Merrill JE, & Monti PM (2014). Using ecological momentary assessment (EMA) to assess situation-level predictors of alcohol use and alcohol-related consequences. Alcohol research: current reviews, 36(1), 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang C, Linas B, Kirk G, Bollinger R, Chang L, Chander G, … Latkin C (2015). Feasibility and acceptability of smartphone-based ecological momentary assessment of alcohol use among African American men who have sex with men in Baltimore. JMIR mHealth and uHealth, 3(2), e67, 10.2196/mhealth.4344 [DOI] [PMC free article] [PubMed] [Google Scholar]


