Abstract
Ecological Momentary Assessments (i.e., EMA, repeated assessments in daily life) are widespread in many fields of psychology and related disciplines. Yet, little knowledge exists on how differences in study designs and samples predict study compliance and dropout—two central parameters of data quality in (micro-)longitudinal research. The current meta-analysis included k = 477 articles (496 samples, total N = 677,536). For each article, we coded the design, sample characteristics, compliance, and dropout rate. The results showed that on average EMA studies scheduled six assessments per day, lasted for 7 days, and obtained a compliance of 79%. Studies with more assessments per day scheduled fewer assessment days, yet, the number of assessments did not predict compliance or dropout rates. Compliance was significantly higher in studies providing financial incentives. Otherwise, design or sample characteristics had little effects. We discuss the implications of the findings for planning, reporting, and reviewing EMA studies.
Keywords: meta-analysis, ecological momentary assessment, experience sampling method, ambulatory assessments, study compliance, dropout
For the last two decades, Ecological Momentary Assessments (EMA1), that is, repeated, frequent assessments in people’s daily life, have become very common in different fields of psychology (e.g., emotion, health, work, and social relationships) and related disciplines (Bolger & Laurenceau, 2013; May et al., 2018; Van Berkel et al., 2017; van Roekel et al., 2019). Given that EMA is used in very different fields, the examined topics and populations often require different assessment schedules, but only vague guidance for planning studies is available to date: “Low frequency [i.e., rare] states require a higher sampling rate or event-contingent sampling, and longer total duration compared with very frequent states” (Wrzus & Mehl, 2015, p. 253).
In deciding on the assessment schedule, that is, the number of assessments per day and the total duration of assessment days, researchers have to balance competing goals: A first goal is to obtain a sufficient number of assessments per person to study the phenomenon of interest reliably, and to obtain sufficient statistical power and precision for the planned analyses. A second goal when studying within-person dynamics and patterns is to match the sampling schedule to the assumed “natural frequency” of the investigated phenomenon. An obvious example is the assessment of tiredness at least twice a day, before and after a night sleep, instead of every other day. Thus, researchers have to decide (a) how often per day and (b) for how many days or weeks will they assess the target phenomenon. This relates to the final goal: to keep the number of assessments manageable for the target population to reduce participant burden and thus to retain adherence to the assessment protocol (see also Hasselhorn et al., 2021; Janssens et al., 2018). Study adherence is central for obtaining reliable results and refers to both participants remaining in the study (i.e., low participant dropout) and participants providing the intended amount of data (i.e., compliance rate, which is the response rate of answered assessments relative to the scheduled assessments).
The aims of the current meta-analysis are to examine how differences in assessment schedules affect dropout and compliance, and how these effects might differ across samples (e.g., varying in age or health) in various research areas of psychology. In contrast to previous single studies (Eisele et al., 2020; Guo et al., 2017), meta-analyses allow examining multiple factors simultaneously, such as assessment frequency, study duration, incentives, research topic, age, and health of sample, as well as interactions among these factors.
Reasons for Increased Use of EMA
We briefly revisit the main advantages of using EMA in diverse fields of psychology and related disciplines: real-time, real-life, and within-person (Mehl & Conner, 2012). First, retrospective memory and report biases are greatly reduced in EMA, even compared with end-of-day diaries, because participants are asked what they were doing, feeling, or thinking at a given moment or during the past hour(s), (Lucas et al., 2020; Neubauer et al., 2020; Robinson & Clore, 2002; Schwarz, 2012). Second, behavior that occurs scarcely or is unethical to induce in laboratory experiments, such as physical violence or suicidal thoughts, can be observed in daily life (Kleiman et al., 2017; Reis, 2012). Third, repeated assessments per person allow to study within-person effects (e.g., how variation in sleep influences well-being) as well as within-person dynamics over a substantial time period longer than typical experiments (e.g., how feelings and behavior change from 1 hr, day, or week to another; Carpenter et al., 2016; Hamaker, 2012; Molenaar, 2004; Molenaar et al., 2009).
Despite EMA offering these advantages and being increasingly and easily used in diverse research areas, the study designs and the examined populations differ substantially and these differences might affect systematically the data quality, that is, the compliance and the dropout. In the following sections, we review previous knowledge on how design characteristics (e.g., assessment schedules, incentives, and research topics) and characteristics of the sample (e.g., gender, age, and health) might contribute to participants’ adherence to study protocols.
Assessment Schedules and Incentives in EMA
When reading current EMA studies, the impression arises that most studies ask participants to report around 5 times a day for a week or less. Yet notable exceptions have been conducted: For example, young adults received 200 signals within 4 consecutive days, that is, signals every 17 min during their waking hours, to report their momentary affect with 1 item (Kuppens et al., 2010). Equally demanding were studies where again young adults answered one or two assessments per day on consecutive days for 2 months (Larsen, 1987) or for even 6 months (Epstein & Preston, 2012). Thus, notable differences in assessment schedules exist, and the current meta-analysis examines both potential systematic differences between research topics in assessment schedules and the effects of assessment schedules on study adherence.
Assessment Schedules
Previous meta-analyses on EMA studies in specific subdisciplines (e.g., clinical research and adolescent development) described surprisingly similar assessment schedules of on average four to six assessments per day for almost 2 weeks (Heron et al., 2017; Jones et al., 2019; May et al., 2018; Morren et al., 2009; Ono et al., 2019; Ottenstein & Werner, 2021; Rintala et al., 2019; Soyster et al., 2019; van Berkel et al., 2019; van Roekel et al., 2019; Wen et al., 2017, see Supplementary Table S1 for details). At the same time, these meta-analyses also document substantial variability in the number of daily assessments and the number of assessment days, which might reflect specific decisions related to research topic and could affect participants’ adherence to the study protocol (i.e., compliance rates and dropout).
In most previous studies, the compliance rate (i.e., percentage of answered assessments from all scheduled assessments) did not differ with the number of assessments per day (Jones et al., 2019; Ottenstein & Werner, 2021; Rintala et al., 2019; Vachon et al., 2019). Studies that experimentally varied the number of assessments per day demonstrated likewise that compliance for 14 days was comparable when answering 3, 6, 9, or 12 prompts per day (Eisele et al., 2020; Stone et al., 2003). In contrast, as assessment days increased within studies, typically for more than 1 week, participation often became burdensome and compliance declined over time (Ono et al., 2019; Rintala et al., 2019; Silvia et al., 2013). Accordingly, some studies schedule break days without assessments in between assessment days to maintain motivation and thus compliance for longer periods (e.g., Neubauer et al., 2018; Riediger et al., 2009).
When comparing studies of different duration, often no significant differences in compliance rate were observed—likely because longer studies often demanded fewer assessments per day to reduce participant burden (Jones et al., 2019; Soyster et al., 2019; van Roekel et al., 2019). Most previous studies showed that the applied assessments work reasonably well with respect to compliance rates, likely because incentives and further efforts to encourage compliant participation are provided with intense, burdensome assessments (e.g., Kuppens et al., 2010).
Incentives
Although not all EMA studies report how they compensated participants (see also Jones et al., 2019; van Berkel et al., 2019; van Roekel et al., 2019), the majority provide financial incentives. Providing any kind of monetary or other incentive has been associated with higher compliance rates compared with no incentives (Ottenstein & Werner, 2021; Vachon et al., 2019; van Berkel et al., 2019). An experimental study on providing monetary incentives or not confirmed that compensating participants for a demanding EMA study increased both data quality (i.e., compliance rates) and willingness to participate (i.e., reduced sample selectivity; Ludwigs et al., 2020). Regarding the amount of incentives, meta-analyses found no significant association with compliance rates (Jones et al., 2019; Morren et al., 2009; Wen et al., 2017), yet it might be possible that the amount of incentives matches the demands of the study and thereby compensates for more intense assessment schedules.
Research Topic
In addition to the expected participant burden, the specific research topic might guide decisions on how often and for how many days people are asked to report thoughts, feelings, and behaviors. Previous theoretical work has suggested sampling less frequent events more often and for longer periods so as not to miss the event (Collins & Graham, 2002; Wrzus & Mehl, 2015). For example, positive emotions and social interactions occur frequently and thus can be captured reliably with a couple of assessments per day for a few days (e.g., Kuppens et al., 2010; Nezlek, 1993). In contrast, suicidal thoughts occur scarcely and were thus asked 4 times a day for 28 days, for example (Kleiman et al., 2017). The previous meta-analyses on EMA did not systematically compare whether assessment schedules differed with research topics. Thus, the current meta-analysis explores differences between studies focusing on emotions, health behavior, social interactions, and further topics frequently examined using EMA (Hoppmann & Riediger, 2009).
Sample Effects in EMA
Study adherence (i.e., compliance rates and dropout) might vary not only across assessment schedules and incentive structures as described before but also across different samples. At the same time, different samples might require different assessment schedules: For example, highly intense schedules, with measurements every 17 min throughout the day, might overburden some populations (e.g., older adults or employees) more than others (e.g., young adults and students).
Gender
Although some studies observed higher compliance among women compared with men (Rintala et al., 2019; Silvia et al., 2013; Vachon et al., 2019; van Roekel et al., 2019), other studies did not examine gender differences (Heron et al., 2017; May et al., 2018; van Berkel et al., 2019; Wen et al., 2017) or found no significant differences (Jones et al., 2019; Morren et al., 2009; Ono et al., 2019; Ottenstein & Werner, 2021; Soyster et al., 2019). Research on gender differences in responsibility and conscientiousness suggests that women are more reliable on average (Schmitt et al., 2008), which would also contribute to closer adherence to the assessment schedule. However, individual differences in conscientiousness were not significantly related to compliance rates in two EMA studies (Courvoisier et al., 2012; Soyster et al., 2019). The absence of gender differences is not always attributable to insufficient power to detect differences but instead might indicate complex interplays between gender, assessment schedule, and incentives, which have not been tested in previous studies.
Age
With respect to age differences, one would assume that researchers apply less demanding assessment schedules to children and adolescents compared with adults to reduce participant burden for younger samples. Yet, the three meta-analyses on EMA in adolescence (Heron et al., 2017; van Roekel et al., 2019; Wen et al., 2017) reported equally demanding schedules of on average four to six assessments per day for almost 2 weeks (Supplementary Table S1). Interestingly, the compliance rates did not differ with age (van Roekel et al., 2019) or age differences were not reported (Heron et al., 2017; Wen et al., 2017), although adolescents are on average less dependable and reliable compared with preadolescent children.
Among adults, age differences in compliance rates might be nonlinear with the highest compliance among older adults and a leveling off in old age (Ono et al., 2019) as young adults and working middle-aged adults might be either less responsible or not always able to respond to assessment prompts throughout their daily routines (Beal, 2015; Morren et al., 2009; Rintala et al., 2019). At the same time, several previous studies found no significant age differences in compliance rates (Jones et al., 2019; Soyster et al., 2019; Vachon et al., 2019), which might be partly attributable to restricted age ranges and examining only linear age differences. Thus, theoretically, we assume compliance to be nonlinearly related to age with the highest compliance among elementary school children and older adults before leveling off in old age. Again, incentives might compensate for age differences in compliance. This reasoning suggests that age differences in adherence might be modulated by certain design characteristics (e.g., the total number of assessments or the incentives provided).
Clinical Samples and Participants With Mental or Physical Illnesses
EMA studies have a long history in clinical research because symptoms can be monitored more accurately compared with retrospective reports (Hufford & Shields, 2002). EMA has thus been implemented successfully in studies on mental or physical illnesses. Descriptively, the implemented assessment schedules are highly similar to schedules used in healthy samples, with around six assessments per day for 1 to 2 weeks (May et al., 2018; Morren et al., 2009; Vachon et al., 2019; see Table S1).
Likewise, compliance rates of around 80% suggest that, on average, EMA provides similar data density in clinical samples as in healthy samples. Accordingly, previous studies found only very small differences in compliance between different clinical samples (Soyster et al., 2019; Wen et al., 2017). When some difference could be detected, patients with psychosis were somewhat less compliant compared with healthy participants (Jones et al., 2019; Rintala et al., 2019; Silvia et al., 2013; Vachon et al., 2019). Surprisingly, there were no significant differences between patients diagnosed with depression and healthy participants (Rintala et al., 2019; Vachon et al., 2019), although symptoms of depression include listlessness. Perhaps diagnostic groups might not differentiate finely enough between the strength of symptoms as more severe symptoms of psychosis or depression have been related to lower compliance rates, albeit in a sample of young students (Silvia et al., 2013).
In addition to mental or physical illnesses impeding study participants, being ill might also motivate participation, especially in studies that examine participants’ clinical conditions. For example, several studies on pain demonstrated that neither diagnosis nor pain intensity predicted compliance rates, which were quite high overall (Morren et al., 2009; Ono et al., 2019). Thus, clinical conditions might not interfere much with EMA schedules and in some cases might even increase the motivation to participate compliantly if the study topic targets the illness. Furthermore, participant burden, that is, more assessments per day and more assessment days, might further moderate effects of clinical illnesses on compliance rates, that is, predicting lower compliance among ill participants compared with healthy participants.
Preregistered Hypotheses
The hypotheses follow the theoretical background, while in the preregistration (registered on January 29, 2019, under https://osf.io/exvdc), the hypotheses had a different order but identical wording. We did not always assume similar effects for compliance and dropout and thus registered only hypotheses that were grounded in previous work and theoretical assumptions outlined before.
Design Differences of Research Fields
Hypothesis 1 (H1): The number of assessments varies with the content of EMA questionnaires: The highest numbers of assessments per day occur for affective states.
Design Effects on Compliance and Dropout
Hypothesis 2 (H2): With higher number of assessments, compliance is lower. The association might be accelerated with higher numbers of assessments.
Hypothesis 3 (H3): With higher number of assessments (per day or total), dropout is higher. The association might follow an exponential function.
Hypothesis 4 (H4): The implementation of break days increases compliance.
Hypothesis 5 (H5): The implementation of break days decreases dropout.
Sample Effects on Compliance and Dropout
Hypothesis 6 (H6): Compliance, that is, percentage of filled out assessments, varies with age of the participants and follows a U-shaped curve: Compliance is highest among children and older participants and lowest among adolescents.
Hypothesis 7 (H7): Study compliance is higher among women compared with men.
Combined Effects of Design and Sample Characteristics on Compliance and Dropout
Hypothesis 8 (H8): Dropout is higher among samples with ill participants, except when the study topic is about health-related issues (study topic moderates the effect of health on dropout).
Hypothesis 9a–c (H9a–c): The negative association between number of assessments and compliance is moderated by age, incentive, and participant care: stronger associations among adolescents; with no or little incentives; and without further care/contact with participants.
Hypothesis 10a–e (H10a–e): The positive association between number of assessments and dropout is moderated by sample age (association more pronounced among adolescents and less pronounced among older participants), gender (association less pronounced among women), sample health (association more pronounced among ill participants), incentive and care (association less pronounced when monetary incentives or care are provided).
Method
Disclosure of Open Science Practices
We designed and reported the results of the meta-analysis in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (Shamseer et al., 2015). We preregistered the hypotheses, data collection, and coding on January 29, 2019, before starting the data collection, that is, study retrieval, on https://osf.io/exvdc. Data, materials, and code for analyses and figures can be accessed at https://osf.io/a34nj/. No ethical approval was required because we analyzed data from published articles rather than human participants directly.
Search of Studies and Inclusion Criteria
The databases Academic Search Complete, PsycArticles, and PsycInfo were searched via EBSCOhost web to obtain EMA studies, that is, studies in which participants answered one or more questions several times per day for several days. Accordingly, the search terms were experience sampling OR momentary assessment OR event-contingent sampling OR ambulatory assessment, and the exclusion criteria were:
The article was not empirical (e.g., case study or review).
No EMA method was used (e.g., daily diary with only one assessment per day; study assessed only physiological parameters).
The article was not in English or German.
From 4,024 unique hits (i.e., without duplicates), 2,501 were excluded based on the exclusion criteria. We retrieved the PDF of the remaining 1,523 articles and coded about one-third as a random subsample.2 During coding, we removed further articles, which were not empirical, for example, reviews (k = 19) did not apply an EMA method (k = 31) or reported not enough information on the number of assessments (k = 7, Figure 1). After the exclusion of six mistakenly coded studies with only one assessment per day or one assessment day, information on 496 samples from 477 articles were included for the analyses.
Figure 1.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Flow Diagram of the Literature Search, Study Exclusion, and Study Coding.
Coding of the Studies
We developed a detailed coding scheme that focused on three domains: (a) EMA design characteristics, (b) sample characteristics, and (c) compliance and dropout. In addition, we recorded the study information including first author, publication year, and publication type. The complete coding manual can be retrieved from the OSF repository accompanying this work: https://osf.io/a34nj/.
During the coding of the studies, we paid attention to similarities of samples in articles from the same authors to avoid including the same study (i.e., data set) repeatedly, if it had been published in several articles. In addition, we checked the age and gender distribution of studies with the same sample size and, in the case of multiple publications from the same data set, kept the publication that provided the most information on the study and sample characteristics.
Study Design Characteristics
Number of Assessments
We recorded the number of scheduled assessments per day, intervals between assessments, scheduled number of assessment days, and scheduled break days as continuous information. The average numbers of actual assessments per day and in total across the entire study were recorded in two separate variables because studies vary in how they report the actual number of assessments.
Schedule
We coded the sampling scheme of the study according to established taxonomies (Wrzus & Mehl, 2015): 1 = interval-contingent (e.g., assessments every 3 hr not random), 2 = signal-contingent (e.g., auditory signals occurred at [quasi-]random or non-random times), 3 = event-contingent (e.g., assessments after specific events such as conflicts or meals), 4 = mixed (e.g., combination of signal- and event-contingent assessments).
Study Topic
Based on previous reviews (Hoppmann & Riediger, 2009; van Roekel et al., 2019), the topic of the research was categorized using the following categories: 1 = physical health and health behavior (e.g., physical activity and diet), 2 = mental health, 3 = emotions (e.g., affective states, well-being, stress, and emotion regulation), 4 = romantic or marital relationships (including sexual behavior), 5 = friendships, 6 = family relationships, 7 = general social relationships or social interactions, 8 = education (school/university), 9 = work/employment, 10 = consumer behavior (shopping, video games, and music), 11 = situations, places, 12 = personality states, 13 = other (e.g., daily activities). Two variables with the same categories were used to accommodate multiple topics in the same study. The variables often corresponded to the main predictor and outcome of the EMA results reported in the article. For example, a study examining associations between momentary affect and food choices (Ashurst et al., 2018) was coded as examining Topics 1 (i.e., physical health behavior including diet) and 3 (i.e., emotion). We summarized the two variables into 13 dummy-coded variables representing the 13 categories.
Additional Physiological or Other Measures
One variable represented whether the EMA study included further measures: 1 = none, 2 = mobile sensing (e.g., GPS tracking), 3 = wearable sensors (e.g., watches or chest belts that measure cardiac activity), and 4 = other (e.g., saliva sampling).
Incentives
Studies were coded for providing the following incentives: 1 = no incentives, 2 = immediate feedback, 3 = later feedback or study results, 4 = monetary incentives, 5 = vouchers (e.g., books and gift cards), 6 = lottery for money, gift, or voucher, 7 = new medical treatment (e.g., participation in novel therapy approach), 8 = sweets or food, 9 = course credit, 10 = combination of incentives (e.g., course credit and lottery), 11 = other. In case of individual monetary incentives including vouchers, the amount in USD was coded. Category 1 (no incentive) was only coded if it was explicitly stated that no incentive for participation was provided (if no information was given, this variable was coded as missing). We also coded whether additional efforts were made to enforce high compliance (1 = no, 2 = yes, e.g., requiring minimum compliance rate to pay incentives). Finally, it was assessed whether participants were contacted during or after the EMA phase to monitor or incentivize study participation: 1 = no/not reported, 2 = direct personal contact, for example, phone calls, messages, 3 = general contact, for example, generic messages to everybody, holiday cards, 4 = events hosted by researchers, 5 = other.
Sample Characteristics
Sample Size and Sample Type
The size of the sample was coded continuously. Depending on how participants were recruited, we categorized the sample as 1 = representative sample, 2 = convenience sample, 3 = online sample, or 4 = college student sample.
Age
In addition to average age, SD, and range of participants’ age (if available), we coded the age group from the available information (e.g., first graders, average age, and age range) using the categories: 1 = childhood (≤12 years), 2 = adolescence (13–18 years), 3 = young adulthood (19–39 years), 4 = middle adulthood (40–64 years), 5 = late adulthood (65–80 years), 6 = lifespan sample including childhood or adolescence, 7 = adult lifespan sample. The age group describes the age of most participants. If the age mean, SD, or age range suggested that people from more than one age group were studied, the categories 6 or 7 for lifespan samples were chosen.
Gender
The proportion of female participants was coded continuously.
Sample Health
The samples were categorized as 1 = healthy sample, 2 = clinical sample with physical illnesses, 3 = clinical sample with mental illnesses, or 4 = mixed sample.
Study Adherence
Compliance Rate
Participants’ average compliance with the study protocol/assessment schedule was continuously coded as the percentage of completed assessments relative to the number of scheduled assessments. In cases when the compliance rate was not reported as a percentage, it was computed from the completed number of assessments relative to the scheduled number of assessments.
Dropout
The percentage of participants, who started the EMA but did not continue until the end, was coded as dropout. Reported dropout reasons were coded as: 1 = unknown/none of the reasons, 2 = personal reasons (e.g., relocation), 3 = health-related reasons, 4 = motivational reasons, 5 = refusal (e.g., content-related), and 6 = mixed (e.g., motivation and technical problems).
Measurement Reactivity
Whether the repeated assessment of momentary experiences and behaviors changed the reports of such experiences and behaviors was coded as measurement reactivity using the categories: 1 = not reported, 2 = linear change over time in EMA variables, 3 = difference of EMA participants compared with control group without EMA, 4 = reactivity directly reported (e.g., participants rated reactivity), and 5 = reactivity tested, but no significant reactivity observed.
Interrater Agreement
A total of 15 coders,3 3 research assistants and 12 students, were trained and coded 10 studies to analyze the interrater agreement. The average agreement of ICC (1,1) = .85 and .91 was sufficient across the coders for the central study variables (see Table S2 for details on interrater agreement). After double coding, the raters received feedback on their accuracy and continued coding. During further coding, open questions and unclear cases were solved during regular consensus meetings.
Publication Bias
The current meta-analysis did not aim to synthesize a specific effect size for which publication bias might exist (Hunter & Schmidt, 2004; McShane et al., 2016). Instead, we aimed at describing the current state of the art of EMA across different research topics in published research. We did not search specifically for unpublished studies or excluded dissertations because publication bias usually works on central study questions and outcomes, while the current study focused on design and sample. Nonetheless, studies with weak designs, for example, few participants and/or number of assessments, might be less likely to be published because of methodological weaknesses. We will discuss the observed range of assessments and compliance in light of potential publication biases.
Data Analyses
Because most articles (461/477 = 96.6%) reported results from one sample only, we treated the coded information of all 496 samples as independent. Given the small number of studies that reported multiple studies, we deviate from the preregistered meta-regression and used linear models assuming independent observations (i.e., OLS regression, t tests, analysis of variance [ANOVAs]) for the analyses.
Results
Description of Study Designs and Samples
Study Characteristics
We first describe the designs of the EMA studies (see complete descriptive statistics in Table 1), before turning to the underlying samples (see Table 2 for complete descriptive statistics). The majority of the included studies used a signal-contingent (58%) or interval-contingent (18%) sampling scheme. Only a few studies (5%) applied a purely event-contingent sampling, while combinations of event- and interval-contingent schemes also occurred (16%, Table 1). The average study was scheduled for 12.4 days (Mdn = 7) with on average 6.53 (Mdn = 6) assessments per day. The most common study topics included emotion (k = 327 studies, 66%), mental health (k = 116, 23%), and physical health and health behavior (k = 109, 22%). A total of 164 studies (33%) reported monetary incentives for study participation, with on average USD 96.47 (Mdn = 50). In 75 (15%) studies, it was explicitly mentioned that no incentives were provided. Other incentives (e.g., partial course credit, vouchers, lottery, combinations of these) were provided in 92 (19%) studies, and 165 (33%) studies did not report any information on incentives. A total of 134 (27%) studies reported that they used some form of compliance reinforcement, such as incentives dependent on providing a minimum percentage of assessments. In addition, in 78 (16%) studies researchers had contact with participants over the course of the study (e.g., via phone calls, personal contact, or events hosted by the researchers), while the majority of studies did not report contact with participants (k = 418, 84%).
Table 1.
Descriptive Statistics of the Designs of the Studies Included in the Analyses.
| Categorial variables | % | k | |
|---|---|---|---|
| Schedule | |||
| 1 = interval contingent | 18.1 | 90 | |
| 2 = signal contingent | 57.9 | 287 | |
| 3 = event contingent | 5.0 | 25 | |
| 4 = mixed | 15.7 | 78 | |
| –9 = Missing | 3.2 | 16 | |
| Study Topic | |||
| 1 = physical health and health behavior | 22.0a | 109 | |
| 2 = mental health | 23.4 | 116 | |
| 3 = emotions | 65.9 | 327 | |
| 4–7 = social relationships and interactions | 15.1 | 75 | |
| 8 and 9 = education or work | 8.7 | 43 | |
| 10 = consumer behavior | 4.2 | 21 | |
| 11 = situations, places | 18.3 | 91 | |
| 12 = personality states | 13.3 | 66 | |
| 13 = other (e.g., goals) | 9.3 | 46 | |
| Additional measures | |||
| 1 = none | 88.1 | 437 | |
| 2 = mobile sensing (e.g., GPS or Apps) | 5.2 | 26 | |
| 3 = wearable sensors | 4.2 | 21 | |
| 4 = other (e.g., saliva sampling) | 2.4 | 12 | |
| Incentives | |||
| 1 = no incentives | 15.1 | 75 | |
| 4 = direct monetary incentives | 33.1 | 164 | |
| 5, 6, 10 = other financial incentives/combination | 11.9 | 59 | |
| 9 = course credit | 3.8 | 19 | |
| 11 = other (incl. feedback) | 2.8 | 14 | |
| –9 = Missing | 33.3 | 165 | |
| Continuous variables | M | SD | Mdn (range) |
| Scheduled assessment days (k = 482) | 12.40 | 16.38 | 7 (2-180) |
| Scheduled assessments per day (k = 390) | 6.53 | 5.02 | 6 (1.7-81) |
| Actual assessments per days (k = 287) | 5.14 | 3.44 | 4.6 (0.1-35.4) |
| Actual total number of assessments (k = 347) | 51.11 | 45.29 | 39.9 (2.4-455) |
| Intervals between assessments (in min, k = 233) | 141.12 | 89.57 | 120 (15-720) |
| Amount of incentive in USD (k = 205) | 96.47 | 179.65 | 50 (5-2310) |
Percentage of topic domains add up to more than 100% because up to two topics were coded per study.
Table 2.
Descriptive Statistics of the Samples Included in the Meta-Analyses.
| Categorial variables | % | k | |
|---|---|---|---|
| Sample type | |||
| 1 = representative sample | 10.7 | 53 | |
| 2 = convenience sample | 59.7 | 296 | |
| 3 = online sample | 0.8 | 4 | |
| 4 = college student sample | 22.0 | 109 | |
| −9 = Missing | 6.9 | 34 | |
| Age group | |||
| 1 = childhood (≤12 years) | 1.2 | 6 | |
| 2 = adolescence (13–18 years) | 11.3 | 56 | |
| 3 = young adulthood (19–39 years) | 36.9 | 183 | |
| 4 = middle adulthood (40–64 years) | 2.0 | 10 | |
| 5 = late adulthood (65–80 years) | 0.4 | 2 | |
| 6 = lifespan sample including children | 2.8 | 14 | |
| 7 = adult lifespan sample | 41.5 | 206 | |
| −9 = Missing | 3.8 | 19 | |
| Sample health | |||
| 1 = healthy sample | 61.1 | 303 | |
| 2 = clinical sample with physical illnesses | 6.9 | 34 | |
| 3 = clinical sample with mental illnesses | 15.9 | 79 | |
| 4 = mixed sample | 13.7 | 68 | |
| −9 = Missing | 2.4 | 12 | |
| Continuous variables | M | SD | Mdn (range) |
| Sample size (k = 496) | 136.6 | 176.0 | 87.5 (4–2,001) |
| Mean sample age (k = 429) | 31.50 | 12.66 | 29.5 (4.5–71.4) |
| Proportion female participants (in %, k = 473) | 62.98 | 22.55 | 61.3 (0–100) |
Study Topic and Number of Assessments (H1)
To test the hypothesis that the number of assessments per day would be larger in samples targeting affective states (H1), we compared studies that focused on emotions (k = 327) with the other studies. Preliminary analysis suggests that there were two outliers in the number of assessments per day (81 and 44 assessments per day). To remove the potential influence of these outliers, these were winsorized to the next highest observation (20 daily assessments) for all further analyses. No significant difference in number of assessments was observed in studies focusing on emotion (M = 6.30, SD = 2.58) compared with the other studies, M = 6.32, SD = 3.40, t(388) = 0.06, p = .955.
Sample Characteristics
Regarding the samples included, the average study included 136.6 participants (Mdn = 87.5). Most studies included healthy samples (61%), while clinical samples as well as samples including both patients and healthy controls were similarly frequent (Table 2). Most studies recruited convenience samples (60%), followed by college student samples (22%), and representative samples (11%). Seven percent of studies did not provide information on the type of sample, which is also reflected in the missing information on sample age (Table 2). Most samples (42%) were classified as adult lifespan samples, followed by young adulthood (37%), and adolescence (11%). Only six samples were classified as childhood samples, and two samples were drawn from late adulthood only. The samples were often rather balanced in gender, with on average 63% females in samples.
Descriptive Information on Compliance, Dropout, and Reactivity
The compliance rate was explicitly reported in 307 studies. For an additional 40 studies, articles provided enough information (i.e., scheduled assessment days, scheduled assessments per day, and the actually obtained number of assessments), and we estimated compliance from this information, that is, obtained total number assessments/(scheduled assessment days × scheduled assessments per day). Thus, all subsequent analyses are based on the compliance rates in 347 studies, that is, compliance rate reported or, if not reported, estimated from other study information.
Compliance
On average, compliance was 79.19% (SD = 13.64%; range = 9.83–101.85%; Mdn = 81.8%).4 In a majority of the studies (k = 197; 56.8%), reported compliance was 80% or greater; only nine (2.6%) of the studies reported compliance rates below 50%. Figure 2 (left panel) shows the distribution of the 347 compliance rates.
Figure 2.
Distribution of Compliance (Left Panel) and Dropout Rates (Right Panel).
Note. Compliance k = 347 samples (left panel); dropout rates k = 140 samples (right panel).
Dropout
Dropout was reported in only 140 studies (28.2% of all samples), with an average dropout rate of 10.58% (SD = 11.57%; range = 0–72; Mdn = 7.1%); see Figure 2 (right panel) for the distribution of dropout rates. Of the 140 studies that reported the percentage of dropout, 69% did not provide reasons for dropout. Most common were diverse problems (e.g., technical issues, motivational problems, 17%, k = 24), motivational reasons of participants (9%, k = 13), and other reasons (12%, k = 18) were also often mentioned.
Measurement Reactivity
The vast majority of studies (k = 475; 95.8%) did not report any tests of potential measurement reactivity. Of the remaining studies, 15 tested for reactivity and reported no evidence for measurement reactivity. One study reported a linear change in the momentarily assessed variable(s) over time, one study reported a significant difference between the EMA sample and the control group without EMA, and in four studies participants reported perceived reactivity in questionnaires.
Design Effects on Compliance and Dropout
Number of Assessments (H2 and H3)
The number of total scheduled assessments was computed as the product of the number of assessments per day and the number of planned study days. Notably, the winsorized number of assessments per day was negatively correlated with the number of study days, r = −.269, p < .001, 95% confidence interval (CI): [−.36, −.17], suggesting that studies with more assessments per day tended to last for fewer days. Overall, studies with both large numbers of assessments per day and long study duration were nonexistent in the sampled studies (see Figure 3). Descriptive analyses suggested six outliers in the total number of assessments that were winsorized (i.e., set to the next largest observation = 176 observations) for all subsequent analyses. Contrary to our H2 and H3, the total number of scheduled assessments was unrelated to compliance, b = .008, p = .714, r = .021, and dropout, b = −.002, p = .928, r = −.009. Investigating higher order polynomial effects (quadratic, cubic, quartic) or nonlinear (exponential) effects for the number of assessments did not reveal any statistically meaningful associations with compliance or dropout. Graphical inspection of the scatterplots (Figure 4) did not provide any evidence for a nonlinear association either. Unregistered analyses suggested a quadratic association between the number of assessments per day and compliance (blinear = −.682, p = .055, bquadratic = .103, p = .034): Compliance was lower with more assessments per day, and this effect flattened with more assessments. Furthermore, in contrast to H3, the number of assessments per day was also unrelated to dropout rate, b = .230, p = .528, r = −.06, and there was no evidence for polynomial or exponential associations of the number of assessments per day and dropout rate.
Figure 3.
Scatterplot of Number of Assessments Per Day With Number of Study Days.
Note. Solid lines depict the estimated linear association between the number of assessment days and the number of daily assessments.
Figure 4.
Scatterplot of Compliance (Left) and Dropout (Right) With the Number of Scheduled Assessments.
Note. Solid lines depict the loess-smoothed association between the two variables.
Break Days (H4 and H5)
Only few studies (9.5%) reported that break days between assessment days were implemented. Regarding the number of break days implemented, four studies were identified as outliers (with >100 break days). After winsorizing these data to the nearest observation (42 break days), studies with break days scheduled on average 11.3 days (median = 6, SD = 12.4, range = 1–42) without assessments before the next assessments started. Contrary to H4 and H5, compliance rates, t(327) = 0.49, p = .621, and dropout rates, t(128) = −0.39, p = .697, did not differ between studies that implemented break days (Mcomp = 78.42%, SDcomp = 15.39%; Mdrop = 10.98%, SDdrop = 9.14%) and those that did not (Mcomp = 79.59%, SDcomp = 13.22%; Mdrop = 9.77%, SDdrop = 10.30%).
Exploratory Analyses on Incentives and Compliance Reinforcement
In exploratory analyses, we also examined the main effects of incentives, compliance reinforcement, and contact with participants throughout the study on compliance and drop out. For incentives, we combined the 11 coded categories into three groups: direct monetary incentives, other types of incentives, or no incentives. One-way ANOVAs revealed no statistically meaningful differences in dropout rates, F(2, 105) = 0.18, p = .838, η² = .003, but statistically significant differences in compliance rates, F(2, 240) = 5.51, p = .005, η² = .044. Post hoc contrasts (multiple testing adjusted via the Tukey correction) revealed higher compliance in samples with monetary incentives (M = 82.21%, SD = 11.77%) compared with both other incentive methods (M = 77.52%, SD = 11.78%), t(240) = 2.55, p = .031, and no incentives (M = 76.20%, SD = 14.75%), t(240) = 2.84, p = .014. Samples with other incentives did not differ from samples without any incentives in their compliance, t(240) = 0.58, p = .833. The amount of monetary incentives was higher the more assessments were scheduled, r = .51, p < .001, 95% CI [.38, .61], yet the monetary amount was not significantly associated with compliance, r = .09, p = .272, 95% CI [−.07, .25], k = 151, or dropout rates, r = −.11, p = .411, 95% CI [−.35, .16], k = 58.
Samples with compliance reinforcements and samples without reinforcements did not differ in either compliance, t(345) = 0.62, p = .533, or dropout, t(138) = −0.91, p = .362. There were also no statistically meaningful differences in either compliance, t(345) = −1.14, p = .254, or dropout, t(138) = 0.72, p = .470, when comparing samples with contact during / after the EMA phase and those samples for whom no participant contact was reported.
Sample Effects on Compliance and Dropout
Age (H6)
To examine the effects of age, we reordered the age categories in ascending order with respect to participants’ average age across all samples: childhood (observed mean age = 9.73 years), adolescence (15.23 years), young adulthood (23.98 years), lifespan samples (including both those with and without childhood and adolescence 40.43 years), middle adulthood (47.09 years), and late adulthood (70.60 years). A one-way ANOVA suggested no statistically meaningful difference in compliance between these six groups, F(5, 328) = 2.00, p = .078, η² = .030. To test the specific hypotheses regarding the U-shaped association between age and compliance (H6), we conducted a regression predicting compliance from a linear and quadratic effect of the age group variable (centered on the third age group, young adulthood). Neither the linear, b = 1.28, p = .204, nor the quadratic effect, b = 1.41, p = .087, were statistically meaningful. Figure 5 depicts the compliances rates for each age group. Additional analyses, including average sample age as continuous linear and quadratic predictors of compliance (sample age was centered on 30 years), also demonstrated no significant linear, b = .053, p = .463, or quadratic age effect on compliance, b = .003, p = .526.
Figure 5.
Compliance Rates in Samples Belonging to Different Age Groups.
Note. The figure depicts individual studies (gray dots) and average compliance by age group (red dots). Error bars depict 95% bootstrap confidence interval.
Gender (H7)
The association between the proportion of women in the study and compliance (H7) was also not statistically significant, b = .063, p = .066, r = .102. When using only the explicitly reported compliance rates (k = 308), the association between gender composition and compliance was statistically significant, with higher compliance rates in studies including relatively more women, b = .070, p = .034, r = .125. In exploratory analyses, we tested gender effects on dropout but observed no statistically significant associations, b = −.058, p = .187, r = −.115.
Combined Effects of Design and Sample Characteristics on Compliance and Dropout
Health and Study Topic (H8)
We did not expect compliance rates to vary with the health status of EMA participants per se, and explorative analyses showed that compliance rates did not differ between healthy, physically ill, mentally ill, and mixed samples, F(3, 336) = 2.02, p = .111, η² = .018. To test the hypothesized difference in dropout between healthy samples and ill samples (H8), we compared dropout rates between the four sample types using an ANOVA, which yielded no statistically meaningful overall difference between the four groups, F(3, 133) = 1.28, p = .285, η² = .028. The planned contrast comparing the healthy group to the two patient groups was also not statistically significant, t(133) = 1.78, p = .078. Furthermore and differently from what was expected (H8), the study topic did not moderate differences in dropout between healthy and ill participants, either in studies targeting physical health, F(3, 129) = 0.88, p = .453, or mental health, F(3, 129) = 0.27, p = .845. Exploratory analyses examining the interaction effects on study compliances also yielded no improvement in model fit when including these interactions, F(3, 332) < 2.14, p > .095 for both physical and mental health.
Incentives and Number of Assessments
We next tested the interaction between the number of assessments with incentives, contact with participants during the EMA, and compliance reinforcement on compliance rates and dropout, respectively (H9b–c and H10d–e). Providing incentives (monetary, other, and no incentives) did not moderate the association between number of assessments and compliance, F(2, 211) = 0.71, p = .492, or dropout, F(2, 76) = 0.56, p = .571. Contact with participants during the EMA study also had no statistically meaningful effect on the association between number of assessments and compliance, F(1, 305) = 1.07, p = .302, or dropout, F(1, 105) = 0.40, p = .526.
For compliance reinforcement (in exploratory analyses), the interaction with the number of assessments was statistically significant when predicting compliance rate, F(1, 303) = 7.07, p = .008. Inspection of the regression coefficients revealed an unexpected negative interaction effect, b = −.132, p = .008 indicating that the compliance was lower with more assessments in samples who received incentives contingent on compliance (i.e., reinforced compliance), b = −0.090, 95% CI [−0.175, −0.006], whereas in studies without compliance reinforcement a higher number of assessments was descriptively, but not statistically significantly, associated with higher compliance b = 0.042, 95% CI [−0.007, 0.092]. Figure 6 depicts the model predicted association between the number of assessments and compliance separately by compliance reinforcement. To illuminate this unexpected interaction pattern, we examined differences between studies with and without contingent reinforcement in more detail. While neither the number of assessments per day, t(388) = 0.76, p = .447, nor the total number of assessments, t(380) = −0.51, p = .609, were significantly different between samples with and without contingent reinforcement, the number of study days was slightly larger in samples with (M = 14.80, SD = 19.81) compared to without (M = 11.50, SD = 14.81) contingent reinforcement, t(480) = −1.98, p = .048. In addition, the samples with contingent reinforcement were slightly younger (M = 29.48, SD = 11.86) than the samples without contingent reinforcement (M = 32.33, SD = 12.90), t(427) = 2.12, p = .034. Contingent reinforcement was also associated with the type of reinforcement (monetary incentives, other incentives, and no incentives), χ²(2) = 26.63, p < .001. While in samples with contingent reinforcement monetary incentives were provided in 61% of the cases, these were provided in only 44% of samples without contingent reinforcement. There was no interaction of compliance reinforcement with the number of assessments when predicting dropout rate, F(1,105) = 1.99, p = .162.
Figure 6.
Predicted Association Between Total Number of Assessments and Compliance Rate Moderated by Compliance Reinforcement.
Age and Number of Assessments
Next, we tested whether sample age moderated the effect of the number of assessments on compliance (H9a) and dropout (H10a). Given the small group size, the two samples with very old adults were removed for both the analyses, and the childhood samples were removed for the analyses on dropout rates. We set up two regression models with compliance rate and dropout rate as dependent variables, respectively, and the (winsorized) number of total assessments (centered on 50 assessments), age group, and their interaction as predictors. The model with the interaction did not improve explained variance for either compliance rate, F(4, 284) = 0.06, p = .993, or dropout rate, F(3, 97) = 2.27, p = .085, compared with a model having only the main effects, yielding no evidence for the hypothesized moderation.
Gender or Health and Number of Assessments
Contrary to our expectations, neither gender composition (H10b), F(1, 100) = 0.02, p = .990, nor health status of the sample (H10c), F(3, 98) = 2.26, p = .087, moderated the association between number of assessments and dropout rates. In exploratory analyses, we also examined interaction effects of gender and health status, respectively, with the number of assessments for compliance rates. We note that our preregistered hypotheses referred to the interactive effects on dropout rate only, so these analyses should be considered post hoc. Gender composition, F(1, 287) = 6.38, p = .012, but not sample health status, F(3, 293) = 2.03, p = .110, moderated the association between the number of assessments and compliance. Simple slope analyses showed that the effect of gender composition on compliance was stronger in studies with more assessments: At the average level of the number of assessments (60 assessments), the association between gender composition and compliance was b = .080, p = .036, and the association was stronger in samples with more assessments (1 SD above the average: 97 assessments), b = .180, p = .001, and not statistically meaningful in samples with fewer assessments (1 SD above the average: 24 assessments), b = −.021, p = .711.
Sensitivity Analyses
Articles Published in 2007 or Later
In exploratory analyses, we included only studies published in 2007 or later to focus on studies using mobile devices for data collection. We chose 2007 as cutoff because handheld computers and mobile phones with larger screens for data entry became more widely available during this time. As a result, the number of EMA studies increased sharply from fewer than 10 articles per year before 2007 to 19 articles in 2007 and steadily increasing thereafter: k = 15 articles per year from 2007 to 2010, k = 45 articles per year from 2011 to 2015, and k = 52 articles per year thereafter. The compliance and drop-out rates of did not differ from the full data set: compliance rate M = 78.75% (vs. 79.19% in all studies) and dropout M = 10.70% (vs. 10.58% all studies). Furthermore, the results based on the restricted data set of newer studies (k = 439) were nearly identical, yet statistical tests differed for two results: The association between the proportion of women in the study and compliance (H7) was now just statistically significant (r = .11, p = .049). In contrast, the simple slope effect of lower compliance with more assessments in samples who received contingent incentives was no longer statistically significant, b = −0.089, 95% CI [−0.177, 0.0004]. Zero-order correlations are presented in Supplementary Table S4.
Simultaneous Analyses of Number of Assessments, Assessment Days, and Moderators
Furthermore, we included the number of assessments per day, the number of assessment days, their interaction as well as the moderator variables into one regression model predicting compliance or dropout rates (Table S5). These additional analyses allowed to test whether previous results were sensitive to the chosen analytic approach (i.e., examining effects of the number of assessments per day and assessment days combined into one variable) or whether moderating effects occurred that were specific for the number of assessments per day or assessment days. When interpreting the results of the additional analyses, the reduced number of studies has to be taken into account because considerably fewer studies provided information on all moderator variables included in the model (for compliance k = 189 of 347; for dropout k = 70 of 140). In general, the additional analyses confirmed results from analyses reported for the specific hypotheses and we only summarize the results. Regression coefficients and model fit are reported in Supplementary Table S5.
Regarding compliance rates, the additional analyses confirmed that compliance was higher when participants received monetary incentives. The analyses also consistently showed that the association of lower compliance with more assessment days was stronger with a greater proportion of men in the sample (Supplementary Table S5 and Model S2). In addition, two significant three-way interactions occurred between the interaction of the number of assessments per day and assessment days, and momentary incentives or sample health, respectively (Supplementary Table S5, Model S2). When participants received monetary incentives or were healthy, the association between lower compliance with both higher assessments per day and assessment days (interaction effect) was buffered (i.e., less negative).
Regarding dropout rates, the overall models were statistically not significant (Supplementary Table S5, Model S3, and Model S4). Taking also into account that the results stem from only 69 studies, which represented only 50% of all studies reporting dropout rates, the interpretation of the coefficients did not seem acceptable (Supplementary Table S5 and Model S4).
Discussion
The central aims of the meta-analysis were to examine how differences in assessment schedules of EMA studies relate to average compliance and dropout of participants and how such effects might differ with incentives and across samples varying, for example, in age or health status. Overall, the on average high compliance (79%) and the low average dropout (10% in studies, which reported dropout) varied somewhat between EMA studies, but this variation was largely unrelated to study differences in assessment schedules or sample characteristics. The meta-analytic results showed higher compliance in studies offering monetary incentives compared with other or no incentives. We next discuss the relative robustness of compliance and dropout as well as further factors that can facilitate or undermine compliance.
Assessment Schedule, Incentives, Sample Characteristics, and Compliance
Although previous studies showed that repeated momentary assessments become burdensome and compliance declines after the first days (Rintala et al., 2019; Silvia et al., 2013), the current results did not support this effect on the level of studies. Different to what was hypothesized, yet consistent with some previous meta-analyses from different fields (Hufford & Shields, 2002; Jones et al., 2019), the total number of assessments, the number of assessment days, or the number of assessments per day did not predict participants’ compliance with the assessment schedule. Two explanations seem especially likely. For one thing, participants were aware of the schedule when agreeing to participate. Thus, EMA studies with more assessment days or assessments might partly have selected participants willing to adhere to more demanding schedules. These participants then provided information on 80% of scheduled assessments, on average. This still allows the possibility that within each study, participants were more compliant in the first days and less compliant as the study advanced as shown in prior work (Rintala et al., 2019; Silvia et al., 2013).
In addition, the number of days and assessments per day were negatively related across studies and accordingly studies seemed to be generally designed to minimize the burden for participants. Thus, more assessments per day or more assessment days than the typical numbers (i.e., six assessments per day for 7 days) seem feasible and do not necessarily lead to low compliance, if they are balanced with fewer days or fewer assessments per day, respectively. For example, highly frequent assessments were conducted only for a couple of days (Kuppens et al., 2010), whereas long studies requested only one or two assessments per day (Epstein & Preston, 2012). In fact, no study included in this meta-analysis combined a very intense daily assessment schedule (e.g., more than 10 assessments per day) with a long study period (e.g., more than 14 days of assessment).
Similar to the number of days and assessments per day being balanced, the amount of financial incentives was also well-balanced with the study demands and was higher when more assessments were scheduled in total. As a consequence, incentives did neither predict compliance or dropout nor moderate the association between the number of assessments and compliance or dropout. However, studies that provided direct monetary incentives achieved significantly higher compliance (almost one-half SD) compared with studies without incentives. This finding is in line with previous meta-analytic evidence from clinical research and computer science (Vachon et al., 2019; van Berkel et al., 2019). Other reviews did not observe the effects of incentives, but most included studies hardly varied in providing incentives (Morren et al., 2009; Wen et al., 2017).
We assumed that break days between assessment days would reduce participants’ burden and thus increase compliance. Similarly, we expected that participation-contingent reinforcement and additional participant care (e.g., phone calls and social events) would increase study motivation and thus compliance. The current meta-analysis observed none of these effects. First, break days, contingent reinforcement, and participant care were scarcely implemented, yet contingent reinforcement and participant care were more often reported in more demanding designs (studies with more assessment days and/or more total assessments). Still, we observed no evidence for interaction effects showing that the use of such study elements buffers against high study demands (i.e., high numbers of assessments). Unexpectedly, contingent reinforcement had a negative effect on compliance in more demanding studies, that is, in studies with about 81 or more assessments overall, compliance rates were higher without participation-contingent reinforcement compared with reinforcement. Again, if people sign up to studies with demanding assessment schedules, these participants might be positively selected and intrinsically motivated to participate reliably without further reinforcement. Contingent extrinsic rewards might undermine intrinsic motivation, and depending on the minimum criteria, contingent rewards might even backfire: For example, if participants receive compensation if they complete at least 70% of all assessments, they might be motivated to attain this goal but stop completing EMA prompts once they have achieved the minimum number. As a consequence, overall compliance might just pass 70% and hence remains lower compared with the general compliance of about 80% observed in this meta-analysis.
The meta-analysis examined sample characteristics as the second area of factors that might affect study compliance and dropout. The included studies covered a broad spectrum of age periods, gender proportion, and health statuses, but none of these sample characteristics could be consistently related to study compliance or dropout. This pattern of findings is partly consistent with recent meta-analyses on EMA studies in adolescence or clinical areas (e.g., Jones et al., 2019; van Roekel et al., 2019; Wen et al., 2017), while some studies observed higher compliance among older participants (Morren et al., 2009; Ono et al., 2019; Rintala et al., 2019) or higher compliance among healthy participants compared with participants having specific clinical disorders (e.g., Jones et al., 2019; Vachon et al., 2019).
Also, hypothesized combined effects of topic and sample, such as higher compliance and less dropout among ill samples when participating in health-related EMA studies, received no support. Again, this might be attributable to selection effects: Those participants, who agree to EMA studies, complete them rather reliably—irrespective of the study topic. Gender was a noteworthy exception: Although samples with higher proportions of females did not show substantially higher compliance or lower dropout on average, gender differences became apparent in demanding studies: With about 60 or more EMA assessments in total, average study compliance was higher with a higher proportion of females. This gender effect approximately doubled in studies with about 100 assessments. Thus, with a higher proportion of females in samples, the average compliance is higher, especially in studies with many assessments. This differentiated effect of gender differences might explain why only some previous meta-analyses observed lower compliance among men compared with women (see Table S1; Rintala et al., 2019; Vachon et al., 2019; van Roekel et al., 2019). Reviews that did not observe gender differences in compliance often included studies with fewer assessments (e.g., Jones et al., 2019).
Limitations
The current meta-analysis examined compliance and dropout rates in EMA studies and considered a broad spectrum of study designs, topics, and sample characteristics in about 500 studies. Unfortunately, some characteristics could not be examined (e.g., socioeconomic status of participants) because the original studies provided too little information (see also Trull & Ebner-Priemer, 2020). Further studies and meta-analyses on specific populations additionally examined effects of time-of-day and questionnaire length: The studies observed higher compliance in the evenings (Courvoisier et al., 2012; Silvia et al., 2013; van Berkel et al., 2019), while the number of questions in EMA studies hardly predicted differences in compliance (Hasselhorn et al., 2021; Jones et al., 2019; van Roekel, 2019, but see Eisele et al., 2020; Morren et al., 2009)—likely because most EMA studies already apply short questionnaires.
Regarding study dropout, only less than one-third of studies reported dropout rates and reasons for dropout. Similarly, measurement reactivity, that is, the change in responding to EMA questions due to the assessment, could not be analyzed because reactivity was too scarcely reported in the original studies (i.e., in 3% of studies). Most of the studies, which tested measurement reactivity, observed none (e.g., when studying pain; Stone et al., 2003; cognitive lapses, Lange & Süß, 2014; social comparisons, McKee et al., 2013). Other studies observed reactivity effects in emotional experiences (McCarthy et al., 2015; Shrout et al., 2018) or momentary cognitive performance due to practice effects (e.g., Riediger et al., 2011). Thus, conditions when reactivity occurs and how strongly are topics for further investigations.
The meta-analysis focused on compliance and dropout rates as indicators of data quality. We included studies with paper-and-pencil-based or electronic assessments to cover the broad range of EMA studies. Still, paper-and pencil-based EMA might provide somewhat less precise information on compliance. The sensitivity analyses focusing on studies published in 2007 and later, thus likely using electronic data collections, did not reveal different conclusions about the associations between design, sample, and study compliance.
Relatedly, in event-contingent EMA studies, which represented about 20% of the reviewed EMA studies, the total number of events and thus of event-based assessments is unknown beforehand, and thus compliance with the study protocol is only indirectly measurable, for example, through continuous behavioral observation (e.g., EAR, Mehl, 2017; mobile sensing, Harari et al., 2016). A comparison of event-contingent and signal-contingent assessment of social interactions yielded similar data quality and covariation with affective states and personality traits, yet a higher number of social interactions was captured with event-contingent assessment for 1 week (Himmelstein et al., 2019). At the moment, it remains open if compliance is equally robust to variation in study length or sample composition in event-contingent designs.
In addition to compliance rates (i.e., percentage of completed assessments relative to all scheduled assessments) and dropout (i.e., quitting study participation early), other types of noncompliance have been observed in EMA and other questionnaire studies. For example, responses completed some minutes after the scheduled assessment time may be counted as compliant responses in most cases. For research questions on very short-lived experiences or behaviors, delayed answers could be counted as noncompliant responding. Future research might take a more fine-grained perspective and distinguish delayed from immediate responses. Also, response patterns, such as participants repeatedly selecting only the first answering option, also indicate careless responding and lead to invalid data (Eisele et al., 2020; Meade & Craig, 2012). In our experience, if the EMA questionnaire is short and varied enough (e.g., diverse question and answer formats), response patterns are scarce and can be detected easily. The length of individual EMA assessments should be considered and ideally pilot-tested carefully because an excessive length can have implications for participant burden, compliance, and dropout (e.g., Eisele et al., 2020; Hasselhorn et al., 2021; Morren et al., 2009).
The current meta-analysis aimed to include published EMA studies as broadly as possible. Strictly speaking, we cannot draw inferences on daily diary studies: We excluded such studies because the questionnaires are completed only once a day, generally in the evening at home, and thus capture environmental aspects and behavioral dynamics less or only retrospectively (Trull & Ebner-Priemer, 2020). From the more than 1,400 retrieved studies, we coded a random subsample of about 500 studies, which is more comprehensive than previous meta-analyses on EMA in adolescent or clinical samples (e.g., Jones et al., 2019; May et al., 2018; Wen et al., 2017; Table S1). Although the included studies were not susceptible to publication bias based on the statistical significance of results concerning the original research question, still some publication biases might have occurred as larger samples reported slightly lower compliance and higher dropout (see Supplementary Table S3). This might be attributable to additional efforts to maintain participants’ motivation, and thus compliance and adherence, in smaller studies. Overall, we observed a high average compliance rate, which is comparable to previous meta-analyses from other fields on EMA studies (e.g., May et al., 2018; Vachon et al., 2019). Also, more than half of the included EMA studies reported a compliance rate of more than 80%, and only 2.6% of included EMA studies reported compliance rates below 50% of scheduled assessments. Accordingly, EMA studies with low compliance rates or other methodological weaknesses (e.g., too few participants, too few assessments) might be less likely to be published, and thus included in the current analyses, because of these flaws.
Finally, the current meta-analysis cannot address the effects of sample selectivity. We discussed for several findings that participants know the demands of the study when agreeing to participate and thus the participants might be specific in some characteristics compared with the general population (e.g., conscientiousness and open-mindedness for academic research). Previous studies on EMA selectivity are scarce and while women and young adults are more likely to participate, especially with no or little incentive (Ludwigs et al., 2020), participants do not seem to differ in personality characteristics or cognitive abilities compared with nonparticipants or the general population (Ludwigs et al., 2020; Riediger et al., 2011).
Future Directions of EMA
Over the last decades, psychology and related fields have observed a steep increase in the number of EMA studies. The large majority of this complex study paradigm focuses on obtaining correlational self-report data from individuals. We would like to encourage researchers to further extend EMA studies along the following lines and example studies.
First, in addition to self-reported thoughts, feelings, behavior, and situational characteristics, researchers can employ physiological and environmental sensors (e.g., cardiac activity, Houtveen & de Geus, 2009; Wrzus et al., 2013; physical activity, Shcherbina et al., 2017; location/GPS, Doherty et al., 2014, surrounding light, noise, or temperature, van Laerhoven, 2022). In the currently examined EMA studies, only 12% of studies included additional sensors and physiological data, yet the number of studies might increase as data collection and synchronization of sensor and self-report data become increasingly easy. Importantly, multi-method data will reduce the ambiguity inherent in analyzing self-reports. In addition, non-self-report data might reduce participant burden and thus increase compliance, if some information can be obtained unobtrusively and without effort from participants. Furthermore, experimental variations become ever more common in EMA (e.g., Beal, 2015; Schmiedek & Neubauer, 2020) and offer insights into causal mechanisms while maintaining the real-life basis. Still, further research is necessary on how additional methods and experimental manipulation impact study compliance and dropout.
Second, dyadic EMA studies add another layer of complexity because of the necessity of both partners investing time and reporting at the same time and concerning the same phenomena (e.g., the same conflict or interaction; Pauly et al., 2020; Rauers et al., 2013). For dyadic analyses, dyadic compliance is necessary, that is, having concurrent information from both partners. The current results indicate that each partner has to respond to at least 90% of assessments to obtain dyadic compliance higher than 80%, assuming that noncompliance is independent for the dyad members (i.e., .9 × .9 = .81). Yet, technical solutions help to solve such challenges, for example, through GPS- or blue tooth-triggered assessments, when partners (i.e., their smartphones) are close to each other (Timmons et al., 2017). A similar technological innovation, presenting questions on smartwatches, reduces the perceived burden of participants and might increase compliance (Intille et al., 2016).
Conclusion and Recommendations
Researchers interested in studying human feelings, thoughts, and behaviors in daily life using EMAs have little specific guidance when planning the design for a specific topic and population. Findings based on almost 500 reviewed EMA studies showed that on average fewer assessments per day were scheduled with longer assessment duration (i.e., higher number of assessment days). In the majority of the studies, the average compliance was 80% or higher, with only a handful of studies reporting compliance below 50%. Within the observed range of realized EMA designs, the number of assessments (per day or in total) and also sample characteristics were largely unrelated to differences in compliance. Thus, when piloting EMA designs, it seems that researchers should aim for a combination of assessment days and assessments per day that attains compliance of around at least 80% in the targeted sample.
Financial incentives may be helpful and results indicate higher compliance in studies with monetary incentives. The effect of financial incentives seems to be invariant across different levels of study intensity—potentially because the number of assessments was strongly associated with the number of incentives. Surprisingly, contingent incentives might backfire in very intense sampling designs. If contingent incentives are used, it would be advisable to set the criterion high enough to avoid satisficing (i.e., participants stopping at a low criterion). It needs to be acknowledged although that inferring such recommendations for individual studies still entails uncertainty because the current findings refer to the level of a population of studies (e.g., see the Simpson paradox, Kievit et al., 2013; Simpson, 1951). That is, the present findings cannot refute the possibility that adding more assessments to a given study might decrease compliance with the study protocol.
Still, we are confident to summarize that, in general, EMA studies constitute a viable research paradigm for many areas of psychology and beyond. With careful planning of study designs and incentives tailored to the research area and target sample, high quality of data can usually be obtained. Future EMA studies should follow APA reporting standards (Appelbaum et al., 2018, see also Trull & Ebner-Priemer, 2020) and also report study designs comprehensively including incentives, sample characteristics, and indices of data quality (i.e., compliance, reactivity, sample selectivity, and selective dropout) to allow further investigation of design and sample effects in studies employing repeated momentary assessments.
Supplemental Material
Supplemental material, sj-DOCX-1-asm-10.1177_10731911211067538 for Ecological Momentary Assessment: A Meta-Analysis on Designs, Samples, and Compliance Across Research Fields by Cornelia Wrzus and Andreas B. Neubauer in Assessment
EMA is a method of ambulatory assessment (AA), which “comprises the use of field methods to assess the ongoing behavior, physiology, experience and environmental aspects in naturalistic or unconstrained settings” (SAA, 2018). While AA can also include the measurement of ambient environmental parameters, for example, street noise at homes, EMA typically focuses on repeated self-reports, partly combined with physiological or activity assessments. Daily diaries can be counted as AA but differ from EMA because they are completed only once per day, typically in the evening and at home. Unlike EMA, daily diaries typically assess retrospective evaluations of the previous day, which are related to, but still distinct from momentary assessments collected in EMA (e.g., Neubauer et al., 2020).
We had initially preregistered that all retrieved studies will be coded because we expected 100–200 suitable studies based on previous meta-analyses. Due to the unexpectedly large number of eligible studies and the resulting coding costs (training coders takes about 40 hr, coding 200 studies takes about 50 workdays), we decided to code a random subsample of about one-third. Articles were randomly chosen by selecting every third article in a folder sorted by the first author.
Three research assistants coded a total of 290 and 12 students additionally coded up to 20 studies each.
The average compliance exceeded 100% in some studies because participants continued to answer EMA questions after the scheduled number of days.
Footnotes
Authors’ Note: We are grateful to Janina Bühler and Florian Schmiedek for valuable feedback on this manuscript as well as to the students supporting the coding of the studies.
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.
ORCID iDs: Cornelia Wrzus
https://orcid.org/0000-0002-6290-959X
Andreas B. Neubauer
https://orcid.org/0000-0003-0515-1126
Supplemental Material: Supplemental material for this article is available online.
References
- Appelbaum M., Cooper H., Kline R. B., Mayo-Wilson E., Nezu A. M., Rao S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3–25. 10.1037/amp0000191 [DOI] [PubMed] [Google Scholar]
- Ashurst J., Van Woerden I., Dunton G., Todd M., Ohri-Vachaspati P., Swan P., Bruening M. (2018). The association among emotions and food choices in first-year college students using mobile-ecological momentary assessments. BMC Public Health, 18(1), 573–581. 10.1186/s12889-018-5447-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beal D. J. (2015). ESM 2.0: State of the art and future potential of experience sampling methods in organizational research. Annual Review Organizational Psychology and Organizational Behavior, 2(1), 383–407. [Google Scholar]
- Bolger N., Laurenceau J. P. (2013). Intensive longitudinal methods. Guilford Press. [Google Scholar]
- Carpenter R. W., Wycoff A. M., Trull T. J. (2016). Ambulatory assessment: New adventures in characterizing dynamic processes. Assessment, 23(4), 414–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins L. M., Graham J. W. (2002). The effect of the timing and spacing of observations in longitudinal studies of tobacco and other drug use: Temporal design considerations. Drug and Alcohol Dependence, 68, 85–96. 10.1016/S0376-8716(02)00217-X [DOI] [PubMed] [Google Scholar]
- Courvoisier D. S., Eid M., Lischetzke T. (2012). Compliance to a cell phone-based ecological momentary assessment study: The effect of time and personality characteristics. Psychological Assessment, 24, 713–720. [DOI] [PubMed] [Google Scholar]
- Doherty S. T., Lemieux C. J., Canally C. (2014). Tracking human activity and well-being in natural environments using wearable sensors and experience sampling. Social Science Medicine, 106, 83–92. [DOI] [PubMed] [Google Scholar]
- Eisele G., Vachon H., Lafit G., Kuppens P., Houben M., Myin-Germeys I., Viechtbauer W. (2020). The effects of sampling frequency and questionnaire length on perceived burden, compliance, and careless responding in Experience Sampling data in a student population. Assessment. 10.1177/1073191120957102 [DOI] [PubMed]
- Epstein D. H., Preston K. L. (2012). TGI Monday? Drug-dependent outpatients report lower stress and more happiness at work than elsewhere. The American Journal on Addictions, 21(3), 189–198. 10.1111/j.1521-0391.2012.00230.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo B., Chen H., Yu Z., Nan W., Xie X., Zhang D., Zhou X. (2017). TaskMe: Toward a dynamic and quality-enhanced incentive mechanism for mobile crowd sensing. International Journal of Human-computer Studies, 102, 14–26. 10.1016/j.ijhcs.2016.09.002 [DOI] [Google Scholar]
- Hamaker E. (2012). Why researchers should think “within-person”: A paradigmatic rationale. In Mehl M. R., Conner T. (Eds.), Handbook of research methods for studying daily life (pp. 43–61). Guilford Press. [Google Scholar]
- Harari G. M., Lane N. D., Wang R., Crosier B. S., Campbell A. T., Gosling S. D. (2016). Using smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges. Perspectives on Psychological Science, 11(6), 838–854. 10.1177/1745691616650285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hasselhorn K., Ottenstein C., Lischetzke T. (2021). The effects of assessment intensity on participant burden, compliance, within-person variance, and within-person relationships in ambulatory assessment. Behavior Research Methods. 10.3758/s13428-021-01683-6 [DOI] [PMC free article] [PubMed]
- Heron K. E., Everhardt R. S., McHale S. M., Smyth J. M. (2017). Using mobile-technology-based Ecological Momentary Assessment (EMA) methods with youth: A systematic review and recommendations. Journal of Pediatric Psychology, 42, 1087–1107. 10.1093/jpepsy/jsx078 [DOI] [PubMed] [Google Scholar]
- Himmelstein P. H., Woods W. C., Wright A. G. C. (2019). A comparison of signal- and event-contingent ambulatory assessment of interpersonal behavior and affect in social situations. Psychological Assessment, 31(7), 952–960. 10.1037/pas0000718 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoppmann C. A., Riediger M. (2009). Ambulatory assessment in lifespan psychology: An overview of current status and new trends. European Psychologist, 14, 98–108. [Google Scholar]
- Houtveen J. H., de Geus E. J. (2009). Noninvasive psychophysiological ambulatory recordings: Study design and data analysis strategies. European Psychologist, 14(2), 132–141. [Google Scholar]
- Hufford M. R., Shields A. L. (2002). Electronic diaries. Applied Clinical Trials, 11(4), 46–56. [Google Scholar]
- Hunter J. E., Schmidt F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). SAGE. [Google Scholar]
- Intille S., Haynes C., Maniar D., Ponnada A., Manjourides J. (2016). μEMA: Microinteraction-based ecological momentary assessment (EMA) using a smartwatch [Paper presentation]. The Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany. 10.1145/2971648.2971717 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janssens K. A. M., Bos E. H., Rosmalen J. G. M., Wichers M. C., Riese H. (2018). A qualitative approach to guide choices for designing a diary study. BMC Medical Research Methodology, 18(1), Article 140. 10.1186/s12874-018-0579-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones A., Remmerswaal D., Verveer I., Robinson E., Franken I. H. A., Wen C. K. F., Field M. (2019). Compliance with ecological momentary assessment protocols in substance users: A meta-analysis. Addiction, 114(4), 609–619. 10.1111/add.14503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kievit R., Frankenhuis W. E., Waldorp L., Borsboom D. (2013). Simpson’s paradox in psychological science: A practical guide. Frontiers in Psychology, 4, Article 513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleiman E. M., Turner B. J., Fedor S., Beale E. E., Huffman J. C., Nock M. K. (2017). Examination of real-time fluctuations in suicidal ideation and its risk factors: Results from two ecological momentary assessment studies. Journal of Abnormal Psychology, 126(6), 726–738. 10.1037/abn0000273 [DOI] [PubMed] [Google Scholar]
- Kuppens P., Oravecz Z., Tuerlinckx F. (2010). Feelings change: Accounting for individual differences in the temporal dynamics of affect. Journal of Personality and Social Psychology, 99, 1042–1060. 10.1037/a0020962 [DOI] [PubMed] [Google Scholar]
- Lange S., Süß H.-M. (2014). Measuring slips and lapses when they occur–Ambulatory assessment in application to cognitive failures. Consciousness and Cognition, 24, 1–11. https://doi.org/dx.doi.org/10.1016/j.concog.2013.12.008 [DOI] [PubMed] [Google Scholar]
- Larsen R. J. (1987). The stability of mood variability: A spectral analytic approach to daily mood assessments. Journal of Personality and Social Psychology, 52(6), 1195–1204. [Google Scholar]
- Lucas R. E., Wallsworth C., Anusic I., Donnellan B. (2020). A direct comparison of the day reconstruction method and the experience sampling method. Journal of Personality & Social Psychology, 120, 816–835. 10.1037/pspp0000289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ludwigs K., Lucas R., Veenhoven R., Richter D., Arends L. (2020). Can happiness apps generate nationally representative datasets?—A case study collecting data on people’s happiness using the German Socio-economic Panel. Applied Research in Quality of Life, 15, 1135–1149. 10.1007/s11482-019-09723-2 [DOI] [Google Scholar]
- May M., Junghaenel D. U., Ono M., Stone A. A., Schneider S. (2018). Ecological Momentary Assessment methodology in chronic pain research: A systematic review. The Journal of Pain, 19(7), 699–716. 10.1016/j.jpain.2018.01.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCarthy D. E., Minami H., Yeh V. M., Bold K. W. (2015). An experimental investigation of reactivity to ecological momentary assessment frequency among adults trying to quit smoking. Addiction, 110(10), 1549–1560. 10.1111/add.12996 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKee S., Smith H. J., Koch A., Balzarini R., Georges M., Callahan M. P. (2013). Looking up and seeing green: Women’s everyday experiences with physical appearance comparisons. Psychology of Women Quarterly, 37(3), 351–365. 10.1177/0361684312469792 [DOI] [Google Scholar]
- McShane B. B., Böckenholt U., Hansen K. T. (2016). Adjusting for publication bias in meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on Psychological Science, 11(5), 730–749. 10.1177/1745691616662243 [DOI] [PubMed] [Google Scholar]
- Meade A. W., Craig S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. 10.1037/a0028085 [DOI] [PubMed] [Google Scholar]
- Mehl M. R. (2017). The Electronically Activated Recorder (EAR): A method for the naturalistic observation of daily social behavior. Current Directions in Psychological Science, 26, 184–190. 10.1177/0963721416680611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mehl M. R., Conner T. (Eds.). (2012). Handbook of research methods for studying daily life. Guilford Press. [Google Scholar]
- Molenaar P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement: Interdisciplinary Research and Perspectives, 2, 201–218. [Google Scholar]
- Molenaar P. C. M., Sinclair K. O., Rovine M. J., Ram N., Corneal S. E. (2009). Analyzing developmental processes on an individual level using nonstationary time series modeling. Developmental Psychology, 45, 260–271. [DOI] [PubMed] [Google Scholar]
- Morren M., van Dulmen S., Ouwerkerk J., Bensing J. (2009). Compliance with momentary pain measurement using electronic diaries: A systematic review. European Journal of Pain, 13, 354–365. 10.1016/j.ejpain.2008.05.010 [DOI] [PubMed] [Google Scholar]
- Neubauer A. B., Lerche V., Voss A. (2018). Interindividual differences in the intraindividual association of competence and well-being: Combining experimental and intensive longitudinal designs. Journal of Personality, 86, 698–713. 10.1111/jopy.12351 [DOI] [PubMed] [Google Scholar]
- Neubauer A. B., Scott S. B., Sliwinski M. J., Smyth J. M. (2020). How was your day? Convergence of aggregated momentary and retrospective end-of-day affect ratings across the adult life span. Journal of Personality and Social Psychology, 119, 185–203. 10.1037/pspp0000248 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nezlek J. B. (1993). The stability of social interaction. Journal of Personality and Social Psychology, 65, 930–941. [Google Scholar]
- Ono M., Schneider S., Junghaenel D. U., Stone A. A. (2019). What affects the completion of Ecological Momentary Assessments in chronic pain research? An individual patient data meta-analysis. Journal of Medical Internet Research, 21(2), Article e11398. 10.2196/11398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ottenstein C., Werner L. (2021). Compliance in ambulatory assessment studies: Investigating study and sample characteristics as predictors. Assessment. 10.1177/10731911211032718 [DOI] [PMC free article] [PubMed]
- Pauly T., Keller J., Knoll N., Michalowski V. I., Hohl D. H., Ashe M. C., Gerstorf D., Madden K. M., Hoppmann C. A. (2020). Moving in sync: Hourly physical activity and sedentary behavior are synchronized in couples. Annals of Behavioral Medicine, 54(1), 10–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rauers A., Blanke E., Riediger M. (2013). Everyday empathic accuracy in younger and older couples: Do you need to see your partner to know his or her feelings? Psychological Science, 24(11), 2210–2217. [DOI] [PubMed] [Google Scholar]
- Reis H. T. (2012). Why researchers should think “real-world”: A conceptual rationale. In Mehl M. R., Conner T. (Eds.), Handbook of research methods for studying daily life (pp. 3–21). Guilford Press. [Google Scholar]
- Riediger M., Schmiedek F., Wagner G. G., Lindenberger U. (2009). Seeking pleasure and seeking pain: Differences in pro- and contra-hedonic motivation from adolescence to old age. Psychological Science, 20, 1529–1535. 10.1111/j.1467-9280.2009.02473.x [DOI] [PubMed] [Google Scholar]
- Riediger M., Wrzus C., Schmiedek F., Wagner G. G., Lindenberger U. (2011). Is seeking bad mood cognitively demanding? Contra-hedonic orientation and working-memory capacity in everyday life. Emotion, 11, 656–665. 10.1037/a0022756 [DOI] [PubMed] [Google Scholar]
- Rintala A., Wampers M., Myin-Germeys I., Viechtbauer W. (2019). Response compliance and predictors thereof in studies using the experience sampling method. Psychological Assessment, 31(2), 226–235. 10.1037/pas0000662 [DOI] [PubMed] [Google Scholar]
- Robinson M. D., Clore G. L. (2002). Belief and feeling: Evidence for an accessibility model of emotional self-report. Psychological Bulletin, 128(6), 934–960. [DOI] [PubMed] [Google Scholar]
- Society for Ambulatory Assessment (2018). Society for Ambulatory Assessment. Bylaws. https://ambulatory-assessment.org/ [Google Scholar]
- Schmiedek F., Neubauer A. B. (2020). Experiments in the wild: Introducing the within-person encouragement design. Multivariate Behavioral Research, 55(2), 256–276. 10.1080/00273171.2019.1627660 [DOI] [PubMed] [Google Scholar]
- Schmitt D. P., Realo A., Voracek M., Jüri A. (2008). Why can’t a man be more like a woman? Sex differences in Big Five personality traits across 55 cultures. Journal of Personality and Social Psychology, 94, 168–182. [DOI] [PubMed] [Google Scholar]
- Schwarz N. (2012). Why researchers should think “real-time”: A cognitive rationale. In Mehl M. R., Conner T. (Eds.), Handbook of research methods for studying daily life (pp. 22–42). Guilford Press. [Google Scholar]
- Shamseer L., Moher D., Clarke M., Ghersi D., Liberati A., Petticrew M., . . . Stewart L. A. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: Elaboration and explanation. BMJ, 349. 10.1136/bmj.g7647 [DOI] [PubMed] [Google Scholar]
- Shcherbina A., Mattsson C. M., Waggott D., Salisbury H., Christle J. W., Hastie T., Wheeler M. T., Ashley E. A. (2017). Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. Journal of Personalized Medicine, 7(2), 3–15. 10.3390/jpm7020003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shrout P. E., Stadler G., Lane S. P., McClure M. J., Jackson G. L., Clavél F. D., . . . Bolger N. (2018). Initial elevation bias in subjective reports. Proceedings of the National Academy of Sciences, 115(1), E15–E23. www.pnas.org/cgi/doi/10.1073/pnas.1712277115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silvia P. J., Kwapil T. R., Eddington K. M., Brown L. H. (2013). Missed beeps and missing data: Dispositional and situational predictors of nonresponse in experience sampling research. Social Science Computer Review, 31(4), 471–481. 10.1177/0894439313479902 [DOI] [Google Scholar]
- Simpson E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), 13(2), 238–241. [Google Scholar]
- Soyster P. D., Bosley H. G., Reeves J. W., Altman A. D., Fisher A. J. (2019). Evidence for the feasibility of person-specific Ecological Momentary Assessment across diverse populations and study designs. Journal for Person-oriented Research, 5, 53–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stone A. A., Broderick J. E., Schwartz J. E., Shiffman S., Litcher-Kelly L., Calvanese P. (2003). Intensive momentary reporting of pain with an electronic diary: Reactivity, compliance, and patient satisfaction. Pain, 104(1–2), 343–351. 10.1016/s0304-3959(03)00040-x [DOI] [PubMed] [Google Scholar]
- Timmons A. C., Baucom B. R., Han S. C., Perrone L., Chaspari T., Narayanan S. S., Margolin G. (2017). New frontiers in ambulatory assessment: Big data methods for capturing couples’ emotions, vocalizations, and physiology in daily life. Social Psychological and Personality Science, 8(5), 552–563. 10.1177/1948550617709115 [DOI] [Google Scholar]
- Trull T. J., Ebner-Priemer U. W. (2020). Ambulatory Assessment in Psychopathology Research: A Review of Recommended Reporting Guidelines and Current Practices. Journal of Abnormal Psychology, 129, 56–63. [DOI] [PubMed] [Google Scholar]
- Vachon H., Viechtbauer W., Rintala A., Myin-Germeys I. (2019). Compliance and retention with the Experience Sampling Method over the continuum of severe mental disorders: A systematic review and meta-analysis. Journal of Medical Internet Research, 21(12), Article e14475. 10.2196/14475 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Berkel N., Ferreira D., Kostakos V. (2017). The experience sampling method on mobile devices. ACM Computing Surveys, 50(6), 1–40. 10.1145/3123988 [DOI] [Google Scholar]
- van Berkel N., Goncalves J., Lovén L., Ferreira D., Hosio S., Kostakos V. (2019). Effect of experience sampling schedules on response rate and recall accuracy of objective self-reports. International Journal of Human-computer Studies, 125, 118–128. 10.1016/j.ijhcs.2018.12.002 [DOI] [Google Scholar]
- van Laerhoven K. (2022). Beyond the smartphone I: The future of wearables as mobile sensors. In Mehl M. R., Eid M., Wrzus C., Harari G., Ebner-Priemer U. W. (Eds.), Mobile Sensing in Psychology: Methods and Applications. New York: Guilford. [Google Scholar]
- van Roekel E., Keijsers L., Chung J. M. (2019). A review of current ambulatory assessment studies in adolescent samples and practical recommendations. Journal of Research on Adolescence, 29(3), 560–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wen C. K. F., Schneider S., Stone A. A., Spruijt-Metz D. (2017). Compliance with mobile Ecological Momentary Assessment protocols in children and adolescents: A systematic review and meta-analysis. Journal of Medical Internet Research, 19(4), Article e132. 10.2196/jmir.6641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wrzus C., Mehl M. R. (2015). Lab and/or field? Measuring personality processes and their social consequences. European Journal of Personality, 29, 250–271. 10.1002/per.1986 [DOI] [Google Scholar]
- Wrzus C., Müller V., Wagner G. G., Lindenberger U., Riediger M. (2013). Affective and cardiovascular responding to unpleasant events from adolescence to old age: Complexity of events matters. Developmental Psychology, 49, 384–397. 10.1037/a0028325 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-DOCX-1-asm-10.1177_10731911211067538 for Ecological Momentary Assessment: A Meta-Analysis on Designs, Samples, and Compliance Across Research Fields by Cornelia Wrzus and Andreas B. Neubauer in Assessment






