Abstract
Ecological Momentary Assessment (EMA) is a timely method for capturing differences between hearing aids (HAs) or HA features in real-world environments. Studies vary greatly in reporting periods, specifically how long ago an event can have occurred to still be reported and how events are selected when not summarizing over a period of time. The potential effects of different reporting periods on HA or HA feature contrast remain unexplored. In a 14-day EMA study, 22 hearing-aid users assessed both a basic and an advanced HA program, which were switched daily without participant control. Several times daily, participants used a smartphone app to report on satisfaction with the HA program, overall listening experience, sound quality, and listening effort. The app had participants focus on the current situation, denoting a momentary reporting period, and the worst listening experience within the previous 30 min, denoting a short-term retrospective period. Participants also completed an end-of-day questionnaire. Sound-pressure levels and HA classifier data were recorded continuously. Mixed modeling was used to examine the impact of reporting periods on ratings. Main findings showed no rating differences between the two HA programs in momentary or end-of-day assessments. However, differences emerged in short-term retrospective reporting, thus in the assessments of the worst experience within the preceding 30 min. Various time effects were also observed. Depending on the reporting period, the analysis of sound-pressure levels and HA classifier data revealed variations in real-world snapshots. In conclusion, this study underscores the need to diligently define reporting periods in hearing-related EMA research.
Keywords: ecological momentary assessment, coverage EMA, hearing-aid comparison, EMA methods, real-world environments
Introduction
Ecological Momentary Assessment (EMA) methods, also known as Experience Sampling Method, or Ambulatory Assessment, are increasingly used in hearing research (Holube et al., 2020; Schinkel-Bielefeld et al., 2024). EMA, which is often conducted using smartphones, involves participants repeatedly reporting on their perceptions and emotions in situ, often accompanied by brief descriptions of their auditory situation. As a result, EMA is a context-sensitive approach that is well suited to explore auditory reality for diverse groups and individuals, and to investigate whether specific hearing aids (HAs) or specific HA features benefit the user in real-life situations.
The EMA design requires particularly careful consideration of HA features such as noise reduction and directional microphone technology, which are only active in specific, usually challenging situations in which they can demonstrate their full potential. When laboratory tests show significant differences between HA features or devices, but no differences are detected in EMA studies, two main explanations are possible. Firstly, the demanding situations or challenging acoustic conditions occur so rarely in real-life environments that the hearing-aid feature is of limited benefit in everyday life. Secondly, such situations exist, but, due to the specific design of the EMA study, are frequently overlooked. As a consequence, the methodology may lack the sensitivity to detect actual differences. If participants can only report their experiences concurrently or shortly after they occur, the timeframe for assessment, known as the EMA reporting period, is very narrow. This can increase the risk of missing relevant, but rare or short-lived, situations, and those that are incompatible with smartphone usage. The current study is a methodological investigation into this second possibility, specifically examining the influence of the reporting period in a HA comparison study.
In controlled laboratory tests, challenging conditions are intentionally created for speech audiometry in noise, whereas such conditions are relatively rare in real-world environments (e.g., Smeds et al., 2020; v. Gablenz et al., 2021; Wagener et al., 2008; Wu et al., 2018). Sound-pressure level (SPL) is often low in many everyday situations, and signal-to-noise ratios (SNRs) are mostly clearly positive (Jorgensen et al., 2022; Smeds et al., 2015; Wu et al., 2018). This is particularly true for the largest target group, i.e., older adults with hearing impairment requiring support, who often spend time in quiet environments (Humes et al., 2018; Wu & Bentler, 2012). However, it is not clear whether more challenging situations are not of interest in everyday life or are avoided because they are too demanding, despite the desire to communicate in these conditions. A hint towards the avoidance of challenging situations might be that the SNRs experienced in everyday life correlate with individual acceptable noise levels (Schinkel-Bielefeld et al., 2023).
Borschke et al. (2024) argued that people do not voluntarily remain for an extended period in situations in which they describe their hearing experience as being difficult. The authors further found that people often actively shape their acoustic situation by quickly changing their environment or modifying sound levels (e.g., getting closer to conversation partners, turning the volume of a media device softer or louder). Moreover, conducting an EMA assessment is not possible, or is considered inappropriate, in certain situations (Rintala et al., 2023; Schinkel-Bielefeld et al., 2020). In social situations, using a smartphone may conflict with politeness, while in other situations, such as when driving a car, safety considerations may limit use of a smartphone. Also, in very challenging situations, participants may concentrate on understanding, rather than using additional cognitive resources for responding to a survey. As a result, these situations may be underrepresented in EMA data, particularly when the reporting period is very short. While situations in which it is hard to report for social or safety reasons may be equally missing in all test conditions, situations that have been modified or avoided due to their challenging nature could be primarily missing in the HA that provides less benefit, thereby systematically diminishing the sensitivity of EMA. Due to the brevity of the situations before modification, they are less likely to be reported in prompt-based EMA. However, unless those modifications have been made subconsciously, they can be reported in user-initiated or short-term retrospective surveys.
In a conventional EMA design, to minimize recall biases, assessments are typically conducted as close to the event as possible, or even simultaneously. Nevertheless, the reporting periods in hearing research EMA studies vary widely. For example, Jensen et al. (2019), Smeds et al. (2019), and Glista et al. (2021, 2024) incorporated paired-comparisons into their EMA sampling designs. Participants were not only asked to take surveys, but to switch between two hearing programs in real-life situations, and to evaluate their performance in a comparative manner. This approach requires momentary assessments, but can be considered a more invasive EMA variant, as it not only interrupts the participants’ daily activities, but also requires them to actively engage in a test procedure. Henry et al. (2012) employed a strictly momentary approach, with each EMA question formulated in the present tense and explicitly including the phrase “right now.” Other studies, such as Bosman et al. (2021), Jenstad et al. (2021), and Wu et al. (2021), used short reporting periods of up to 10 min. Schinkel-Bielefeld et al. (2020) and Lelic et al. (2021) instructed participants to start a survey by opening the EMA app during the situation they wanted to assess, resulting in a relatively momentary reporting period, but allowed them to complete the survey within 30 min or within a 30 to 60-min time frame. Wu et al. (2020a, 2020b) employed different reporting periods for prompted and self-initiated surveys. Prompted surveys had to be completed within 5 min, whereas self-initiated surveys could be completed up to 60 min after the evaluated situation. In other studies, such as Timmer et al. (2018), von Gablenz et al. (2021), and Borschke et al. (2024), participants were instructed that momentary assessments were the preferred mode, but allowed for retrospective assessments. Participants used categorical scales to rate how long ago the event or perception they reported occurred. A more distinct approach was used by Galvez et al. (2012), who asked participants to reflect on the time elapsed since the previous assessment, resulting in a reporting period of 150–180 min.
In addition to the variety of reporting periods, there are mixed results from studies comparing HA models, features, or settings. Strictly momentary assessments using paired comparisons have proved to be a sensitive method, particularly when expecting only minor performance differences (Amlani & Schafer, 2009). Glista et al. (2021) and Smeds et al. (2019) successfully identified significant preferences for particular programs at the group level. Conversely, Jensen et al. (2019) employed a similar approach, yet found no significant preference between two HA programs.
Other HA comparison studies have added to these diverse outcomes. In a methodologically comprehensive study, Wu et al. (2019) investigated the effect of two HA models (basic vs. premium), and directional microphone technology and noise reduction (features on vs. features off) using a single-blinded cross-over EMA. Participants were prompted to complete surveys and encouraged to take a survey on their own initiative whenever they had a new listening experience. The reporting period was 5 min in both cases. The results showed an effect for the HA features on versus off, but no strong evidence for the superiority of premium HA and premium HA features. Statistical analyses were conducted separately for classes of noisiness defined by the participants’ rating. Drawing conclusions was difficult, due to the small number of surveys related to acoustically challenging situations, particularly those with noise. Bosman et al. (2021) researched high-frequency amplification in adults with hearing impairment. Participants controlled two HA programs, each randomly assigned to a different high-frequency amplification scheme. They were instructed to switch programs frequently, completing a survey with each change. Although no specific reporting period was defined, the design implied a strict momentary approach. Overall, EMA ratings showed no statistical difference between the two amplification schemes. However, after incorporating contextual information from the HA classifiers, differences emerged for situations classified as “Speech” or “Noisy Speech.” Andersson et al. (2021) examined the benefits of a noise management system using a single-blind EMA design. The study employed a combined sampling strategy, apparently expecting momentary in-situ reporting, although this was not explicitly stated in the article. The EMA ratings alone showed no statistical benefit from the noise management system. Differences in ratings for the noise management system turned on versus off were only statistically significant when HA classifier data were used, but, contrary to expectations, not for situations that fell into the noise sound class. The authors suggested that the low prevalence of noise environments—only 4% of EMA surveys were related to noise, and 8% were related to speech-in-noise settings—may explain the lack of observed differences in the noise sound class. Christensen et al. (2024b) used a single-blind study to compare a deep-neural-network-based noise reduction system (DNN-NR) with a conventional system (NR). Sampling involved pseudo-random prompts and self-initiated surveys. Participants noted whether the situation was ongoing, but, due to similar results regardless of exclusions, the analysis included all data. Acoustic data from the 5 min preceding the survey was used, suggesting an effectively momentary approach, although the maximum reporting delay was not specified in the article. No significant difference was found in the overall satisfaction ratings. However, when separate models were developed for DNN-NR and conventional NR, using satisfaction ratings as outcomes, and SNR and SPL as variables, a benefit for DNN-NR appeared. In the DNN-NR model, satisfaction ratings were independent of SNR, unlike in the NR model. The authors highlighted the importance of pairing subjective ratings with data logging for comparing HA features in real-world settings. An expanded analysis confirmed this overall conclusion (Christensen et al., 2024a).
The above summary of the current state of research broadly reveals that subjective ratings from EMA studies alone only provide a weak basis for researching HA performance differences. The number of challenging, difficult listening situations is typically low in EMA data, making the combination of subjective assessments and contextual information from data logging crucial for detecting a difference. However, such acoustical descriptors are not consistently available in all kind of study settings. Additionally, HA data processing often lacks transparency and varies depending on the brand and model. This applies not only to situational classifier data and SNR estimates, but also to more basic parameters such as SPL. For example, Christensen et al. (2024b) were confronted with different frequency weighting schemes when comparing SPL for different HA programs running on the same HA. Given these conditions, it is worth considering how the EMA method itself could be adapted to enhance sensitivity to the point where even smaller differences in HA performance can be detected without relying on contextual information from the HA.
Recognizing that the benefits of advanced signal processing algorithms are most likely to emerge in challenging, often noisy listening environments, it is crucial that the EMA sampling strategy effectively captures these situations. Extending the reporting period, thus allowing for short-term retrospective assessments, while focusing on relatively difficult listening experiences, could enhance the method's sensitivity. Instead of focusing solely on momentary points, as did the original EMA methods, Stone et al. (2023) refer to a coverage model of EMA, a concept introduced by Shiffman et al. (2008), which involves considering longer periods of time. With this in mind, the present EMA study was designed to explore how the EMA reporting period affects the contrast between two HA programs. Participants received HAs with two programs—basic and advanced—that automatically alternated each day. They were randomly prompted to take a survey, but could also initiate a survey at any time. In each survey, participants first described and rated the current listening situation. Participants were then asked if they noted any changes in the situation or their perception, and, if so, to rate the worst listening experience in the past 30 min. In addition to the daytime surveys, participants were requested to complete an end-of-day survey each evening, which covered several aspects addressed in the daytime assessments. To define the scope of our study, the primary and secondary research questions are as follows:
Does a short-term retrospective assessment provide a greater contrast between the two HA programs when compared to strictly momentary assessments? (Primary RQ)
Do different reporting types alter the distribution of everyday situations as captured by the EMA survey responses and the HA classifier data? If so, in what way? (Secondary RQ)
Materials and Methods
Participants
A cohort of 22 German-speaking individuals with hearing impairment, with a mean age of 70 years (SD = 8 years), was recruited between March and June 2023 from a volunteer database maintained by the Hörzentrum Oldenburg. Hearing loss ranged from moderate to moderately severe, as indicated by a pure tone average (PTA) at frequencies of 0.5, 1, 2, and 4 kHz, with a mean of 41 dB HL (SD = 11 dB) in the better ear and a mean of 45 dB HL (SD = 10 dB) in the worse ear. The majority of participants had extensive experience with HAs, with a mean duration of 9 years (SD = 7 years); one participant reported not using HAs on a regular basis. Moreover, 17 participants (77%) expressed satisfaction with the performance of their HAs, ranging from rather satisfied to very satisfied. Among the participants, 17 (77%) were male, 17 (77%) were retired, and 15 (68%, 1 response missing) had achieved the highest level of school education. All participants reported good to excellent general health status (1 response missing). Prior participation in an EMA study was reported by six (27%) of the participants.
Experimental Procedure
The study participants had two appointments, at least 14 days apart, at the Institute of Hearing Technology and Audiology in Oldenburg. Each participant was expected to complete a 14-day period of EMA. However, for scheduling reasons, there were sometimes a few days more than 14 full days between the two appointments, leading to a few additional days of data collection in four cases.
During the first appointment, participants received instructions about the study, including practical training on the EMA app. Participants were also fitted with study HAs. Pure-tone audiometry was conducted in an audiometric booth using the Unity audiometer 2 and HDA200 headphones. Participants returned the completed Hearli-Q (Lelic et al., 2022) and AStra questionnaires (Fischer et al., 2025), which they received by mail before the appointment.
During the second appointment, participants returned the smartphone and study HAs. As a measure to check for reactivity, the participants completed selected items from the AStra questionnaire (no. 14–27 and no. 32–37) that were deemed suitable for assessing potential behavioral and attitudinal changes following the EMA survey, as compared to the period before. Furthermore, the participants took part in a standardized exit interview, which included inquiries about the usability of the EMA system and the comprehensibility of tasks.
Hearing-aid Fitting
Using the NAL-NL2 fitting formula (Keidser et al., 2011) for experienced listeners, the participants were fitted with Signia Pure 312 7AX HAs equipped with closed click sleeves. An own-voice training session was conducted, and the gain settings were fine-tuned to accommodate individual participant preferences and requirements. Two participants returned to the institute during the initial stages of the EMA phase to have their HAs adjusted. Additional measurements were performed to verify and validate the quality of HA fitting, including probe-microphone measurements and the Freiburg monosyllabic speech recognition test with two test lists. Probe-microphone measurements were performed using the ISTS signal (Holube et al., 2010), with an input level of 65 dB SPL. From 0.5 to 4 kHz, the prescribed NAL-NL2 gain and the applied gain differed on average in absolute terms by 2.2 dB. The average speech recognition score with study HAs was 90% (range: 84%–94%) on the Freiburg monosyllabic speech test at a level of 65 dB SPL in the quiet condition. Two distinct hearing programs were configured: one program with adaptive directionality and noise-reduction capabilities, and another program that essentially compensated for the audiometric loss. These HA programs are hereinafter referred to as ‘advanced’ (A) and ‘basic’ (B), respectively.
EMA app
Participants were given a Galaxy S20 smartphone (Android 11) with a preinstalled EMA app (Version 4.15.7, developed internally by WS Audiology), to which the study HAs were connected via Bluetooth. The EMA app provided the daytime EMA questionnaire and the end-of-day questionnaire that was completed once daily in the evening, hereinafter referred to as EoD. The app additionally controlled the storage and upload of objective acoustical feature data processed in the HA, such as situation classifier and SPL data. The app included a Do-Not-Disturb (DND) feature that enabled participants to set personalized settings for their usual sleeping hours. Participants could adjust the DND periods as needed throughout the 2-week field trial, but were asked to allow survey prompts for at least six hours per day.
The app was designed to switch programs automatically each day at 4 am, or whenever the smartphone next reconnected to the HAs via Bluetooth. Participants did not have control over the HA program selection. While participants knew that there were two programs that randomly alternated, the sound in the HA normally indicating the program change was disabled to blind participants to the active program. They could, however, adjust the volume of the HA using the EMA app. Participants received six random prompts per day to complete a survey, which expired after 3 min. They could also initiate a survey at any time, particularly if they were dissatisfied with their hearing experience in any way. Thus, each survey falls into one of two sampling-type categories: prompted or self-initiated. After completing a survey, there was a 45-min refractory period during which no additional prompts were sent.
The EMA questionnaire consisted of two parts, as schematically shown in Figure 1. First, participants were asked about their mood and the current situation. Participants were informed that “current” referred to the situation they were in when they decided to take the survey, whether prompted by the study or on their own initiative. The environment was broadly categorized as home, mobility, social, work, or other, but collapsed for the analysis into three: home, mobility, and other. Participants also described the listening task using the Common Sound Scenarios (CoSS) scheme (Wolters et al., 2016, CoSS). They rated their overall listening experience, listening effort, sound quality, and HA satisfaction on 7-point categorical scales. Listening effort ratings were only requested when the listening task focused on understanding speech. The wording of these attributes and the extreme categories of the response scales are detailed in Table 1.
Figure 1.
Structure of the conditionally two-part EMA questionnaire. The first part refers to the momentary reporting period. The second part, in the event of a change, refers to the retro-selective reporting period.
Table 1.
Hearing-Related Attributes Addressed in the EMA Questionnaires: Wording, Short-Form, and Extreme Categories.
| Question | Short Form | Extreme Categories of the 7-Point Categorical Scales |
|---|---|---|
| Overall, how do you assess the current/past listening experience? | Global rating | Very good ↔ very bad |
| How effortful is/was listening in this situation? (EoD: How effortful was listening overall today?) | Listening effort | Extremely effortful ↔ effortless |
| How is/was the sound quality of your HAs? | Sound quality | Very good ↔ very bad |
| How satisfied are/were you with your HAs in this situation? (EoD: How satisfied were you overall with your HAs today?) |
HA satisfaction | Very satisfied ↔ very unsatisfied |
Note. Questions in past tense were used when asking about the worst listening experience in the last 30 min.
Second, participants were asked if the situation, or their hearing experience, had changed within the last 30 min (“Please recall the last 30 min. Were there any changes in your situation or auditory perception?”). Changes in situation were explained as shifts e.g., in location, activities, listening targets, or present sound sources, while changes in auditory perception referred to variations in how sounds were perceived, even within consistent settings. If such changes were noted, the participants proceeded to the second part of the questionnaire. This part focused on the worst auditory experience during that period. If the participants were unable to choose between different situations, they were instructed to select the most recent one (“Please describe the worst auditory experience you had in the last 30 min. It does not necessarily have to be negative. If you cannot decide on a situation, just describe the most recent one”). The same set of questions from the first part was then also asked in relation to this specific situation.
The setting of a 30-min interval, although somewhat arbitrary, was informed by findings from von Gablenz et al. (2021), which showed that 93% of assessed situations fell within this period. Fürstenberg et al. (2024) also demonstrated that this interval was well suited to capturing rare situations, showing only minor memory effects. Furthermore, pilot work conducted before the start of the study confirmed the practicality of this interval, as it was well managed by participants.
Additionally, participants were asked a few more questions, such as if it would have been feasible to respond in that situation, whether they attempted to alter the listening situation in any way, and how much it would have bothered them to do so. These items are not relevant to the current research questions and were not considered in the present analysis. Overall, the questionnaire flow was adaptive and included up to 25 items.
The EoD included up to seven questions addressing fatigue and potential nonhearing-related causes. Additionally, it requested an overall assessment of listening effort and HA satisfaction, using the same scales as the daytime EMA questionnaire (see Table 1). The full versions of the EoD and the EMA questionnaire in German, and their English translations, are available in the supplement to this article.
In addition to the questionnaires, the EMA app managed the extraction and storage of objective data characterizing the acoustic conditions. For the analysis, data related to own-voice detection, HA situational classifiers, and SPL were used. Classification and own-voice detection were conducted every second; this data were then aggregated to a percentage per minute for analysis. The HA classifier distinguished between the categories speech in quiet, speech in noise, noise, music, car, and quiet. The data was saved every 15 min (and discarded if the HA was restarted before the 15-min interval was saved). It was then stored inside the HA for 24 hr of operation time and was lost if not collected by the phone within this time frame.
Variables and Short Forms
To distinguish the same parameters or variables for these two reporting periods, the short forms “momentary” and “retro-selective” are used. Additionally, data from both sections are conditionally combined and labeled “retro-combined.” In total, the following terms are used:
Momentary: Refers to the situation and perception immediately before or during the completion of the questionnaire.
Retro-selective: Refers to the worst auditory experience during the previous 30 min (excluding the ratings for the current situation).
Retro-combined: Refers to the worst auditory experience during the previous 30 min. If no situational or perceptual changes were reported, the ratings were assumed to be identical to the current ratings.
For the analysis of objective HA data, including SPL and situation classifications, the surveys were categorized based on whether a change in situation was reported or not. Specifically, two types of surveys were defined:
No-change survey: no change of listening situation or perception in the last 30 min, thus only momentary assessments available
Change survey: change of listening situation or perception in the last 30 min, thus both momentary assessments and retro-selective assessments available
Analysis
Data analysis was conducted descriptively using conventional parameters, such as mean (M), standard deviation (SD) and relative and absolute frequency distributions, along with corresponding visualizations. Unless otherwise noted, descriptive statistics were calculated from data aggregated at the individual participant level. Accordingly, overall distributional descriptors were calculated as the mean of the percentages for individual participants. Paired t-tests were used to examine interaction effects between HA program and features of survey sampling, with the data aggregated at the individual level, and to demonstrate mean differences.
A mixed multinomial model was used to assess whether the distribution of environments, listening intentions, and targets differed between surveys related to the advanced and the basic program, with a significant main effect indicating that program A was not used in the same environmental conditions as program B. A conventional linear mixed model was used for interval-scaled outcome (e.g., the count of surveys taken by day) to analyze potential time effects. In all models, study participants were treated as random effects.
The data related to the main research question, i.e., the relationships between the HA programs and the subjective ratings from both the daytime and the EoD surveys, was also analyzed using mixed linear modeling. Since the ratings were collected on categorical scales, mixed models were built as cumulative-link mixed models (CLMM) with a logit link function (Bauer & Sterba, 2011). CLMMs are suitable for the analysis of complex EMA data, and provide results similar in quality to Bayesian modeling (Leijon et al., 2023). CLMMs were also constructed for subsets of the data, incorporating additional variables as fixed factors and interaction terms. In each of these models, the study participants were treated as random effects, while the HA program and a time variable (day, week) were considered as fixed factors. The CLMM were built separately for every reporting period in question and structured as follows: OrdinalOutcome = β1 × HA_program + β2 × TimeVariable + … γ × ParticipantID + c. Owing to the study design, the ratings from the different reporting periods are not independent. This dependency can only be effectively addressed in the models for the momentary ratings by adding a binary variable indicating the presence of a short-term retrospective rating in the same survey. Such extended CLMM on momentary ratings were also built and reported in a summarized form. All statistical tests were conducted with a two-tailed alpha level of 0.05.
Results
Data Overview
Participants completed 2,112 daytime surveys. Excluded from the analysis were incomplete surveys (n = 106) and those started 40 min or less after a program change (n = 45), resulting in 1961 valid daytime surveys. Additionally, the participants initiated 405 EoD surveys. After removing EoD surveys that were incomplete or superseded by a subsequent EoD survey (n = 107), or completed within 8 hr following the last program change, 264 valid EoD surveys remained available for analysis.
Due to Bluetooth connection problems, HA classifier and SPL data for the 30-min interval preceding the survey was only available for 1,779 surveys. The number of surveys varied among participants. On average, each study participant collected 89 daytime surveys (SD = 16, range: 43–115) and 12 EoD surveys (SD = 3, range: 1–18) on 16 different days (SD = 2, range: 12–21).
Overall, an approximately balanced use of both HA programs was observed across all surveys, with the advanced HA program A being associated with 46% (SD = 11%; range: 21–72%) of the daytime surveys and 49% (SD = 21%; range: 0–100%) of the EoD surveys. A paired t-test revealed no significant interaction between the number of surveys completed and the HA program, for either the daytime surveys, ΔM = 6.0, t(21) = –1.4, p = .169, or the EoD surveys, ΔM = 0.7, t(21) = 0.7, p = .380.
In the daytime surveys, 47% (SD = 14%; range: 22%–78%) followed a prompt. The median time interval between two successive surveys was 92 min (SD = 13 min; range: 71–120 min). No significant interaction was found between the HA program and the sampling type, ΔM = −5.7, t(21) = –1.6, p = .127. Of the 1,961 surveys conducted, a change in the listening situation or perception in the preceding 30 min was reported in 401 surveys (change surveys). The percentage of change surveys varied among participants, with M = 21% (SD = 15%; range: 0%–63%). No significant interactions were found between the percentage of change surveys and the HA program, ΔM = –1.0, t(21) = 0.6, p = .579.
Separate mixed-multinomial regression models were used for each dependent variable: environment (home, work, other), CoSS listening intentions, and CoSS tasks. Each model included the HA program as a fixed effect, and the study participant as a random effect. None of these models showed a significant main effect of the HA program on the respective dependent variable, thus the distribution of real-world environments, listening intentions, and tasks did not differ when using HA program A or B.
The mean RMS SPL averaged over the left and right HA for a 30-min period before responding to the daytime surveys was 55.2 dB SPL (SD = 9.9 dB SPL). An independent-samples t-test indicated no significant difference in mean SPL during the usage of the two HA programs, ΔM = 0.6, t(1777) = 1.2, p = .215.
Overall, the statistical analysis confirmed that the daily switching between HA programs ensured that participants encountered similar conditions and demands with both the basic and advanced programs.
Time and Sequence Effects
Participants completed selected items from the AStra questionnaire before and after the EMA phase to explore potential changes in attitudes, discussed as reactivity in the EMA context. AStra scores before and after the intervention showed no significant differences at the group level, as indicated by a related-samples Wilcoxon signed-rank test. Consequently, based on AStra responses, repeated assessments with EMA did not appear to significantly alter hearing-related behaviors or attitudes.
However, within the EMA data itself, time effects were observable, affecting the number of completed daily questionnaires, the percentage of no-change surveys, and, more importantly, the ratings provided. Since the first and last days of each EMA phase did not represent full days, they were excluded for the following analyses of time- and sequence effects. Analyses were conducted using either a day index variable, or by comparing the first and second halves of the EMA phase, simply referred to as the first and second week.
Decreasing Number of Surveys
The number of daytime surveys completed each day exhibited a slight decrease over time. This decrease was not statistically significant when comparing the first and second EMA week using a two-sided t-test, ΔM = –0.5, t(21) = 1.849, p = .079, on data aggregated for each study participant. Nevertheless, when analyzing the time effect continuously with a day index variable using a mixed-effects model approach, it reached significance, β = –0.080, t(296) = −3.232, p = .001.
Decreasing Share of Surveys Reporting a Change
The percentage of surveys in which participants reported changes in environments and/or hearing-related perceptions also declined slightly over time. This proportion was 21% in the first week and reduced to 18% in the second week at median. This decrease is not statistically significant, as shown by a paired t-test, ΔM = 2.8, t(21) = 2.535, p = .272, on data aggregated for each study participant. However, a time effect was confirmed by a linear mixed model analysis when using a day index variable, β = –0.591, t(295) = –2.416, p = .016.
Increasingly Critical Subjective Ratings
Also, the ratings on the main attributes themselves exhibited a weak-to-moderate temporal effect. Over time, participants became increasingly critical regarding the device-related attributes. Global rating and sound quality of current listening situations were scored lower and HA satisfaction decreased. The time effect was independent of the HA program and statistically significant for many attributes. CLMM, including a simple sequence variable or the first and second EMA week as predictors, yielded F-values from 8.7 to 15.1 and p-values ≤ .03. The only attribute that stood out was listening effort. A trend towards a slight decrease over time was observed when the time variable was used as the only predictor, but significance criteria were only met in more complex modeling, as shown in Tables 2 and 4.
Table 2.
CLMM for Momentary Assessments: Beta Coefficients and 95% Confidence Intervals (Logits).
| 95% Confidence Interval | |||||
|---|---|---|---|---|---|
| Outcome Variable | Predictor | Beta Coefficient | Low | High | p-Value |
| HA satisfaction | Advanced HA | 0.075 | −0.114 | 0.265 | .437 |
| First EMA week | 0.264 | 0.078 | 0.450 | .005 | |
| Background sounds | −0.405 | −0.621 | −0.190 | <.001 | |
| Environment: Other | −0.573 | −0.850 | −0.297 | <.001 | |
| Environment: Mobility | −0.595 | −0.859 | −0.332 | <.001 | |
| Global rating | Advanced HA | 0.071 | −0.114 | 0.257 | .449 |
| First EMA week | 0.292 | 0.110 | 0.474 | .002 | |
| Background sounds | −0.590 | −0.801 | −0.380 | <.001 | |
| Environment: Other | −0.613 | −0.887 | −0.340 | <.001 | |
| Environment: Mobility | −0.779 | −1.037 | −0.521 | <.001 | |
| Sound quality | Advanced HA | 0.170 | −0.020 | 0.359 | .079 |
| First EMA week | 0.228 | 0.042 | 0.414 | .016 | |
| Background sounds | −0.592 | −0.808 | −0.376 | <.001 | |
| Environment: Other | −0.414 | −0.694 | −0.133 | .004 | |
| Environment: Mobility | −0.400 | −0.663 | −0.137 | .003 | |
| Listening effort | Advanced HA | 0.022 | −0.199 | 0.244 | .844 |
| First EMA week | −0.283 | −0.503 | −0.064 | .011 | |
| Background sounds | −0.831 | −1.090 | −0.572 | <.001 | |
| Environment: Other | −0.977 | −1.303 | −0.651 | <.001 | |
| Environment: Mobility | −0.725 | −1.046 | −0.405 | <.001 | |
Note. Reference categories: basic HA program, second week, no report of background sounds, home. Positive values indicate better ratings than for the reference condition, i.e., higher HA satisfaction, better global rating, better sound quality, and lower listening effort.
Table 4.
CLMM for Retro-Combined Assessments: Beta Coefficients and 95% Confidence Intervals (Logits).
| 95% Confidence Interval | |||||
|---|---|---|---|---|---|
| Outcome Variable | Predictor | Beta Coefficient | Low | High | p-Value |
| HA satisfaction | Advanced HA | 0.067 | −0.118 | 0.252 | .478 |
| First EMA week | 0.248 | 0.067 | 0.430 | .007 | |
| Background sounds | −0.405 | −0.616 | −0.194 | <.001 | |
| Environment: Other | −0.638 | −0.900 | −0.376 | <.001 | |
| Environment: Mobility | −0.827 | −1.075 | −0.579 | <.001 | |
| Global rating | Advanced HA | 0.109 | −0.070 | 0.289 | .232 |
| First EMA week | 0.276 | 0.100 | 0.453 | .002 | |
| Background sounds | −0.505 | −0.710 | −0.299 | <.001 | |
| Environment: Other | −0.831 | −1.088 | −0.575 | <.001 | |
| Environment: Mobility | −1.053 | −1.296 | −0.811 | <.001 | |
| Sound quality | Advanced HA | 0.184 | −0.001 | 0.369 | .051 |
| First EMA week | 0.200 | 0.019 | 0.381 | .031 | |
| Background sounds | −0.534 | −0.745 | −0.323 | <.001 | |
| Environment: Other | −0.440 | −0.704 | −0.177 | .001 | |
| Environment: Mobility | −0.716 | −0.964 | −0.469 | <.001 | |
| Listening effort | Advanced HA | 0.143 | −0.076 | 0.363 | .201 |
| First EMA week | −0.232 | −0.449 | −0.015 | .036 | |
| Background sounds | −0.826 | −1.081 | −0.571 | <.001 | |
| Environment: Other | −1.040 | −1.350 | −0.730 | <.001 | |
| Environment: Mobility | −0.689 | −1.000 | −0.378 | <.001 | |
Note. Reference categories: basic HA program, second week, no report of background sounds, home. Positive values indicate better ratings than the reference condition, i.e., higher HA satisfaction, better global rating, better sound quality, and lower listening effort.
Distribution of Real-World Conditions by Reporting Period
Retrospectively reported listening experiences were, on average, rated poorer than momentarily reported experiences, a result that was expected due to the study design. However, real-world environments, listening tasks, and conditions substantially differed by reporting period, and these differences may also have impacted the analysis of HA-program differences. Figure 2 shows proportional statistics of real-world conditions for the three reporting types.
Figure 2.
Proportional statistics for environments, listening intentions, and speech listening events separately for momentary, retro-selective, and retro-combined reporting in daytime surveys. Data aggregated on an individual level and rescaled to 100%.
As can be seen in Figure 2, the distribution of listening situations in the momentary assessments differed from that in the retro-selective and retro-combined assessments. Retro-selective situations were significantly more often related to mobility, ΔM = 24.4, t(20) = 4.856, p < .001, and less frequently related to home environments than momentary assessments, ΔM = 35.1, t(20) = 7.565, p < .001. This pattern aligns with the finding that the listening intentions categorized as focused listening in the CoSS framework, which in the current data primarily included activities such as watching TV or listening to the radio, as well as speech listening events, had a higher proportion during the momentary assessments compared to the retro-selective assessments. However, significance was not achieved in either case. Due to the relatively large number of no-change surveys, the distributions of the retro-combined data, combined from retro-selective and momentary assessments, is very similar to the momentary data.
Objective data recorded by the HA reveal differences in the 30-min intervals preceding change and no-change surveys. Figures 3 and 4 illustrate the distribution of HA classifier and SPL data, respectively. They indicate a greater presence of car segments (15% vs. 6%), fewer quiet segments (37% vs. 51%), and higher RMS levels (medians 59 dB SPL and 55 dB SPL, respectively) in the 30 min before surveys reporting a change of situation or perception, compared to no-change surveys.
Figure 3.
Classifier proportions in the 30-min interval before starting an EMA survey. Data aggregated on individual level and rescaled to 100%.
Figure 4.
Density distribution and median of SPLs in the 30-min interval before taking a survey for no-change surveys (blue) and change surveys (red).
HA Program Contrast
To explore the relationship between categorical quality ratings and various predictors, a series of exploratory CLMM analyses were conducted. The models were developed separately for the reporting periods in question. Based on expected relationships, the models included the following variables alongside the HA program: sampling type (prompted vs. self-initiated), presence of background sounds, environment, day sequence, and, alternatively, EMA week (first vs. second).
The sampling type consistently affected the assessments, i.e., somewhat poorer listening experiences were reported in surveys initiated by the participants voluntarily than in those following random prompts. However, including the sampling type in CLMM did not significantly alter any of the estimates for the other predictors, if at all, and did not enhance the model quality, as indicated by the AIC score (where a lower value signifies better model fit). Therefore, the models reported here were developed without this variable. In contrast, contextual variables characterizing the environments and week as the time variable were included. The environments were categorized as home, work, or other, and the presence of background sounds of any kind was coded as a binary variable. The results for a total of 12 CLMMs established for the daytime surveys are presented in separate tables for the three reporting periods: momentary (Table 2), retro-selective (Table 3), and retro-combined (Table 4). Each table includes models for the four attributes observed in this study: HA satisfaction, global rating, listening effort, and sound quality. Model results for the summarizing EoD assessments are shown in Table 5. Positive coefficients generally indicate better ratings compared to the reference. To aid interpretation for the main research question RQ1, comparing the impact of the two different HA programs on EMA ratings, Figure 5 presents the beta coefficients and 95% confidence intervals from the CLMM models across different reporting periods.
Table 3.
CLMM for Retro-Selective Assessments: Beta Coefficients and 95% Confidence Intervals (Logits).
| 95% Confidence Interval | |||||
|---|---|---|---|---|---|
| Outcome Variable | Predictor | Beta Coefficient | Low | High | p-Value |
| HA satisfaction | Advanced HA | 0.325 | −0.065 | 0.716 | .102 |
| First EMA week | 0.265 | −0.132 | 0.662 | .190 | |
| Background sounds | −0.524 | −1.024 | −0.024 | .040 | |
| Environment: Other | −0.298 | −0.819 | 0.223 | .262 | |
| Environment: Mobility | −0.365 | −0.856 | 0.125 | .144 | |
| Global rating | Advanced HA | 0.413 | 0.029 | 0.797 | .035 |
| First EMA week | 0.154 | −0.237 | 0.545 | .440 | |
| Background sounds | −0.431 | −0.925 | 0.064 | .088 | |
| Environment: Other | −0.620 | −1.135 | −0.106 | .018 | |
| Environment: Mobility | −0.822 | −1.308 | −0.337 | <.001 | |
| Sound quality | Advanced HA | 0.492 | 0.096 | 0.888 | .015 |
| First EMA week | 0.237 | −0.165 | 0.639 | .248 | |
| Background sounds | −0.261 | −0.766 | 0.245 | .311 | |
| Environment: Other | −0.100 | −0.626 | 0.427 | .710 | |
| Environment: Mobility | −0.441 | −0.937 | 0.054 | .081 | |
| Listening effort | Advanced HA | 0.652 | 0.121 | 1.184 | .016 |
| First EMA week | −0.278 | −0.806 | 0.249 | .300 | |
| Background sounds | −0.812 | −1.464 | −0.160 | .015 | |
| Environment: Other | −0.070 | −0.749 | 0.608 | .839 | |
| Environment: Mobility | 0.002 | −0.711 | 0.715 | .995 | |
Note. Reference categories: basic HA program, second week, no report of background sounds, Home. Positive values indicate better ratings than the reference condition, i.e., higher HA satisfaction, better global rating, better sound quality, and lower listening effort.
Table 5.
CLMM for EoD Assessments: Beta Coefficients and 95% Confidence Intervals (Logits).
| 95% Confidence Interval | |||||
|---|---|---|---|---|---|
| Outcome Variable | Predictor | Beta Coefficient | low | high | p-Value |
| HA satisfaction | Advanced HA | 0.328 | −0.195 | 0.851 | .218 |
| First EMA week | 0.023 | −0.477 | 0.523 | .927 | |
| Listening effort | Advanced HA | 0.151 | −0.341 | 0.642 | .547 |
| First EMA week | 0.017 | −0.450 | 0.484 | .941 | |
Note. Reference categories: basic HA program, second week. Positive values indicate higher HA satisfaction and lower listening effort.
Figure 5.
Beta coefficients and 95% confidence intervals from complex CLMM (see Tables 2–5) for different attributes and reporting types. Positive values indicate better ratings for the advanced than the basic HA program. Note. Listening effort was reversed to facilitate easier interpretation.
In momentary assessments, the presence of background sounds and being in nondomestic environments were generally associated with poorer ratings. As shown in Table 2, the logit beta coefficients for these factors were often in the range of |0.4| to |1.0|, which is substantially higher than the coefficients estimated for the HA program contrast, which do not exceed |0.2| for any attribute and generally fall well below this value. In momentary assessments, even the time effect is more pronounced than the HA program contrast, with beta coefficients >|0.2|. Notably, the ratings for global rating, sound quality, and HA satisfaction of the listening experience progressively worsened over time, while listening effort improved (lower listening effort in the second than the first week). The significance criterion is consistently met by the factors background sounds, environment, and week as time variable, but not for the HA program in any model of momentary ratings.
In contrast, models related to retro-selective assessments showed a different pattern (Table 4, Figure 5). The effects of time, the presence of background sounds, and nondomestic environments were significantly reduced, with the corresponding coefficients ranging from 0 to |0.5| often not meeting the significance criterion. On the other hand, differences between the HA programs become apparent (see also Figure 5). The associated beta coefficients ranged from |0.3| to |0.7| in favor of the advanced HA program. Ratings for the advanced HA program were significantly better for the attributes global rating, sound quality, and listening effort.
The CLMMs for the retro-combined condition shown in Table 4 and Figure 5 qualitatively reflect the estimates of the momentary ratings, largely due to the sample distribution (21% retro-selective vs. 79% momentary).
The modeling was repeated separately for the environments and each reporting period. As predictors, HA program, presence of background sounds, and the time variable were used. For ratings related to the environment home, no effect for the HA program was detectable in any reporting period. Rating differences between the two HA programs were only observed in the mobility and other environments. In these nondomestic environments, for all three reporting periods, including the momentary reporting, the beta coefficients indicating differences between HA programs were often significant or nearly significant across models. Specifically, this applies to the attributes of sound quality and listening effort in environments categorized as other, as well as the attributes HA satisfaction, global rating, and sound quality in mobility environments.
Unlike the daytime surveys, the CLMM analysis of the EoD data for the attributes HA satisfaction and listening effort indicated no effect, either for the HA program or the time variable (Table 5, Figure 5).
As mentioned in the Methods subsection, the study design implies a dependency in the ratings across the reporting periods. As this dependency can only be effectively addressed for the momentary ratings, a binary variable was added to the respective CLMM to indicate the presence of a short-term retrospective rating in the same survey. This binary factor significantly affected the momentary ratings for global rating and listening effort. The momentary ratings were poorer when a short-term retrospective rating followed, compared to when participants reported no change in situation or perception in the last 30 min. As in the models on momentary ratings presented in Table 2 and Figure 5, no significant association with the hearing program was observed here either.
Discussion
This study examined a methodological question, namely, which EMA design is more sensitive to small differences between two conditions. While we focused here on HA program evaluation, we assume that methodological improvements are also valid for other treatments or conditions under test. The premise was that differences between HAs or HA features that can be reliably observed in laboratory settings often are not manifest in EMA surveys (e.g., Wu et al., 2019). This discrepancy may be due to the fact that the HA features being examined do not produce perceivable differences in everyday acoustic environments, or because the EMA method is not sensitive enough to reliably capture existing perceptual differences. In particular, sensitivity may be reduced if challenging situations are more frequently modified in the HA program that provides less benefit. The situation before modification may be easier to report in a short-term retrospective. The primary research question therefore focused on the impact of the reporting period on the observed contrast. In this study, experienced HA users were fitted with study HAs that alternated daily between a basic and an advanced program. Participants were asked to provide both a strictly momentary assessment of their listening experience, and, if any changes in their listening situation or perception occurred within the previous 30 min, to recall and evaluate the worst listening experiences during that period. If no change was reported, the approach assumed that the current rating applied to every moment and situation within the preceding 30-min interval. Based on this, three reporting types for daytime assessments were distinguished in the analysis: momentary, retro-selective, and retro-combined. Additionally, at the end of each day the participants were asked to provide an overall assessment of listening effort and HA satisfaction.
Higher Contrast in Short-Term Retrospective Reporting
In summary, the study demonstrated that the observed contrasts between the basic and the advanced HA program were clearly associated with the type of reporting period used. No differences were evident in the strictly momentary assessments, whereas the retro-selective assessments indicated a favorable difference for the advanced program, such that the criteria for statistical significance were met for the attributes global rating, sound quality, and listening effort (not HA satisfaction). The most pronounced effect was observed for sound quality. Only 21% of the surveys indicated that participants experienced a change in their listening situation or perception in the 30 min prior. Consequently, the HA difference was not apparent for the retro-combined reporting period, which used momentary ratings when retro-selective data were unavailable. Only the attribute sound quality showed a trend, narrowly missing statistical significance. The EoD questionnaires revealed no discernible differences between the two HA programs.
Distribution of Real-World Context Factors by Reporting Period
The reporting periods used in the daytime surveys related to different selection principles and resulted in different distributions of listening intentions and environments in the EMA data. While home situations accounted for approximately two-thirds of the momentary and retro-combined assessments, they comprised less than one-third in the retro-selective assessments. Roughly two-thirds of the retro-selective assessments related to nonhome situations, such as mobility, work, or various social activities. Momentary and retro-selective surveys also showed differences in their distributions of CoSS intention categories. The distribution in the momentary assessments aligned closely with findings from other EMA studies on similar populations. Specifically, the CoSS intention levels in momentary assessments were 33% for communication, 24% for focused listening, and 43% for nonspecific situations, which is similar to the results reported by Smeds et al. (2019) with 30%, 23%, and 47%, Smeds et al. (2020) with 31%, 23%, and 45%, and Jensen et al. (2019) with approximately 33%, 27%, and 40% (respectively, extracted from plots). In contrast, the retro-selective assessments revealed a lower proportion of focused listening, with a higher share of nonspecific intentions. Although distribution differences in CoSS intention categories suggested their inclusion in CLMM modeling, exploratory analyses revealed several drawbacks. Models using CoSS intention categories as predictors showed no significant effects on the attributes and tended to poorer model fit, indicated by higher AIC values. Importantly, for our main research question, model comparability was compromised, as listening effort ratings cannot be reasonably modeled for the “passive listening” category.
Different reporting types captured distinct time segments of daily life, as also shown by HA classifier data and SPL distributions. The 30-min intervals prior to change surveys, i.e., surveys including retro-selective assessments, showed higher SPL and a larger proportion of segments categorized as Car and Noise by HA classifiers, compared to surveys with only momentary assessments. This suggests that HA preferences were pronounced in nonhome situations, and might be masked by the predominance of quiet- and easy listening situations in momentary assessments, which are frequently sampled by random prompts. This finding aligns with Bosman et al. (2021) and Christensen et al. (2024a, 2024b), who emphasized the value of including HA classifier data to differentiate various, including more challenging, everyday situations in HA comparisons. However, in our study, two significant reasons argued against using HA data for modeling. First, there was a notable reduction in cases due to Bluetooth connectivity issues. Second, and more importantly, modeling short-term retrospective ratings required selecting an appropriate metric for the 30-min interval data—such as the mean, median, maximum, or another parameter—which added complexity. In our study, we researched EMA design modifications to possibly improve sensitivity to smaller differences, acknowledging that contextual information provided by the HA or other technical equipment might not always be available.
Time Effects
Particular attention is warranted for the time effects observed in this study, which may suggest a form of reactivity. Although the AStra questionnaire completed before and after the EMA phase indicated no changes in the participants’ attitudes towards their hearing problems or their management of the hearing problems, the EMA data itself revealed mostly small, but robust, changes in several respects. First, there was a decline over time in both the number of completed surveys and the proportion of surveys reporting a change of listening situation or perception. This may be attributed to the cumulative burden over a 2-week EMA period. Participants had to respond to up to 13 additional items when reporting any change in situation or perception. It is therefore possible that participants opted for “no change” as a shortcut to end the survey, even though upon further reflection they might have recalled changes. Ultimately, we cannot definitively determine to what extent the participants opted for such an easy way out.
More serious than these numerical reductions are the shifts in ratings. Previous EMA hearing studies typically did not detect reactivity to repeated measurements (e.g., Jenstad et al., 2021; Timmer et al., 2017; von Gablenz et al., 2021). Commonly, reactivity was evaluated with questionnaires filled in before and after the EMA phase, rarely examining the EMA data itself. In an earlier study with a short EMA phase lasting only four days, we employed the latter approach and found no evidence of reactivity. Christensen et al. (2024b) incorporated a time variable (day) into their EMA modeling, but did not report any outcomes regarding this parameter.
In the current study, participants were equipped with HAs and HA programs that were unfamiliar to them, and an adaptation period was not included before starting the EMA phase. As a result, temporal changes due to adaptation were not entirely unexpected. For example, in a HA program comparison EMA study, Bosman et al. (2021) found improvements in sound-quality ratings and program suitability over time. However, the changes observed in our study suggest otherwise. Specifically, there was a deterioration over time in the ratings of HA satisfaction, global rating, and sound quality—all attributes closely tied to HAs. This trend argues against an adaptation effect. Instead, it suggests the presence of a focus effect similar to that described by Lelic et al. (2023, 2024) but occurring in the opposite direction. Lelic et al. asked participants to report positive experiences related to their hearing, and found that HA satisfaction was significantly higher among experienced users (Lelic et al., 2023) and first-time users (Lelic et al., 2024) compared to control groups not instructed to report positive perceptions. The current study adopted the opposite approach of “negative focusing” by inquiring about the worst listening experience in the retro-selective survey segment. However, not all changes in ratings over time followed the same trend. The complex CLMM showed an improvement for listening effort, indicating a decrease over time and suggesting an effect of adaptation. This opposing temporal shift in ratings might seem paradoxical, but it is important to note that HA satisfaction, global rating, and sound quality are assessments of the device, an external reference point possibly subject to high expectations, whereas listening effort reflects an internal personal experience. However, it might also simply be a variant of the “honeymoon” effect, known for first-time HA users, but also discussed for experienced and refitted HA users (Wolff et al., 2024). Further research is necessary to determine to what extent the observed effects are really negative focus effects. In light of a possible negative focus effect, it seems advisable to employ an EMA design targeting the sampling of negative experiences only in selected populations, such as avoiding its use in HA first-fit assessments, and only for a quite limited duration.
Despite the temporally opposing changes in the ratings of listening effort and the other attributes, the ratings remained strongly correlated, as observed in other EMA studies (e.g., Christensen et al., 2024b; von Gablenz et al., 2021).
Limitations
One limitation of this study is the absence of a preparatory phase. This affected both the adaptation to the study HA and the conduct of EMA, which, in other studies, was preceded by a several-day practice phase (e.g., Wu et al., 2018). Furthermore, participant responses in the final interview indicated that the EMA task was challenging. Although completely negative ratings were very rarely given, often about a third of the responses fell into moderately positive categories. For example, in response to whether the instructions were clearly understandable, 14 participants answered “yes” and eight participants answered “rather yes.” Recalling the previous 30 min was considered “easy” by eight participants and “somewhat easy” by 11 participants, although three participants found this task “somewhat difficult.” Additionally, six participants found it somewhat difficult and two found it difficult to distinguish between current and past assessments, while five found it “easy” and nine found it “rather easy.” This supports the considerations of Stone et al. (2023) that combining momentary EMA with “coverage EMA” within a single momentary protocol is efficient, but raises new questions, particularly whether participants actually switch reporting periods or maintain them. The EMA study indeed involved a complex decision-making process. Initially, participants needed to recall the events and perceptions from the preceding 30 min to determine any differences from the current situation. Then, participants had to make a selection decision, considering both perceptual and temporal aspects. Choosing and assessing the worst listening experience can be especially challenging when there are no inherently bad or poor experiences, only relatively less favorable ones among generally good or neutral experiences. Overall, it is not typical in daily life to continuously monitor changes in such a detailed manner, and to take multiple decisions about perceptual differences as required in the current study. If changes were reported, the survey extended beyond the momentary assessment, with additional items that might, at times, have tempted participants to report no changes. Having two separate EMA phases for momentary and retro-selective assessments might have simplified the task for participants, eliminated the risk of unreported changes, and created a broader data basis. Such an EMA design would have also removed the dependency between momentary and short-term retrospective ratings (in terms of timing, not participants). However, this approach was not feasible for this research project due to the increased effort required.
In the analysis, two aspects appeared peculiar from the data analysts’ perspective. First, certain data entries showed inconsistencies in temporal progression. Participants could initiate a survey at any time on their own, even at intervals much shorter than 30 min. For example, a total of 34 surveys were initiated within 10 min or less of the previous one, including 20 with an interval of 3 min or less, where the described environment and listening intention remained unchanged. These 20 surveys mostly exhibited only minor differences in description and ratings and were fairly well distributed across seven different participants, without any notable concentration. Second, some participants’ ratings showed very low variance, with one participant showing essentially none. Across different listening situations and intentions in 79 out of 82 surveys, there was absolute consistency in categorical selections for the four attributes, with only four out of 414 ratings selecting the next closest category. It remains unclear whether this particular participant who rated each situation exactly the same did not perceive any differences, or if the scale was inadequate to capture the nuances this participant experienced. Both aspects—surveys initiated in rapid succession and extremely low variance in ratings—are puzzling but do not, in our view, justify excluding the data. Data collection with EMA is largely uncontrollable and, since bypassing a questionnaire is easier than responding without due care, it is assumed that participants answer with appropriate care.
Conclusion
The definition of the reporting period in audiologically motivated EMA studies should be carefully considered. This study demonstrated that EMA surveys referencing recent past experiences, and incorporating a selective element, capture a distinct distribution of real-world snapshots compared to a strictly momentary approach. Specifically, compared to strictly momentary assessments, the retro-selective assessments showed a stronger contrast between the advanced and basic HA program. This suggests that short-term retrospective and selective reporting could potentially enhance the EMA method's sensitivity to small perceptual differences.
Supplemental Material
Supplemental material, sj-docx-1-tia-10.1177_23312165261421698 for Beyond the Moment: How EMA Reporting Periods Affect Sampled Situations and Sensitivity to Hearing Aid Differences by Petra von Gablenz, Inga Holube and Nadja Schinkel-Bielefeld in Trends in Hearing
Acknowledgments
We thank Patricia Fürstenberg for hearing aid fitting, audiometric assessments, and study assistance. English language services were provided by www.stels-ol.de.
Footnotes
ORCID iDs: Petra von Gablenz https://orcid.org/0000-0002-7346-7950
Inga Holube https://orcid.org/0009-0001-1936-8855
Nadja Schinkel-Bielefeld https://orcid.org/0000-0003-1273-176X
Ethical Approval and Informed Consent Statements: The study was approved by the ethics committee of Carl von Ossietzky University Oldenburg (Kommission für Forschungsfolgenabschätzung und Ethik, Approval No. EK/2022/065), and all participants gave written informed consent.
Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Data collection was funded by WS Audiology. Data analysis and manuscript preparation was supported with funds from the governmental funding initiative zukunft.niedersachsen of the Volkswagen Foundation, project “Data-driven health (DEAL)”. Publication was supported by the Open Access Publication Fund of Jade University of Applied Sciences.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability: Fully anonymized datasets from this study are available upon reasonable request.
Supplemental Material: Supplemental material for this article is available online.
References
- Amlani A. M., Schafer E. C. (2009). Application of paired-comparison methods to hearing aids. Trends in Amplification, 13(4), 241–259. 10.1177/1084713809352908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersson K. E., Andersen L. S., Christensen J. H., Neher T. (2021). Assessing real-life benefit from hearing-aid noise management: SSQ12 questionnaire versus ecological momentary assessment with acoustic data-logging. American Journal of Audiology, 30(1), 93–104. 10.1044/2020_AJA-20-00042 [DOI] [PubMed] [Google Scholar]
- Bauer D. J., Sterba S. K. (2011). Fitting multilevel models with ordinal outcomes: Performance of alternative specifications and methods of estimation. Psychological Methods, 16(4), 373. 10.1037/a0025813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borschke I., Jürgens T., Schinkel-Bielefeld N. (2024). How individuals shape their acoustic environment: Implications for hearing aid comparison in ecological momentary assessment. Ear and Hearing, 45(4), 985–998. 10.1097/AUD.0000000000001490 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bosman A. J., Christensen J. H., Rosenbom T., Patou F., Janssen A., Hol M. K. (2021). Investigating real-world benefits of high-frequency gain in bone-anchored users with ecological momentary assessment and real-time data logging. Journal of Clinical Medicine, 10(17), 3923. 10.3390/jcm10173923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen J. H., Rumley J., Gil-Carvajal J. C., Whiston H., Lough M., Saunders G. H. (2024a). Predicting individual hearing aid preference from self-reported listening experiences in daily life. Ear and Hearing, 45(5), 1313–1325. 10.1097/AUD.0000000000001520 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen J. H., Whiston H., Lough M., Gil-Carvajal J. C., Rumley J., Saunders G. H. (2024b). Evaluating real-world benefits of hearing aids with deep neural network-based noise reduction: An ecological momentary assessment study. American Journal of Audiology, 33(1), 242–253. 10.1044/2023_AJA-23-00149 [DOI] [PubMed] [Google Scholar]
- Fischer R.-L., Williger B., Incerti L., Kamin S. T. (2025). Hearing-related adaptive strategies: Development and initial validation of the AStra scale. Zeitschrift für Gerontologie und Geriatrie, 59, 67–73. online ahead of print, 10.1007/s00391-025-02482-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fürstenberg P., Holube I., Lelic D., Schinkel-Bielefeld N. (2024). Memory bias in Ecological Momentary Assessment – a pilot study. GMS Zeitschrift für Audiologie – Audiological Acoustics, 6(Doc23), 1–6. 10.3205/zaud000058 [DOI] [Google Scholar]
- Galvez G., Turbin M. B., Thielman E. J., Istvan J. A., Andrews J. A., Henry J. A. (2012). Feasibility of ecological momentary assessment of hearing difficulties encountered by hearing aid users. Ear and Hearing, 33(4), 497–507. 10.1097/AUD.0b013e3182498c41 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glista D., O'Hagan R., Beh K., Crukley J., Scollie S., Cornelisse L. (2024). Real-world assessment of listener preference for hearing aid technology levels in socially involved situations. Frontiers in Audiology and Otology, 2, 1430992. 10.3389/fauot.2024.1430992 [DOI] [Google Scholar]
- Glista D., O’Hagan R., Van Eeckhoutte M., Lai Y., Scollie S. (2021). The use of ecological momentary assessment to evaluate real-world aided outcomes with children. International Journal of Audiology, 60(sup1), S68–S78. 10.1080/14992027.2021.1881629 [DOI] [PubMed] [Google Scholar]
- Henry J. A., Galvez G., Turbin M. B., Thielman E. J., McMillan G. P., Istvan J. A. (2012). Pilot study to evaluate ecological momentary assessment of tinnitus. Ear and Hearing, 32(2), 279–290. 10.1097/AUD.0b013e31822f6740 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holube I., Fredelake S., Vlaming M., Kollmeier B. (2010). Development and analysis of an international speech test signal (ISTS). International Journal of Audiology, 49(12), 891–903. 10.3109/14992027.2010.506889 [DOI] [PubMed] [Google Scholar]
- Holube I., von Gablenz P., Bitzer J. (2020). Ecological momentary assessment in hearing research: Current state, challenges, and future directions. Ear and Hearing, 41, 79S–90S. 10.1097/AUD.0000000000000934 [DOI] [PubMed] [Google Scholar]
- Humes L. E., Rogers S. E., Main A. K., Kinney D. L. (2018). The acoustic environments in which older adults wear their hearing aids: Insights from datalogging sound environment classification. American Journal of Audiology, 27(4), 594–603. 10.1044/2018_AJA-18-0061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jensen N. S., Hau O., Lelic D., Herrlin P., Wolters F., Smeds K. (2019). Evaluation of auditory reality and hearing aids using an ecological momentary assessment (EMA) approach. In Proceedings of the 23rd International Congress on Acoustics, Aachen, Germany, Berlin: German Acoustical Society, 6545–6552. [Google Scholar]
- Jenstad L. M., Singh G., Boretzki M., DeLongis A., Fichtl E., Ho R., Huen M., Meyer V., Pang F., Stephenson E. (2021). Ecological momentary assessment: A field evaluation of subjective ratings of speech in noise. Ear and Hearing, 42(6), 1770–1781. 10.1097/aud.0000000000001071 [DOI] [PubMed] [Google Scholar]
- Jorgensen E., Xu J., Chipara O., Oleson J., Galster J., Wu Y.-H. (2022). Auditory environments and hearing aid feature activation among younger and older listeners in an urban and rural area. Ear and Hearing, 44(3), 603–618. 10.1097/AUD.0000000000001308 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keidser G., Dillon H., Flax M., Ching T., Brewer S. (2011). The NAL-NL2 prescription procedure. Audiology Research, 1(e24), 88–90. 10.408/audiores.2011.e24 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leijon A., von Gablenz P., Holube I., Taghia J., Smeds K. (2023). Bayesian analysis of Ecological Momentary Assessment (EMA) data collected in adults before and after hearing rehabilitation. Frontiers in Digital Health, 5(1100705), 1–13. 10.3389/fdgth.2023.1100705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lelic D., Nielsen J., Parker D., Marchman Rønne F. (2021). Critical hearing experiences manifest differently across individuals: Insights from hearing aid data captured in real-life moments. International Journal of Audiology, 61(5), 428–436. 10.1080/14992027.2021.1933621 [DOI] [PubMed] [Google Scholar]
- Lelic D., Parker D., Herrlin P., Wolters F., Smeds K. (2023). Focusing on positive listening experiences improves hearing aid outcomes in experienced hearing aid users. International Journal of Audiology, 63(6), 420–430. 10.1080/14992027.2023.2190006 [DOI] [PubMed] [Google Scholar]
- Lelic D., Wolters F., Herrlin P., Smeds K. (2022). Assessment of hearing-related lifestyle based on the common sound scenarios framework. American Journal of Audiology, 31(4), 1299–1311. 10.1044/2022_AJA-22-00079 [DOI] [PubMed] [Google Scholar]
- Lelic D., Wolters F., Schinkel-Bielefeld N. (2024). Measuring hearing aid satisfaction in everyday listening situations: Retrospective and in-situ assessments complement each other. Journal of the American Academy of Audiology, 35(01/02), 30–39. 10.1055/a-2265-9418 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rintala A., Wampers M., Lafit G., Myin-Germeys I., Viechtbauer W. (2023). Perceived disturbance and predictors thereof in studies using the experience sampling method. Current Psychology, 42(8), 6287–6301. 10.1007/s12144-021-01974-3 [DOI] [Google Scholar]
- Schinkel-Bielefeld N., Burke L., Holube I., Iankilevitch M., Jenstad L. M., Lelic D., Naylor G., Singh G., Smeds K., von Gablenz P., Wolters F., Wu Y.-H. (2024). Implementing ecological momentary assessment in audiological research: Opportunities and challenges. American Journal of Audiology, 33(3), 648–673. 10.1044/2024_AJA-23-00249 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schinkel-Bielefeld N., Kunz P., Zutz A., Buder B. (2020). Evaluation of hearing aids in everyday life using ecological momentary assessment: What situations are we missing? American Journal of Audiology, 29(3S), 591–609. 10.1044/2020_AJA-19-00075 [DOI] [PubMed] [Google Scholar]
- Schinkel-Bielefeld N., Ritslev J., Lelic D. (2023). Reasons for ceiling ratings in real-life evaluations of hearing aids: The relationship between SNR and hearing aid ratings . Frontiers in Digital Health, 5, 1134490. 10.3389/fdgth.2023.1134490 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shiffman S., Stone A. A., Hufford M. R. (2008). Ecological momentary assessment. Annual Review of Clinical Psychology, 4(1), 1–32. 10.1146/annurev.clinpsy.3.022806.091415 [DOI] [PubMed] [Google Scholar]
- Smeds K., Dahlquist M., Larsson J., Herrlin P., Wolters F. (2019). LEAP, a new laboratory test for evaluating auditory preference. In Proceedings of the 23rd international congress on acoustics, aachen (pp. 7608–7615). German Acoustical Society. [Google Scholar]
- Smeds K., Gotowiec S., Wolters F., Herrlin P., Larsson J., Dahlquist M. (2020). Selecting scenarios for hearing-related laboratory testing. Ear and Hearing, 41, 20S–30S. 10.1097/AUD.0000000000000930 [DOI] [PubMed] [Google Scholar]
- Smeds K., Wolters F., Rung M. (2015). Estimation of signal-to-noise ratios in realistic sound scenarios. Journal of the American Academy of Audiology, 26(2), 183–196. 10.3766/jaaa.26.2.7 [DOI] [PubMed] [Google Scholar]
- Stone A. A., Schneider S., Smyth J. M. (2023). Evaluation of pressing issues in ecological momentary assessment. Annual Review of Clinical Psychology, 19, 107–131. 10.1146/annurev-clinpsy-080921-083128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Timmer B. H., Hickson L., Launer S. (2017). Ecological momentary assessment: Feasibility, construct validity, and future applications. American Journal of Audiology, 26(3S), 436–442. 10.1044/2017_AJA-16-0126 [DOI] [PubMed] [Google Scholar]
- Timmer B. H., Hickson L., Launer S. (2018). Do hearing aids address real-world hearing difficulties for adults with mild hearing impairment? Results from a pilot study using ecological momentary assessment. Trends in Hearing, 22, 1–15. 10.1177/2331216518783608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Gablenz P., Kowalk U., Bitzer J., Meis M., Holube I. (2021). Individual hearing aid benefit in real life evaluated using ecological momentary assessment. Trends in Hearing, 25, 1–18. 10.1177/2331216521990288 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagener K., Hansen M., Ludvigsen C. (2008). Recording and classification of the acoustic environment of hearing aid users. Journal of the American Academy of Audiology, 19(4), 348–370. 10.3766/jaaa.19.4.7 [DOI] [PubMed] [Google Scholar]
- Wolff A., Houmøller S. S., Tsai L.-T., Hougaard D. D., Gaihede M., Hammershøi D., Schmidt J. H. (2024). The effect of hearing aid treatment on health-related quality of life in older adults with hearing loss. International Journal of Audiology, 63(7), 500–509. 10.1080/14992027.2023.2218994 [DOI] [PubMed] [Google Scholar]
- Wolters F., Smeds K., Schmidt E., Christensen E. K., Norup C. (2016). Common sound scenarios: A context-driven categorization of everyday sound environments for application in hearing-device research. Journal of the American Academy of Audiology, 27(7), 527–540. 10.3766/jaaa.15105 [DOI] [PubMed] [Google Scholar]
- Wu Y.-H., Bentler R. A. (2012). Do older adults have social lifestyles that place fewer demands on hearing? Journal of the American Academy of Audiology, 23(9), 697–711. 10.3766/jaaa.23.9.4 [DOI] [PubMed] [Google Scholar]
- Wu Y.-H., Stangl E., Chipara O., Hasan S. S., DeVries S., Oleson J. (2019). Efficacy and effectiveness of advanced hearing aid directional and noise reduction technologies for older adults with mild to moderate hearing loss. Ear and Hearing, 40(4), 805–822. 10.1097/AUD.0000000000000672 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y.-H., Stangl E., Chipara O., Hasan S. S., Welhaven A., Oleson J. (2018). Characteristics of real-world signal to noise ratios and speech listening situations of older adults with mild to moderate hearing loss. Ear and Hearing, 39(2), 293–304. 10.1097/AUD.0000000000000486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y.-H., Stangl E., Chipara O., Zhang X. (2020a). Test-retest reliability of ecological momentary assessment in audiology research. Journal of the American Academy of Audiology, 31(8), 599–612. 10.1055/s-0040-1717066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y.-H., Stangl E., Oleson J., Caraher K., Dunn C. C. (2021). Personal characteristics associated with ecological momentary assessment compliance in adult cochlear implant candidates and users. Journal of the American Academy of Audiology, 33(3), 158–169. 10.1055/a-1674-0060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y.-H., Xu J., Stangl E., Pentony S., Vyas D., Chipara O., Gudjonsdottir A., Oleson J., Galster J. (2020b). Why ecological momentary assessment surveys go incomplete?: When it happens and how it impacts data. Journal of the American Academy of Audiology, 32(1), 16–26. 10.1055/s-0040-1719135 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material, sj-docx-1-tia-10.1177_23312165261421698 for Beyond the Moment: How EMA Reporting Periods Affect Sampled Situations and Sensitivity to Hearing Aid Differences by Petra von Gablenz, Inga Holube and Nadja Schinkel-Bielefeld in Trends in Hearing





