Effective Design for Experiments on Small-Group Conversation: Insights From an Example Study

Raluca Nicoras; Bryony Buck; Rosa-Linde Fischer; Matthew Godfrey; Lauren V Hadley; Karolina Smeds; Graham Naylor

doi:10.1044/2025_AJA-24-00226

. 2025 Apr 7;34(2):305–320. doi: 10.1044/2025_AJA-24-00226

Effective Design for Experiments on Small-Group Conversation: Insights From an Example Study

Raluca Nicoras ^a,^✉, Bryony Buck ^a, Rosa-Linde Fischer ^b, Matthew Godfrey ^a, Lauren V Hadley ^a, Karolina Smeds ^a,^c, Graham Naylor ^a

PMCID: PMC12151282 PMID: 40194778

Abstract

With current advances in experimental techniques, there is a renewed interest in studying communication behavior, reflecting a desire to improve our understanding of hearing disability and the effects of treatment interventions at the level of in-the-moment behaviors. Group conversations are among the most challenging situations for people with hearing loss. Experiments on group conversations are increasingly common and disproportionately more demanding than dyad studies to design and execute. Thorough design and planning are critical for successfully capturing valid behavioral data, highlighting the value of sharing behind-the-scenes experiences with the researcher community.

We have completed a laboratory study of four-way group conversations involving people with and without hearing loss. This article describes the goals and compromises involved in our design choices and evaluates their effectiveness through participant feedback. Aspects covered include contrasts and covariates, group composition and physical arrangement, participant characteristics, hearing devices, participant experience, physical environment, conversational task, and measurement modalities. Next, we briefly describe the experiment's execution. Finally, we analyze and discuss participants' feedback and reflect on what proved effective, what did not, and what design “worries” proved founded or unfounded.

We hope thereby to provide support and inspiration for others who may be faced with similar design challenges. The main message is that such an experiment can be carried out successfully and in such a way that the behavioral and self-report data collected are likely to carry a relatively high degree of ecological validity while still supporting experimental and statistical control.

Group conversations fulfill our need for social connections and enhance our sense of community. Whether it is a family dinner-table discussion, a business meeting, or a group therapy session, a group conversation is a complex dynamic system. Group conversations are of interest across many research areas, including sociology, psychology, linguistics, and computer science. In hearing science, we wish to improve our understanding of hearing disability through analyzing and interpreting the effects of hearing loss and treatment interventions on in-the-moment conversation success and behavior.

Previous research has largely focused on conversations between two people, although many conversations involve more than two interlocutors (Dunbar et al., 1995; Peperkoorn et al., 2020). Acknowledging that group conversations, especially in noisy backgrounds, are among the most challenging situations for people with hearing loss (PwHL; Vas et al., 2017), increasing numbers of research groups are undertaking experiments to study moment-to-moment behaviors in such conversations (Hadley et al., 2021; Örnolfsson et al., 2023; Petersen, 2024). Interactions involving more than two people introduce new complexities and affordances: less predictable speaker sequences, more competition for the floor, and a larger repertoire of behaviors (due to additional possible conversational roles). Focusing solely on dyads cannot illuminate all relevant aspects of conversational interaction, and understanding hearing disability as it occurs in group conversations is not an extrapolation from the dyad case.

Moving from dyads to groups, experiments become disproportionately more complex to design and execute, including the demands on technology, researchers, space, time, and logistics. As group size increases, the number of possible interactive effects inflates drastically. Group participants may perceive greater social pressure to conform (Bond, 2005) posing a potential ethical issue, as individuals may be reluctant to withdraw consent or request a break. These challenges highlight the importance of thorough design, planning, and execution for successful experiments on group conversations.

Purpose of This Article

Given the high cost in terms of execution, participant burden, and novel ethical considerations, we see it as important to share our behind-the-scenes story for the benefit of other researchers. Typically, articles only report the final experimental design. However, valuable information can be missed when the design process is not disclosed. This article is an attempt to describe the goals and compromises involved in our choices and to evaluate their effectiveness. We hope thereby to provide a useful reference for others who may be faced with similar design challenges.

We first detail our study objectives and then elaborate on the compromises involved in designing our experiment. Next, we briefly describe the experiment's execution. Finally, we examine participants' postsession feedback and reflect on our insights from what proved effective, what did not, and what concerns proved founded or unfounded. Detailed analyses and interpretations of the results of our study will be reported in later publications.

Design Goals and Compromises

Research Objectives and Scope

Our research objective was to illuminate differences in communication behaviors and perceptions of conversation success between people with normal hearing (PwNH) and PwHL, during group conversations held in different levels of adversity. We chose to manipulate noise level and hearing aid (HA) use as forms of listening adversity, assessing the effect of manipulating these by collecting behavioral data via video, audio, and three-dimensional (3D) motion capture, alongside self-reports of conversation success. By recording both behavioral and subjective data, we aimed for a multifaceted understanding of the participants' experiences. Full technical specifications of the experiment are omitted from this section in favor of issues of principle.

Overarching Design Goals

Defining conditions, variables, and experimental manipulations of interest is not straightforward. Our final experimental design was the culmination of a series of linked decisions concerning experimental confounds and covariates. This section describes the design goals and the compromises that were considered and integrated to promote the best possible ecological validity of outcomes (Keidser et al., 2020) within our research aims and tangible constraints. To achieve this, the participants should exhibit behavior as close as possible to that which they would exhibit in a corresponding real-life situation. This implies that the participants experienced sufficient realism in diverse aspects of the experimental situation—despite knowing that it is an artificial experiment. Furthermore, it implies that they found the experimental encounter as such acceptable, that is, not eliciting affective reactions that might distort their behavior.

The extent to which the two driving factors of Realism and Acceptability were satisfactorily fulfilled cannot be deduced by examining the objective behavioral outcomes themselves but have to be assessed by other means. The only means available is participant report after the fact, and this is the approach taken in this article. Nevertheless, we did not blindly design the experiment and hope for the best. We preemptively considered design factors with respect to their likely effects on Realism and Acceptability, based on existing evidence and logic. Ideal choices could not be made on all design factors simultaneously, and hence, we also considered the joint effects of various compromises.

All the decisions described below serve the goals of Realism and Acceptability, either directly or indirectly. Whether these goals were fulfilled is considered in the Results and Discussion sections.

Contrasts, Covariates, and Overall Factorial Design

In group conversation studies, the number of between-participant covariates potentially influencing outcomes increases with the level of ambition with respect to empirical realism. Covariates must be either controlled as explicit contrasts, minimized through counterbalancing or participant inclusion/exclusion criteria, measured and accounted for statistically, or simply acknowledged as risk factors. Immediately obvious covariates that could plausibly affect outcomes, and which we decided to control or balance at recruitment time, were age, hearing ability, and gender.

Potential within-participant contrasts that could plausibly affect perceived conversation success and behaviors include environment (e.g., noise, reverberation, positioning in space), HA operation (e.g., types of HAs/different programs), and conversation type (e.g., relational vs. transactional; Pichora-Fuller et al., 1998; Yeomans et al., 2022). Relational conversations, often occurring between close family and friends, foster bonding and enhance relational satisfaction through activities like joking or sharing personal thoughts. In contrast, transactional conversations are goal oriented, typically work related, and aimed at achieving specific objectives.

Substantial literature confirms that noise and spatial arrangement have major effects on conversational adversity and coping behaviors. As this experiment took place during a time of “social distancing” due to the COVID-19 pandemic, we were constrained to relatively fixed participant positioning, leaving environmental noise, conversation type, and HA operation/use as the primary factors available for manipulation.

Considering our abovementioned research objectives, it was decided to categorize participants by hearing ability (PwNH and PwHL) and manipulate adversity across conversations by varying the noise environment and the wearing of HAs (PwHL aided vs. PwHL unaided). This reduced the possible design considerations to the following, to be detailed in the upcoming sections:

- Group composition (age, gender, hearing ability), size, and physical arrangement
- Characteristics of PwHL
- Specifications of HAs
- Participant experience
- Physical environment (acoustic and visual)
- Conversational task

Group Composition, Size, and Physical Arrangement

Decisions concerning the composition of the conversational group are crucial for a study of this sort. An early decision of principle, with ramifications for many other decisions, was to combine PwNH and PwHL in each conversational group. Some behavioral accommodations are known to occur in PwNH when conversing with PwHL (e.g., Beechey et al., 2020; Hazan & Baker, 2011), and we wished to be able to observe any such mutual influences.

Another early decision was to omit a “confederate” or “facilitator” role from the group. A covert confederate would be hard to keep secret over a 90-min session, and suspicion of a confederate among some or all the real participants would undermine the face validity of the experiment. In any case, it was unclear what benefit a confederate or facilitator might bring. For instance, the objective was not to keep talking at all costs; if adversity caused people to stop talking, that would itself comprise informative data.

Concerning the age range(s) to include, given that hearing loss is most prevalent among older adults and that older adults most frequently communicate with others in the age range between their own age and midlife (Ajrouch et al., 2005), we decided to focus exclusively on older adults. This also avoids issues of over- and under-accommodation, stereotyping behaviors, and serial monologuing that occur in intergenerational communication (Samter, 2003).

Turning to the size of the group, the primary balance to be struck was between a size large enough to accommodate group-like behaviors but small enough to be practical in execution. It has been argued that however large a gathering of people may be, a conversation of mutual (rather than one to many) character rarely encompasses more than four people (Krems & Wilkes, 2019). We could not find arguments in favor of a group size greater than four, strong enough to outweigh the increased logistical cost. The choice remained between three and four participants; we chose four for the following reasons:

A four-way conversation can fragment (“schisming”; Egbert, 1997) into two two-way conversations, a phenomenon that might be more prevalent with PwHL, and in adverse conditions.
Withdrawal behaviors are believed to be common among PwHL (e.g., Bennett et al., 2022). A four-way conversation is more “permissive” of withdrawal than a three-way one, rendering any tendency to withdraw more detectable.
We wished to combine PwNH and PwHL. In a group of three, there would be an imbalance. Having a majority of PwHL in a group would not be representative of the most common everyday situations. With PwNH in the majority, the adversity would fall disproportionately on the lone person with hearing loss, perhaps aggravating any feelings of anxiety and estrangement. Although this may reflect the experiences of PwHL in real life, we considered it unacceptable from an ethical perspective. A group of four, with two PwNH and two PwHL, ensures that no individual is left feeling alone with their hearing difficulty.
A group composed of two PwHL (one male, one female) and two PwNH (ditto) would allow us to observe potential in-group behaviors, such as schisming by hearing ability or gender, in-group side comments, or notable differences between in-groups in perceived conversation success.

Henceforth, a conversational group in the experiment is referred to as a “quartet” to avoid confusion with PwNH versus PwHL as a grouping variable. Concerning the physical arrangement, there are two ways of arranging 2 × PwNH +2 × PwHL in a circle, either with alike hearing and differing gender opposite each other or alike gender and differing hearing opposite each other. We chose the latter, as it made it possible for all participants to establish a subconversation with a neighbor of either hearing type.

Some aspects of conversation behavior are known to differ, depending on whether the interlocutors are mutually acquainted (Templeton et al., 2023). Given our desire to recruit quartets composed as described above, it would be impractical to also require mutual familiarity; therefore, we chose to recruit previously unacquainted participants. We can only note that the outcomes may not be generalizable to groups of familiar interlocutors.

Characteristics of the Group of Participants With Hearing Loss

As we were particularly focused on the effect of hearing loss, we considered many aspects of this for both group composition (above) and experimental conditions (here and below). The severity of the hearing loss of PwHL may be expected to have a substantial effect on outcomes, but recruiting PwHL with tightly matched hearing losses would be impractical. Our chosen approach was (a) to specify a limited range of eligible hearing acuities for the PwHL, covering moderate to severe hearing losses, and (b) to obtain distinct hearing groups (PwNH vs. PwHL) by ensuring a sizeable gap in hearing ability between the worst hearing PwNH and the best hearing PwHL.

We wished to contrast outcomes when PwHL were versus were not wearing HAs. This required decisions about PwHL inclusion criteria concerning everyday HA use. We identified habitual use of HAs and duration of this use to be important factors to consider. Non-users would be likely to behave quite differently from users, when required to wear HAs (they are non-users for a reason). For people who had only recently become hearing aid users, the intensive experience of the experiment itself might disturb their process of acclimatization or, in the extreme, put them off using their HAs. Either outcome would be ethically questionable. Since experienced and habitual users would be unlikely to suffer such disturbance, and they would presumably possess well-established patterns of communication behaviors, we solely recruited PwHL from that population.

A further aspect of HA use, which could generate significant confounds, is whether a participant used an HA in one or both ears, and if in one, on which side. It is easy to see that a participant with unilateral hearing loss and/or amplification is likely to exhibit behaviors (e.g., head rotation) quite different from those of someone with bilateral hearing loss and amplification. Although comparisons between unilateral and bilateral HA users are undoubtedly interesting, including both groups would greatly complicate analysis and interpretation. A comparison between users of unilateral and bilateral HAs was deemed to be better served as a topic for future research. As such, we limited recruitment to people with roughly symmetrical hearing loss and bilateral fittings.

HAs

There were two design decisions to be made regarding HAs, arising from the requirement for PwHL to be receiving HA amplification in some experimental conditions. First, should they wear their own aids, or should we provide aids with amplification prescribed according to the severity of the participant's hearing loss? Arguments in favor of using their own aids were (a) that the participant would be fully accustomed to the sound and presumably would have integrated knowledge of how the devices behave into their own behavioral patterns and (b) that no experimenter-led adjustments would have to be made (e.g., to eliminate feedback) that might disturb the participant(s). If experimenter-provided amplification were used, we would have confidence that all participants were receiving clinically appropriate amplification. On balance, we had no reason to suppose that the participants' own aids were providing substantially inappropriate amplification since they were in daily use. Therefore, we decided that participants should wear their own HAs.

The second decision was whether to allow participants to make adjustments themselves during the session, for example, changing the volume control setting. Almost all modern HAs provide adaptive amplification, so there should be little need for volume control changes in typical environments. In addition, it would be ethically unsound to insist on a participant enduring discomfort, should it occur, by not allowing manual adjustments in the moment. Finally, adjusting the controls is a monitorable behavior, that is, salient data. For these reasons, we decided to advise participants that they could make adjustments to their devices as they saw fit.

The Participant Experience

To promote the aforementioned goals of Realism and Acceptability, the participants' likely experience of the experiment must be carefully considered. To promote Realism, the task and environment should be designed to possess sufficient face validity and intrinsic engagement that the participants behave as spontaneously as possible, and the situations presented in the experiment should be representative of the participants' real-life experiences. To promote Acceptability, participants should not feel pressured to please the experimenter. They should also feel comfortable in their interactions with other participants, not pressured to conform or reveal personal information, and without feeling excessively anxious.

It is likely that the behavior of individual participants would depend on the extent to which members of a quartet were aware of each other's hearing and/or amplification status. Logistically, we could not readily control whether they met and talked among themselves in the waiting area before the session. Furthermore, counterbalancing of experimental conditions would mean that participants in different quartets might reveal or become aware of each other's hearing status at different points within the session. Additionally, any stigma-related behavior by PwHL (e.g., trying to conceal their hearing loss), while valid in itself, would be likely to fail in uncontrolled ways over the course of a session. All in all, it was clear that the best course would be to ensure that participants were aware of each other's hearing status from the start.

Physical Environment

Laboratory studies are created with the goal of controlling critical variables that are uncontrollable outside of the lab. In an experiment concerned with hearing, this inevitably means controlling the acoustical environment. As this experiment was about group communication, we were obliged to consider visual and spatial parameters as well. The goal when designing the environment was not to fool participants that they were somewhere other than in the lab but to facilitate a sufficient “suspension of disbelief” (Böcking, 2008), so they engaged in the task in a realistic manner.

We could not allow participants ambulatory freedom or very close mutual proximity but wished to achieve at least a modicum of realism in a four-way conversation scenario. To satisfy these constraints, we chose to create an environment approximating a café or small function room with respect to acoustics and mutual placement of interlocutors by using multitalker babble (detailed later) as the background sound and seating the participants equidistantly around a circular table. We could be confident that low and high levels of noise would provide substantially differing levels of adversity for most, if not all, participants. In order to elucidate the transition between these states, we added a third, middle noise level. In addition to the babble noise, the soundscape could have included moving sources, transient sounds (such as crockery, cutlery or machinery), and/or music for added realism. However, while this would be unlikely to overcome the visual and contextual reminders signaling the actual situation, it might well add variance to the observed behaviors that would be difficult to attribute to any specific variable. Therefore, we decided against such additions. No attempt was made to simulate a visual setting other than the actual one—four people seated around a table in a typical well-lit sound-attenuated listening laboratory.

Conversational Task

The conversational task plays a core role in the experimental design. It determines many aspects of behavioral constraint and freedom, influences participants' interpretation of the experimental situation and thereby the naturalness of their behavior and constrains when and how outcome measures may be obtained. In the current study, our goals for the task were (a) to afford all participants a priori equal opportunity to engage in the conversation; (b) to provide a task resembling an everyday conversation with minimum theatrical “props,” so as to increase its face validity for participants and thus maximize the likelihood of natural behavior; and (c) to provide a basis for meaningful participant judgments of various aspects of the conversation. In order to afford all participants an equal basis for contributing to the conversation, it must not require or favor the possession of specific prior knowledge, since the disclosure of a certain amount of such knowledge will close the conversation. Tasks involving physical manipulation of objects elicit a lot of body/head movement and gaze trajectories that are not related to the communication. Puzzle-solving tasks, for example, spot-the-difference (Baker & Hazan, 2011) or tangram (Beechey et al., 2019), typically also require substantial amounts of eye gaze away from interlocutors. Thus, these kinds of tasks were not ideal for our purposes. Another goal was to avoid any potential controversy in which participants might be personally invested or at loggerheads (e.g., religion, politics).

Taking all the above into consideration, our participants' task was to converse freely, starting from a provided scenario prompt shaped such as to encourage discussion toward a consensus, but without any pressure to actually conclude. The conversation prompt itself may subtly encourage, more or less, exchange among the group (e.g., “Discuss how the area you live in has changed over the years” will tend to favor long monologues compared to “You have been granted three wishes by a genie. Decide as a group what your three wishes will be”; Buck et al., 2022). In order to minimize any confound arising from the interaction of the specific prompt used in a given conversation and the conditions (noise, aiding) during that conversation, we created six prompts (e.g., “Plan a dinner party, using only food and drink that you all dislike”; complete list in Appendix A) designed to encourage a consensus discussion derived from items used or suggested in previous studies in the literature.

Although light-hearted consensus scenarios limit the likelihood of heated controversy, they present hypothetical situations that are not consequential for participants and which they might not talk about in their daily life. Participants might therefore digress into a meta-conversation (“Why are we talking about this?”) or a different topic entirely. However, this is immaterial for our purposes. Scenarios were deliberately developed to encourage light-hearted engagement rather than serious goal-oriented (potentially competitive) performance. We expected the resulting conversations to comprise a mix of relational and transactional characteristics (Yeomans et al., 2022; for details, see the “Contrasts, Covariates, and Overall Factorial Design” section).

The duration of the conversation task should be sufficient for participants to adjust to each new experimental condition and respond to it, both in behavior and in self-report, bearing in mind that conversation involves continuous ebb and flow. As such, very short conversations were deemed unlikely to provide representative outcome data. A perception of time pressure among participants could affect behavior. On the other hand, conversations should not be so long that participants either run out of things to say or experience boredom or excessive fatigue. Addressing these issues, we chose to cut conversations off after a specified time rather than letting them run on for as long as they could. We originally specified 8 min per conversation, but pilot testing indicated that this was too long, as some conversation topics “ran dry,” and the overall duration of the full experiment session became fatiguing. The duration of each conversation was therefore set to 6 min.

Experimental Design: Summary

Following the decisions detailed above, the factors and conditions remaining in our design were as follows. The experiment was to involve four participants at a time, one male with normal hearing, one male with hearing loss, one female with normal hearing, and one female with hearing loss. Quartets would be instructed to engage in 6-min conversations using consensus-based prompts. The adversity of the situation would differ for each conversation by contrasting whether PwHL were using their (own, bilateral) HAs or not and the background noise level. The overall 2 × 2 × 3 factorial design thus encompassed hearing ability (PwNH/PwHL), HA status (aided/unaided), and background noise level (low/mid/high). The latter two contrasts were repeated measures that might be susceptible to order effects due to familiarization and/or fatigue. Pseudo-counterbalancing across quartets was applied to these contrasts and to the conversation topics by means of a Greco-Latin square. This required 12 quartets for its completion.

Measurement of Outcomes and Behavior

Self-Reports of Conversation Success

A participant's perception of “how well it is going” may vary across time, triggered by events in the conversation. In order to facilitate identification of such events, we wished to include some form of self-report actionable during the conversation itself. An ideal implementation would be (a) minimally disruptive to the conversation behavior of the individual participant making the self-report, (b) minimally likely to influence other participants, (c) available to the participant at all times, (d) time-aligned with other recordings, and (e) made use of to a meaningful extent by participants.

Based on the above, we decided against pen-and-paper notes, calling out of ratings, video recording of hand signals, and periodic interruptions to collect ratings. We decided to use a visual analogue slider scale (Louven et al., 2022) on a small touch screen at table height in front of each participant. This solution is not perfect. Obviously, a participant cannot make a self-report without it interfering to some extent with their conversation behavior. Secreting response devices below the table might reduce the risk of distracting other participants but would make responding itself more demanding and error prone. Furthermore, one participant's visible interaction with their response slider might encourage spurious responses from others.

Participants could tap or drag the slider to a new value whenever they wished during a conversation. We provided only one scale, as shown in Figure 1. The slider was continually visible to each participant during a conversation, and collection of in-the-moment ratings relied on the participants themselves remembering to interact with the screen. Participants were instructed to respond based on their own subjective feeling of success, without being prompted to consider specific criteria. They were encouraged to focus on how successful the conversation felt for them personally, rather than for the group as a whole.

The image displays a visual rating scale for the question, How would you rate this conversation? The color on the rating scale changes from red on the left, to yellow in the middle and green on the right. Red represents Not at all successful and green represents very successful. A slider shaped in the form of a hand icon is over the yellow region. — The slider displayed on the screen as seen by participants during the experiment. Participants could tap or drag the cursor to a new point on the slider to represent their perceived change in conversation success. Leftward: decreased perceived success; rightward: increased perceived success.

In order to obtain a more nuanced but less time-varying view, a postconversation rating task was also included. Participants were presented with an 11-item survey (see Appendix B) after each conversation, each item accompanied by a 5-point Likert scale, one item per screen. Items were primarily derived from the conversation success factors found by Nicoras et al. (2022).

Recording of Behavior

Conversation is a multimodal activity (Holler & Levinson, 2019), and as such we should expect responses to adversity in conversation also to be multimodal. Counterbalancing the urge to capture everything is the risk that the machinery of capture may substantially impact the behaviors being captured. Our decisions were driven by the desire to strike a satisfactory balance between collecting as rich a behavioral data set as possible and risking that the recording paraphernalia affected participants' behaviors. The practicalities of mounting, calibrating, and checking multiple body-mounted devices on four participants also played a significant role. We considered a wide variety of behavioral recording methods and modalities, including audio, 2D video, 3D motion capture, eye tracking, and pupillometry.

To capture participants' speech in such a way that diarization of individual participants' speech would be possible even against high-level background noise, we used head-mounted directional microphones close to the mouth. An omnidirectional microphone in the center of the table enabled relative level calibration of each participant's voice audio.

2D video cameras would capture visible, nonverbal behaviors. Four long-focal-length cameras captured individual participants' faces, and a wide-angle camera captured the scene as a whole.

3D infrared motion tracking would allow high-resolution investigation of participants' movement patterns. Using retroreflective markers placed on participants' heads would provide head position and orientation. Although markers across the upper body of participants would allow us to record gestures, we decided to include only head markers to reduce the equipment burden on participants and time needed for calibration.

Many of our elderly participants would be wearing spectacles, and half of them would also have to insert and remove HAs, so we decided not to use head-mounted glasses for recording eye gaze and/or pupillometry. This omission could be partially compensated for by estimating gaze direction from head orientation (Hládek et al., 2019; Lu & Brimijoin, 2022). 2D video recordings could be used to identify hand and face gestures (Mahmoud & Robinson, 2011).

Experiment Execution

This section describes how the experiment was carried out, as the instantiation of the many choices described in previous sections. Technical details are limited to those necessary to assess the participants' experience. Details sufficient for replication will be made available on open science framework platform (osf.io).

Participants

Inclusion criteria for all participants were

Age 50–75 years
Fluent English speaker
Self-report of no medical condition (other than hearing loss) impacting communication
Self-report of normal or corrected to normal visual acuity
Self-report of no concerns about transmission of the coronavirus due to participating
Left–right asymmetry of four-frequency (0.5, 1, 2, and 4 kHz) pure-tone average hearing threshold (4F-PTA) < 10 dB HL.

In addition, for the PwNH:

Worse-ear 4F-PTA < 15 dB HL.

In addition, for the PwHL:

Better ear 4F-PTA of 40–65 dB HL.
Experienced (> 6 months) and habitual (> 2 hr/day) user of bilateral HAs.

Recruitment and scheduling difficulties forced us to include a few participants who fell slightly outside these ranges on one or more criteria. This resulted in the age range being expanded to 50–78 years, the worse ear 4F-PTA limit for PwNH extended to 28.75 dB HL, the better ear 4F-PTA range for PwHL expanded to 35–65 dB HL, and the 4F-PTA asymmetry range expanded to 15 dB HL. One participant had only 3 months' experience of HA use.

The final sample consisted of 72 older adults (M_age = 67.9 years, SD = 7.5; 39 females) who completed the experiment. Half of the participants (n = 36) were classified as PwNH, with a better ear 4F-PTA of 13.1 dB HL (ranging from −2.5 to 28.7 dB HL), and none were HA users. The other half (n = 36) were classified as PwHL, with a mean better ear 4F-PTA of 50.1 dB HL (ranging from 35 to 65 dB HL), and all were experienced bilateral HA users. Participants were unacquainted, fluent in English, and residents of Glasgow, United Kingdom.

Quartets

Eighteen quartet sessions (“Session 2”; see Table 1) generated complete or near-complete data; however, not all of these could be composed of the desired gender match (five had a 3:1 gender distribution). Participants were invited to attend two sessions and were paid £10 plus travel expenses for their time for each session. All participants gave their written consent for participation in the study. This research received ethical approval from the West of Scotland Research Ethics Committee (18/WS/0007) and the National Health Service Greater Glasgow and Clyde R&D (GN18EN094).

Table 1.

Details of the first and second sessions.

Outline	Session 1	Session 2
Duration	40–60 min	90–120 min
Type	Eligibility and preparation: Individual	Main experiment: Group conversation
Chronological events	Informed consent	Informed consent
	Hearing test (audiometry)	Introduction to group and briefing
	5× questionnaires	Setup, calibration, practice with slider and questions
	Experiment familiarization (introduction to testing room, practice using head crowns, mics and self-report slider, opportunity for questions)	Experiment Instructions given before each conversation
		Block 1 (3× conversations)	Comfort break	Block 2 (3× conversations)
	Payment	Debriefing, payment, model-release consent

Open in a new tab

Experimental Setup

Figure 2 illustrates the experimental setup. Four participants were seated equidistantly around a circular table (1-m diameter) in a sound-deadened and attenuated room (4.3 m × 4.6 m × 2.6 m). Chairs (centers) were positioned 2 m from the opposite chair. Four loudspeakers playing background sounds were situated in a square equidistant around the table, at a height of 1.4 m, 2.5 m from the table center, and 45° offset between the participants. In addition, four “dummy” loudspeakers were present, providing visually plausible locations for background sound sources but actually acting as video-camera stands. Response tablets were fixed to the table in front of each participant.

Background Sounds

Background sound (if present) was presented continuously at a constant level for the 6 min of each conversation, including 10 s of audio ramp-up and ramp-down applied to avoid startling participants and to allow HA amplification to adapt. The three background noise levels (equal at all participant positions) were low (30 dBA, loudspeakers muted, in-room equipment noise only), medium (54 dBA babble), and high (72 dBA babble). A total of 30 dBA would be inaudible for most listeners with at least moderate hearing loss; 54 dBA corresponds to a populated but not too noisy social environment; 72 dBA corresponds to a typical restaurant at medium occupancy (To & Chung, 2015). Background sound was multitalker babble, constructed as follows. The signal for each loudspeaker consisted of two male–female voice pairs. A total of 2-dB difference was applied between the pairs to simulate a closer and a more distant pair. Each loudspeaker played a different instance of such a mix. The overall result was thus 16-talker babble, with some background talkers relatively close and others more distant, spatially distributed fairly evenly around the quartet.

Procedure

Members of the departmental participant pool whose archived data met inclusion criteria for PwNH or PwHL were invited to participate. Table 1 gives a breakdown of the two sessions.

Introductory Individual Assessment (Session 1)

Participants attended Session 1 individually and were briefed as to the nature of the two-session experiment. After giving consent to take part, participants' audiograms were measured, and they completed a battery of five short questionnaires, the purposes of which were to provide data on potentially influential covariates such as individuals' own assessment of their hearing ability, autistic traits, or depressive states. Participants who still met the audiometric inclusion criteria after assessment were shown the experimental setup, given the opportunity to practice using the equipment, and invited to return as part of a quartet for the second experimental session. When matching participants to form quartets, we prioritized gender balance and age (aiming for a maximum 10-year range) over audiometric data (already controlled by the PwNH and PwHL recruitment criteria).

Group Experiment (Session 2)

Session 2 comprised two blocks of three 6-min conversations. Quartet members were seated according to hearing ability and gender, as shown in Figure 2. Participants were introduced to one another, given an overview of the session, and refamiliarized with the recording setup. Written consent was obtained, and participants were reminded of their right to withdraw at any time. Participants often disclosed their hearing status as soon as the quartet were together, typically instigated by a PwHL asking if others were also hard of hearing. Regardless, we inquired whether those with HAs had brought them with them and whether they were working properly. Thus, the hearing status of each participant was made apparent to all others.

A 2-min practice conversation was held, with 1 min in 30-dBA background noise and 1 min in 54 dBA, using a topic prompt not included in the main conversations (see Appendix A). PwHL wore their HAs for this conversation. The purpose was to give participants an opportunity to talk to one another directly and become accustomed to the surroundings and equipment. Any obvious procedural misunderstandings were addressed before proceeding.

The quartet then engaged in six conversations, split over two equal blocks. In one block, PwHL used their HAs; in the other, they did not (order was counterbalanced across quartets). Each block contained all three noise conditions, with the order counterbalanced across quartets and aiding conditions. The two blocks were separated by a brief comfort break. Before each conversation, one of six conversation prompts (see Appendix A) was read out by an experimenter and displayed on participants' response tablets. Participants were asked to discuss the topic and come to a consensus that best suited the quartet. Quartets were instructed to continue the conversation, if possible, for the full 6 min, and not to worry if the conversation drifted off-topic.

Throughout each conversation, the question “How would you rate this conversation?” was displayed on the touch-screen tablet in front of each participant. Participants could respond by dragging or tapping the slider position anywhere between left (“not at all successful”) and right (“very successful”; see Figure 1) without being prompted to consider specific criteria. After each conversation, participants completed an 11-item retrospective rating task (see Appendix B) using the tablet.

On completion of the second block, participants were debriefed collectively and given the opportunity to ask questions and provide feedback. Participants were also invited for an individual debrief, to be conducted either by phone or e-mail, depending on their preference.

Results

The experimental sessions generated 10.8 hr of multimodal time-aligned recordings, including audio, video, motion capture, and continuous slider feedback from participants, gathered from 18 quartets conversing in six different conditions (108 six-minute conversations). Each conversation also generated four participants' responses to the 11-item rating task, and debriefings generated textual notes and transcriptions. First, we examine these data to assess whether our design goals were achieved and then briefly highlight participant behaviors whose rare occurrence may provide insight.

Evidence of Achievement of Design Goals

Drawing on evidence from qualitative debriefing data, here, we address the overarching goals of Realism and Acceptability as follows:

Realism
1. Did participants feel intrinsic engagement with the conversation task?
2. Did they feel that the experimental conditions resembled real-life situations?
3. Did the scenario and task facilitate suspension of disbelief in participants?
Acceptability
1. Did participants feel pressured to please the experimenters?
2. Did participants feel comfortable in their interactions with other participants?

We conducted an exploratory thematic analysis of the transcriptions of the quartet (n = 18) and individual (n = 32) debriefing sessions (Braun & Clarke, 2006; see Table 2). We used a deductive approach, coding material through the lenses of the two goals. For completeness, we examined whether participants' feedback about the experiment was related to their ratings of conversation success, but no meaningful relationship was identified.

Table 2.

Demographic details of participants for group and individual debriefs, including gender, age, hearing status, and quartets represented.

Characteristic		Group debrief	Individual debrief
Gender	Male	33	15
Gender	Female	39	17
Age (SD)		67.8 (7.5)	65.3 (6.2)
Hearing status	PwNH	36	12
Hearing status	PwHL	36	20
Quartets		18	16

Open in a new tab

Note. PwNH = people with normal hearing; PwHL = people with hearing loss.

Realism—positive (18 comments). Participants explained how specific conditions relate to their day-to-day life. The high background noise condition was highlighted as representative of the struggles people usually have. Participants reported that once they began talking, the recording room and equipment were quickly forgotten, and that these generally did not hamper their conversations. Conversations were experienced as flowing naturally and drifting off-topic similarly to daily life interactions.

Realism—negative (nine comments). The extreme conditions (unaided with 72 dBA noise level, aided with 30 dBA) were considered less representative by the PwHL. On the one hand, these participants rarely choose not to wear their HAs when experiencing severe difficulty. On the other hand, they also considered the aided condition in low background noise unrealistic, as the sound-attenuated room provided an unusually favorable environment. Some participants expressed that the conversation topics were not of a type they would discuss in their daily life.

Acceptability—positive (28 comments). The majority of participants described the experience as enjoyable and pleasant. Some showed their approval for the topic prompts received, describing them as engaging, thought provoking, and fun. Participants' commitment seemed to be motivated by the idea that engaging in research is beneficial and holds value for individuals dealing with hearing loss.

Acceptability—negative (eight comments). Two PwHL described the experiment as being stressful, frustrating, and upsetting when required to converse without wearing HAs. Moreover, despite our care in choosing topics that would minimize the likelihood of talk drifting into controversial issues, two conversations (1.9%) drifted to politics, making some participants reluctant to participate in that conversation.

A few comments linked negative acceptability and positive realism, whereby the most challenging conditions were reported not only as emotionally stressful but also similar to situations encountered in daily life. An alternative expression of this effect was in the form of acceptance of these very difficult conditions only because it was an experiment; in real life, they would have departed from the situation.

The above data indicate that participants experienced the intended intrinsic engagement with the task (Goal 1a), resemblance to real-life situations (1b), and suspension of disbelief (1c). We saw no explicit evidence of participants feeling pressured to please the experimenters (2a), although comments referring to the value of contributing to research might be indicative of an “experiment” factor motivating continued participation even when uncomfortable. Participants were largely at ease in their interactions within quartets (2b), despite occasional instances of discomfort when talk turned to politics or under the most challenging conditions.

Infrequent Behaviors

Of the potential behaviors that influenced our design choices, three were observed very rarely: (a) HA adjustment (one occurrence), (b) schisming (two occurrences), and (c) lapse into prolonged silence (one occurrence—unaided, loud background noise). Despite two participants experiencing negative emotional responses to the adverse conditions, nobody withdrew from the study during Session 2. Participants were generally cooperative and engaged with the task. The frequency of slider interactions tended to increase with background noise level (see Figure 3), with median values equivalent to an interaction every 7–15 s. Of the 18 × 6 × 4 = 432 instances, nine (shared across four participants) recorded no interaction with the slider. While three of these participants might simply have forgotten to use the slider, one (NH) participant chose to be uncooperative, explicitly expressing a lack of willingness to engage with the slider during three conversations (on the basis of “I wouldn't be doing that in a real-life conversation”).

A box plot for slider interactions by noise level amplitude in decibels. The data are as follows. 1. Nose level: 30 decibels. The minimum and maximum values are 0 and 75, respectively. The first and third quartiles are 10 and 35, respectively. The median is 20. 2. Noise level: 54 decibels. The minimum and maximum values are 0 and 98, respectively. The first and third quartiles are 10 and 40, respectively. The median is 25. 2. Noise level: 72 decibels. The minimum and maximum values are 0 and 180, respectively. The first and third quartiles are 20 and 80, respectively. The median is 50. — Number of slider interactions per conversation, split by background noise level. Each point represents the value for one participant in one conversation. Box plots show median, interquartile ranges, median ±2.5 × interquartile range (minimum zero), and outliers.

Discussion

The purpose of this article has been to present the objectives and trade-offs inherent in our design decisions, offering an assessment of their efficacy, specifically with respect to application in four-person conversations involving unacquainted people with and without hearing loss and across a range of noise environments. Our assessment includes insights related to our design decisions as reflected in the level of realism and acceptability of our experimental conditions as perceived by the participants.

Realism

The experience of realism achieved in our design lends a strong sense of ecological validity (in the sense intended by Keidser et al., 2020: “the degree to which research findings reflect real-life hearing-related function, activity, or participation”) to the outcome data collected. By achieving a situation in which people could discount their technical surroundings and immerse themselves in discussions of the topic at hand, we provide strong indication that subsequent interpretations will be robust when extrapolated to the wider conversational world. The generalizability of objective results from our study is nevertheless strictly limited, due to some influential contextual variables (e.g., mutual familiarity, age, conversation type, freedom of movement) being fixed. Removing HAs was considered unrealistic by some PwHL. However, we considered it necessary as a window into the consequences of untreated hearing loss.

Overall, participants reported that the conversations flowed naturally, indicative of a high level of realism. Although a minority of participants reported that such whimsical hypothetical topics would be unlikely to arise in their daily life, nobody expressed a view that the tone of the conversations was unrealistic.

Most quartets quickly developed a camaraderie that facilitated the partial forgetting of the experimental context. This may well be strength of group-based experiments, where the responsibility for progressing the conversation is shared among the group members. This effect confirms the wisdom of our decision not to include a confederate in the quartet.

The quiet condition, when aided, was found to be unrealistically easy by some participants. This suggests that the low level of reverberation in the sound-deadened test room became an issue for realism when noise was absent.

Most participants' frequent interaction with the slider suggests that collecting continuous feedback during conversational tasks is feasible. However, while the slider enabled real-time judgments, it also introduced an artificial element into the conversational process. In everyday life, individuals may continuously evaluate their conversational experiences internally but do not typically report or quantify these evaluations.

Acceptability

Although participants were informed beforehand about the content of the experimental session and had an individual familiarization session, we could not fully predict how easy or difficult the conversations would be perceived in the moment, in the presence of the other participants. Our findings revealed a mixed emotional experience, although clearly leaning toward more positive than negative feelings. We were mindful that grouping four unacquainted participants together could induce strong pressure to conform. There are indications that this did infrequently occur, with a few PwHL reporting that in real-life situations of such adversity, they would have given up rather than persisting.

While some participants found the provided conversation topics to be interesting, thought provoking, and enjoyable, others were less enthusiastic. The instructions given explicitly permitted participants' talk to drift onto unrelated topics. Although this enhances realism, it also opens the door to controversial subjects, as was evident in two instances, causing discomfort and hesitant participation in a few participants.

The acceptability of the experimental conditions was also influenced by participants' perception of their role in the research process. There were indications that some participants' desire to “do well” and “help research” may have influenced their behavior (specifically, keeping going when they might have given up in an equivalent real-life situation).

Infrequent Behaviors

The fact that we very infrequently observed HA adjustment, schisming or lapsed conversation might be taken to indicate that these phenomena can confidently be ignored in experiments of this sort. However, they are all likely to be sensitive to the particular combination of experimental conditions and analyses. Both schisming and lapsing are phenomena, which evade straightforward definition. In the former case, as one looks more closely at the data via transcription of lexical content, microschisms become apparent. In the latter case, pauses can be awkwardly long without being an ultimate cessation of talk.

General Strengths and Limitations

Capturing multimodal recordings of group conversation behaviors provides extremely rich data. Given the overarching purpose of our work—to understand and ultimately alleviate hearing disability—such data are only useful if it reflects everyday behavior to a reasonable extent and with known constraints. We believe that the evidence presented here supports confidence in the validity of the data and clarifies the likely boundaries of that validity.

It deserves to be repeated that any results arising from this study cannot be interpreted as generalizable to all real-life group conversation situations, as many aspects were held constant whose variability in real life is likely to influence behavior. Such aspects include size and composition of the group (age, gender, dis/ability, familiarity), type of conversation, freedom of movement, and environmental manipulations. Specifically, while the use of 16-talker babble at two levels ensured experimental control, it does not fully capture the acoustic complexity of real-world soundscapes, which often include transient sounds, background music, or moving noise sources.

Conversations are inherently unique, even within the same group of individuals, and this was also apparent in our study. Additionally, group-level differences, such as specific combinations of personality traits, could further influence how interactions unfold and how acceptable and realistic participants considered the experiment to be. A round-robin design where individuals are assessed across multiple group configurations could help disentangle these effects. However, such an approach would greatly increase participant burden and logistical complexity.

Alleviating Participant Distress and Giving Up as Data

When the researchers' aim is to understand how adversity affects people, there is necessarily a risk of participant discomfort, stress, or anxiety. Our experience with running this study highlights how this factor plays out quite differently when participants take part in groups, compared to single-participant designs. Due to the diversity of individuals' responses to adversity, some participants may become highly distressed in a situation where others are not at all troubled. In single-participant designs, this is less likely to be problematic, as peer pressure to conform is absent. Although such peer pressure is arguably also present in real-life social situations, the drivers of whether to remain in the situation or to leave are quite different. Hence, we may observe behavior whose validity (for our purposes) is questionable.

If one were to repeat a study similar to this one, we would recommend the inclusion of some kind of mechanism(s) to deal with occurrences of distress. Distress might be defused—and captured as data—by, for example, having a button on the participants' response tablet labeled, “Click here if this situation is so bad that you would give up in real life,” which when clicked changes to “OK, feel free to stop trying for the remainder of this conversation.”

Conclusions

This article provides an in-depth description of design decisions and how they played out in the execution of a novel small-group conversation experiment involving interlocutors with normal hearing and PwHL. By describing the trade-offs made during the design process and the challenges faced during execution, we believe this article can serve as a valuable reference for other researchers.

The core message is that such an experiment can be carried out successfully and in such a way that the behavioral and self-report data collected are likely to possess a high degree of ecological validity while still supporting experimental and statistical control. Many choices must be made during the design phase, and they deserve careful thought because of the complex web of effects—both desirable and unintended—that are involved.

Data Availability Statement

Data will be made available upon publication of the substantive results from this study.

Acknowledgments

R.N., R.-L.F., and K.S. were supported by WS Audiology. B.B., M.G., and G.N. were supported by Medical Research Council Grants MR/S003576/1 and MR/X003620/1. L.V.H. was supported by a UK Research and Innovation Future Leaders Fellowship Grant MR/T041471/1. The authors thank all the participants who took part in the study, Christoph Louvan and his team for their help with emoTouch software, and Andrew Lavens for his technical support.

Appendix A

Conversation Topics

Training conversation topic:

“Discuss how Glasgow has changed over the past ten years.”

Conversation topics used during data collection:

“Discuss and plan a dinner party together using only food and drinks you all dislike.”
“Imagine that you are writing a book together. Each of you is a character. Discuss what your character would be and what the book would be about.”
“Imagine that life on land is no longer possible. In order to survive you either have to live underwater or in space. Discuss which you would prefer as a group.”
“Imagine that you are heroes in a movie. What superpowers do you have and what will the movie be called?”
“Discuss and decide as a group what is the weirdest thing you have ever eaten. Would those in the group, who have never eaten that food, give it a try?”
“A genie has granted your group three wishes. Discuss and agree on which three wishes would be the best for all of you.”
Extra topic used only in one quartet: “Think about what movies you have watched in the past six months. Discuss and agree as a group on which are the best three movies.”

Appendix B

Conversation Success Survey

This survey was administered after each conversation. All items except Item 2 were inspired by the facets of conversation success identified by Nicoras et al. (2022). Copyright © 2022 Nicoras, R., Gotowiec, S., Hadley, L. V., Smeds, K., & Naylor, G. Published by Informa UK Limited, trading as Taylor & Francis Group on behalf of British Society of Audiology, International Society of Audiology, and Nordic Audiological Society. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Survey question	Response options
1) To what extent would you rate this conversation as successful?	1 Unsuccessful	5 Successful
2) Did you do anything or say anything in a particular way to make this conversation successful?	1 Not at all	5 Very much
3) How easy was it for you to follow the conversation?	1 Not easy	5 Very easy
4) To what extent was the person sitting on your left speaking in a helpful way?	1 Not helpful	5 Very helpful
5) To what extent was the person sitting in front of you speaking in a helpful way?	1 Not helpful	5 Very helpful
6) To what extent was the person sitting on your right speaking in a helpful way?	1 Not helpful	5 Very helpful
7) How connected did you feel with other participants?	1 Not connected	5 Very connected
8) To what extent did you share information successfully?	1 Not successfully	5 Very Successfully
9) How enjoyable was this conversation for you?	1 Not enjoyable	5 Very enjoyable
10) To what extent was this conversation flowing smoothly?	1 Not smoothly	5 Very smoothly
11) How often did you feel uncomfortable/anxious during this conversation?	1 Never	5 Always

Open in a new tab

Funding Statement

References

Ajrouch, K. J., Blandon, A. Y., & Antonucci, T. C. (2005). Social networks among men and women: The effects of age and socioeconomic status. Journals of Gerontology: Series B: Psychological Sciences and Social Sciences, 60(6), S311–S317. 10.1093/geronb/60.6.S311 [DOI] [PubMed] [Google Scholar]
Baker, R., & Hazan, V. (2011). DiapixUK: Task materials for the elicitation of multiple spontaneous speech dialogs. Behavior Research Methods, 43(3), 761–770. 10.3758/s13428-011-0075-y [DOI] [PubMed] [Google Scholar]
Beechey, T., Buchholz, J. M., & Keidser, G. (2019). Eliciting naturalistic conversations: A method for assessing communication ability, subjective experience, and the impacts of noise and hearing impairment. Journal of Speech, Language, and Hearing Research, 62(2), 470–484. 10.1044/2018_JSLHR-H-18-0107 [DOI] [PubMed] [Google Scholar]
Beechey, T., Buchholz, J. M., & Keidser, G. (2020). Hearing impairment increases communication effort during conversations in noise. Journal of Speech, Language, and Hearing Research, 63(1), 305–320. 10.1044/2019_JSLHR-19-00201 [DOI] [PubMed] [Google Scholar]
Bennett, R. J., Saulsman, L., Eikelboom, R. H., & Olaithe, M. (2022). Coping with the social challenges and emotional distress associated with hearing loss: A qualitative investigation using Leventhal's self-regulation theory. International Journal of Audiology, 61(5), 353–364. 10.1080/14992027.2021.1933620 [DOI] [PubMed] [Google Scholar]
Böcking, S. (2008). Suspension of Disbelief. In Donsbach W. (Ed.), The international encyclopedia of communication (pp. 4913–4915). Blackwell. 10.1002/9781405186407.wbiecs121 [DOI] [Google Scholar]
Bond, R. (2005). Group size and conformity. Group Processes & Intergroup Relations, 8(4), 331–354. 10.1177/1368430205056464 [DOI] [Google Scholar]
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. 10.1191/1478088706qp063oa [DOI] [Google Scholar]
Buck, B., McLaren, A., & Naylor, G. (2022, June 12–15). Worse hearers make better listeners: The effects of hearing loss on turn-taking and head movement in virtual conversations [Paper presentation]. 6th International Conference on Cognitive Hearing Science for Communication, Linkoping, Sweden. [Google Scholar]
Dunbar, R. I. M., Duncan, N. D. C., & Nettle, D. (1995). Size and structure of freely forming conversational groups. Human Nature, 6(1), 67–78. 10.1007/BF02734136 [DOI] [PubMed] [Google Scholar]
Egbert, M. M. (1997). Schisming: The collaborative transformation from a single conversation to multiple conversations. Research on Language and Social Interaction, 30(1), 1–51. 10.1207/s15327973rlsi3001_1 [DOI] [Google Scholar]
Hadley, L. V., Whitmer, W. M., Brimijoin, W. O., & Naylor, G. (2021). Conversation in small groups: Speaking and listening strategies depend on the complexities of the environment and group. Psychonomic Bulletin & Review, 28(2), 632–640. 10.3758/s13423-020-01821-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hazan, V., & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. The Journal of the Acoustical Society of America, 130(4), 2139–2152. 10.1121/1.3623753 [DOI] [PubMed] [Google Scholar]
Hládek, Ĺ., Porr, B., Naylor, G., Lunner, T., & Owen Brimijoin, W. (2019). On the ingteraction of head and gaze control with acoustic beam width of a simulated beamformer in a two-talker scenario. Trends in Hearing, 23. 10.1177/2331216519876795 [DOI] [PMC free article] [PubMed] [Google Scholar]
Holler, J., & Levinson, S. C. (2019). Multimodal language processing in human communication. Trends in Cognitive Sciences, 23(8), 639–652. 10.1016/j.tics.2019.05.006 [DOI] [PubMed] [Google Scholar]
Keidser, G., Naylor, G., Brungart, D. S., Caduff, A., Campos, J., Carlile, S., Carpenter, M. G., Grimm, G., Hohmann, V., Holube, I., Launer, S., Lunner, T., Mehra, R., Rapport, F., Slaney, M., & Smeds, K. (2020). The quest for ecological validity in hearing science: What it is, why it matters, and how to advance it. Ear and Hearing, 41(Suppl. 1), 5S–19S. 10.1097/AUD.0000000000000944 [DOI] [PMC free article] [PubMed] [Google Scholar]
Krems, J. A., & Wilkes, J. (2019). Why are conversations limited to about four people? A theoretical exploration of the conversation size constraint. Evolution and Human Behavior, 40(2), 140–147. 10.1016/j.evolhumbehav.2018.09.004 [DOI] [Google Scholar]
Louven, C., Scholle, C., Gehrs, F., & Lenz, A. (2022). emoTouch Web—A web-based system for continuous real time studies with smartphones, tablets, and desktop computers. Jahrbuch Musikpsychologie, 30(April). 10.5964/jbdgm.137 [DOI] [Google Scholar]
Lu, H., & Brimijoin, W. O. (2022). Sound source selection based on head movements in natural group conversation. Trends in Hearing, 26. 10.1177/23312165221097789 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mahmoud, M., & Robinson, P. (2011). Interpreting hand-over-face gestures. In S. D'Mello, A. Graesser, B. Schuller, & J. C. Martin (Eds.), Affective computing and intelligent interaction. ACII 2011. Lecture notes in computer science (Vol. 6975, pp. 248–255). Springer. 10.1007/978-3-642-24571-8_27 [DOI] [Google Scholar]
Nicoras, R., Gotowiec, S., Hadley, L. V., Smeds, K., & Naylor, G. (2022). Conversation success in one-to-one and group conversation: A group concept mapping study of adults with normal and impaired hearing. International Journal of Audiology, 62(9), 868–876. 10.1080/14992027.2022.2095538 [DOI] [PubMed] [Google Scholar]
Örnolfsson, I., May, T., Ahrens, A., & Dau, T. (2023). How noise impacts decision-making in triadic conversations. Proceedings of Forum Acusticum, September, 429–432. 10.61782/fa.2023.0720 [DOI] [Google Scholar]
Peperkoorn, L. S., Becker, D. V., Balliet, D., Columbus, S., Molho, C., & Van Lange, P. A. M. (2020). The prevalence of dyads in social life. PLOS ONE, 15(12), 1–17. 10.1371/journal.pone.0244188 [DOI] [PMC free article] [PubMed] [Google Scholar]
Petersen, E. B. (2024). Investigating conversational dynamics in triads: Effects of noise, hearing impairment, and hearing aids. Frontiers in Psychology, 15(April). 10.3389/fpsyg.2024.1289637 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pichora-Fuller, M. K., Johnson, C. E., & Roodenburg, K. E. J. (1998). The discrepancy between hearing impairment and handicap in the elderly: Balancing transaction and interaction in conversation. Journal of Applied Communication Research, 26(1), 99–119. 10.1080/00909889809365494 [DOI] [Google Scholar]
Samter, W. (2003). Handbook of communication and social interaction skills. Psychology Press. [Google Scholar]
Templeton, E. M., Chang, L. J., Reynolds, E. A., Cone Lebeaumont, M. D., & Wheatley, T. (2023). Long gaps between turns are awkward for strangers but not for friends. Philosophical Transactions of the Royal Society B: Biological Sciences, 378(1875), Article 20210471. 10.1098/rstb.2021.0471 [DOI] [PMC free article] [PubMed] [Google Scholar]
To, W. M., & Chung, A. W. L. (2015). Restaurant noise: Levels and temporal characteristics. Noise & Vibration Worldwide, 46(8), 11–17. 10.1260/0957-4565.46.8.11 [DOI] [Google Scholar]
Vas, V., Akeroyd, M. A., & Hall, D. A. (2017). A data-driven synthesis of research evidence for domains of hearing loss, as reported by adults with hearing loss and their communication partners. Trends in Hearing, 21, 1–25. 10.1177/2331216517734088 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yeomans, M., Schweitzer, M. E., & Brooks, A. W. (2022). The conversational circumplex: Identifying, prioritizing, and pursuing informational and relational motives in conversation. Current Opinion in Psychology, 44, 293–302. 10.1016/j.copsyc.2021.10.001 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available upon publication of the substantive results from this study.

[bib1] Ajrouch, K. J., Blandon, A. Y., & Antonucci, T. C. (2005). Social networks among men and women: The effects of age and socioeconomic status. Journals of Gerontology: Series B: Psychological Sciences and Social Sciences, 60(6), S311–S317. 10.1093/geronb/60.6.S311 [DOI] [PubMed] [Google Scholar]

[bib2] Baker, R., & Hazan, V. (2011). DiapixUK: Task materials for the elicitation of multiple spontaneous speech dialogs. Behavior Research Methods, 43(3), 761–770. 10.3758/s13428-011-0075-y [DOI] [PubMed] [Google Scholar]

[bib3] Beechey, T., Buchholz, J. M., & Keidser, G. (2019). Eliciting naturalistic conversations: A method for assessing communication ability, subjective experience, and the impacts of noise and hearing impairment. Journal of Speech, Language, and Hearing Research, 62(2), 470–484. 10.1044/2018_JSLHR-H-18-0107 [DOI] [PubMed] [Google Scholar]

[bib4] Beechey, T., Buchholz, J. M., & Keidser, G. (2020). Hearing impairment increases communication effort during conversations in noise. Journal of Speech, Language, and Hearing Research, 63(1), 305–320. 10.1044/2019_JSLHR-19-00201 [DOI] [PubMed] [Google Scholar]

[bib5] Bennett, R. J., Saulsman, L., Eikelboom, R. H., & Olaithe, M. (2022). Coping with the social challenges and emotional distress associated with hearing loss: A qualitative investigation using Leventhal's self-regulation theory. International Journal of Audiology, 61(5), 353–364. 10.1080/14992027.2021.1933620 [DOI] [PubMed] [Google Scholar]

[bib28] Böcking, S. (2008). Suspension of Disbelief. In Donsbach W. (Ed.), The international encyclopedia of communication (pp. 4913–4915). Blackwell. 10.1002/9781405186407.wbiecs121 [DOI] [Google Scholar]

[bib6] Bond, R. (2005). Group size and conformity. Group Processes & Intergroup Relations, 8(4), 331–354. 10.1177/1368430205056464 [DOI] [Google Scholar]

[bib7] Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. 10.1191/1478088706qp063oa [DOI] [Google Scholar]

[bib8] Buck, B., McLaren, A., & Naylor, G. (2022, June 12–15). Worse hearers make better listeners: The effects of hearing loss on turn-taking and head movement in virtual conversations [Paper presentation]. 6th International Conference on Cognitive Hearing Science for Communication, Linkoping, Sweden. [Google Scholar]

[bib9] Dunbar, R. I. M., Duncan, N. D. C., & Nettle, D. (1995). Size and structure of freely forming conversational groups. Human Nature, 6(1), 67–78. 10.1007/BF02734136 [DOI] [PubMed] [Google Scholar]

[bib10] Egbert, M. M. (1997). Schisming: The collaborative transformation from a single conversation to multiple conversations. Research on Language and Social Interaction, 30(1), 1–51. 10.1207/s15327973rlsi3001_1 [DOI] [Google Scholar]

[bib11] Hadley, L. V., Whitmer, W. M., Brimijoin, W. O., & Naylor, G. (2021). Conversation in small groups: Speaking and listening strategies depend on the complexities of the environment and group. Psychonomic Bulletin & Review, 28(2), 632–640. 10.3758/s13423-020-01821-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Hazan, V., & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. The Journal of the Acoustical Society of America, 130(4), 2139–2152. 10.1121/1.3623753 [DOI] [PubMed] [Google Scholar]

[bib29] Hládek, Ĺ., Porr, B., Naylor, G., Lunner, T., & Owen Brimijoin, W. (2019). On the ingteraction of head and gaze control with acoustic beam width of a simulated beamformer in a two-talker scenario. Trends in Hearing, 23. 10.1177/2331216519876795 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Holler, J., & Levinson, S. C. (2019). Multimodal language processing in human communication. Trends in Cognitive Sciences, 23(8), 639–652. 10.1016/j.tics.2019.05.006 [DOI] [PubMed] [Google Scholar]

[bib14] Keidser, G., Naylor, G., Brungart, D. S., Caduff, A., Campos, J., Carlile, S., Carpenter, M. G., Grimm, G., Hohmann, V., Holube, I., Launer, S., Lunner, T., Mehra, R., Rapport, F., Slaney, M., & Smeds, K. (2020). The quest for ecological validity in hearing science: What it is, why it matters, and how to advance it. Ear and Hearing, 41(Suppl. 1), 5S–19S. 10.1097/AUD.0000000000000944 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Krems, J. A., & Wilkes, J. (2019). Why are conversations limited to about four people? A theoretical exploration of the conversation size constraint. Evolution and Human Behavior, 40(2), 140–147. 10.1016/j.evolhumbehav.2018.09.004 [DOI] [Google Scholar]

[bib30] Louven, C., Scholle, C., Gehrs, F., & Lenz, A. (2022). emoTouch Web—A web-based system for continuous real time studies with smartphones, tablets, and desktop computers. Jahrbuch Musikpsychologie, 30(April). 10.5964/jbdgm.137 [DOI] [Google Scholar]

[bib16] Lu, H., & Brimijoin, W. O. (2022). Sound source selection based on head movements in natural group conversation. Trends in Hearing, 26. 10.1177/23312165221097789 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Mahmoud, M., & Robinson, P. (2011). Interpreting hand-over-face gestures. In S. D'Mello, A. Graesser, B. Schuller, & J. C. Martin (Eds.), Affective computing and intelligent interaction. ACII 2011. Lecture notes in computer science (Vol. 6975, pp. 248–255). Springer. 10.1007/978-3-642-24571-8_27 [DOI] [Google Scholar]

[bib18] Nicoras, R., Gotowiec, S., Hadley, L. V., Smeds, K., & Naylor, G. (2022). Conversation success in one-to-one and group conversation: A group concept mapping study of adults with normal and impaired hearing. International Journal of Audiology, 62(9), 868–876. 10.1080/14992027.2022.2095538 [DOI] [PubMed] [Google Scholar]

[bib19] Örnolfsson, I., May, T., Ahrens, A., & Dau, T. (2023). How noise impacts decision-making in triadic conversations. Proceedings of Forum Acusticum, September, 429–432. 10.61782/fa.2023.0720 [DOI] [Google Scholar]

[bib20] Peperkoorn, L. S., Becker, D. V., Balliet, D., Columbus, S., Molho, C., & Van Lange, P. A. M. (2020). The prevalence of dyads in social life. PLOS ONE, 15(12), 1–17. 10.1371/journal.pone.0244188 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Petersen, E. B. (2024). Investigating conversational dynamics in triads: Effects of noise, hearing impairment, and hearing aids. Frontiers in Psychology, 15(April). 10.3389/fpsyg.2024.1289637 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Pichora-Fuller, M. K., Johnson, C. E., & Roodenburg, K. E. J. (1998). The discrepancy between hearing impairment and handicap in the elderly: Balancing transaction and interaction in conversation. Journal of Applied Communication Research, 26(1), 99–119. 10.1080/00909889809365494 [DOI] [Google Scholar]

[bib23] Samter, W. (2003). Handbook of communication and social interaction skills. Psychology Press. [Google Scholar]

[bib24] Templeton, E. M., Chang, L. J., Reynolds, E. A., Cone Lebeaumont, M. D., & Wheatley, T. (2023). Long gaps between turns are awkward for strangers but not for friends. Philosophical Transactions of the Royal Society B: Biological Sciences, 378(1875), Article 20210471. 10.1098/rstb.2021.0471 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] To, W. M., & Chung, A. W. L. (2015). Restaurant noise: Levels and temporal characteristics. Noise & Vibration Worldwide, 46(8), 11–17. 10.1260/0957-4565.46.8.11 [DOI] [Google Scholar]

[bib26] Vas, V., Akeroyd, M. A., & Hall, D. A. (2017). A data-driven synthesis of research evidence for domains of hearing loss, as reported by adults with hearing loss and their communication partners. Trends in Hearing, 21, 1–25. 10.1177/2331216517734088 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Yeomans, M., Schweitzer, M. E., & Brooks, A. W. (2022). The conversational circumplex: Identifying, prioritizing, and pursuing informational and relational motives in conversation. Current Opinion in Psychology, 44, 293–302. 10.1016/j.copsyc.2021.10.001 [DOI] [PubMed] [Google Scholar]

PERMALINK

Effective Design for Experiments on Small-Group Conversation: Insights From an Example Study

Raluca Nicoras

Bryony Buck

Rosa-Linde Fischer

Matthew Godfrey

Lauren V Hadley

Karolina Smeds

Graham Naylor

Abstract

Purpose of This Article

Design Goals and Compromises

Research Objectives and Scope

Overarching Design Goals

Contrasts, Covariates, and Overall Factorial Design

Group Composition, Size, and Physical Arrangement

Characteristics of the Group of Participants With Hearing Loss

HAs

The Participant Experience

Physical Environment

Conversational Task

Experimental Design: Summary

Measurement of Outcomes and Behavior

Self-Reports of Conversation Success

Figure 1.

Recording of Behavior

Experiment Execution

Participants

Quartets

Table 1.

Experimental Setup

Figure 2.

Background Sounds

Procedure

Introductory Individual Assessment (Session 1)

Group Experiment (Session 2)

Results

Evidence of Achievement of Design Goals

Table 2.

Infrequent Behaviors

Figure 3.

Discussion

Realism

Acceptability

Infrequent Behaviors

General Strengths and Limitations

Alleviating Participant Distress and Giving Up as Data

Conclusions

Data Availability Statement

Acknowledgments

Appendix A

Conversation Topics

Appendix B

Conversation Success Survey

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases