Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jul 29.
Published in final edited form as: Hamlyn Symp Med Robot. 2024 Jun;16:103–104.

Acoustic Patterns of Interprofessional Communication and Quality of Teamwork in the Cardiac Operating Theatre

Shrivatsa Mishra 1, Roger D Dias 2, Marco A Zenati 2,3, Theodora Chaspari 1
PMCID: PMC11285017  NIHMSID: NIHMS2011746  PMID: 39076410

INTRODUCTION

Medical errors are the third leading cause of death, right after cancer and heart diseases, causing around 250,000 deaths every year. Approximately 40% of these errors happen in the operating room (OR) and 50% of the resulting complications are avoidable. Most errors are related to communication. Effective communication among team members during cardiac surgery operations is paramount for ensuring patient safety and successful outcomes. Clear and concise communication among the members of the surgical team facilitates the coordination of complex tasks, enhances situational awareness, and allows for swift responses to unexpected developments [1].

Behavioral data analytics leverage multimodal data and artificial intelligence (AI) methods and can offer valuable insights into communication cues during cardiac surgery. By integrating various data sources such as audio, video, and physiology, AI algorithms can capture team members’ tone of voice, language patterns, and body language, ultimately distinguishing between positive and negative interprofessional communication behaviors. Speech signals can serve as indicators of both positive and negative interprofessional communication cues. A recent study that conducted analysis of conversation data during team interaction showed that it is possible to predict team performance from linguistic and acoustic features [2]. In another study, speech features extracted from conversations between team members were significantly correlated with team collaboration effort [3]. Identifying the vocal behaviors that affect team functioning on a moment-to-moment basis can yield a unique set of materials for personalized team training (e.g., video playback [4]) aimed at proactively addressing communication challenges, optimizing collaboration, and ensuring safer, more efficient cardiac surgery procedures. This paper investigates acoustic measures of speech extracted from speech during two phases of simulated cardiac surgery operations representing good and poor interprofessional communication behaviors and examines potential differences in acoustic measures between sessions.

MATERIALS AND METHODS

Study Design and Setting:

Data for this study comes from four scripted simulation videos that were produced for educational purposes on OR team training by the American Society of Extracorporeal Technology (AmSECT)1. Data includes one cardiac scripted surgery video displaying poor interprofessional communication and one video of a good communication example. Each simulation covers two phases: the pre-operative briefing and the cardiopulmonary bypass (CBP) phase.

Population:

Participants of the simulations were four selected representative primary cardiac team members, including the attending surgeon, attending anesthesiologist, primary perfusionist, and scrub nurse, playing scripted simulation scenarios.

Procedures:

All sessions were video recorded. The video editing took place immediately after the simulations are recorded. Audio was further extracted from the videos and the transcripts were manually examined to ensure the correct start and end time of each dialog turn.

Analysis:

Acoustic measures indicative of voice prosody and intonation were extracted using the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) [5]. GeMAPS includes theoretically stipulated and empirically validated acoustic measures of affect, covering frequency, energy/amplitude, and spectral balance parameters. Each acoustic measure was extracted over each speaker turn at a short-time scale of 20–60ms, following standard practices in speech analysis for capturing human affect [6].

Linear mixed-effects model (LME) analysis was further applied to identify significant differences in terms of the acoustic measures between the good communication and the poor communication videos. The LME models accounted for the multiple phases (i.e., briefing, CPB) within each video and were defined as:

AcousticMeasurei,j=α×CommunicationRatingi,j+b×t+μi

where AcousticMeasureij is the acoustic measure of video i over task j, CommunicationRatingij is the variable that represents the communication quality of video i (i.e., 1 for ‘good’ and 0 for ‘poor’), t represents the phase of the video (i.e., 0 for briefing, 1 for CPB), and mi is the video-specific mean of the acoustic measure. The coefficient a represents the association between the acoustic measure and communication quality, while the coefficient b represents the association between the acoustic measure and the video phase.

RESULTS

Acoustic measures are visualized separately for the good and poor communication videos via box plots, that depict the distribution of each measure per video. A preliminary examination suggests differences between the good and poor communication sessions, particularly in terms of mean fundamental frequency (F0), loudness, and first/second/third/fourth Mel-Frequency Cepstral Coefficients (MFCC) (Figure 1). Results from the LME models further suggest a significant effect of the session rating (i.e., good/poor) on the acoustic measures, but no effect of the type of phase (i.e., briefing/CPB) on the same measures (Table 1). Indicatively, significant differences between the good and poor surgery sessions are found in terms of the energy/amplitude measures such as loudness (perceived speech intensity) and shimmer (perceived trembling in speech intensity). Spectral parameters that overall quantify differences in speech between low and high-frequency regions also depict significant differences between the two types of sessions. Specifically, significant differences are observed between the good and poor session in terms of the alpha ratio (ratio of the summed speech energy from low frequency of 50–1000 Hz and high frequency of 1–5 kHz), Hammarberg index (ratio of the strongest energy peak in the 0–2 kHz region to the strongest peak in the 2–5 kHz region), and spectral flux (temporal difference in spectral content between two consecutive speech frames). Finally, jitter, a frequency-related speech measure that quantifies temporal deviations in speech periodicity, also depicted significant differences between the good and poor session.

Figure 1:

Figure 1:

Box plots of acoustic measures for good and poor interpersonal communication sessions.

Table 1:

Coefficients and p-values of LME models.

Acoustic measure Session Rating (good/poor) Phase (briefing/CPB)
Loudness a = −0.083, p = 0 b = 0.105, p = 0.869
Alpha ratio a = 0.139, p = 0 b = −1.103, p = 0.859
Hammarberg index a = −0.334, p = 0 b = 1.488, p = 0.868
Spectral flux a = −0.095, p = 0 b = 0.093, p = 0.862
Jitter a = −0.009, p = 0 b = 0.004, p = 0.969
Shimmer a = −0.115, p = 0 b = −0.137, p = 0.907

DISCUSSION

Results indicate significant differences in terms of the vocal characteristics of the surgical team members between the good and poor communication sessions. This suggests that AI systems could rely on such acoustic measures to detect subtle nuances in communication dynamics between cardiac surgery team members in-real time, ultimately enabling team alerts and facilitating targeted interventions such as clarifications and conflict resolution. Despite the promising results, these findings should be approached with caution as the acoustic measures are reported cumulatively across team members. Future analyses will disaggregate these measures per team member, thereby addressing inherent inter-individual differences in acoustics. Moreover, the small sample size and the scripted data underscores the need to include additional real-life data as part of future analysis. Beyond acoustics, linguistic analysis of the conversation will offer complementary insights enabling us to pinpoint specific linguistic and paralinguistic elements in team discussions that contribute to interprofessional communication.

Acknowledgment:

This work was supported by the National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health (NIH) [Grant Numbers: R01HL126896, R01HL157457].

Footnotes

REFERENCES

  • 1.Dias RD, et al. , Dissecting cardiac surgery: A video-based recall protocol to elucidate team cognitive processes in the operating room. Annals of surgery, 2021. 274(2): p. el81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Murray G and Oertel C. Predicting group performance in task-based interaction. [Google Scholar]
  • 3.Reilly JM and Schneider B, Predicting the Quality of Collaborative Problem Solving through Linguistic Analysis of Discourse. International Educational Data Mining Society, 2019. [Google Scholar]
  • 4.de José Belzunce M, Micro-videos and micro-behaviors as an innovative methodology for training in soft skills.
  • 5.Eyben F, et al. , The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing, 2015. 7(2): p. 190–202. [Google Scholar]
  • 6.Hansen JHL and Patil S, Speech under stress: Analysis, modeling and recognition. Speaker classification I: Fundamentals, features, and methods, 2007: p. 108–137. [Google Scholar]

RESOURCES