Figure 2.
Overview of methodological process for this study. Spoken dialogs are divided into individual speaking turns. Moreover, 429 acoustic features (divided into five acoustic feature sets) are extracted from each speaking turn in every conversation. Proximity and synchrony scores are calculated for each acoustic feature, yielding 858 entrainment scores per speaking turn. Predictive modeling is used to evaluate the degree of entrainment (i.e., degree to which entrainment scores could be used to distinguish real and sham conversational turns) and the relationship between entrainment and conversational success (i.e., degree to which entrainment scores could be used to predict conversational efficiency and quality scores). EMS = envelope modulation spectrum; LTAS = long-term average spectrum; MFCC = mel-frequency cepstrum coefficient; VR = voice report.