Skip to main content
. 2013 Jun 17;23(12):1075–1080. doi: 10.1016/j.cub.2013.04.055

Figure 2.

Figure 2

Distance-to-Mean in Voice Space

(A) Stimuli from Experiment 1 (32 natural voices per gender) are represented as individual points in the three-dimensional space defined by their average log(f0), log(FD), and HNR, Z scored by gender (resulting in overlapping male and female stimulus clouds). Red discs represent female voices; blue discs represent male voices. The prototypical voices generated by averaging together all same-gender stimuli are located on top of the stimulus cloud (triangles) owing to their high HNR value. Distance-to-mean =df02+dHNR2+dFD2.

(B) Voice averaging in Experiment 1. Spectrograms of example voice stimuli (top row) represent male speakers uttering the syllable “had.” Black circles indicate manually identified time-frequency landmarks put in correspondence across stimuli during averaging, corresponding to the frequencies of the first three formants at onset of phonation (left side), at onset of formant transition, and at offset of phonation (right side). A prototypical voice (bottom) is generated by morphing together stimuli from 32 different speakers. Note the smooth texture caused by averaging, resulting in high HNR values.

(C) Histograms of distance-to-mean distributions for the voice stimuli of Experiment 1 (gray) and Experiment 2 (black); the mode of the two distributions is for intermediate values of distance-to-mean.

(D) Scatterplot of distance-to-mean versus distinctiveness ratings (Z scored) for the 126 stimuli of Experiment 1. Distance-to-mean explains over half of the variance in distinctiveness ratings (R2 = 0.53): voices with greater distance-to-mean are judged to be more distinctive. See also Figure S2 for correlations coefficients in other spaces.