Table B2.
No. | Sign | Mode
a
|
Calculation | |
---|---|---|---|---|
P | A | |||
1 | Reduced dispersion of corner vowels from center | X | There are four corner vowels. The center is defined using the average first and second formant frequencies over the four corner vowels. Dispersion is the weighted mean of the corner vowels of the distance to that center. “Weighted” means each vowel occurrence is separately included in the dispersion calculation. In comparison, the center location calculation is unweighted. The average formant frequency pairs are separately calculated for each of the four vowels. The resulting four frequency pairs are then averaged to get the vowel center. |
|
2 | Reduced dispersion of corner vowels from ^ | X | The location of any vowel is the average first and second formant frequencies. Dispersion is the average distance of the location of each of the four corner vowels to the location of four. | |
3 | Reduced average pairwise distance of corner vowels | X | This is the average distance from the location of each corner vowel to the location of the other three corner vowels. | |
4 | Increased duration of corner vowels | X | The weighted mean of the length of the corner vowels. “Weighted” means each vowel occurrence is separately included in the calculation. | |
5 | Increased duration for middle vowels and diphthongs | X | This includes eight monophthongs and five diphthongs. | |
6 | Reduced % vowel phoneme target consistency | X | A type is a distinct Y-line word considering just the phonemes (see PEPPER [2019]for a description of X, Y, and Z lines). A token is a specific occurrence. A type can have many tokens. Tokens that have anything questionable in the X or Y lines are ignored. Phonemes that are questionable in the Z line are ignored. Cases where a phoneme occurs twice (e.g., “b” in “bob”) count as two types. For a given phoneme in a type, consistency looks at just the corresponding position in the tokens for errors. Within a type, consistency is defined as the probability, for the selected phoneme, that any two randomly selected (without replacement) tokens have the same obtained result (considering either phoneme only or phoneme and diacritics). For error consistency, the two tokens are selected just from those with errors; for target consistency, they are selected from all the tokens (assuming at least one token has an error). For example, if three different obtained results occur with frequencies of I, J, and K, then this probability is: where N = I + J + K. To combine these probabilities over types, we weight each probability by N − 1, because a type with only one eligible token gives us no information. For our “numerator,” we store the sum of each type's probability times its N − 1. For our “denominator,” we store the sum of each type's N − 1. Target consistency considers only those types with at least two tokens where at least one has an error. Phoneme consistency considers just substitutions and deletions to be errors. |
|
7 | Reduced % vowel target consistency | X | Complete consistency considers substitutions, deletions, and distortions to be errors. Distortions are diacritics on the Z line only (produced but not intended) aside from stress and juncture diacritics. | |
8 | Reduced % correct glides | X | There are two English glides. | |
9 | Increased relative distortion index: sibilants | X | Percentage of sibilant distortions of all sibilant errors (distortions, substitutions, and deletions). | |
10 | Reduced % dentalized sibilants of distorted sibilants | X | Distortions are diacritics on the Z line only (produced but not intended) aside from stress and juncture diacritics. There are three English sibilants. | |
11 | Increased relative distortion index for early consonants | X | Percentage of distorted early eight consonants of all early eight errors. | |
12 | Decreased first moment on /s/ initial singletons | X | See centroid definition at https://en.wikipedia.org/wiki/Spectral_centroid | |
13 | Increased sqrt of the second moment for /s/ initial singletons | X | Sqrt is the abbreviation for square root. | |
14 | Increased sqrt of the second moment for /s/ initial and /s/ and /z/ final singletons | X | The same as the previous item except that final /s, z/ are included. | |
15 | Increased all consonant–consonant duration | X | Average length in milliseconds of all consonant pairs where the consonants are less than 0.1 s apart and they are not the same consonant or a cognate. | |
16 | Increased Diacritic Modification Index (DMI) class: place % | X | Percentage of phonemes with one or more tongue configuration or position diacritics. | |
17 | Increased DMI class: duration % | X | Percentage of phonemes lengthened or shortened. | |
18 | Increased % of epenthesis errors | X | Percent of epenthesis errors (vowel addition) by token (by word). | |
19 | Increased PM errors: % of addition, breath, repeat, or long | X | Percentage of pause opportunities with one or more of addition, breath, repeat, or long. (Counted even if grope, change, or abrupt is also present.) | |
20 | Reduced syllables per second (without pauses) | X | Syllables per second for first 12 coded utterances after pauses are removed. | |
21 | Increased syllable length in ms (without pauses) | X | Uses the first 12 coded utterances. | |
22 | Increased % of prosody-voice (PV) codes 15/16 EE codes of all coded utterances without fast/acceleration (uncircled and circled) | X | EE is the abbreviation for excessive/equal stress, an inappropriate stress pattern that can occur on utterances that have PVSP inappropriate code of 15 or 16 (see pp. 31–32 of the PVSP manual [Shriberg, Kwiatkowski, & Rasmussen, 1990]). Inappropriate fast and/or accelerated speech (PV11/12) is defined as greater than four syllables per second for children and greater than six syllables per second for adolescents and adults. Uncircled and circled are treated as inappropriate and appropriate, respectively. Circled codes give speakers the benefit of the doubt when a coding decision is difficult to make. | |
23 | Increased % of prosody-voice codes 15/16 EE codes of all PV15/16 codes (uncircled and circled) | X | The same as above except the denominator is the number of PV15/16 codes of any kind. | |
24 | Decreased intensity difference, dB, fricative + vowel | X | For a fricative–vowel pair, the intensity difference is the intensity of the vowel in dB minus the intensity of the fricative in dB. This uses the average intensity difference over all fricative–vowel pairs in the transcript where both phonemes have been delimited during the acoustic analysis. | |
25 | Decreased F0 for all vowels and diphthongs | X | F0 is the fundamental frequency at the characteristic point for those vowels and diphthongs that were delimited during the acoustic analysis. | |
26 | Decreased range of characteristic F0 for delimited vowels/diphthongs | X | This is the overall maximum F0 minus the overall minimum F0. | |
27 | Increased % jitter for vowels b | X | “Jitter is the cycle-to-cycle variation of fundamental frequency, i.e., the average absolute difference between consecutive periods.” | |
28 | Increased % shimmer for vowels b | X | “Shimmer (dB) is expressed as the variability of the peak-to-peak amplitude in decibels, i.e., the average absolute base-10 logarithm of the difference between the amplitudes of consecutive periods, multiplied by 20.” | |
29 | Decreased HNR dB for vowels | X | TF32 (Milenkovic, 2001
c
) calculates the SNR (signal-to-noise ratio) in dB. To calculate the HNR (harmonics-to-noise ratio): Power = 10 to the (SNR/10) power HNR = 10 * log10(Power − 1) |
|
30 | Increased % voice quality resonance wrong | X | Inappropriate resonance in the PVSP includes inappropriate codes 30, 31, and 32 (nasal, denasal, and nasopharyngeal) | |
31 | Decreased F1 /a/ (nasal) | X | First formant frequency for /a/ | |
32 | Decreased F2 for high vowels (nasopharyngeal) | X | Second formant frequency for /i/ and /u/ |
Note. PVSP = Prosody-Voice Screening Profile; PM = Pause Marker.
A = acoustic; P = perceptual.
Jitter and shimmer definitions adapted from “Jitter and Shimmer Measurements for Speaker Recognition,” by M. Farrús, J. Hernando, and P. Ejarque, 2007, Proceedings of the Interspeech, 778–781.
TF32: Department of Electrical and Computer Engineering, University of Wisconsin-Madison.