Abstract
Objectives
Singers, college students, and females are groups known to be at an elevated risk of developing functional/hyperfunctional voice disorders; therefore, female college students majoring in vocal performance may be at an even higher risk. To mitigate this risk, it would be helpful to know the “safe limits” for voice use that would help maintain vocal health in this vulnerable group, but there is a paucity of high-quality objective information upon which to base such limits. This study employed weeklong ambulatory voice monitoring in a large group of vocally healthy female college student singers to begin providing the types of objective data that could be used to help develop improved vocal health guidelines.
Methods
Participants included 64 vocally healthy females currently enrolled in a vocal performance or similar program at a college or university. An ambulatory voice monitor recorded neck-surface acceleration throughout a typical week. A singing classifier was applied to the data to separate singing from speech. Weeklong vocal dose measures and distributional characteristics for standard voice measures were computed separately for singing and speech, and for both types of phonation combined.
Results
Participants spent 6.2% of the total monitoring time speaking and 2.1% singing (with total phonation time being 8.4%). Singing had a higher fo mode, more pitch variability, higher average SPL, negatively skewed SPL distributions, lower average CPP, and higher H1-H2 values than speaking.
Conclusions
These results provide a basis for beginning to establish vocal health guidelines for female students enrolled in college-level vocal performance programs and for future studies of the types of voice disorders that are common in this group. Results also demonstrate the potential value that ambulatory voice monitoring may have in helping to objectively identify vocal behaviors that could contribute to voice problems in this population.
Keywords: Ambulatory monitoring, vocal dose, singer, vocal health guidelines, vocal behavior
Introduction
Singers are known to carry an increased risk of developing voice disorders and often represent a large portion of patients who are seen in voice clinics [1]. There is also a higher incidence of voice problems in college students when compared with the general population [2], and a higher incidence of voice-use related problems in females than males [3]. Therefore, it stands to reason that female college students who are enrolled in a vocal performance or voice concentration program may be at an even higher risk because of participation in an extensive array of co-curricular and extracurricular activities that are vocally demanding, such as voice lessons, choral rehearsals, opera or theater rehearsals, performances, auditions, individual rehearsals, non-curricular singing engagements, and church singing activities, among others [4]. Some may also have full social lives and service jobs that might result in frequent or loud speaking voice use. Students might not be exposed to or understand the implications of poor vocal hygiene and excessive voice use, which may contribute to an increased risk for developing voice disorders. Anecdotally, a high percentage of the patients seen at the Massachusetts General Hospital Voice Center, where this study was performed, are female students enrolled in vocal performance programs in Boston Area colleges and conservatories.
The voice disorders that are more prevalent in singers typically include those associated with vocal hyperfunction (VH), defined as excessive and/or imbalanced muscular forces related to voice production [5]. The extent to which various factors contribute to VH is still under investigation . One manifestation of this condition is phonotraumatic vocal hyperfunction (PVH), which leads to development of benign vocal fold lesions such as nodules or polyps. Evidence has suggested that possible causes of PVH include cumulative vocal fold tissue damage and/or reaction to persistent tissue inflammation [6–12]. Those in high voice-use occupations, such as singers, are more likely to develop phonotrauma, though it is puzzling why some singers develop phonotraumatic lesions while other singers with similar vocal demands do not [13].
Singers who pursue medical care also commonly complain of vocal fatigue and increased vocal effort with singing and/or speaking in the absence of phonotrauma, which are common symptoms associated with non-phonotraumatic vocal hyperfunction (NPVH). There is still no clear understanding about what factors contribute to the symptoms of vocal fatigue or effort, though in NPVH these could conceivably be related to the amount and/or type of voice use.
Even though amount and type of voice use are believed to play a major role in causing the higher incidence of voice disorders in singers, there are currently no empirically-based guidelines that define “safe limits” for voice use in singers. As already noted, this may be particularly problematic for college students enrolled in vocal performance programs who may be at even greater risk of developing voice disorders due to a combination of co-curricular and extracurricular activities making it important that any guidelines encompass/differentiate singing and non-singing (speaking) voice use.
The development and use of ambulatory voice monitoring devices to assess daily voice use has become more prevalent over the past two decades, making it easier to objectively analyze vocal function during activities of daily living [6, 7, 14, 15]. Such devices unobtrusively monitor phonation via a small neck-placed sensor, typically an accelerometer (ACC), that senses neck-skin vibration [16]. In addition to tracking traditional voice measures in daily life, such as fundamental frequency (fo) and sound pressure level (SPL) [17–21], the devices are also used to obtain vocal dose measures, which are an attempt to estimate the cumulative physical load placed on the larynx during periods of phonation (e.g. one day, one week, etc.), and more specifically the exposure of vocal fold tissue to mechanical stress [22]. Several studies have demonstrated methods for estimating various vocal doses in daily life, with the most typical being time dose (accumulated phonation time), cycle dose (total vocal fold oscillatory cycles) and distance dose (total distance, in meters, traveled by the vocal folds based on an algorithmic combination of SPL, fo, and phonation time) [7, 22, 23]. More recently, measures such as cepstral peak prominence (CPP), which can provide insight into the periodicity of the voice signal (related to voice quality), and H1-H2, which can reflect glottal closure (related to the underlying phonatory physiology), have been extracted from the acceleration signal and have been shown to correlate strongly with the same measures extracted from the acoustic signal [12, 24–26]. Recent work has also employed the use of higher-order distributional characteristics, such as skewness and kurtosis, of week-long ambulatory data to describe voice behaviors that might not be well-represented by the lower-order characteristics of means and standard deviations [26].Only a few previous studies have provided empirical data pertaining to the amount and characteristics of typical daily voice use in vocally healthy singers. Most of these studies have been done in small groups and were designed to investigate how vocal dose is related to vocal effort/fatigue and vocal function/quality in singers [27–29].
Carroll et al. [27] used ambulatory voice monitoring (National Center for Voice and Speech (NCVS) Dosimeter™) in a study of seven vocally healthy classical singers (2 female) to find evidence that singer-perceived vocal effort increased on the same day that vocal dose increased. Additionally, they found that voice deterioration and increased vocal effort were more intense after back-to-back days of high vocal doses. While this study was informative regarding the potential impact of vocal doses on perceived vocal effort/fatigue, data were normalized within individual participants and it is not possible to extract interpretable/translatable values for measures of vocal function and vocal dose, nor did it separate the contributions of speaking and singing in analyzing the data.
Two additional small sample studies used ambulatory monitoring to report vocal dose values for individual vocally healthy student singers in different educational programs (2 subjects each in music education, vocal performance or musical theatre) [29] and with respect to the type/intensity of singing activity (2 subjects) [28]. While these studies provided some initial objective data on voice use in vocally healthy singers, the number of subjects and measures were too limited to be used in establishing voice health guidelines.
Only one ambulatory voice monitoring study has been conducted on a larger group of college student singers [4]. This study monitored 19 singers (8 women) for 3 weekdays and focused on investigating relationships between dose measures and common voice quality-related metrics (e.g., fo, SPL, jitter, shimmer, long-term average spectrum [LTAS], pitch strength, etc.). Results showed that higher vocal doses were significantly correlated with greater vocal intensity, more vocal clarity and less perturbation, and there were significant differences in some acoustic voice quality-related measures among choral singing, solo singing and speaking. However, the extent to which results for the vocal function and dose measures can be applied toward establishing voice health guidelines (which was not the goal of the study) is somewhat limited because the vocal health of the participants was not verified by a laryngeal examination and the number of subjects is still relatively small (group sizes for gender-sensitive measures further reduced to 8 women and 11 men). There is also some concern about interpreting measures such as shimmer and LTAS when extracted from the ACC signal because they are not highly correlated with those extracted from the acoustic signal [24].
One potential weakness in most of the studies that have used ambulatory monitoring to estimate vocal function and dose measures in singing is their reliance on subject-generated activity logs to identify time blocks during the day when subjects were primarily engaged in singing or speaking [4, 28, 29]. The accuracy of such self-estimates is questionable, particularly given past research showing that people tend not to be highly accurate when estimating their own voice use [30]. Subjects may also have engaged in speaking and singing during time blocks that were labeled as primarily one or the other. For example, it is feasible that a singer might speak periodically when rehearsing, and singing might occur spontaneously when speaking. Without automated detection of singing, it would be impossible to identify which parts of the hours-long ambulatory recordings represent singing with a high degree of accuracy. To avoid the confounding effect of conflating singing and speech in ambulatory voice data, we have recently developed an automatic singing classification method based on processing the ACC signal. Validation of this method has resulted in highly accurate detection of singing in a training set and separate test set comprised of weeklong ambulatory recordings of the ACC signal in both patients with phonotrauma and controls with healthy voices [31].
The goal of this investigation is to provide objective information on the amount and characteristics of typical voice use in a large group of female student singers who are documented to have normal voices. This is accomplished using weeklong ambulatory voice monitoring and automated data processing that includes differentiation of singing and speech and the extraction of vocal dose and vocal function measures. The results of this work can be used to help develop improved vocal health guidelines (i.e., begin to identify “healthy” or “safe” limits of voice use for this at-risk population), and as a basis for future studies of the types of voice disorders that are common in this group.
Methods
Participants
Sixty-four female singers currently enrolled in a vocal performance or voice concentration major at a college or university were recruited through convenience sampling. All singers were subject to a screening process to confirm healthy vocal status, which included (1) a phone interview with a voicespecialized speech-language pathologist (SLP), (2) report of no current or history of voice problems, (3) a normal hearing screening, and (4) a normal laryngeal stroboscopic evaluation with a voice-specialized SLP. All enrolled participants were judged to have a normal larynx by the SLP that performed the exam.
Average age of participants was 21 years (SD = 2.6, range = 18 – 28). Information regarding current year in school was not obtained. Of the 64 participants, 12 participants identified their primary singing-genre as classical, and 52 reported the primary singing-genre to be non-classical. Sub-genres within non-classical singers (e.g., musical theater, rock, etc.) were not obtained on the entire sample.
Data Collection
Ambulatory voice data was obtained using a neck-placed miniature accelerometer as the phonation sensor (ACC; model BU-27135, Knowles Electronics, Itasca, Illinois, USA) and a custom smartphone application (Voice Health Monitor; VHM) as the data acquisition platform [11]. The unprocessed ACC signal was recorded at an 11,025 Hz sampling rate, 16-bit quantization, and 80 dB dynamic range to obtain frequency content of neck-surface vibrations up to 5,000 Hz. Additional specifications of the system have been detailed in previous publications [11, 12].
The ACC assembly was affixed to the center of the neck, above the suprasternal notch and below the thyroid prominence using hypoallergenic double-sided tape (Model 2181, 3M, Maplewood, Minnesota, USA). Each enrolled participant was monitored for one full typical week (7 days) while school was in session. They were instructed to participate in all normal voice-use activities (e.g., rehearsals, performances, social activities, etc.). Each morning, the participant was prompted to complete a daily SPL calibration sequence, which was recorded by a small handheld digital recorded positioned 15 cm from the lips [11, 32]. In addition to allowing calculation of ACC-based estimates of SPL, these daily calibrations also provided ongoing verification that the system was operating properly.
Voice feature extraction
Weeklong data were processed to yield several voice features and vocal dose measures. The approach that was used to calibrate the ACC sensors with the acoustic signal has been described in previous publications [11, 12]. Measures of SPL and fo were extracted from non-overlapping frames of 50 ms in duration. To distinguish between voiced and non-voiced activity (i.e., voice activity detection) in the ACC signal, each frame was considered voiced if it passed the following thresholds, as described in previous publications [17, 26, 33]: (a) SPL was greater than 45 dB SPL at 15 cm, (b) the first nonzero-lag peak in the normalized autocorrelation exceeded a threshold of 0.6, (c) fo was between 70 and 1000 Hz, and (d) the ratio of low- to high- frequency energy exceeded 20 dB. These criteria were implemented to eliminate nonphonatory data, such as tapping or rubbing against the sensor and electrical artifacts, from the data extracted for analysis.
Averages amounts of voice use (vocal loads) for each participant were quantified through three cumulative vocal dose measures. Accumulated phonation time (time dose) was calculated as the percent of phonation during the total time the patient was monitored. Cycle dose was estimated as the number of vocal fold oscillations that occurred during the total time monitored. Distance dose represents the cumulative distance the vocal folds traveled during the total monitoring time. This combines the estimates of cycle dose and vibratory amplitude based on SPL [17, 22]. Cycle dose and distance dose were normalized by time (per hour) since total monitored time differed between each participant. Doses were determined based on total phonation (singing and speech combined) and for singing and speech separately.
Additional voice features were computed from the raw ACC signal. To calculate CPP from the acceleration signal, two discrete Fourier transforms were computed in succession with a logarithmic transformation between them. CPP was defined as the dB difference between the magnitude of the highest peak and the baseline regression level in the averaged power cepstrum [12, 24]. H1-H2 was derived from a 1024-point fast Fourier transform of each 50-ms frame. The measure is the difference between amplitudes, in dB, of the first and second harmonics in the magnitude spectrum [25]. All voice features were determined based on total phonation (singing and speech combined) and for singing and speech separately.
Singing Detection
The ACC-based data was further processed with an automated singing classifier to separate singing from speech and thus allow for independent analysis of each mode of phonation. The details of the classification system have been described by the authors in a recent publication. In a training set, the singing classifier resulted in an overall accuracy of 93.3%, sensitivity of 90.3%, and specificity of 96.4%. Applying the classification method to a held-out test set resulted in an overall accuracy of 94.2%, sensitivity of 93.5%, and specificity of 95.0% [31].
Statistical Analysis
Descriptive statistics for the full participant sample are presented for each of the following features, in singing, speech, and combined phonation: SPL, fo, phonation time (%), cycle dose (cycles per hour), distance dose (meters per hour), CPP, and H1-H2. The average distributional parameters of mean (mode for fo), standard deviation (SD), skewness, and kurtosis were analyzed as individual features for SPL, fo, CPP, and H1-H2. Skewness is a measure of the asymmetry of the weeklong distributions. A skewness value of 0 would indicate a symmetric distribution; a negative skewness value would indicate that most of the data lies above the mean (e.g., a negatively skewed SPL distribution suggests speaking louder than average more frequently). Kurtosis is a measure that reflects the variability to the extremes of the distributions. A high kurtosis value indicates less variability to the extremes of the weeklong distribution.
Paired t-tests were completed for all voice parameters to identify statistically significant differences between singing and speech. A Bonferroni correction was applied to the significance value; differences were considered to be statistically significant at p ≤ .002.
No attempt was made to identify statistically significant differences in voice features between the two singing-genre groups because the small size of the classical singers group (i.e., statistical tests would be underpowered). Assessing the impact of different singing genres on overall vocal load in student singers is a future goal once sufficient additional data are collected.
Results
Table 1 shows group-based descriptive statistics (means and standard deviations) and distributional parameters for all vocal dose and voice measures as well as the results of t-tests. Participants wore the ambulatory voice monitor for an average of 86 hours, 22 minutes, and 47 seconds, with a standard deviation of 16 hours, 19 minutes and 4 seconds. Average total monitoring time per day was 12 hours and 20 minutes. Participants phonated, on average, 8.4% of the total monitoring time – to equal an average of 7 hours, 14 minutes and 39 seconds. The average percentage of time spent speaking (6.2% to equal an average of 5 hours, 21 minutes and 58 seconds) throughout the course of a week was significantly higher than the average percentage of time spent singing (2.1% to equal an average of 1 hour, 52 minutes and 13 seconds). Thus, on average in this group of singers, 25.8% of their phonation time was spent singing. Because time spent speaking during the week is three times higher than time spent singing, the cycle and distance dose values are significantly higher for speech than for singing (despite significantly higher SPL and fo during singing). We also calculated cycle dose and distance dose separately for each phonation mode normalized by total phonation time per mode. This method found estimates of cycle dose to be an average 373.7 cycles per second for singing and 235.7 cycles per second for speech. Distance dose estimates were found to be an average of 1.4 meters per second for singing and 0.9 meters per second for speech.
Table 1:
Descriptive Statistics of Voice Measures and Vocal Dose for Combined and Separated Phonation
Combined Phonation | Singing | Speech | t | p | |
---|---|---|---|---|---|
Phonation Time (hh:mm:ss) | 07:14:39 (02:43:51) | 01:52:13 (01:21:21) | 05:21:58 (01:49:10) | −16.8 | < .001 |
Phonation time (%) | 8.4 (2.5) | 2.1 (1.4) | 6.2 (1.8) | −17.7 | < .001 |
Cycle dose (cycles/hr) | 81,735.2 (27,354.4) | 28,642.9 (18,943.9) | 53,061.8 (15,385.3) | −9.3 | < .001 |
Distance dose (m/hr) | 315.4 (124.3) | 104.3 (70.6) | 210.9 (82.7) | −9.5 | < .001 |
SPL | |||||
Mean (dB) | 84.8 (5.3) | 86.9 (6.4) | 83.9 (5.4) | 5.8 | < .001 |
SD (dB) | 12.5 (2.5) | 12.9 (2.6) | 12.0 (2.5) | 4.9 | < .001 |
Skewness | .003 (.309) | −.117 (.354) | .000 (.323) | −3.4 | .001 |
Kurtosis | 3.067 (.395) | 3.078 (.577) | 3.156 (.418) | −1.2 | .233 |
fo | |||||
Mode (Hz) | 205.7 (18.2) | 325.4 (41.1) | 203.5 (16.5) | 24.9 | < .001 |
SD (Hz) | 91.6 (17.0) | 94.6 (15.1) | 62.4 (8.7) | 16.1 | < .001 |
Skewness | 1.622 (.419) | 1.091 (.326) | 2.236 (.409) | −18.7 | < .001 |
Kurtosis | 6.737 (2.221) | 5.087 (1.542) | 11.970 (3.447) | −14.9 | < .001 |
CPP | |||||
Mean (dB) | 22.7 (1.1) | 21.5 (1.4) | 23.1 (1.1) | −14.1 | < .001 |
SD (dB) | 4.5 (.3) | 4.0 (.4) | 4.5 (.3) | −12.7 | < .001 |
Skewness | −.183 (.179) | .046 (.285) | −.276 (.159) | 11.2 | < .001 |
Kurtosis | 2.366 (.153) | 2.579 (.319) | 2.386 (1.474) | 4.8 | < .001 |
H1−H2 | |||||
Mean (dB) | 5.5 (2.0) | 9.7 (2.0) | 4.2 (2.1) | 19.4 | < .001 |
SD (dB) | 7.2 (.8) | 7.3 (.8) | 6.6 (.7) | 7.8 | < .001 |
Skewness | .647 (.207) | .481 (.246) | .663 (.208) | −4.4 | < .001 |
Kurtosis | 3.473 (.537) | 2.990 (.470) | 3.680 (.512) | −9.8 | < .001 |
Note: Means and standard deviations presented for each parameter. Paired t-test results are presented for differences between singing and speech. P-values are significant at ≤.002.
As expected, paired t-tests revealed that there were statistically significant differences between speech and singing for most voice parameters. SPL kurtosis was the only parameter not statistically different between singing and speech. Average SPL is higher in singing than speech by 3.0 dB. Based on SPL skewness, speech has a symmetric distribution pattern while singing is slightly negatively skewed (i.e., more time is being spent louder than average). The fo mode in singing is 121.9 Hz higher than in speech, and singing has more fo variability, based on the larger SD and smaller kurtosis values. While there is a positively skewed fo distribution in both phonation modes, speech is skewed even more positively than singing. This indicates that more time was spent speaking in the lower-than-average part of the frequency range, while more of the variation in singing involved higher than average fo.
CPP is higher in speech than in singing by 1.6 dB, and there is less CPP variability in singing. CPP skewness in singing is close to zero while speech has a negatively skewed CPP distribution. This indicates that when participants are speaking they are spending relatively more time producing phonation with higher than average levels of periodic energy. Mean H1-H2 in speech is 5.5 dB lower than in singing, indicating more abrupt/complete glottal closure in speech than in singing.
Discussion
The purpose of this investigation was to identify typical values for vocal dose and voice measures in an unprecedently large cohort of vocally healthy female college student singers as a basis for beginning to develop voice health guidelines for this at-risk population.
This is the first ambulatory study of singers that has implemented an automatic singing detection tool to provide objective differentiation of singing and speech in ambulatory accelerometer-based recordings of phonation. The ability to obtain objective estimates of singing and speaking voice use is a clear advancement over previous ambulatory studies of singers that relied on the questionable accuracy of participant self-reporting (logs) to determine when singing or speaking was occurring [4, 28, 29]. It is not surprising that all but one of the voice measures (SPL kurtosis) were significantly different for speech and singing, given the subjective difference in these two modes of phonation.
This study is the first to establish average normal values for the ACC-based measures of CPP and H1-H2 in healthy singers during typical daily voice use. The potential usefulness of these measures is based largely on recent work showing that they correlate well with the same parameters extracted from the acoustic signal and thus may be interpreted in a similar fashion [24, 25]. H1-H2 reflects the abruptness of glottal closure and the dimension of breathy-to-strained voice quality [34–36]. H1-H2 has recently been found to be highly discriminative between patients with phonotrauma and matched-controls in weeklong ambulatory voice data (that analyzed singing and speech as combined phonation), with more restricted distribution of H1-H2 (smaller standard deviation with less variation toward higher values) being observed in patients than in a matched control group [26]. This result was interpreted to indicate a higher prevalence of more abrupt glottal closure in the patients with phonotrauma. In the current study, average measures of H1-H2 were found to be lower in speech, indicating that speech tends to have a more abrupt and possibly more complete glottal closure than singing. Vocal fold vibration tends to be more symmetric in terms of opening and closing phases and is associated with less abrupt and potentially less complete glottal closure when producing higher pitches, which might contribute to the overall higher H1-H2 values in singing. This finding alludes to the possibility that speaking may have greater potential to contribute to phonotrauma than singing in student singers, particularly coupled with the fact that the students spend much more time speaking than singing. Further work should explore whether speaking vocal load is higher and H1-H2 values in speech are lower in singers who have phonotrauma compared to healthy singers.
We chose to analyze CPP because it has been recommended for clinical use to quantify the level of periodic energy in the voice signal [37], and it has also been correlated with clinician perceptual ratings of overall dysphonia [38]. Average CPP was found to be higher in speech than in singing, which was surprising because we expected singing to have higher levels of periodic energy than speech. The CPP speech distribution was also found to be substantially more negatively skewed than singing, which indicates that singers are spending relatively more time producing speech phonation with higher than average levels of periodic energy. We suspect that this result might correspond to our finding that H1-H2 values reflect more abrupt and probably more complete glottal closure during speech along with the notion that higher fo modes in singing are associated with higher H1-H2. Increased glottal closure would theoretically lead to higher CPP values.
No other studies investigating daily voice measures in healthy singers have explored the higherorder distributional parameters such as skewness and kurtosis, which have the potential to reveal additional information about daily vocal behavior. We found that SPL skewness was .003 in combined phonation and .000 in speech, representing a symmetric SPL distribution. In a recent study that compared patients with phonotrauma to controls without separating singing from speech, negative SPL skewness (phonating louder than average more often) contributed significantly to identifying patients with phonotrauma, while a normal SPL distribution was found in controls [26]. Our current results replicate what was found for the control subjects in the previous study. We did find slight negative skewness in the singing SPL distribution, which is viewed as another indication of the higher physiological demands of singing.
The extent to which results from the current study can be compared with those from previous ambulatory investigations of student singers is limited by differences in methodology, including the use of participant self-reporting to identify periods of speaking and singing, inclusion of males, variation in duration of monitoring, etc. In terms of overall amounts of voice use, Gaskill et al [29] in their study, monitoring six student singers (3 males and 3 females) for 4 or 5 class days, reported an average daily phonation percentage of 12.91%, whereas Schloneger and Hunter [4] reported a value of 10.33% for the 8 female student singers in their study who were monitored for 3 days. These values are both higher than the average percent phonation time of 8.4% (SD = 2.5%) that was found in the current study. There are several possible sources for this difference, including the use of different ambulatory monitoring instrumentation, different data processing methods (e.g. differences in voice activity detection algorithms), and sampling error associated with the relatively small numbers of participants (including further reduction in group sizes due to including males) and monitoring days in the previous studies as compared to the current study of 64 female participants with most being monitored for an entire week (7 consecutive days, which included both class days and weekends).
It is not possible to compare results from the current study based on analyzing singing and speech phonation separately with previous ambulatory studies of student singers [4, 28, 29], again primarily because those previous studies relied on participant self-reporting to identify periods of singing and speech from which voice measures were then extracted, including vocal doses. The use of a discriminatory singing and speech classifier in the current study produced objective estimates of how much each mode of phonation contributes to the overall weekly voice use/vocal load of the student singer group. This is critical to establishing comprehensive voice health guidelines and ultimately helping to determine the potential/relative contribution of speech and singing to voice disorders in this at-risk group – particularly for studying the most common types of phonotraumatic disorders in which the cumulative effect of voice use (chronic vocal fold trauma) is believed to play a major role. However, this objective approach to differentiating singing and speech produced a result that appears to be at odds with previous studies. Because participants in the current study spent much less time phonating to sing than they did to speak, all the vocal dose measures are higher for speech than for singing. Previous ambulatory studies that estimated vocal doses during self-identified periods of either singing or speech found just the opposite (i.e. higher vocal doses for singing than speech), primarily because the segments identified as singing had much more phonation than those identified as speech – appropriately reflecting the higher level of physiological effort associated with singing. However, if we use our data to calculate the cycle and distance doses separately for singing and speech and normalized each value by the total phonation time of each mode we get estimates for cycle dose of 373.7 cycles per second for singing and 235.7 cycles per second for speech, and for distance dose estimates of 1.4 meters per second for singing and 0.9 meters per second for speech. The higher dose values for singing are consistent with previous studies and confirm the notion that singing, while not occurring as frequently, is potentially more demanding on voice production mechanisms than speaking – due primarily to the higher average SPL and fo values associated with singing.
In addition to studies of daily or weekly voice use in student singers, several previous studies have also used ambulatory voice monitoring to describe the voice characteristics and vocal loads for other selected occupations that require high voice use (e.g., teachers, call center operators) and are also at higher than normal risk for developing voice disorders [13, 39, 40]. Titze et al [41] reported that teachers demonstrated an average voicing percentages of 23% during the work day compared to 12% during nonwork hours. Similarly, Hunter and Titze [20] found that a cohort of 57 teachers had an average occupational phonation percentage of 29.9% and a non-occupational percentage of 14.4%. Cantarella et al [42] reported that a group of 92 call center operators had an average of 14.74% phonation time during work-designated time and 6.23% phonation time during non-work time periods. All these previous values for percent phonation time are much higher than the overall average of 8.4% that was found for the vocally healthy student singers in the current study during a week of monitoring that included both speaking and singing. Only the average percentage of time the student singers spent phonating for speech of 6.2% approximated the lowest values previously reported for non-work-related phonation in high voice-use occupations. These somewhat surprising results potentially indicate that vocally healthy student singers use their voices much less than other occupations that are also at high risk for developing voice disorders. However, differences in phonation time values between past studies and the current study might also be due to technical differences in the voice monitoring approaches that were used. Past studies tended to use only SPL and fo thresholds to identify voicing in the ambulatory accelerometer data. In an effort to increase the accuracy of detecting phonation, the current study also used the normalized autocorrelation peak and the ratio of low- to high-frequency energy in addition to SPL and fo thresholds in its voice activity detection algorithm to identify voicing.
The development of new and improved methods for extracting phonatory measures from the neck surface acceleration signal is an ongoing focus of our group. As part of that process, an additional quality check was performed to assess the performance of the singing classifier in processing the data for this study (screening for potential misclassifications), even though the algorithm has already been tested and validated on segments of ambulatory data [31]. Week-long distributions for the normalized autocorrelation peak and fo modes, the two input features of the classifier, were examined for each subject-week. Subject-weeks with significant lack of clear separation between the labeled “singing” and “speech” classes were flagged for review by a trained listener, as this pattern was shown to be present in some of the more significant misclassifications during previous classifier testing. A subset of subjects was determined to have misclassification rates that exceeded the specifications of the classifier. After repeating the statistical analysis with these subjects removed it was found that the results and key findings were not significantly altered and thus the data for these subjects are included in the final results that are reported. However, investigation into the misclassified frames revealed instances of what seemed to be speaking in a “singing-like” fashion, which resulted in increased likelihood of misclassification. This observation brings to light the potential limitations of using a binary classifier (speech or singing). While singing and speech typically represent two fundamentally different modes of phonation, in reality there are times when individuals may produce phonation that combines characteristics of both which suggests that future work should consider implementing modifications to the singing and speech classifier to create a non-binary version that classifies phonation along a continuum.
The results of the current study can be used in the future to guide voice teachers and/or vocal health care specialists in how to tailor education for individual singing students and patients. Implementation of ambulatory voice monitoring could allow the teacher or clinician to identify vocal function and behaviors that fall outside of the normal ranges established here and guide behavior change recommendations in order to curb the risk for developing vocal fold pathology or other symptoms related to vocal hyperfunction.
The interpretation of this study’s results is limited by the lack of comparisons with data from female student singers who have vocal pathology, and the lack of more detailed information about specific singing sub-genres. Without this information it is not possible to identify which measures may be the most salient indicators of the potential risk for developing vocal pathology, particularly with respect to individual singing styles. Comparing the daily vocal function and behavior of female student singers with vocal pathology and matched controls is a goal of future work. The concern about singing style (subgenre) is related to the view that, in addition to the amount of voice use, the type of phonation (e.g., variation in vocal fold closure velocities and impact stresses associated with different levels of adduction forces) may also play a major role in the etiology of vocal pathology, particularly phonotrauma, While current vocal dose measures are designed to characterize the total amount of voice use or vocal load, they do not capture potentially important aspects of phonatory physiology that could influence the risk for developing vocal pathology. For instance, phonation produced for a given period of time that involves higher vocal fold impact stresses due to increased adduction (i.e., pressed vocal folds) is more likely to be associated with phonotrauma compared to the same amount of phonation with less adducted vocal folds. The more recent extraction of measures such as H1-H2 and CPP from the neck skin acceleration signal recorded during ambulatory monitoring has the potential to provide additional information about the prevalence of healthy versus pathophysiological phonatory mechanisms, particularly in the context of different singing styles. More information on the style of singing that both non-classical and classical singers do would be helpful when analyzing differences between groups. For example, we might see the prevalence of a lower H1-H2 values and higher CPP values among musical theater singers who belt compared to jazz singers that implement breathiness into their singing style. This could indicate more pressed and abrupt vocal fold closure (higher vocal fold impact stresses) for the musical theater singers with an associated increased potential risk of phonotrauma compared to the jazz singers for the same amount/time of phonation.
We also had an imbalanced sample size among the two primary singing genres (classical and non-classical). The group of non-classical singers was much larger in our cohort compared to classical singers. Though we did not formally report the significance of differences in voice parameters between singing-genres (due to lack of power and the imbalanced group sizes), we acknowledge above that it is possible that style of singing could impact the “safe limits” for singing voice use. We did informally assess differences between groups using a Bonferroni-corrected independent t-test, and the only differences found were in fo SD (in combined phonation and in singing). Classical singers had somewhat larger pitch variability in than non-classical singers. Other parameters approached significance and had medium-to-large effect sizes, suggesting the possibility that more power for this type of analysis might show differences between groups that we do not currently find.
This work begins to establish normal values for healthy student singers and can serve as the basis for future work that focuses on identifying differences among singers with vocal pathology and those with healthy voices. While recent work has investigated ambulatory differences between patients with phonotrauma and healthy controls, no work to date has exclusively explored the daily vocal behavior of the population of singers with phonotrauma in speech and singing separately. Based on the current results differentiating vocal characteristics found in singing versus speaking, we suspect that the speaking voice might be of potential importance when analyzing the differences between singers with phonotrauma and vocally healthy matched-controls.
This study also demonstrates the potential that ambulatory voice monitoring may have in helping to maintain the voice health of student singers. One could imagine coupling the current results on typical voice use in vocally healthy student singers with the monitoring of new singing students to determine which ones may be at increased risk for developing a voice disorder because they are exceeding the normal benchmarks. The results of such “ambulatory screenings” could be used to counsel students about modifications in their vocal behavior that could prevent voice disorders from developing. The future use of ambulatory monitoring technology to more directly investigate the role of faulty vocal behaviors and excessive vocal loads in causing voice disorders in student singers will only increase its value as a diagnostic tool.
Conclusion
This work uses week-long ambulatory monitoring to characterize voice use in a large group of vocally healthy female college student singers. The results provide a basis for beginning to establish vocal health guidelines for female students enrolled in college-level vocal performance programs and for future studies of the types of voice disorders that are common in this group. Results also demonstrate the potential value that ambulatory voice monitoring may have in helping to objectively identify vocal behaviors that could contribute to voice problems in this population.
Acknowledgments
Financial Disclosures: This work was supported by the National Institutes of Health (NIH) National Institute on Deafness and Other Communication Disorders under grants R33 DC011588 and P50 DC015446.
Footnotes
Conflict of interest statement: Dr. Robert Hillman has a financial interest in Inno-Voyce LLC, a company focused on developing and commercializing technologies for the prevention, diagnosis and treatment of voice-related disorders. Dr. Hillman’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Phyland DJ, Oates J and Greenwood KM, Self-reported voice problems among three groups of professional singers. Journal of Voice, 1999. 13(4): p. 602–611. [DOI] [PubMed] [Google Scholar]
- 2.Merrill RM, Tanner K, Merrill JG, McCord MD, Beardsley MM and Steele BA, Voice symptoms and voice-related quality of life in college students. Annals of Otology, Rhinology & Laryngology, 2013. 122(8): p. 511–519. [DOI] [PubMed] [Google Scholar]
- 3.Roy N, Merrill RM, Thibeault S, Parsa RA, Gray SD and Elaine S, Prevalence of voice disorders in teachers and the general population. Journal of Speech, Language, and Hearing Research, 2004. 47(2): p. 281–293. [DOI] [PubMed] [Google Scholar]
- 4.Schloneger MJ and Hunter EJ, Assessments of voice use and voice quality among college/university singing students ages 18–24 through ambulatory monitoring with a full accelerometer signal. Journal of Voice, 2017. 31(1): p. 124. e21–124. e30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hillman RE, Holmberg EB, Perkell JS, Walsh M and Vaughan C, Objective assessment of vocal hyperfunction: An experimental framework and initial results. Journal of Speech and Hearing Research, 1989. 32(2): p. 373–392. [DOI] [PubMed] [Google Scholar]
- 6.Cheyne HA, Hanson HM, Genereux RP, Stevens KN and Hillman RE, Development and testing of a portable vocal accumulator. Journal of Speech, Language, and Hearing Research, 2003. 46(6): p. 1457–67. [DOI] [PubMed] [Google Scholar]
- 7.Popolo PS, Švec JG and Titze IR, Adaptation of a Pocket PC for use as a wearable voice dosimeter. Journal of Speech, Language, and Hearing Research, 2005. 48(4): p. 780–791. [DOI] [PubMed] [Google Scholar]
- 8.Czerwonka L, Jiang JJ and Tao C, Vocal nodules and edema may be due to vibration-induced rises in capillary pressure. Laryngoscope, 2008. 118(4): p. 748–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Karkos PD and McCormick M, The etiology of vocal fold nodules in adults. Current Opinion in Otolaryngology & Head & Neck Surgery, 2009. 17(6): p. 420–423. [DOI] [PubMed] [Google Scholar]
- 10.Tao C, Jiang JJ and Czerwonka L, Liquid accumulation in vibrating vocal fold tissue: A simplified model based on a fluid-saturated porous solid theory. Journal of Voice, 2010. 24(3): p. 260–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mehta DD, Zañartu M, Feng SW, Cheyne HA II and Hillman RE, Mobile voice health monitoring using a wearable accelerometer sensor and a smartphone platform. IEEE Transactions on Biomedical Engineering, 2012. 59(11): p. 3090–3096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mehta DD, Van Stan JH, Zanartu M, Ghassemi M, Guttag JV, Espinoza VM, Cortes JP, Cheyne HA 2nd, and Hillman RE, Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update. Front Bioeng Biotechnol, 2015. 3: p. 155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Roy N, Merrill RM, Gray SD and Smith EM, Voice disorders in the general population: Prevalence, risk factors, and occupational impact. Laryngoscope, 2005. 115(11): p. 1988–1995. [DOI] [PubMed] [Google Scholar]
- 14.Švec JG, Hunter EJ, Popolo PS, Rogge-Miller K and Titze IR, The calibration and setup of the NCVS dosimeter. NCVS Online Technical Memo, 2004. 2: p. 1–52. [Google Scholar]
- 15.Carullo A, Penna A, Vallan A, Astolfi A and Bottalico P A portable analyzer for vocal signal monitoring. in 2012 IEEE International Instrumentation and Measurement Technology Conference (I2MTC). 2012. [Google Scholar]
- 16.Van Stan JH, Gustafsson J, Schalling E and Hillman RE, Direct comparison of three commercially available devices for voice ambulatory monitoring and biofeedback. Perspectives on Voice and Voice Disorders, 2014. 24(2): p. 80–86. [Google Scholar]
- 17.Van Stan JH, Mehta DD, Zeitels SM, Burns JA, Barbu AM and Hillman RE, Average ambulatory measures of sound pressure level, fundamental frequency, and vocal dose do not differ between adult females with phonotraumatic lesions and matched control subjects. Annals of Otology, Rhinology, and Laryngology, 2015. 124(11): p. 864–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ghassemi M, Van Stan JH, Mehta DD, Zañartu M, Cheyne HA II, Hillman RE and Guttag JV, Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules. IEEE Transactions on Biomedical Engineering, 2014. 61(6): p. 1668–1675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hillman RE, Heaton JT, Masaki A, Zeitels SM and Cheyne HA, Ambulatory monitoring of disordered voices. Annals of Otology, Rhinology, and Laryngology, 2006. 115(11): p. 795–801. [DOI] [PubMed] [Google Scholar]
- 20.Hunter EJ and Titze IR, Variations in intensity, fundamental frequency, and voicing for teachers in occupational versus nonoccupational settings. Journal of Speech, Language, and Hearing Research, 2010. 53(4): p. 862–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Titze IR and Hunter EJ, Comparison of vocal vibration-dose measures for potential-damage risk criteria. Journal of Speech, Language, and Hearing Research, 2015. 58(5): p. 1425–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Titze IR, Švec JG and Popolo PS, Vocal dose measures: Quantifying accumulated vibration exposure in vocal fold tissues. Journal of Speech, Language, and Hearing Research, 2003. 46(4): p. 919–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Švec JG, Popolo PS and Titze IR, Measurement of vocal doses in speech: Experimental procedure and signal processing. Logopedics, Phoniatrics, Vocology, 2003. 28(4): p. 181–192. [DOI] [PubMed] [Google Scholar]
- 24.Mehta D, Van Stan J and Hillman R, Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016. 24(4): p. 659–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mehta DD, Espinoza VM, Van Stan JH, Zanartu M and Hillman RE, The difference between first and second harmonic amplitudes correlates between glottal airflow and necksurface accelerometer signals during phonation. J Acoust Soc Am, 2019. 145(5): p. El386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Van Stan JH, Mehta DD, Ortiz AJ, Burns JA, Toles LE, Marks KL, Vangel M, Hron T, Zeitels S, and Hillman RE, Differences in Weeklong Ambulatory Vocal Behavior Between Female Patients With Phonotraumatic Lesions and Matched Controls. Journal of Speech, Language, and Hearing Research, 2020. 63(2): p. 372–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Carroll T, Nix J, Hunter E, Emerich K, Titze I and Abaza M, Objective measurement of vocal fatigue in classical singers: A vocal dosimetry pilot study. Otolaryngology--Head and Neck Surgery, 2006. 135(4): p. 595–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schloneger MJ, Graduate student voice use and vocal efficiency in an opera rehearsal week: a case study. Journal of Voice, 2011. 25(6): p. e265–e273. [DOI] [PubMed] [Google Scholar]
- 29.Gaskill CS, Cowgill JG and Many S, Comparing the vocal dose of university students from vocal performance, music education, and music theater. Journal of Singing, 2013. 70(1): p. 11. [Google Scholar]
- 30.Mehta DD, Cheyne HA II, Wehner A, Heaton JT and Hillman RE, Accuracy of selfreported estimates of daily voice use in adults with normal and disordered voices. American Journal of Speech-Language Pathology, 2016. 25(4): p. 634–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ortiz AJ, Toles LE, Marks KL, Capobianco S, Mehta DD, Hillman RE and Van Stan JH, Automatic speech and singing classification in ambulatory recordings for normal and disordered voices. The Journal of the Acoustical Society of America, 2019. 146(1): p. EL22–EL27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Švec JG, Titze IR and Popolo PS, Estimation of sound pressure levels of voiced speech from skin vibration of the neck. The Journal of the Acoustical Society of America, 2005. 117(3): p. 1386–94. [DOI] [PubMed] [Google Scholar]
- 33.Mehta DD, Van Stan JH, Zanartu M, Ghassemi M, Guttag JV, Espinoza VM, Cortes JP, Cheyne HA 2nd, and Hillman RE, Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update. Front Bioeng Biotechnol, 2015. 3(155): p. 155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Klatt DH and Klatt LC, Analysis, synthesis, and perception of voice quality variations among female and male talkers. The Journal of the Acoustical Society of America, 1990. 87(2): p. 820857. [DOI] [PubMed] [Google Scholar]
- 35.Lowell SY, Kelley RT, Awan SN, Colton RH and Chan NH, Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality. Annals of Otology, Rhinology, and Laryngology, 2012. 121(8): p. 539–548. [DOI] [PubMed] [Google Scholar]
- 36.Hillenbrand J, Cleveland RA and Erickson RL, Acoustic correlates of breathy vocal quality. Journal of Speech and Hearing Research, 1994. 37(4): p. 769–778. [DOI] [PubMed] [Google Scholar]
- 37.Patel RR, Awan SN, Barkmeier-Kraemer J, Courey M, Deliyski D, Eadie T, Paul D, Švec JG, and Hillman R, Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. American Journal of Speech-Language Pathology, 2018: p. 1–19. [DOI] [PubMed] [Google Scholar]
- 38.Awan SN, Roy N, Jetté ME, Meltzner GS and Hillman RE, Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V. Clinical Linguistics & Phonetics, 2010. 24(9): p. 742–758. [DOI] [PubMed] [Google Scholar]
- 39.Verdolini K and Ramig LO, Review: Occupational risks for voice problems. Logopedics, Phoniatrics, Vocology, 2001. 26(1): p. 37–46. [PubMed] [Google Scholar]
- 40.Titze IR, Lemke J and Montequin D, Populations in the U.S. workforce who rely on voice as a primary tool of trade: A preliminary report. Journal of Voice, 1997. 11(3): p. 254–259. [DOI] [PubMed] [Google Scholar]
- 41.Titze IR, Hunter EJ and Švec JG, Voicing and silence periods in daily and weekly vocalizations of teachers. The Journal of the Acoustical Society of America, 2007. 121(1): p. 469478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Cantarella G, Iofrida E, Boria P, Giordano S, Binatti O, Pignataro L, Manfredi C, Forti S, and Dejonckere P, Ambulatory phonation monitoring in a sample of 92 call center operators. Journal of Voice, 2014. 28(3): p. 393. e1–393. e6. [DOI] [PubMed] [Google Scholar]