Abstract
Purpose
Although there is a long history of use of semi-occluded vocal tract gestures in voice therapy, including phonation through thin tubes or straws, the efficacy of phonation through tubes has not been established. This study compares results from a therapy program on the basis of phonation through a flow-resistant tube (FRT) with Vocal Function Exercises (VFE), an established set of exercises that utilize oral semi-occlusions.
Method
Twenty subjects (16 women, 4 men) with dysphonia and/or vocal fatigue were randomly assigned to 1 of 4 treatment conditions: (a) immediate FRT therapy, (b) immediate VFE therapy, (c) delayed FRT therapy, or (d) delayed VFE therapy. Subjects receiving delayed therapy served as a no-treatment control group.
Results
Voice Handicap Index (Jacobson et al., 1997) scores showed significant improvement for both treatment groups relative to the no-treatment group. Comparison of the effect sizes suggests FRT therapy is noninferior to VFE in terms of reduction in Voice Handicap Index scores. Significant reductions in Roughness on the Consensus Auditory-Perceptual Evaluation of Voice (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) were found for the FRT subjects, with no other significant voice quality findings.
Conclusions
VFE and FRT therapy may improve voice quality of life in some individuals with dysphonia. FRT therapy was noninferior to VFE in improving voice quality of life in this study.
Voice disorders affect the ability to communicate at work and in recreational activities. The lifetime prevalence of self-reported voice problems in adults has been found to be almost 30%, with a point prevalence from 6.6% to 7.5% (Cohen, 2010; Roy, Merrill, Gray, & Smith, 2005). Treatments for voice disorders include medication, surgery, and behavioral therapy. Voice therapy is frequently used either as a primary treatment or as an adjunct to surgical intervention. Many successful voice therapy techniques and programs are based on exercises that semi-occlude the vocal tract (SOVT). SOVT exercises such as phonating through straws and tubes are becoming increasingly common in clinical practice. However, the clinical efficacy of these exercises has not been studied to date.
Semi-occluded voice exercises are rooted in a long tradition of use in training vocal performers. For example, some exercises involve using the hand during phonation to partially cover the mouth (Aderhold, 1963) or completely cover the mouth (Coffin, 1987), thus creating either a semi-occlusion or a complete occlusion for a brief moment. Oscillatory SOVT exercises, such as lip trills, tongue trills, and raspberries (labio-lingual trills), have been used in training of the acting voice as well as the singing voice (Linklater, 1976; Nix, 1999). Engel (1927) described the use of SOVT for acting voice, suggesting that narrowing the mouth with the tongue tip against the alveolar ridge produces efficient voicing. This technique was further developed by Lessac (1997) in the resonance exercises producing a vocal quality called y-buzz, which utilizes oral narrowing of the vocal tract to create buzzy sensations in the face due to heightened acoustic pressures in the narrowed region.
Several well-known voice therapy programs utilize SOVT exercises as key components of the therapy. For example, Lessac-Madsen Resonant Voice Therapy (LMRVT) draws from performing voice techniques as described above, and relies upon semi-occluded consonants such as fricatives and the nasals /m/, /n/, and /ŋ/ as key training gestures and embedded cues in connected speech (Orbelo, Li, & Verdolini Abbott, 2014; Verdolini, 2000; Verdolini-Marston, Burke, Lassac, Glaze, & Caldwell, 1995). Resonant voice has been defined clinically as “easy to produce and buzzy in the facial tissues” and scientifically as “a reinforcement of the source by the vocal tract” (Titze & Verdolini Abbott, 2012). In a similar way, the Accent Method (AM) utilizes consonants such as voiced fricatives that provide oral semi-occlusions in rhythmic vocalizations, moving progressively from nonspeech to connected speech exercises (Kotby & Fex, 1998). Vocal Function Exercises (VFE), a therapy technique based upon the work of Briess (1957, 1959), are a series of nonspeech daily exercises that incorporate semi-occlusion at the lips as the primary training gesture (Stemple, 1993, 2005). In addition, there is a long history of using straws or tubes to extend the vocal tract and provide resistance during phonation (Gundermann, 1977; Habermann, 1980; Laukkanen, 1992; Sovijärvi, 1966; Spiess, 1904; Tapani, 1992). A variation on this gesture is phonation through a tube submerged in water (Simberg & Laine, 2007).
SOVT exercises have been validated by several theoretical investigations. In a modeling study (Titze & Laukkanen, 2007), semi-occlusion of the vocal tract at the lips paired with narrowing of the epilaryngeal tube increased inertive reactance in the range of 200–1000 Hz, reinforcing vocal fold vibration and increasing vocal economy (defined as maximum flow declination rate divided by maximum area declination rate). Lengthening of the vocal tract, as achieved with phonation through a thin tube, further increases vocal tract inertance (see Figure 1) (Titze & Verdolini Abbott, 2012). Vocal tract inertance has previously been shown to have desirable effect. Titze (1988) showed that acoustically, an inertive vocal tract reduces phonation threshold pressure (PTP). Vocal tract inertance also skews the flow pulse to increase maximum flow declination rate (Rothenberg, 1981; Titze, 2006a), thus increasing the intensity of higher harmonics in the acoustic spectrum. SOVT may also provide a correction for so-called pressed voice. The oral pressure produced by a semi-occlusion behind the lips acts on the superior surface of the vocal folds to keep them separated, thereby helping to maintain a rectangular glottal shape (Titze, 2014). Squared-up vocal folds, where the medial surfaces of the vocal folds are parallel or nearly parallel, have the lowest PTP, requiring less vocal fold adduction. This has been shown repeatedly with physical models and with computational models (Chan, Titze, & Titze, 1997; Titze, 1988). The rationale and underpinnings of voice therapy with both a frontal semi-occlusion (the lips) and a rear semi-occlusion (in the epilarynx tube) have also been outlined in terms of maximum power output (Titze, 2006b).
SOVT therapy programs and exercises vary in their complexity and claims about efficacy, physiologic underpinnings, and ease of application in a clinical setting. Although some treatment programs have had beneficial effects documented in randomized controlled trials (RCTs), others remain in a very exploratory stage. We will review the evidence supporting several commonly used treatment approaches here.
Resonant Voice Therapy
Several studies have sought to define resonant voice, which is often judged perceptually by clinicians and coaches. Perceptually, resonant voice has been defined as voice involving perceptible anterior oral vibrations that feels easy to produce (Verdolini-Marston et al., 1995). Biomechanically, this voicing pattern has been shown to be associated with a barely ad/abducted vocal fold configuration (Verdolini, Druker, Palmer, & Samawi, 1998), resulting in a closed quotient between .5 and .6. The vocal folds are neither hyperadducted nor hypoadducted (Peterson, Verdolini-Marston, Barkmeier, & Hoffman, 1994). Acoustically, there is evidence that adjustment of the relationship between harmonics and formants can lead to maximized acoustic output (Barrichelo-Lindstrom & Behlau, 2009; Smith, Finnegan, & Karnell, 2005). Resonant voice may in fact rely on maximizing source-filter interaction (Titze, 2004), thereby lowering PTP through increased inertance of the vocal tract air column (Titze, 2001), which is initially trained through the use of semi-occluded consonant sounds.
At least three published studies have examined the efficacy of resonant voice therapy. First, Verdolini-Marston et al. (1995) conducted a small-scale, preliminary study using a prospective, randomized controlled design with blinding. The authors examined college-aged women with nodules who were treated with either resonant voice therapy or confidential voice therapy in a two-week intensive therapy program. Treatment benefits were demonstrated across auditory-perceptual measures, laryngoscopy findings, and self-ratings of vocal effort. An interesting finding was that treatment benefits were shown to be dependent upon compliance, but not therapy type. Limitations of this study included, as the authors noted, the inability to use parametric statistics to determine the comparative magnitude of treatment benefits across treatment types, although nonparametric, binomial statistics did produce meaningful results. Second, in an RCT (Roy et al., 2003), 64 teachers with dysphonia were randomly assigned to one of three treatment groups: voice amplification, resonant voice therapy, or respiratory muscle therapy. Treatment benefits were observed in the voice amplification and resonant voice therapy groups, based upon reduction in Voice Handicap Index (VHI) scores (Jacobson et al., 1997) and subjects' self-ratings of voice symptom severity. In a third study, 24 teachers were treated in small groups using a protocol based on LMRVT, without a control group (Chen, Hsiao, Hsiao, Chung, & Chiang, 2007). Diagnoses included four subjects with muscle tension dysphonia, six subjects with vocal fold nodules, and 14 subjects with “chronic corditis.” Significant improvements were noted after therapy in auditory-perceptual measures, laryngostroboscopy ratings, speech fundamental frequency (F0), F0 range, PTP, and VHI-physical domain scores. Data are forthcoming from several prospective, randomized clinical trials of LMRVT, one of which shows longitudinal improvements in VHI scores continuing over a 1-year follow-up period after treatment (K. Verdolini Abbott, personal communication, September 12, 2014).
AM
Several studies of varying quality have investigated the efficacy of AM of voice therapy. AM therapy utilizes consonants such as voiced fricatives that provide oral semi-occlusions in rhythmic vocalizations, moving progressively from nonspeech to connected speech exercises (Kotby & Fex, 1998). Smith and Thyme (1976) measured acoustic changes in a pre/post uncontrolled, unmasked study of 30 students without dysphonia who participated in 10 sessions of AM therapy. The authors reported improvements in various spectrographic parameters. Kotby, El-Sady, Basiouny, Abou-Rass, and Hegazi (1991) studied the effects of AM therapy on 28 subjects with dysphonias of a variety of origins (functional dysphonia, vocal fold lesions, and vocal fold immobility) in an uncontrolled, unmasked study. After 20 therapy sessions, positive changes in voice performance were reported by 89.3% of subjects. In addition, the authors reported improved auditory perceptual ratings of voice quality in over half of subjects, reduced nodule size in all six subjects with nodules, and significant improvements in some aerodynamic measures in group pre/post comparisons. Likewise, improvements in auditory perceptual and acoustic measures were found in an uncontrolled, unmasked study of 10 subjects with functional dysphonia (Fex, Fex, Shiromoto, & Hirano, 1994). In the largest study to date of AM therapy, Bassiouny (1998) conducted a randomized, controlled, double-masked trial of AM therapy with 42 subjects with dysphonia, including functional dysphonia and dysphonia associated with vocal fold lesions and vocal fold immobility. Subjects were randomly assigned to 20 sessions of AM therapy including vocal hygiene advice plus the accent exercises, or 10 sessions of vocal hygiene only. The AM group improved more in auditory perceptual, acoustic, and aerodynamic measures than the vocal hygiene group. The AM group also showed improvements in stroboscopy parameters, which were not found in the vocal hygiene group.
VFE
A number of well-designed studies have documented the efficacy of VFE, a set of pitch range and duration exercises for voice that use SOVT postures. Two controlled studies have shown changes in flow rate, phonatory volume, maximum phonation time, and pitch range in vocally healthy subjects, pre- to postvoice training with VFE (Stemple, Lee, D'Amico, & Pickup, 1994), and singers (Sabol, Lee, & Stemple, 1995). One study of preventive VFE treatment with teachers did not show significant effects of VFE, but the outcomes were likely limited by inadequate subject training in the exercises (Pasa, Oates, & Dacakis, 2007). Several RCTs of VFE with dysphonic subjects have shown positive effects, including increased self-ratings of voice improvement, ease, and clarity of voice (Roy et al., 2001); significant improvements in Voice Symptom Severity Scale scores (Gillivan-Murphy, Drinnan, O'Dwyer, Ridha, & Carding, 2006); and improvements in perturbation, harmonics-to-noise ratio, and auditory perceptual judgments of voice quality (Nguyen & Kenny, 2009).
Several studies have also addressed the effects of a VFE regimen on aging voices. Two uncontrolled studies showed significant decreases in VHI scores, self-ratings of phonatory effort level, and auditory perceptual measures of breathiness and strain (Sauder, Roy, Tanner, Houtz, & Smith, 2010), as well as significant improvement in maximum phonation time and several aerodynamic measures related to glottal closure (Gorman, Weinrich, Lee, & Stemple, 2008). One controlled study of aging community choral singers found significant improvements in perceived roughness, maximum phonation time, jitter, shimmer, and harmonics-to-noise ratio in the VFE group (Tay, Phyland, & Oates, 2012). A preliminary RCT of VFE in elderly individuals with presbyphonia showed significant improvements in voice-related quality of life after treatment compared to a no-treatment control group (Ziegler, Verdolini Abbott, Johns, Klein, & Hapner, 2013), though a measure of phonatory effort did not improve significantly.
SOVT Exercises with Tubes and Straws
The effects of various SOVT exercises with narrow tubes between the lips on the physiology of sound production during exercise have been investigated in a handful of small studies, with varying results. Laryngeal elevation has been observed using dual-channel electroglottography during phonation into a glass tube (Laukkanen, Lindholm, & Vilkman, 1995b), whereas two single-subject X-ray computed tomography (CT) studies have found no change in laryngeal position (Vampola, Laukkanen, Horacek, & Svec, 2011) or a lowering of laryngeal position (Guzman et al., 2013) during phonation into tubes or straws. These conflicting results may be due to limitations in the accuracy of dual-channel electroglottography for measuring laryngeal position (Laukkanen, Takalo, Vilkman, Nummenranta, & Lipponen, 1999), or due to differences in position (i.e., upright vs. supine) during the different procedures. Laukkanen, Titze, Hoffman, and Finnegan (2008) recorded electromyography signals in a single subject during phonation into various tubes and during production of a voiced bilabial fricative. The subject increased thyroarytenoid (TA) activity, relative to cricothyroid and lateral cricoarytenoid activation, in response to increased vocal tract impedance during tube and fricative phonation. A computer simulation done as part of this study found that greater vocal economy (defined as maximum flow declination rate divided by maximum area declination rate) and glottal efficiency (defined as radiated output power divided by glottal aerodynamic power) were obtained with a higher TA/CT ratio, with adduction by lateral cricoarytenoid adjusted to maximize these outcomes (generally around 22%). The effects of various diameter straws on aerodynamic and vocal fold vibratory characteristics have been investigated in two trained singers (Titze, Finnegan, Laukkanen, & Jaiswal, 2002). The findings suggested that with decreased straw diameter, lung pressures greatly increase, but without a concomitant increase in amplitude of vibration of the vocal folds or closed quotient; thus it was concluded that larger collision forces and pressed voice are not likely to occur during straw phonation. Observations of closed quotient during tube phonation have been mixed, however, including one study finding increased vocal fold contact during tube phonation (Laukkanen, 1992), and another study finding a general trend toward decreased contact quotient during tube phonation, but with a great deal of variability (Gaskill & Quinney, 2012).
Some studies have looked at effects on voice production immediately following use of SOVT exercises. In their study of tube phonation, Laukkanen et al. (1995b) used surface electromyography (sEMG) to estimate muscular activity during production of vowels before and after 20 prolonged phonations into a glass tube, and found that sEMG activity after tube phonation increased in women, whereas it lessened in men. In a similar study of phonation before and after 20 productions of a voiced bilabial fricative, Laukkanen, Lindholm, Vilkman, Haataja, and Alku (1996) found decreased muscular activity, as estimated by sEMG, after exercise, without acoustic changes. In another study, glottal resistance was found to decrease for most of 11 subjects, and laryngeal efficiency decreased in about half of subjects after exercising with 10 tokens of voiced bilabial fricatives, /m/, and tube phonation, due to increased glottal flow (Laukkanen, Lindholm, & Vilkman, 1995a), whereas both measures increased for a subject with glottal insufficiency due to decreased glottal flow. In fact, SOVT exercises appeared to have an immediate impact on the glottal width. In a study of a single subject using CT before, during, and after 5 min of tube phonation (Guzman et al., 2013), the ratio between the pharyngeal inlet area and the area of the epilaryngeal tube increased during and after the exercises, accompanied by better velopharyngeal closure and lower laryngeal position. In a similar case, another single-subject CT study found increased cross-sectional area of the vocal tract relative to epilaryngeal area and improved velopharyngeal closure during vowel phonation performed after 5 min of phonation into a tube (Vampola et al., 2011). This change in relative area would lead to increased vocal tract inertance, as described above. In addition, Enflo, Sundberg, Romedahl, and McAllister (2013) found increased collision threshold pressure in singers phonating immediately after phonation into tubes submerged in water for 2 min.
There is also some evidence that SOVT exercises have immediate effects on acoustic output and perceived voice quality. In one study, F0 decreased after 1 min of tube phonation, attributed to possible decreased muscular tension (Sampaio, Oliveira, & Behlau, 2008), although in another study, F0 did not change after the same duration of exercise (Costa, Costa, Oliveira, & Behlau, 2011). Guzman et al. (2013) found increased spectral prominence of the singer's/speaker's formant cluster after 5 min of tube phonation in a single subject. Perceptual measures of voice quality have suggested positive effects of tube phonation. Several studies reported improved perceptual judgments of voice quality immediately after 1–5 min of phonation through tubes (Enflo, et al., 2013; Guzman et al., 2013; Sampaio et al., 2008) and after exercising with three sets of 15 voiced tongue trills (Schwarz & Cielo, 2009). However, Costa et al. (2011) again reported no change in perceptual judgments of voice quality after straw phonation, but concluded that their exercise dosage may have been insufficient to see effects. In their study, subjects with vocal fold lesions did improve in vocal self-assessment ratings after straw phonation.
Some of the variability in the above findings may be attributed to differences in the semi-occlusions used. Semi-occlusions vary in the intraoral pressures they create, and these pressures vary across subjects (Maxfield, Titze, Hunter, & Kapsner-Smith, 2014). Even in the case of phonation through tubes, which is more controlled than some other SOVT exercises, the diameter of the tube will have an impact on the pressure-flow relationship and the resistance created at the lips, therefore changing the back pressure created during phonation (Titze et al., 2002). A certain amount of narrowing/resistance may be necessary to create desirable changes; indeed, in the CT imaging and acoustic analysis conducted by Guzman et al. (2013) described above, greater beneficial changes were induced by phonation through a narrow stirring straw than through a wider glass tube.
Speech Versus Nonspeech Exercises
The voice therapy approaches described utilize nonspeech exercises, speech exercises, or a combination of both. VFE and tube phonation, for example, rely on nonspeech semi-occlusions and do not incorporate direct training of speech production. LMRVT begins with nonspeech exploratory exercises such as humming and pitch glides, but progresses quickly to semi-occlusions embedded in speech. There is some controversy over how motor learning may occur in nonspeech versus speech exercises for voice. Views on the potential for learning and carryover into communication depend in part on one's theoretical perspective.
For example, in schema theory, a key concept has to do with generalized motor programs (GMPs), which are proposed to be central representations of movement. The suggestion is that GMPs are developed with practice and once acquired, are parameterized for each trial within a class of movements to govern movement (e.g., Schmidt, 1975). A schema is a hypothetical three-dimensional cognitive space that relates initial conditions for movement, parameters applied to the GMP, and movement outcomes (recall schema) or that relates initial conditions, sensory consequences of movement, and movement outcomes (recognition schema). Schema theory predicts several variables should influence learning, including the distribution of practice and provision of knowledge of results. Most germane to the present discussion, schema theory also predicts that generalization from trained to untrained exemplars of a movement should be enhanced by variable practice, due to an enrichment of data in the recall schema “space” that it induces (Schmidt, 1975). Considerable data are consistent with this prediction (for review, see Titze & Verdolini Abbott, 2012). However, there is not uniform agreement about the reasons for it. According to one alternative view, the variable practice effect can, paradoxically, be explained by the specificity of practice principle. In this view, the reason that variable practice enhances generalization is not because some rule-based cognitive schema is enhanced by it. Rather, generalization is enhanced by variable practice because if a person practices a task in a large number of ways, when he or she encounters a novel version of it in the future, the chances increase that that novel version will in some way approximate at least one exemplar encountered in the past. Thus, the learner can use a prior cognitive or neural trace to guide movement in the present context (for related discussion see, for example, Kumaran & McClelland, 2012). The specificity of practice principle would suggest that the benefits of nonspeech SOVT training for speech will be enhanced by training its corollaries in actual speech.
In contrast to approaches to motor learning that emphasize cognitive processes, the dynamical systems theory of motor control de-emphasizes the role of cognition. In this approach, the focus is on the emergence of behavior from self-organization of independent biological subsystems in response to interactions across subject, task, and environment (Glazier, Davids, & Bartlett, 2003). Central representations of movement are considered largely superfluous. From a dynamical systems perspective, changes in motor behavior are not the result of central cognitive processes, but rather are induced by external conditions and peripheral responses (e.g., altered voicing physiology in response to physical conditions imposed by SOVT). Some carryover to speech could still occur, analogous to general physical preparation in sports literature—and indeed the idea of “taking your larynx to the gym” has been proposed as a purpose of these exercises. As an alternative to preserve the principle of task specificity as conceptualized in cognitively based motor learning approaches, semi-occluded nonspeech exercises may be designed to bear a closer resemblance to speech, such as incorporating variations in pitch and loudness and speech-like prosodic contours.
Purpose of Study
To date, no studies have been performed investigating the effects of a therapeutic tube phonation protocol, though these exercises have a long history of use in voice studios and clinics. There is some evidence that tube phonation has immediate positive effects such as improved voice quality, decreased muscular activation, and decreased effort or tension. This supports the theoretical notion that semi-occlusion of the vocal tract may lead to efficient voice production by optimizing glottal configuration and vocal tract impedance. In addition, tube phonation appears to allow full engagement of respiratory muscles and stretching of the vocal folds while avoiding increased impact stress to the true vocal folds (Titze et al., 2002).
Exercises that utilize phonation through a narrow flow-resistant tube (FRT) or straw have the benefit of being easy to teach, easy to learn, and convenient. Like VFE, FRT therapy utilizes nonspeech semi-occluded exercises to elicit healthy voicing. Unlike VFE, the FRT protocol utilized in this study includes variations in loudness and speech-like prosodic contours, which we hypothesize may improve carryover to connected speech. Although the efficacy of VFE has been demonstrated in several well-designed clinical trials, systematic and controlled study of FRT exercise programs is needed to support their routine use in habilitation and rehabilitation of the voice.
This study aims to provide initial evidence regarding the efficacy of FRT exercises in a clinical RCT utilizing VFE as a standard of comparison. The study design seeks to determine whether FRT exercises produce an amount of clinical change that is equivalent (noninferior) to change produced by VFE, a voice therapy program with documented beneficial effects in clinical trials. Our primary outcome measure is the VHI (Jacobson et al., 1997) quality of life scale; results of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V; Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) serve as a secondary measure.
Methods
Subjects
Twenty-five individuals were screened for participation in the study. Twenty-one individuals with dysphonia (17 women, 4 men) were enrolled in and completed the study. All recruitment and study procedures were approved by the University of Utah Institutional Review Board. Inclusion criteria for participation in the study included complaint of chronic vocal fatigue and/or hoarseness, and age over 18 years. Subjects were diagnosed by a local otolaryngologist and speech-language pathologist and referred for voice therapy (see Appendix for diagnoses). Subjects were excluded if they showed evidence of a laryngeal condition requiring immediate medical attention (e.g., laryngeal cancer) or voice rest (e.g., vocal fold hemorrhage), or if their participation in the study was otherwise determined to be contra-indicated by the study team. No other exclusion criteria were used. Data from one woman were excluded from the analysis due to confounding outside treatment (voice rest). Passage of subjects through the trial is depicted in Figure 2, according to the Consolidated Standards of Reporting Trials (CONSORT; Schulz, Altman, & Moher, 2010).
The average age of subjects was 51.5 years, SD = 11.4 years, with a range of 32–72 years. Subjects' ages, diagnoses, and primary voice complaints are given in the Appendix. The most common diagnosis was laryngopharyngeal reflux with vocal fold edema. Seven subjects had a current or prior history of vocal fold lesions. Two subjects were diagnosed with vocal fold paresis or paralysis. All subjects complained of hoarseness and/or vocal fatigue as primary symptoms. Average time since onset was 6 years 6 months, SD = 11 years, with a range of 4 months to 43 years. One subject had received prior voice therapy. Six subjects had prior voice/singing training (three in each therapy group), for an average of 9 years (range 6–12 years). No subjects were current smokers.
Procedures
Subjects were randomly assigned to one of four treatment groups: (a) immediate FRT therapy, (b) immediate VFE therapy, (c) delayed FRT therapy, or (d) delayed VFE therapy. Subjects were distributed equally among groups (1:1 allocation). Groups 1 and 2 began treatment 1 week following their initial assessment; Groups 3 and 4 had a 6-week no-treatment period after initial assessment, followed by re-assessment and therapy. All groups participated in a final assessment 1 week following completion of therapy. Thus, by using measures before and after the no-treatment waiting period, Groups 3 and 4 also served as a no-treatment control group.
Treatment, assessment, and recording protocols were approved by the University of Utah Institutional Review Board. All assessments and treatment sessions took place at the National Center for Voice and Speech main office, at the University of Utah, in Salt Lake City, UT. Screening, paperwork, and interviews were conducted in a quiet room. Recording procedures took place in the recording laboratory. Therapy sessions were conducted in the recording laboratory or in the clinician's office.
All subjects completed an initial assessment, consisting of a history taken by the clinician, the VHI (Jacobson et al., 1997), rigid laryngostroboscopic examination, and voice recording tasks including sustained vowels, oral reading, and conversation. The VHI and voice recording tasks were readministered to subjects in the delayed therapy groups after the control phase and prior to initiation of therapy. Posttherapy assessments also included an exit interview with questions about self-perception of voice and treatment effects.
The treatment period consisted of six 30–60-min treatment sessions, one per week, plus a home exercise program. In the initial session, subjects were instructed in their assigned exercises, with clinician demonstration and feedback. In subsequent sessions, the subjects performed each exercise a fixed number of times with clinician monitoring and feedback. The therapy sessions and home programs for the two treatments were designed to provide comparable amounts of clinical treatment time and home exercise.
The VFE therapy sessions consisted of the four exercises described by Stemple (1993): (a) warm-up exercise, /i/ with nasal focus performed on F (above middle C for women, below middle C for men) as long as possible on one breath, 10 repetitions; (b) stretching exercise, slow upward pitch glide performed on /no/ with semi-occlusion at the lips (creating a buzzing sensation), 10 repetitions; (c) contracting exercise, slow downward pitch glide performed on /no/ with semi-occlusion at the lips, 10 repetitions; and (d) low-impact adductory power exercise, /o/ with semi-occlusion at the lips performed five times each on middle C, D, E, F, G (one octave lower for men), for as long as possible on each breath. All exercises were performed as softly as possible with the voice still engaged. For the home program, subjects completed two repetitions each of exercises a–c, and two repetitions on each note for exercise d. Subjects were instructed to complete the home exercises four times daily. The frequency of the VFE home program was increased from the published standard (two practice sessions daily) in order to match the FRT program in terms of frequency of practice and total time spent on voice exercises. Subjects were given an MP3 player with audio instructions for the home exercise program. They were also given a log sheet for home practice.
The FRT exercise program consisted of four exercises performed while phonating through a stirring straw 14.1 cm long and 0.4 cm in diameter. Subjects were instructed to allow airflow only through the straw (not through the nose or around the straw), to use an abdomino-thoracic (nonclavicular) breathing pattern, and to maintain a relaxed upper body posture. Exercises were performed in full voice, though the semi-occluded sound is perceived as reduced in loudness. The exercises performed during therapy sessions were (a) 10 repetitions of a pitch glide up and back down; (b) 10 repetitions of an accent exercise, creating about five to seven “hills” of sound by varying the pitch and loudness of the voice using increased breath support (vs. adduction); (c) singing through the straw with melody but without articulation, a total of 10 short songs (e.g., one verse of “Mary Had a Little Lamb”) or equivalent longer ones; and (d) “reading” through the straw, emphasizing prosody (intonation and stress) while producing voice without articulation, approximately five medium-length paragraphs (around five to 10 sentences) per session. For the home program, subjects were instructed to complete 1 min of each exercise, four times daily (four 4-min practice sessions). Subjects were given an MP3 player with audio instructions for the home exercise program. They were also given a log sheet for home practice.
Most of the therapy was administered by the first author, except for several sessions administered by the third author. Assessments were administered by the first author or another trained member of the research team. All posttherapy recordings were administered by members of the research team other than the first author to prevent artifact from the subjects' familiarity with the clinician.
VHI
The primary outcome measure was a patient self-report measure of voice quality of life, the VHI (Jacobson et al., 1997). The VHI consists of 30 questions related to physical, functional, and emotional aspects of voice, rated on a 5-point equal-appearing interval (EAI) scale, from 0, never, to 4, always. Voice disorders affect an individual's ability to communicate, work, and maintain social relationships. Self-report measures such as the VHI are ecologically valid and robust, reflecting the impact of a voice disorder on an individual's life. Of available voice quality-of-life self-report tools, the VHI has been shown to have the best psychometric properties (Franic, Bramlett, & Bothe, 2005), and it is commonly used in clinical practice.
Analysis
VHI total scores were tallied according to standard procedures. Responses to exit interview questions were entered into a spreadsheet and tallied to examine group trends.
The second sentence of the Rainbow Passage (Fairbanks, 1960) was extracted from audio recordings collected in data acquisition software (ADInstruments Labchart v7.3) for each assessment, and normalized to 90% using acoustic software (Goldwave v5.58). Three experienced judges were asked to rate all voice samples using the first four dimensions of the CAPE-V (overall severity, roughness, breathiness, and strain; Kempster et al., 2009) on 100-mm visual analog scales (VAS). Judges were masked to group assignment, treatment phase (control vs. treatment), and pre/post status of the voice samples. The voice samples were presented in random order. All three judges work in specialized voice practices, with an average of 16 years experience (range 7–22 years). The rating forms were scored, and then scores for each subject were averaged across the three judges. Average scores were used for group analyses.
Interrater agreement was assessed using procedures described by Kreiman and Gerratt (1998). Agreement equivalent to within 1 point on a 7-point EAI scale was calculated for each possible pair of raters for each voice sample. On a 100-mm visual analog scale, scores that fell within 7.2 mm were considered to be in exact agreement on a 7-point EAI, and scores that were within 21.5 mm (7.2 + 14.3 mm) were considered to be within 1 scale value. Two scores were considered to agree if they fell within ±21.5 mm on the VAS (probability of chance agreement p = 0.39). The probability of agreement was calculated by tallying all pairs of scores that agreed and dividing by the total number of score pairs. Twenty percent of the voice samples were randomly selected to be repeated for each judge. Repeat ratings were used to assess intrarater agreement, using the same calculation.
A mixed-effects linear regression model was used to account for both within- and between-subjects measures due to the partial crossover design, as some subjects participated in both the no-treatment control phase as well as the treatment phase of the study. Efficacy of the two therapy approaches relative to the no-treatment control condition was first analyzed. In the mixed-effects linear regression model, the outcome variable was the posttherapy measurement. Two indicator (dummy) variables were included as predictors (1 = FRT, 0 = otherwise; 1 = VFE, 0 = otherwise), so that the no-treatment control phase was the referent group. The pretherapy (baseline) outcome measurement was included as a covariate to control for any baseline differences between groups. In this model, the regression coefficient for FRT therapy represents the difference in change from baseline between FRT therapy and the no-treatment control phase (the measure of FRT efficacy). The coefficient for VFE is interpreted similarly. Comparison of posttherapy outcomes while controlling for pretreatment baselines measures the difference in pre- to posttreatment change, while providing greater statistical power than a direct change analysis (i.e., computing the difference between pre- and posttreatment and using that for the outcome variable). Furthermore, this approach is not subject to the regression towards the mean bias that occurs in a direct change analysis (Frison & Pocock, 1992; Vickers & Altman, 2001). This model was applied to both the VHI and CAPE-V scores. To test the efficacy of FRT therapy against VFE on the primary outcome variable, VHI, the regression coefficient of FRT therapy (FRT relative to control) was compared to the regression coefficient of VFE (VFE relative to control) using a Wald posttest (Harrell, 2001). For the secondary outcome, CAPE-V, which is made up of four subscales, a multiple comparison adjustment was made for four comparisons using the Bonferroni multiple-comparison procedure. The confidence intervals were similarly adjusted for four comparisons, so the reported 95% confidence interval (CI) is actually a 98.75% CI, which is a Bonferroni-adjusted CI.
For descriptive purposes, VHI pre- and postchange scores were calculated for each subject (post minus pre), as well as group averages and standard deviations at each time point. Change scores were tallied within score ranges. The original VHI validation study states that 18 points is a minimum clinically significant change, on the basis of test–retest variability findings (Jacobson et al., 1997). Subsequent studies have found a range of values for minimally significant change as low as 8–13 points (Solomon et al., 2013). In particular, small amounts of change in mild and moderate scores may be more clinically significant, representing a larger percentage of change (Rosen, Murry, Zinn, Zullo, & Sonbolian, 2000). In a prospective study of 91 patients, Solomon et al. (2013) found that a change in VHI total score of as little as 13–16 points was a highly sensitive and specific indicator of clinically meaningful change in voice. Thus, total VHI change scores were tabulated in the following score ranges: increased, decreased 0–12, 13–17, and ≥18.
Results
All subjects tracked their home practice on a daily log sheet. Subjects in the FRT group averaged 14.4 min per day of home exercise, and subjects in the VFE group averaged 14.5 min per day. Therapy session length was tracked by the clinician. FRT sessions averaged 42 min in length, whereas VFE sessions averaged 51 min in length. Sixteen of the 20 subjects completed all six sessions of therapy. Two subjects in each group missed one to three sessions due to illness or time conflicts.
VHI
A mixed-effects linear regression was used to test for significant differences in VHI scores between each treatment group (VFE and FRT) and the no-treatment control phase. Both treatment groups showed significantly more improvement in the total VHI score than the control condition (FRT: p < .001; VFE: p = .048). The change coefficient for FRT therapy was −12.6 (95% CI [17.8, −7.4]), and for VFE was −5.4 (95% CI [−10.8, 0.04]), where a negative number represents a better pre- to posttreatment outcome than the control condition.
In a postregression comparison of slopes, the change for FRT therapy was in the direction of a greater improved outcome than VFE (difference = −7.2, 95% CI [−14.5, 0.1], p = .054). The lower bound of this confidence interval, −14.5, represents 14.5 points more reduction in total VHI scores with FRT therapy than with VFE. The upper bound of this confidence interval, 0.1, represents 0.1 points less reduction in total VHI scores with FRT therapy than with VFE.
Individual total VHI change scores were calculated for all conditions (control, FRT, VFE), and were tallied in score range categories (increased, decreased 0–12, 13–17, and ≥18). Individual scores, group averages and standard deviations, and change scores are presented in Tables 1, 2, and 3. In the control phase, five of 10 subjects' scores increased, four subjects' scores decreased by 0–12 points, and one subject's score decreased by ≥18 points. In the FRT group, one of 10 subjects' scores increased, four subjects' scores decreased by 0–12 points, two subjects' scores decreased by 13–17 points, and three subjects' scores decreased by ≥18 points. In the VFE group, three of 10 subjects' scores increased, two subjects' scores decreased by 0–12 points, one subject's score decreased by 13–17 points, and four subjects' scores decreased by ≥18 points.
Table 1.
Subject | Pre-VHI | Post-VHI | Change |
---|---|---|---|
F19 | 45 | 46 | 1 |
F20 | 16 | 18 | 2 |
F21 | 66 | 36 | −30 |
F23 | 25 | 33 | 8 |
F25 | 26 | 27 | 1 |
F26 | 36 | 46 | 10 |
F27 | 52 | 46 | −6 |
F28 | 25 | 23 | −2 |
F29 | 38 | 37 | −1 |
M16 | 71 | 68 | −3 |
Average | 40 | 38 | −2 |
SD | 18.4 | 14.4 | 11.0 |
Table 2.
Subject | Pre-VHI | Post-VHI | Change |
---|---|---|---|
F03 | 60 | 4 | −56 |
F16 | 23 | 25 | 2 |
F19 | 46 | 37 | −9 |
F20 | 18 | 8 | −10 |
F23 | 33 | 27 | −6 |
F24 | 39 | 17 | −22 |
F26 | 46 | 30 | −16 |
F28 | 23 | 15 | −8 |
M13 | 60 | 45 | −15 |
M15 | 46 | 8 | −38 |
Average | 39.4 | 21.6 | −17.8 |
SD | 15.0 | 13.5 | 17.2 |
Table 3.
Subject | Pre-VHI | Post-VHI | Change |
---|---|---|---|
F13 | 58 | 30 | −28 |
F15 | 55 | 19 | −36 |
F17 | 51 | 28 | −23 |
F21 | 36 | 11 | −25 |
F22 | 10 | 18 | 8 |
F25 | 27 | 20 | −7 |
F27 | 46 | 30 | −16 |
F29 | 37 | 41 | 4 |
M14 | 64 | 78 | 14 |
M16 | 68 | 62 | −6 |
Average | 45.2 | 33.7 | −11.5 |
SD | 17.9 | 21.2 | 16.8 |
Consensus Auditory Perceptual Evaluation of Voice
A mixed-effects linear regression was used to test for significant differences in masked CAPE-V scores between each treatment group (VFE and FRT) and the no-treatment control condition. Analyses were completed for the parameters Overall Severity, Roughness, Breathiness, and Strain. After adjusting the p values and CIs for four multiple comparisons, given four CAPE-V subscales, separately within each treatment group, FRT therapy had significant improvement in Roughness relative to control (change coefficient = −10.2, 95% CI [−20.1, −0.26], p = .040), but VFE did not achieve significant improvement (change coefficient = −7.8, 95% CI [−17.5, 1.8], p = .17). In both groups, results were not significant for Overall Severity (FRT change coefficient = −6.4, 95% CI [−15.9, 3.1], p = .37; VFE change coefficient = −7.4, 95% CI [−17.0, 2.3], p = .23), Strain (FRT change coefficient = −8.8, 95% CI [−20.5, 2.9], p = .24; VFE change coefficient = −8.6, 95% CI [−20.2, 3.0], p = .26) or Breathiness (FRT change coefficient = −3.3, 95% CI [−10.7, 4.1], p = .68; VFE change coefficient = −4.2, 95% CI [−11.8, 3.4], p = .68).
Interrater agreement was assessed by calculating the probability of agreement (±21.5 mm) between each possible pair of raters for each voice sample (probability of chance agreement p = .39). The probability of interrater agreement was p = .83 for Overall Severity, p = .75 for Roughness, p = .81 for Breathiness, and p = .59 for Strain.
Intrarater agreement was assessed by calculating the probability that a rater agreed with him/herself (±21.5 mm) in repeated ratings of the same voice sample (probability of chance agreement p = 0.39). The probability of intrarater agreement for each rater and each voice quality dimension is given in Table 4. The results for both interrater and intrarater agreement are consistent with other studies of expert listener agreement for auditory perceptual judgments of voice quality (Eadie & Kapsner-Smith, 2011; Kreiman & Gerratt, 1998).
Table 4.
Rater | Overall Severity | Roughness | Breathiness | Strain |
---|---|---|---|---|
Rater 1 | .9 | .9 | .9 | 1.0 |
Rater 2 | 1.0 | .9 | 1.0 | 1.0 |
Rater 3 | .9 | .9 | .6 | .7 |
Note. Chance agreement p = .39.
Exit Interviews
Interviews were conducted with each subject at the final evaluation session. Responses were tabulated and are presented in Table 5. Statistical analysis was not performed; data are presented for descriptive purposes only. Of note, nearly all subjects perceived improvement in voice after treatment, and many reported decreased vocal fatigue. The FRT subjects had a higher rate of receiving positive comments from others about their voice, indicated more often that they spoke more posttherapy, and more often noted an increased ability to participate in activities.
Table 5.
Interview question | FRT | VFE |
---|---|---|
Noticed improvements in speech or voice | 100 | 90 |
Received positive comments about voice from others | 60 | 30 |
Less vocal fatigue/pain than before | 60 | 70 |
More vocal fatigue/pain than before | 0 | 0 |
Voice is better in the evenings than before treatment | 80 | 70 |
Speech is less effortful | 80 | 90 |
Felt voice was back to usual | 90 | 80 |
Spoke more after therapy | 40 | 10 |
Noticed increased attentiveness from others | 40 | 0 |
Increased participation ability | 50 | 20 |
Decreased pitch breaks | 80 | 80 |
Increased steadiness | 90 | 90 |
Improved singing voice | 80 | 70 |
Note. FRT = flow-resistant tubes; VFE = Vocal Function Exercises.
Discussion
SOVT exercises, such as phonation through thin tubes, have been used for some time in both professional voice coaching and in voice rehabilitation. Anecdotal reports suggest they are beneficial, but until now no controlled trials have been completed to examine their efficacy. This prospective clinical RCT aimed to establish preliminary evidence regarding the efficacy of FRT phonation for treating dysphonia.
Based upon comparison of pre- and posttreatment scores on the VHI (Jacobson et al., 1997), this study demonstrated significant positive effects for both VFE and FRT therapy in patients with mild to moderate dysphonia and/or vocal fatigue, relative to a no-treatment control condition. Furthermore, these results support the conclusion that FRT therapy is noninferior to, or just as good as, VFE. This comes from examination of the upper bound of the 95% CI for the FRT to VFE comparison, which was 0.1. With a positive number representing less improvement, one can conclude with 95% confidence that FRT therapy has at most one tenth of a single point less improvement than VFE on the VHI outcome. A difference this small is of no clinical consequence, being no different than equal improvement.
Inspection of raw VHI change scores reveals comparable results for the two treatment groups. Half of the subjects in each treatment group reduced their total VHI scores by ≥13 points, but only one subject in the control condition had a decrease in that range. Of note, the one subject who experienced improvement during the control phase also improved significantly during the treatment phase (VFE), and therefore may have had other factors influencing her voice quality of life. In a study by Solomon et al. (2013), a difference as small as 13–16 points in total VHI score was a highly sensitive and specific indicator of clinically meaningful change. This suggests that half of the subjects in each treatment group experienced clinically meaningful change in voice quality of life after completion of the therapy protocol.
CAPE-V scores were included in the present study as a secondary measure. A statistically significant difference was seen between the FRT treatment group and the control phase for the voice quality dimension Roughness as rated on the CAPE-V by masked judges; no other significant results were found. Given the relatively small amount of change, as indicated by the change coefficient (−10.2, CI [−17.9, −2.39]), it is unclear to what degree this result is clinically significant. Normative data do not exist for the CAPE-V instrument, and further study is needed to determine whether small but statistically significant differences such as these are meaningful. Results for Strain were not significant, despite the fact that the majority of subjects reported decreased vocal effort and fatigue after treatment. Results for Strain should be interpreted with caution, given that the probability of interrater agreement was only p = .59. In studies of auditory perceptual assessment of voice, Strain is consistently the least reliable dimension, and in fact may require cues available to the speaker but not the listener in order to judge (e.g., kinesthetic; Eadie & Kapsner-Smith, 2011).
The efficacy of VFE for treating dysphonia and presbyphonia has already been established in several studies (Gillivan-Murphy et al., 2006; Gorman et al., 2008; Roy et al., 2001; Sauder et al., 2010; Tay et al., 2012; Ziegler et al., 2013), and this study adds support to this evidence. Given the repeated findings that VFE are an effective voice therapy technique, our finding that a 0.4-cm–diameter FRT induced a comparable amount of change in VHI scores as VFE in subjects with dysphonia provides evidence for the efficacy of semi-occluding the vocal tract in vocal exercise.
Some hypotheses regarding the physical mechanisms of voice improvement using SOVT exercises may be developed based upon previous modeling studies. As described above, semi-occlusion of the vocal tract creates a positive intraoral pressure that may facilitate an optimal, near-rectangular shape of the glottis (Titze, 1988). Laukkanen et al. (2008) found that vocal economy and glottal efficiency increase with an increased thyroarytenoid-to-cricothyroid activation ratio in computer modeling, and observed increased TA activation using electromyography in one subject during SOVT exercises. Also, increased vocal tract inertance facilitates self-sustained oscillation of the vocal folds. For example, Titze and Laukkanen (2007) conducted a modeling study in which they simulated phonation through a tube and the vowel /u/, while varying the degree of glottal adduction and epilaryngeal tube area. They found that oral semi-occlusion increased inertive reactance in the 200–1000 Hz range, but noted that the effect was strong only when the epilaryngeal tube was also narrowed. Furthermore, they examined the effects on the economy of voice production, defined as maximum flow declination rate divided by maximum area declination rate, and found that the greatest economy occurred when the epilaryngeal tube was narrowed, provided that adduction was sufficient. This effect for tube phonation was comparable to that for /u/, but intraoral acoustic pressures during tube phonation were three times greater than those during production of /u/. They concluded that this may provide an abundance of vibratory sensations in the face that could provide feedback for learning optimal vocal tract configuration for vocal economy. Also, there is preliminary evidence from two single-subject CT imaging studies to suggest that the area ratio between the pharyngeal inlet and epilaryngeal tube may in fact increase in response to phonation through tubes (Guzman et al., 2013; Vampola et al., 2011), particularly when the tube is very narrow, as utilized in the present study (Guzman et al., 2013). These combined effects—optimization of glottal shape, epilaryngeal narrowing, and increased inertance of the vocal tract—are likely to be facilitated by tube phonation. Subjects may rely on sensory cues such as facial vibrations to identify and habituate ideal vocal tract configurations. Further study of the physiologic effects of FRT therapy is needed to confirm the mechanisms that underlie the treatment effects observed in this study.
The role of motor learning in nonspeech voice exercises is an interesting question that remains to be answered. Although the current study does not address this question, we propose that the design of the FRT protocol used in this study is not in conflict with principles of motor learning, including task specificity, distributed practice, and provision of knowledge of results. Phonation into a narrow tube while producing speech-like prosody (the “reading” task in this study) is a task very similar to speech. It is simplified by the removal of articulation, which reduces the number of potential vocal instabilities created by a constantly changing vocal tract shape. In addition, semi-occlusion and elongation of the vocal tract impose conditions that create inertive reactance, a desirable condition for vocal fold vibration. With approaches that do not impose as much control on the vocal tract configuration, such as VFE or humming in LMRVT, more trial and error is generally necessary to achieve an accurate production. The tube in FRT therapy might be thought of as akin to training wheels on a bicycle that keep the rider upright—it allows the user to experience the target behavior reliably before mastery is achieved. In terms of attention to movement effects (knowledge of results), subjects have access to sensory cues including vibration and ease during FRT exercises, possibly related to the acoustic pressures of an inertive vocal tract and reduced PTP. As alluded to above, subjects may use those same sensory cues to facilitate establishing similarly efficient phonation during speech when the tube is removed. To conclude, frequent, distributed practice is a key element of the FRT program, in order to facilitate learning.
In a postregression comparison of change slopes, results were in the direction of a greater improvement for FRT therapy than VFE (7.2 more points of improvement), though the 95% CI ranged from 14.5 more points of improvement to 0.1 fewer points of improvement for FRT therapy over VFE. Superiority testing with a larger group of subjects may clarify whether significant differences in outcomes exist between the two treatments. Differences such as ease of learning of the two treatment programs may play a role in treatment outcomes. Although no objective measures were taken regarding learning of the two exercise programs, clinician observations suggest the FRT therapy may be easier for subjects to master. Most subjects were able to learn and independently perform the FRT exercises adequately within one session. The time required for learning VFE varied between subjects, from one to several sessions; in some cases nearly the full treatment period was needed for mastery of VFE. This difference may be explained by the fact that during FRT exercises, the semi-occlusion is produced and controlled automatically with the presence of a small tube or straw in the mouth; accurate production of the “buzzy” sensation during VFE requires the subject to discover and then consistently reproduce a SOVT configuration. Furthermore, pitch matching is required to accurately perform VFE, which requires those subjects with no background in music or singing to dedicate cognitive resources to this unfamiliar task. In contrast, pitch is not specified in FRT therapy, beyond general instructions to vary upward or downward. The FRT home program may also be easier for subjects to implement correctly than the VFE home program. FRT exercises are easy to complete without cues other than timing, which can be done with an ordinary watch. In contrast, VFE exercises require pitch cues, using either a recorded model or a pitch pipe or keyboard.
Loudness differences between the SOVT exercises used in the FRT therapy versus VFE may also contribute to the difference in outcomes. VFE are performed in as soft a voice as possible, while still engaging vocally, with no variation of prosody or accents. The FRT program designed for this study specifically uses loud voice as well as varying prosody (melody and accents). The “reading through the straw” exercise allows subjects to experience speech-like prosody during semi-occlusion. These elements may have encouraged carryover of the beneficial effects of semi-occlusion into daily voice use.
In the present study, half of the subjects in each treatment condition (VFE and FRT) experienced clinically meaningful reductions in VHI scores after completion of the treatment protocol. Given the small number of subjects in the present study, it is not possible to characterize those who experienced improvement versus those who did not in any meaningful way. It is possible that subjects vary in their response to nonspeech SOVT treatments on the basis of dysphonia diagnosis, time since onset, personality, or learning characteristics, among other attributes. Further studies examining subject characteristics and response to treatment would facilitate evidence-based selection of treatments for individual patients. To be specific, because semi-occluding the vocal tract purports to produce a separation of the vocal folds with supraglottal pressure, which may alleviate problems with hyperadduction, it would be important to test the value of these therapies for disorders related to hypoadduction.
One limitation of this study is that the majority of the therapy was provided by a single clinician, thus introducing the possibility of clinician bias. A second limitation of the study results relates to study design; because a sham treatment condition was not included, it is possible that the effects measured in this study were due to a placebo effect. VFE were selected as a comparison treatment on the basis of prior clinical trials with positive results. A placebo effect, if present, should be similar in both treatment groups. Given that several well-designed studies have documented treatment efficacy for VFE utilizing a variety of outcome measures, this would appear to lend support to the present results; however, future studies should utilize a placebo treatment group to confirm these results. This study also included relatively small numbers of subjects; given that significant results were found, it appears the study was adequately powered to test efficacy of the treatments. Another limitation of this study is the lack of longer-term follow-up, which is important to document learning and stabilization of the new, desirable behaviors. Also, treatment outcomes were limited to voice quality of life, with exploratory analysis of auditory perceptual voice quality. Future studies should involve multiple clinicians and clinical sites, as well as additional treatment outcomes such as aerodynamic and videostroboscopic measurement, to strengthen evidence of treatment effects and effectiveness.
Conclusions
In the present study, FRT therapy was an effective voice therapy protocol that improved voice quality of life in subjects with mild to moderate dysphonia and/or vocal fatigue. Significant improvement in VHI scores was observed in both FRT and VFE groups relative to no treatment. FRT therapy was shown to be noninferior to VFE in improving voice quality of life.
Further study is needed to establish additional treatment effects, appropriate diagnoses/patient characteristics to receive this therapy, and ideal treatment dosage and intensity. Treatment dosage and intensity should be investigated to determine ideal therapy and home practice schedules to effect change with maximum clinical efficiency. Exploration of the mechanisms underlying voice change with SOVT exercises is also incomplete. Also, the tube diameter and length were not varied in this study. In informal observation, subjects respond differently to small-diameter tubes and large-diameter tubes. To conclude, maintenance of therapeutic benefits should also be studied with medium- and long-term follow-up with subjects.
Acknowledgments
This work was supported by the National Institute on Deafness and other Communication Disorders, Grant DC004224. Additional support was received from the National Center for Research Resources and the National Center for Advancing Translational Sciences through Grant 8UL1TR000105 (formerly UL1RR025764). Lynn Maxfield provided assistance in data collection and Russell Banks in data analysis. Greg Stoddard assisted with statistical analysis.
Appendix
Demographic Data for Subjects Enrolled in Study
Subject | Age | Diagnosis | Primary complaint | Time since onset |
---|---|---|---|---|
M13 | 53 | LPR, L TVF ulcer, MTD | Vocal fatigue, strain | 30 years |
M14 | 37 | Unilateral TVF paresis | Hoarseness, vocal fatigue, discomfort | 6 years |
M15 | 32 | L TVF hemorrhage and polyp, resolved | Reduced pitch and loudness range, loses voice easily | 8 months |
M16 | 61 | Remote GSW anterior neck, R TVF granuloma, LPR | Hoarseness | 43 years |
F03 | 60 | TVF edema vs. resolving nodules | Hoarseness, vocal fatigue | 1 year 3 months |
F13 | 37 | LPR, TVF edema | Hoarseness, vocal fatigue, cough | 3 years |
F15 | 47 | LPR, TVF edema | Hoarseness, low pitch, reduced volume, difficulty singing | 4 years |
F16 | 51 | LPR, TVF edema, MTD | Vocal fatigue | 2 years 2 months |
F17 | 34 | LPR, TVF edema, MTD | Vocal fatigue, reduced singing range | 1 year 6 months |
F19 | 66 | LPR, TVF edema, MTD | Hoarseness, cough | 5 months |
F20 | 72 | LPR, TVF edema | Hoarseness | 12 months |
F21 | 66 | Bilateral intubation granulomas, resolved | Hoarseness | 10 months |
F22 | 56 | LPR, TVF edema, MTD | Episodic hoarseness, coughing | 8 months |
F23 | 56 | TVF polyp excised 2001; R TVF polyp excised 2005; LPR, TVF edema, MTD | Hoarseness | 11 years |
F24 | 47 | LPR, TVF edema | Hoarseness, globus | 8 months |
F25 | 39 | LPR, TVF edema, MTD | Vocal fatigue, discomfort | 9 months |
F26 | 54 | L TVF paralysis, postthyroplasty | Hoarseness, vocal fatigue, difficulty singing | 4 years |
F27 | 42 | TVF nodules | Hoarseness, vocal fatigue, reduced loudness | 9 years |
F28 | 61 | LPR | Hoarseness, vocal fatigue | 10 years |
F29 | 59 | LPR/TVF edema | Hoarseness, vocal fatigue, globus, difficulty singing | 4 years |
Note. LPR = laryngopharyngeal reflux; L = left; TVF = true vocal fold; MTD = muscle tension dysphonia; GSW = gunshot wound; R = right.
Funding Statement
This work was supported by the National Institute on Deafness and other Communication Disorders, Grant DC004224. Additional support was received from the National Center for Research Resources and the National Center for Advancing Translational Sciences through Grant 8UL1TR000105 (formerly UL1RR025764). Lynn Maxfield provided assistance in data collection and Russell Banks in data analysis. Greg Stoddard assisted with statistical analysis.
References
- Aderhold E. (1963). Sprecherziehung des Schauspielers: Grundlagen und Methoden [Speech training of the actor: Principles and methods]. Berlin, Germany: Henschelverlag. [Google Scholar]
- Barrichelo-Lindstrom V., & Behlau M. (2009). Resonant voice in acting students: Perceptual and acoustic correlates of the trained Y-Buzz by Lessac. Journal of Voice, 23, 603–609. [DOI] [PubMed] [Google Scholar]
- Bassiouny S. (1998). Efficacy of the accent method of voice therapy. Folia Phoniatrica et Logopaedica, 50, 146–164. [DOI] [PubMed] [Google Scholar]
- Briess F. B. (1957). Voice therapy: Part I. Identification of specific laryngeal muscle dysfunction by voice testing. Archives of Otolaryngology–Head & Neck Surgery, 66, 375–382. [PubMed] [Google Scholar]
- Briess F. B. (1959). Voice therapy: Part II. Essential treatment phases of specific laryngeal muscle dysfunction. Archives of Otolaryngology–Head & Neck Surgery, 69, 61–69. [DOI] [PubMed] [Google Scholar]
- Chan R. W., Titze I. R., & Titze M. R. (1997). Further studies of phonation threshold pressure in a physical model of the vocal fold mucosa. The Journal of the Acoustical Society of America, 101, 3722–3727. [DOI] [PubMed] [Google Scholar]
- Chen S. H., Hsiao T.-Y., Hsiao L.-C., Chung Y.-M., & Chiang S.-C. (2007). Outcome of resonant voice therapy for female teachers with voice disorders: Perceptual, physiological, acoustic, aerodynamic, and functional measurements. Journal of Voice, 21, 415–425. [DOI] [PubMed] [Google Scholar]
- Coffin B. (1987). Coffin's sounds of singing: Principles and applications of vocal techniques with chromatic vowel chart. Lanham, MD: Scarecrow Press. [Google Scholar]
- Cohen S. M. (2010). Self-reported impact of dysphonia in a primary care population: An epidemiological study. The Laryngoscope, 120, 2022–2032. [DOI] [PubMed] [Google Scholar]
- Costa C. B., Costa L. H. C., Oliveira G., & Behlau M. (2011). Immediate effects of the phonation into a straw exercise. Brazilian Journal of Otorhinolaryngology, 77, 461–465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eadie T. L., & Kapsner-Smith M. (2011). The effect of listener experience and anchors on judgments of dysphonia. Journal of Speech, Language, and Hearing Research, 54, 430–447. [DOI] [PubMed] [Google Scholar]
- Enflo L., Sundberg J., Romedahl C., & McAllister A. (2013). Effects on vocal fold collision and phonation threshold pressure of resonance tube phonation with tube end in water. Journal of Speech, Language, and Hearing Research, 56, 1530–1538. [DOI] [PubMed] [Google Scholar]
- Engel E. F. (1927). Stimmbildungslehre [Voice pedagogy]. Dresden, Germany: Weise. [Google Scholar]
- Fairbanks G. (1960). Voice and articulation drillbook (2nd ed.). New York, NY: HarperCollins. [Google Scholar]
- Fex B., Fex S., Shiromoto O., & Hirano M. (1994). Acoustic analysis of functional dysphonia: Before and after voice therapy (accent method). Journal of Voice, 8, 163–167. [DOI] [PubMed] [Google Scholar]
- Franic D. M., Bramlett R. E., & Bothe A. C. (2005). Psychometric evaluation of disease specific quality of life instruments in voice disorders. Journal of Voice, 19, 300–315. [DOI] [PubMed] [Google Scholar]
- Frison L., & Pocock S. J. (1992). Repeated measures in clinical trials: Analysis using mean summary statistics and its implications for design. Statistics in Medicine, 11, 1685–1704. [DOI] [PubMed] [Google Scholar]
- Gaskill C. S., & Quinney D. M. (2012). The effect of resonance tubes on glottal contact quotient with and without task instruction: A comparison of trained and untrained voices. Journal of Voice, 26, e79–e93. [DOI] [PubMed] [Google Scholar]
- Gillivan-Murphy P., Drinnan M. J., O'Dwyer T. P., Ridha H., & Carding P. (2006). The effectiveness of a voice treatment approach for teachers with self-reported voice problems. Journal of Voice, 20, 423–431. [DOI] [PubMed] [Google Scholar]
- Glazier P. S., Davids K., & Bartlett R. M. (2003). Dynamical systems theory: A relevant framework for performance-oriented sports biomechanics research. Sportscience, 7 Retrieved from http://www.sportsci.org/jour/03/psg.htm [Google Scholar]
- Gorman S., Weinrich B., Lee L., & Stemple J. C. (2008). Aerodynamic changes as a result of vocal function exercises in elderly men. The Laryngoscope, 118, 1900–1903. [DOI] [PubMed] [Google Scholar]
- Gundermann H. (1977). Die Behandlung der gestorten Sprechstimme [The treatment of the pathological speaking voice]. Stuttgart, NY: Fischer. [Google Scholar]
- Guzman M., Laukkanen A. M., Krupa P., Horáček J., Švec J. G., & Geneid A. (2013). Vocal tract and glottal function during and after vocal exercising with resonance tube and straw. Journal of Voice, 27, 523.e19–523.e34. [DOI] [PubMed] [Google Scholar]
- Habermann G. (1980). Funktionelle Stimmstörungen und ihre Behandlung [Functional voice disorders and their treatment]. Archives of Oto-Rhino-Laryngology, 227, 171–345. [DOI] [PubMed] [Google Scholar]
- Harrell F. E., Jr (2001). Regression modeling strategies with applications to linear models, logistic regression, and survival analysis. New York, NY: Springer. [Google Scholar]
- Jacobson B. H., Johnson A., Grywalski C., Silbergleit A., Jacobson G., Benninger M. S., & Newman C. W. (1997). The Voice Handicap Index (VHI): Development and validation. American Journal of Speech-Language Pathology, 6(3), 66–70. [Google Scholar]
- Kempster G. B., Gerratt B. R., Verdolini Abbott K., Barkmeier-Kraemer J., & Hillman R. E. (2009). Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, 18, 124–132. [DOI] [PubMed] [Google Scholar]
- Kotby M. N., El-Sady S. R., Basiouny S. E., Abou-Rass Y. A., & Hegazi M. A. (1991). Efficacy of the accent method of voice therapy. Journal of Voice, 5, 316–320. [Google Scholar]
- Kotby M. N., & Fex B. (1998). The accent method: Behavior readjustment voice therapy. Logopedics Phonatrics Vocology, 23, 39–43. [Google Scholar]
- Kreiman J., & Gerratt B. R. (1998). Validity of rating scale measures of voice quality. The Journal of the Acoustical Society of America, 104, 1598–1608. [DOI] [PubMed] [Google Scholar]
- Kumaran D., & McClelland J. L. (2012). Generalization through the recurrent interaction of episodic memories: A model of the hippocampal system. Psychological Review, 119, 573–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laukkanen A. (1992). About the so called “resonance tubes” used in Finnish voice training practice. Logopedics Phoniatrics Vocology, 17, 151–161. [Google Scholar]
- Laukkanen A. M., Lindholm P., & Vilkman E. (1995a). On the effects of various vocal training methods on glottal resistance and efficiency. Folia Phoniatrica et Logopaedica, 47, 324–330. [DOI] [PubMed] [Google Scholar]
- Laukkanen A. M., Lindholm P., & Vilkman E. (1995b). Phonation into a tube as a voice training method: Acoustic and physiologic observations. Folia Phoniatrica et Logopaedica, 47, 331–338. [DOI] [PubMed] [Google Scholar]
- Laukkanen A. M., Lindholm P., Vilkman E., Haataja K., & Alku P. (1996). A physiological and acoustic study on voiced bilabial fricative /β:/ as a vocal exercise. Journal of Voice, 10, 67–77. [DOI] [PubMed] [Google Scholar]
- Laukkanen A. M., Takalo R., Vilkman E., Nummenranta J., & Lipponen T. (1999). Simultaneous videofluorographic and dual-channel electroglottographic registration of the vertical laryngeal position in various phonatory tasks. Journal of Voice, 13, 60–71. [DOI] [PubMed] [Google Scholar]
- Laukkanen A. M., Titze I. R., Hoffman H., & Finnegan E. (2008). Effects of a semioccluded vocal tract on laryngeal muscle activity and glottal adduction in a single female subject. Folia Phoniatrica et Logopaedica, 60, 298–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lessac A. (1997). The use and training of the human voice: A bio-dynamic approach to vocal life. Mountain View, CA: Mayfield. [Google Scholar]
- Linklater K. (1976). Freeing the natural voice. New York, NY: Drama Book Publishers. [Google Scholar]
- Maxfield L., Titze I., Hunter E., & Kapsner-Smith M. (2014). Intraoral pressures produced by thirteen semi-occluded vocal tract gestures. Logopedics Phonatrics Vocology. Advance online publication. doi:10.3109/14015439.2014.913074 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen D. D., & Kenny D. T. (2009). Randomized controlled trial of vocal function exercises on muscle tension dysphonia in Vietnamese female teachers. Journal of Otolaryngology–Head & Neck Surgery, 38, 261–278. [PubMed] [Google Scholar]
- Nix J. (1999). Lip trills and raspberries: “High spit factor” alternatives to the nasal continuant consonants. Journal of Singing, 55, 15–19. [Google Scholar]
- Orbelo D. M., Li N. Y.-K., & Verdolini Abbott K. (2014). Lessac-Madsen resonant voice therapy in the treatment of secondary MTD. In Stemple J. C., & Hapner E. R. (Eds.), Voice therapy: Clinical studies (4th ed.). San Diego: Plural. [Google Scholar]
- Pasa G., Oates J., & Dacakis G. (2007). The relative effectiveness of vocal hygiene training and vocal function exercises in preventing voice disorders in primary school teachers. Logopedics Phoniatrics Vocology, 32, 128–140. [DOI] [PubMed] [Google Scholar]
- Peterson K. L., Verdolini-Marston K., Barkmeier J. M., & Hoffman H. T. (1994). Comparison of aerodynamic and electroglottographic parameters in evaluating clinically relevant voicing patterns. Annals of Otology, Rhinology, and Laryngology, 103, 335–346. [DOI] [PubMed] [Google Scholar]
- Rosen C. A., Murry T., Zinn A., Zullo T., & Sonbolian M. (2000). Voice Handicap Index change following treatment of voice disorders. Journal of Voice, 14, 619–623. [DOI] [PubMed] [Google Scholar]
- Rothenberg M. (1981). Acoustic interaction between the glottal source and the vocal tract. In Stevens K. N., & Hirano M. (Eds.), Vocal fold physiology (pp. 305–328). Tokyo, Japan: University of Tokyo Press. [Google Scholar]
- Roy N., Gray S. D., Simon M., Dove H., Corbin-Lewis K., & Stemple J. C. (2001). An evaluation of the effects of two treatment approaches for teachers with voice disorders: A prospective randomized clinical trial. Journal of Speech, Language, and Hearing Research, 44, 286–296. [DOI] [PubMed] [Google Scholar]
- Roy N., Merrill R. M., Gray S. D., & Smith E. M. (2005). Voice disorders in the general population: Prevalence, risk factors, and occupational impact. The Laryngoscope, 115, 1988–1995. [DOI] [PubMed] [Google Scholar]
- Roy N., Weinrich B., Gray S. D., Tanner K., Stemple J. C., & Sapienza C. M. (2003). Three treatments for teachers with voice disorders: A randomized clinical trial. Journal of Speech, Language, and Hearing Research, 46, 670–688. [DOI] [PubMed] [Google Scholar]
- Sabol J. W., Lee L., & Stemple J. C. (1995). The value of vocal function exercises in the practice regimen of singers. Journal of Voice, 9, 27–36. [DOI] [PubMed] [Google Scholar]
- Sampaio M., Oliveira G., & Behlau M. (2008). Investigation of the immediate effects of two semi-ocluded vocal tract exercises. Pró-Fono, 20, 261–266. [DOI] [PubMed] [Google Scholar]
- Sauder C., Roy N., Tanner K., Houtz D. R., & Smith M. E. (2010). Vocal function exercises for presbylaryngis: A multidimensional assessment of treatment outcomes. Annals of Otology, Rhinology, and Laryngology, 119, 460–467. [DOI] [PubMed] [Google Scholar]
- Schmidt R. A. (1975). A schema theory of discrete motor skill learning. Psychological Review, 82, 225–260. [Google Scholar]
- Schulz K. F., Altman D. G., & Moher D. (2010). CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. British Medical Journal, 340, c332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwarz K., & Cielo C. A. (2009). Vocal and laryngeal modifications produced by the sonorous tongue vibration technique. Pró-Fono, 21, 161–166. [DOI] [PubMed] [Google Scholar]
- Simberg S., & Laine A. (2007). The resonance tube method in voice therapy: Description and practical implementations. Logopedics Phonatrics Vocology, 32, 165–170. [DOI] [PubMed] [Google Scholar]
- Smith C. G., Finnegan E. M., & Karnell M. P. (2005). Resonant voice: Spectral and nasendoscopic analysis. Journal of Voice, 19, 607–622. [DOI] [PubMed] [Google Scholar]
- Smith S., & Thyme K. (1976). Statistic research on changes in speech due to pedagogic treatment (the accent method). Folia Phoniatrica et Logopaedica, 28, 98–103. [DOI] [PubMed] [Google Scholar]
- Solomon N. P., Helou L. B., Henry L. R., Howard R. S., Coppit G., Shaha A. R., & Stojadinovic A. (2013). Utility of the Voice Handicap Index as an indicator of postthyroidectomy voice dysfunction. Journal of Voice, 27, 348–354. [DOI] [PubMed] [Google Scholar]
- Sovijärvi A. (1966). Äänifysiologiasta ja artikulaatiotekniikasta [On voice physiology and articulatory technique]. Helsinki, Finland: Department of Phonetics, University of Helsinki. [Google Scholar]
- Spiess G. (1904). Kurze anleitung zur erlernung einer richtigen tonbildung in sprache und gesang [Short comment on learning correct sound production in speech and singing]. Frankfurt, Germany: Verlag von Johannes Alt. [Google Scholar]
- Stemple J. C. (1993). Voice therapy: Clinical studies. St. Louis, MO: Mosby Year Book. [Google Scholar]
- Stemple J. C. (2005). A holistic approach to voice therapy. Seminars in Speech and Language, 26, 131–137. [DOI] [PubMed] [Google Scholar]
- Stemple J. C., Lee L., D'Amico B., & Pickup B. (1994). Efficacy of vocal function exercises as a method of improving voice production. Journal of Voice, 8, 271–278. [DOI] [PubMed] [Google Scholar]
- Tapani M. (1992). Resonaattoriputki toiminnallisen ääihäiriön hoitmenetelmänä. Seitsemän naispotilaan seurantatutkimus [Resonance tube as a therapy method for a functional voice disorder. A follow-up study of seven female patients] (Unpublished master's thesis). University of Helsinki, Helsinki, Finland. [Google Scholar]
- Tay E. Y., Phyland D. J., & Oates J. (2012). The effect of vocal function exercises on the voices of aging community choral singers. Journal of Voice, 26, 672.e19–e27. [DOI] [PubMed] [Google Scholar]
- Titze I. R. (1988). The physics of small-amplitude oscillation of the vocal folds. The Journal of the Acoustical Society of America, 83, 1536–1552. [DOI] [PubMed] [Google Scholar]
- Titze I. R. (2001). Acoustic interpretation of resonant voice. Journal of Voice, 15, 519–528. [DOI] [PubMed] [Google Scholar]
- Titze I. R. (2004). A theoretical study of F0-F1 interaction with application to resonant speaking and singing voice. Journal of Voice, 18, 292–298. [DOI] [PubMed] [Google Scholar]
- Titze I. R. (2006a). Theoretical analysis of maximum flow declination rate versus maximum area declination rate in phonation. Journal of Speech, Language, and Hearing Research, 49, 439–447. [DOI] [PubMed] [Google Scholar]
- Titze I. R. (2006b). Voice training and therapy with a semi-occluded vocal tract: Rationale and scientific underpinnings. Journal of Speech, Language, and Hearing Research, 49, 448–459. [DOI] [PubMed] [Google Scholar]
- Titze I. R. (2014). Bi-stable vocal fold adduction: A mechanism of modal-falsetto register shifts and mixed registration. The Journal of the Acoustical Society of America, 135, 2091–2101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Titze I. R., Finnegan E., Laukkanen A.-M., & Jaiswal S. (2002). Raising lung pressure and pitch in vocal warm-ups: The use of flow-resistant straws. Journal of Singing, 58, 329–338. [Google Scholar]
- Titze I. R., & Laukkanen A. M. (2007). Can vocal economy in phonation be increased with an artificially lengthened vocal tract? A computer modeling study. Logopedics Phoniatrics Vocology, 32, 147–156. [DOI] [PubMed] [Google Scholar]
- Titze I. R., & Verdolini Abbott K. (2012). Vocology: The science and practice of voice habilitation. Salt Lake City, UT: National Center for Voice and Speech. [Google Scholar]
- Vampola T., Laukkanen A. M., Horacek J., & Svec J. G. (2011). Vocal tract changes caused by phonation into a tube: A case study using computer tomography and finite-element modeling. The Journal of the Acoustical Society of America, 129, 310–315. [DOI] [PubMed] [Google Scholar]
- Verdolini K. (2000). Resonant voice therapy. In Stemple J. C. (Ed.), Voice therapy: Clinical case studies (2nd ed., pp. 46–61). San Diego, CA: Singular. [Google Scholar]
- Verdolini K., Druker D. G., Palmer P. M., & Samawi H. (1998). Laryngeal adduction in resonant voice. Journal of Voice, 12, 315–327. [DOI] [PubMed] [Google Scholar]
- Verdolini-Marston K., Burke M. K., Lassac A., Glaze L., & Caldwell E. (1995). A preliminary study of two methods of treatment for laryngeal nodules. Journal of Voice, 9, 74–85. [DOI] [PubMed] [Google Scholar]
- Vickers A. J., & Altman D. G. (2001). Statistics notes: Analysing controlled trials with baseline and follow up measurements. British Medical Journal, 323, 1123–1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ziegler A., Verdolini Abbott K., Johns M., Klein A., & Hapner E. R. (2013). Preliminary data on two voice therapy interventions in the treatment of presbyphonia. The Laryngoscope, 124, 1869–1876. [DOI] [PubMed] [Google Scholar]