Skip to main content
Journal of Speech, Language, and Hearing Research : JSLHR logoLink to Journal of Speech, Language, and Hearing Research : JSLHR
. 2024 Sep 25;67(10):3521–3535. doi: 10.1044/2024_JSLHR-23-00727

Floating Ball Voice Therapy: Preliminary Effects on Outcomes and Predicting Individual Patient Differences in Generalization

Jarrad H Van Stan a,b,c,, Robert E Hillman a,b,c, Carol Krusemark a,c, Jason Muise a,c, Tara Stadelman-Cohen a,c, Daryush D Mehta a,b,c, Dagmar Sternad d
PMCID: PMC11482575  PMID: 39320344

Abstract

Purpose:

Floating ball voice therapy (FBVT) is a voice-controlled virtual environment based on a common treatment component across multiple evidence-based therapies: improved vocal efficiency (target) via practicing voicing with modified resonance and airflow (ingredient). This study preliminarily tested FBVT's effects on outcomes and the potential for its novel variability metrics to predict individual patient generalization.

Method:

Ten patients with nonphonotraumatic vocal hyperfunction (NPVH) practiced FBVT for 10 days. Outcomes were assessed by a vocal efficiency ratio, a validated NPVH index, the patient-reported Voice-Related Quality of Life (V-RQOL), and forced-choice auditory judgments of overall severity. Exploration in early practice (Day 1) was estimated by how the patient's two-dimensional variability (mean airflow and intensity) related to error (difference between the patient-produced and normative vocal efficiency ratio). Generalization from the game to spontaneous speech was evaluated using the validated NPVH index.

Results:

Ten days of FBVT were associated with improved vocal efficiency (Cohen's d = 1.3), NPVH index (d = −1.1), V-RQOL total score (d = 0.9), and overall severity (odds ratio = 2.5). Patients who generalized on Day 10 exhibited airflow/intensity exploration that was more aligned with the error gradient on Day 1 (d = 0.6–1.2).

Conclusions:

A relatively small dosage of FBVT (i.e., 10 practice sessions) was associated with multiple improved voice therapy outcomes. The FBVT variability metrics on Practice Day 1 demonstrated strong potential to predict which patients generalized to connected speech. Future work can more thoroughly evaluate effects on outcomes and characterizing the quality of vocal exploration with a larger patient population.

Supplemental Material:

https://doi.org/10.23641/asha.27040873


Many voice disorders are believed to be caused by and/or associated with pathological vocal behaviors in daily life, and voice therapy is the primary treatment option (Ramig & Verdolini, 1998). Successful voice therapy relies on the patient learning or relearning improved vocal motor skills that carry over from the clinic into the patient's activities of daily living, that is, generalization. Multiple voice therapies have been shown to elicit generalization in groups of patients. However, these therapies use very different ingredients such as practicing voicing with forward resonance and/or modified airflow only in maximally sustained vowels (Angadi et al., 2019), while going up and down a speech hierarchy (Roy et al., 2003), or only in spontaneous conversation (Gillespie et al., 2019). Despite these different therapeutic approaches, there is still wide variation in generalization across individual patients, from no effect to large positive effect sizes, that either fade away after a few weeks or persist for years (Roy et al., 1997; Van Lierde et al., 2007). Arguably, objective prognostic metrics that predict which patients may generalize would help understand and reduce this wide range in outcomes.

The field of motor control and learning is rich with theories attempting to quantify how the central nervous system controls and learns new movements (e.g., Schmidt et al., 2018). Motor tasks with redundancy—tasks with multiple ways to achieve success—have potential to offer insights into how people generalize and retain new vocal motor behaviors (Müller & Sternad, 2009). By definition, redundant motor tasks have multiple variables for executing the action (execution variables) that map onto fewer variables that quantify the success of the task (result, e.g., error). This many-to-one mapping creates an infinite number of combinations of execution variables that achieve a given result or outcome. The relation between execution variables and the result creates a manifold of solutions defining those executions that all lead to zero error. To systematically study such motor tasks, virtual environments have been developed, where the physics of the task is mathematically modeled to quantify which subset of execution variables fully determines the error (Sternad, 2015; Sternad et al., 2014). Using this approach, investigators can quantify how subjects learn a motor skill by relating practice-based improvements in performance (i.e., reductions in error) to changes in the execution variables.

All vocal skills targeted in voice therapy are redundant by nature. Patients must learn how to covary multiple execution variables at one level—for example, mean flow, subglottal pressure, vocal intensity—to achieve a desired result defined at another level—for example, improved vocal efficiency ratios (decreased input for the same output). However, current virtual environments for vocal skills are confined to simple tasks focusing solely on error and cannot assess the contribution of different motor learning processes. For example, the software program Visi-pitch (KayPENTAX) allows users to manipulate objects by modifying pitch, loudness, or both simultaneously. Changes in pitch and/or loudness are both the execution variables and the result at the same time. Most voice therapy biofeedback tools simply display a number representing the desired target behavior, for example, jitter and cepstral peak prominence (CPP), and ask patients to modify this number directly, for example, increase, decrease, or stay in a desired range (Ferrand, 1995; Van Leer et al., 2017). Thus, the data provide minimal insight into how vocal performance is achieved, what accounts for any improvements, or how robust a learned behavior is.

Research on the control of redundant motor tasks often characterizes the structure of the ever-present variability in movement, where “structure” refers to the distribution of metrics (e.g., Gaussian, anisotropy) or their temporal evolution (e.g., Brownian motion, pink noise). The human sensorimotor system exhibits variability or noise at multiple time scales as well as at all levels of function (e.g., the cellular physiology of neuronal activation or the accuracy of throwing a ball) and skill (e.g., even expert performance produces trial-to-trial fluctuations; Ajemian et al., 2013; Faisal et al., 2008). Thus, studying distributional and temporal variability of motor performance should provide valuable insights into a subject's sensorimotor function. For example, multiple recent studies have shown, paradoxically, that subjects with larger variability in early practice learned faster (although “variability” was defined differently in each study; Ranganathan et al., 2022). It is believed that a subject's baseline variability represents their exploration, which helps them discover the mapping between execution variables and the result. Applied to voicing, this may mean that aspects of voicing are discovered that then transfer across the speech hierarchy and facilitate generalization. Poor exploration in early practice can be considered like a machine learning model that overtrained or overfit to only part of a feature space and therefore does not generalize well to new data or tasks. Thus, for the motor system to learn and generalize skills, it may not be primarily concerned with simply decreasing variability, but rather selectively channeling variability according to the task demands. In contrast to this growing knowledge about sensorimotor behavior, the study of variability in vocal motor behavior has focused primarily on overall decreases or increases in relation to normal versus pathological performance (Ghasemzadeh et al., 2015; Teixeira et al., 2013; Zhang et al., 2005).

In previous work, a virtual throwing paradigm with redundancy from the motor skill literature was successfully modified and applied to vocal motor learning (Sternad et al., 2014; Van Stan et al., 2017). The results indicated that the evolution of variability during vocal motor learning was very similar to limb motor learning. Subsequently, the virtual throwing paradigm was extended to a different virtual task with direct clinical applications—floating ball voice therapy (FBVT)—which also replicated limb motor control results in relation to variability and learning (Van Stan, Park, et al., 2021). Per Table 1, FBVT represents a common treatment component (i.e., single target and associated ingredients) across multiple diverse evidence-based therapies (Gillespie et al., 2019; Roy & Leeper, 1993; Roy et al., 2003, 2017; Stemple, 2005): A target approximating vocal efficiency and vocal practice ingredients focused on sustained phonation, mean flow, forward resonance, pitch, intensity, and a semi-occluded vocal tract. This common treatment component was identified during semistructured qualitative interviews (Van Stan et al., 2024) based on the Rehabilitation Treatment Specification System (RTSS; Hart et al., 2019; Van Stan, Whyte, et al., 2021). During vocal practice, the floating ball task measured two variables, mean flow (ml) and intensity (dB C), because they were the two most critical to achieving the target of a normative ratio (800 for female and 835 for male participants). Specifically, achieving a normative ratio of mean flow and intensity was considered clinically successful regardless of how long patients sustained their phonation, performed specific pitches at various degrees of accuracy, used various degrees of semi-occlusion, and/or produced various amounts of forward resonance ranging from just noticeable to obvious.

Table 1.

Clinical target (desired change in patient), ingredients (clinician actions to achieve target), and doses (amount of ingredients) from multiple evidence-based therapies underlying the floating ball task.

Treatment
concept
Description
Target (intensity–flow) Normative relationship between vocal intensity in decibels of sound pressure level divided by mean flow, between 0.08 L/s and 0.1 L/s of mean flow.
Ingredient 1 (practice) Provide opportunities to practice voicing (i) between 0.08 L/s and 0.1 L/s of mean flow, (ii) with forward resonance, (iii) as soft as possible, (iv) on semi-occluded vocal tract vowel /o/, (v) sustained voicing, (vi) until run out of air, (vii) at the following pitches: F, F, C, C, D, D, E, E, F, F, G, G at 4th (males) or 5th (females) octave.
Dose: 12 total repetitions
Ingredient 2 (feedback) Provide quantitative feedback on error during practice trial via virtualization of floating ball and target.
Dose: 100% feedback
Ingredients 3–5 (volition) Provide volition ingredients to improve patient's capability and motivation to perform the task.
  • Capability (correct performance): Before playing for the first time, educate the patient that they must (a) take the largest inhalation possible, (b) voice as long as possible, (c) voice as softly as possible, (d) remain on the musical note requested for the entire time, and (e) keep going if their voice breaks. If any of these are not done, the trial will be stopped and restarted. Do not provide clinician models of correct performance, nor education on what type of resonance to use or what amount of mean flow to use. Modeling and coaching would minimize the patient's need to explore the vocal execution space for the “correct” ways of voicing.

  • Capability (correct attention): Ask the patient if they noticed any changes in how their voice sounded or felt in relation to the ball movements on the screen (do not tell them what to see, feel, or hear).

  • Motivation (answer patient questions): Describe underlying rationales for various aspects of the practice when the patient asks for them (do not provide this information unless the patient asks for it). Keep rationales as short as possible, and always directly link them to how it will help the patient's voice.


Dose: Generally, provide as little volition ingredients as possible. Correct performance: provide at the beginning of Day 1 and whenever patient produces a grossly incorrect trial. Correct attention: Provide when patient seems to not be paying attention. Answer questions: Provide when patient asks for information.

The target, a normative mean flow–intensity ratio, approximated vocal efficiency without subglottal pressure in the denominator (Hillman et al., 1989). Subglottal pressure was not explicitly outlined in any of the ingredients or targets in RTSS-based treatment descriptions of the four evidence-based voice therapies underlying the task, so it was not included. The vocal function exercise program provided the most explicit and objective definition of this normative relation and was used to initially quantify the target's mean flow range: voicing between 0.08 and 0.1 L/s (Hirano & McCormick, 1986). To establish a normative range of vocal intensity within this mean airflow range, a previously acquired database was analyzed that contained vocally healthy female (23) and male (24) participants who produced soft, sustained /o/ vowels. The normal laryngeal status of the subjects was confirmed via endoscopy. The normative data set was described previously (Van Stan, Park, et al., 2021). For dosing, the number of practice repetitions differs across the multiple voice therapy protocols. Since the vocal function exercises protocol was the most prescriptive, the task includes 12 maximally sustained vowels at musical pitches in the following order: F, F, C, C, D, D, E, E, F, F, G, G in the third or fourth octaves for male and female participants, respectively.

The goal of this study was twofold: provide an initial evaluation of (a) FBVT's effect on clinical outcomes of vocal efficiency (Titze, 1992), Voice-Related Quality of Life (V-RQOL; Hogikyan & Sethuraman, 1999), and overall auditory perception of voice severity (Kempster et al., 2009) and (b) any associations between the FBVT's variability metrics and individual patient generalization to spontaneous speech. First, we hypothesized that 10 practice sessions will be associated with group-based improved objective voice measures, V-RQOL, and auditory perception of overall severity. The aim of FBVT—that is, per the RTSS, an “aim” is what was hypothesized to be achieved by the ingredients changing the target—is improvement in V-RQOL scores. Although distantly related to the ingredients and targets of FBVT, patient-reported outcome measures are a ubiquitous standard for evaluating overall effectiveness of an intervention; for reviews, see the works of Desjardins et al. (2017), Ramig and Verdolini (1998), and Saccente-Kennedy et al. (2024). Second, we hypothesized that variability metrics based on execution variables in FBVT will better characterize the quality of vocal exploration in early practice than more general measures such as the standard deviation of error. Thus, we postulate that in early practice, FBVT variability metrics will be more associated with individual patient generalization to spontaneous speech than traditional measures of variability.

Method

The institutional review board at Massachusetts General Hospital approved all study procedures, and the institutional review board protocol number is 2015P000361.

Participants

All patients enrolled in this study exhibited nonphonotraumatic vocal hyperfunction (NPVH; Hillman et al., 2020). NPVH is characterized by a host of habitual, chronic voice-related symptoms in daily life (e.g., dysphonia, vocal fatigue) in the absence of signs of phonotrauma or other phonation-disrupting structural or neurological impairments. Thus, NPVH includes common diagnoses such as primary muscle tension dysphonia (pMTD; Verdolini et al., 2006) and functional aphonia or dysphonia (Ruotsalainen et al., 2008). Patients with NPVH were recruited for this study because (a) NPVH is associated with many of the most commonly treated voice disorders (Bhattacharyya, 2014); (b) the treatments that influenced the design of the floating ball task are based on multiple studies demonstrating improvements in this patient population (Angadi et al., 2019; Gillespie et al., 2019; Roy & Leeper, 1993; Roy et al., 2003, 2017); and (c) patients with NPVH have anatomically normal vocal folds, that is, physiologically, they should be capable of matching the game's target based on normative data. The FBVT was based on therapies that have treated patients with various types of vocal hyperfunction including those traditionally labeled as secondary muscle tension dysphonia (sMTD; e.g., reactive to nodules, polyps, sulci, paresis, granuloma) and pMTD (also called NPVH). The developers of the therapies underlying the floating ball therapy completed in-depth RTSS qualitative interviews and reported that the treatment ingredients, targets, and aims are not different based on whether the patient is diagnosed with pMTD, sMTD, or a specific structural or neurological diagnosis underlying the sMTD (Van Stan et al., 2024). Diagnoses are based on an evaluation by a laryngologist and speech-language pathologist (SLP) at the Massachusetts General Hospital Voice Center that aligns with the American Speech-Language-Hearing Association's recommendations (Patel et al., 2018).

Patients with NPVH were included in the study if they had secondary diagnoses of laryngopharyngeal reflux and/or gastro-esophageal reflux disease. Patients were excluded if they had any secondary diagnoses related to structural or neurological disorders in their case history or seen during laryngoscopy: specifically laryngitis, loss of superficial lamina propria, benign lesion, polyp, cyst, dysphagia, sulci, paradoxical vocal fold motion, any mention (confirmed or possible) of upper airway paralysis or paresis, polypoid corditis, keratosis, presbylarynx, fibrovascular changes, leukoplakia, injury of the recurrent or superior laryngeal nerve, or history of radiation or neurological impairment (Kridgen et al., 2021). Patients with NPVH who enrolled in this study included eight female and two male participants with a mean age of 28 years, ranging from 21 to 35 years. Eight patients reported vocal fatigue that got worse over the course of the day/week and occasional dysphonia during vocal fatigue, while two patients reported constant vocal fatigue and dysphonia. The treating therapist provided a single rating for each patient's overall severity judgments using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V; Kempster et al., 2009). The group's mean (range) overall severity score was 10.6 (2–25). These perceptual ratings were only included to provide a general description of the group's mild overall auditory severity and were not used in any subsequent analyses. Despite the overall mild dysphonia, many of these patients reported that their voice disorder had obvious effects on their daily life. Patient-reported total V-RQOL mean (range) at baseline was 65.5 (22.5–92.5). Ten patients in total were enrolled to preliminarily estimate effect sizes for future power analyses.

FBVT: Experimental Setup

Figure 1A shows the experimental setup for performing the virtual floating ball task. Each subject's oral airflow was recorded using a customized pneumotachograph (Phonatory Aerodynamic System, Model 6600, PENTAX) and a Glottal Enterprises flow sensor (pressure transducer PT-2E and Model MS-110, Glottal Enterprises) attached to the pressure tap most proximal to the subject's mouth. Vocal intensity was recorded using a unidirectional, cardioid condenser microphone (MKE104, Sennheiser Electronic GmbH) placed 10 cm from the distal end of the pneumotachograph. Both signals were input into a laptop (XPS 13, Dell) running Windows 10 (Microsoft). The native laptop sound card was bypassed by using an external soundcard adaptor (ICUSBAUDIO, StarTech). A miniature accelerometer (BU-27135, Knowles Electronics) was adhered to the patient's anterior neck above the sternal notch using hypoallergenic double-sided tape (Model 2181, 3 M).

Figure 1.

A. 3 photographs depict the experimental setup of the floating ball game. The participant wears a nose clip and holds a pneumotach with the right hand. The participant blows through an adaptor tube attached to the pneumotach. An accelerometer is attached to the participant's neck and a flow sensor is connected to the pneumotach. An inset shows a ball floating above a ring like structure at the end of the pneumotach. A microphone is attached to the tip of the device. Data collected from the experimental setup is transmitted to a laptop computer. B. 5 illustrations depict the heights attained by the ball in the floating ball game. Levels Q subscript T, A subscript T, Q, and A are marked. Height h is 200 pixels. The level of the orange circle is Q. The levels of the black, blue, green, and red circles are below Q subscript T. C. The image displays the time domain waveforms for the mean flow in liters per second, vocal intensity in decibels, and the height and error of the pixels. D. A three-dimensional surface plot. The horizontal axes represent the mean flow in liters per second and the vocal intensity in decibels. The vertical axis represents the error in pixels. Small grey circles, and large circles colored red, green, blue, black, and yellow are marked.

(A) Subjects control a floating ball with mean flow and vocal intensity during sustained voicing. Inset shows the real-life floating ball game. (B) Five schematic examples illustrate how mean flow (ball height; orange Q) and vocal intensity (ball amplitude; vertical arrows; orange A) control the floating ball. The black target rectangles represent the desired mean flow (height, gray QT) and intensity (vertical length of the rectangle, gray AT) based on the normative ratio. (C) Time series of one trial. Top: mean flow (purple line); vocal intensity (orange line). Middle: amplitudes of ball oscillation produced by the subject's mean flow and vocal intensity (gray lines); the height of the target rectangle (black lines). Bottom: Error measured in pixels (pix) of the display. (D) Execution space indicating the error for all combinations of mean flow and vocal intensity. The large colored circles represent the four examples in Panel B; the small gray circles represent the time series in Panel C, plotted as values in 200-ms analysis windows.

Before starting a session, the signal from the pressure transducer (flow sensor) was calibrated for estimating airflow in units of liters per second (L/s) using reference airflow levels (MCU-4 Pneumotach Calibration Unit, Glottal Enterprises). The acoustic signal was calibrated using two complex tones at increasing intensity levels measured by a Class 2 sound-level meter (NL-20, RION) to map the uncalibrated voltage signal to units of pascal and C-weighted decibels (dB C) at 10 cm. The software processed both airflow and microphone signals (recorded with a 10-kHz low-pass filter, 22,050-Hz sampling rate, and 16-bit quantization) over 50-ms nonoverlapping windows to produce quasi–real-time estimates of mean airflow (L/s) and vocal intensity (dB C). The MS-110's amplitude modulation (8-kHz carrier frequency) was used to preserve the DC offset in the flow signal, and the signal was demodulated in real time by the custom virtual task software on the laptop. Subjects wore a noseclip to prevent airflow through the nose during voicing.

FBVT: Virtualization of Treatment Theory

During gameplay, subjects vocally controlled a computational model of a floating ball—inspired by the commonly used “flowball” device (Lã et al., 2017)—with sustained phonation. During sustained phonation, the ball dynamics were fully determined by the subject's vocal intensity A (pixels), translated into the amplitude of ball oscillation in pixels, and mean flow Q (pixels), represented by the mean ball height in pixels. The ball position along the vertical y-axis (y)—oscillating at a frequency f (hertz) of 2 Hz (Lã et al., 2017)—at any point in time t (milliseconds) during a trial was determined by A and Q per Equation 1:

yt=Aicos2πft+Qi. (1)

As shown in Figure 1B, the two aspects of a target rectangle in the floating ball task—(a) vertical up/down movements of the rectangle on the screen and (b) vertical lengthening and shortening of the rectangle itself—implemented a normative mean flow–vocal intensity relation. The vertical position of the target rectangle on the monitor represented the desired mean flow; it only moved up and down on the screen if the subject voiced between 0.08 and 0.1 L/s; zero error was only possible within this range of mean flow. When the subject produced higher or lower mean flow values, the target rectangle remained at the height for 0.1 L/s and 0.08 L/s (respectively), but the ball traveled above or below the rectangle, respectively. Figures 1B, 1C, and 1D show examples of ball oscillation above, within, and below the target rectangle. The vertical length (i.e., distance between the upper and lower bounds) of the target rectangle represented a normative vocal intensity throughout the mean flow range of 0.08–0.1 L/s. When the subject produced a mean flow within the 0.08–0.1 L/s zone, the distance between the upper and lower bounds of the rectangle represented the vocal intensity equal to the mean flow multiplied by 800 or 835, that is, the mean normative female or male ratio, respectively. When the subject produced a mean flow above or below the 0.08- to 0.1-L/s zone, the distance between the rectangle's upper and lower bounds represented the vocal intensity equal to 0.1 L/s or 0.08 L/s (80 dB and 64 dB for female participants; 83.5 dB and 67 dB for male participants), respectively.

Error E was evaluated in pixels at all points in time t by calculating the Euclidean distance between the subject-produced mean ball height Q and oscillation amplitude A versus the target rectangle's height and distance between the upper/lower bounds, QT and AT, respectively (see Figure 1B), according to Equation 2:

Et=QQT2+AAT2. (2)

All variables were represented as pixels in Equation 2. Since the virtual environment allowed a direct mathematical mapping between the two execution variables (mean flow and vocal intensity) and the resulting error, this error could be portrayed with a color code on a two-dimensional execution space (Figure 1D).

Study Design

Patients completed this study after their laryngology evaluation, but before their SLP evaluation. Stimulability testing occurs during the SLP's evaluation, and we did not want that testing to impact patient's exploration during the Day 1 practice. Patients came to the Voice Center to complete 10 practice sessions in person, and these practice sessions could be maximally separated by 3 days. Each session lasted 20–30 min and included 12 trials of the virtual task. During Practice Days 1 and 10, patients produced 30 s of voicing during a spontaneous speech sample. The spontaneous speech sample was collected before practice on Day 1, and both before and after practice on Day 10. Patients were asked to provide a speech sample twice on Day 10 for two reasons: (a) It allowed investigators to evaluate any immediate post-effects of semi-occluded vocal tract exercises, and (b) it provided two chances for the patients to demonstrate improved/generalized voicing. The speech sample was standardly elicited with the prompt, “Tell me what you are going to do after you leave here today.” This question was chosen because it required minimal cognitive load, as increased cognitive load has been shown to reduce vocal performance (MacPherson et al., 2017) and could impede generalization. Spontaneous speech samples during Day 1 and Day 10 were obtained by the treating SLP.

Per the study protocol as outlined in Table 1, the SLP was instructed to provide minimal assistance, rarely providing instructions or feedback throughout the 10 practice sessions with only the following exceptions: (a) provide instructions on how to play the game at the beginning of Practice Session 1 and (b) stop grossly incorrect voicing when it occurred, inform the patient what was incorrect, and ask the patient to restart the trial. SLP feedback, cueing, coaching, and so forth were intentionally minimized to avoid influencing the patient's exploration in early practice and to test the utility of the FBVT's quantitative feedback/visualization in isolation. The patients were not provided clinician models of correct performance, nor educated on what type of resonance to use or what amount of mean flow to use. Modeling and coaching in these ways would minimize the patients' need to explore the execution space for the “correct” manners of voicing, that is, reduce estimates of exploration. An individual trial was considered incorrect and would be restarted if the subject did a shallow breath before starting the trial, produced loud voicing, strayed from the required pitch by more than 2–3 semitones, stopped the prolonged voicing before running out of air, or produced multiple consecutive seconds of moderate (or worse) roughness, strain, breathiness, or nonmodal phonation (e.g., vocal fry, diplophonia). Patients were asked to stop or redo very few practice trials: mean (range) of 8% (2%–14%) of trials. Most repeated trials were in the latter half of practice sessions (Sessions 6 through 10) and occurred because patients stopped voicing before all respiratory volumes were expelled.

Analysis of Distributional Variability

Figure 2 illustrates the distributional variability metrics of tolerance cost (T-Cost) and noise cost (N-Cost; R. Cohen & Sternad, 2009). T-Cost captures a cost due to the data not being at the best “place” in the execution space. T-Cost is estimated by generating an optimized data set in which the mean vocal intensity and flow were shifted in the execution space to the location yielding the best overall result. More specifically, the execution space was parsed into a grid of 1500 × 1500 points (the boundaries of this grid were determined by the limits of the task). The data set was then shifted across this grid through every possible center point and evaluated by its mean result at each location. Note that the dispersion in execution space was preserved during this process. The location that produced the best (lowest) overall mean error was compared to the actual data set. The algebraic difference between the actual mean error and the optimal mean error defined T-Cost and expressed how much the data could have improved its performance if it had been at a different location in the execution space.

Figure 2.

6 surface plots. In all the plots, the horizontal axes represent the mean flow in liters per second and the vocal intensity in decibels. The vertical axis represents the error in pixels. The first 2 graphs plot the results for early practice when T-cost and N-cost are 45, and 2, respectively. The third and fourth graphs plot the results for mid-practice when T-cost and N-cost are 3, and 22, respectively. The fifth and sixth graphs plots the results for late practice when T-cost and N-cost are 2 and 2, respectively. Grey data points are marked in each plot. Red and blue data points correspond to T-cost and N-cost, respectively.

Example and optimized sets of three practice trials from one subject. The left column shows data optimized in terms of tolerance cost (T-Cost), and the right column shows data optimized in terms of noise cost (N-Cost). Gray circles represent 200-ms segments of the subject's actual sustained voicing, and the red/blue circles represent surrogate data with one component optimized (T- or N-Cost, respectively). The top, middle, and bottom panels show data from Practice Days 1, 4, and 10, respectively.

N-Cost is the cost to overall performance due to nonoptimal stochastic variability in the execution space. N-Cost was estimated by generating an optimized data set in which variability was reduced in a stepwise manner to achieve the least possible mean error, while leaving overall mean vocal intensity and flow unchanged. Though one would expect that all data sets should be best when reduced to a single point (the mean vocal intensity and flow), this expectation does not hold, as a data set with a small distribution may produce the lowest mean error depending on the underlying geometry of the solution space. In the numerical procedure, the radial distance for every data point to its mean was divided into 100 steps. Then, all data points were shrunk toward their mean at 1% intervals, and the mean error was evaluated at each interval. The algebraic difference between the mean of the interval that produced the lowest mean error (optimized data set) and the original data set defined N-Cost. This value expresses how much the data could have improved if only their dispersion had been reduced.

NPVH Index

Objective improvements during the practice trials, and generalization of these improvements into spontaneous speech samples, were estimated from the neck-skin acceleration (ACC) signal using a validated NPVH index. The NPVH index previously used the ACC signal from daily life to discriminate (a) between patients with NPVH from their age-, sex-, and occupation-matched vocally healthy controls, (b) within patients with NPVH before and after therapy, and (c) patients with phonotrauma who were accurately classified as “not NPVH” (Van Stan, Ortiz, et al., 2021). The NPVH index relies on the patient's modal H1–H2 (difference between the first and second harmonic magnitudes) to estimate the degree of vocal fold closure during voicing (Klatt & Klatt, 1990) and the mean CPP to estimate a broad range of inefficient periodic and aperiodic voicing (Awan et al., 2013). The combination of H1–H2 mode and CPP mean ostensibly represents the continuum of ways patients can voice inefficiently without increased risk of phonotrauma, such as vocal fry (low H1–H2/typical CPP), breathy voicing (high H1–H2, low CPP), and so forth. The probability p of a patient's data being classified as coming from a patient or not results from a logistic transformation of the patient's daily CPP mean C and H1–H2 mode H, represented in Equation 3:

p=11+e0.781C0.230H+17.184 (3)

To represent the output of the NPVH index, the logit was used instead of the probability. As shown in Equation 4, the logit Lt is an inverse transformation (i.e., link function) of the nonlinear probability p estimate:

Lt=logp1p (4)

The logit was used because changes in LtLt) can be interpreted equally throughout the scale. In contrast, this was not true for the probability, where, for example, a reduction in p from .99 to .98 (ΔLt = −0.31) represented a much larger improvement than a reduction in p from .50 to .49 (ΔLt = −0.02). Note that negative values of Lt represented voice use in the normative range. The index output a logit, typically ranging between −2 and +2, where higher values were associated with higher degrees of NPVH.

The NPVH index was used as an objective measure of generalization, instead of the FBVT's loudness–mean airflow ratio, for multiple reasons. First, the classifier was deliberately designed to represent the putative pathophysiology of NPVH, and this specificity to NPVH was empirically validated on patients (Van Stan, Ortiz, et al., 2021). In contrast, various vocal efficiency ratios have been associated with vocal hyperfunction (VH) in general (Espinoza et al., 2020; Titze, 2013). Second, the classifier was trained and validated on voice use in daily life, which is ostensibly the most ecologically valid way to assess generalization. A vocal efficiency ratio based on ambulatory monitoring will require a robust method of estimating mean airflow and/or subglottal pressure from the ACC signal, which is an active area of current research (Cortés et al., 2022; Donhauser et al., 2024; Fryd et al., 2016; Ibarra et al., 2024).

Statistics

Since this was a proof-of-concept study, all group-based comparisons across therapy were described with effect sizes. Cohen's d effect sizes were calculated for changes in continuous variables. Specifically, comparisons included the patient's mean error, T-Cost, N-Cost, and NPVH index across 12 practice trials on Day 1 versus those on Day 10. Total V-RQOL scores before Day 1 and after Day 10 were not means, as the patient-reported scale provided one summary value for each completion. Traditional cutoffs of small, medium, and large Cohen's d values were 0.2, 0.5, and 0.8, respectively (J. Cohen, 1988).

A forced-choice auditory assessment was completed to evaluate any changes in overall severity. Three voice-specialized SLPs listened to each patient's spontaneous speech sample before and after therapy (20 total samples), two different times for intrarater reliability (40 total ratings). The SLPs were blinded to the recording's treatment status, as the order of pretherapy and posttherapy recordings was randomized. The SLPs were given a forced-choice design by asking them to listen to two recordings and choosing which one was “better” in terms of the CAPE-V's scale for “overall severity.” They were provided the CAPE-V's definition of overall severity to guide their judgments. Fleiss kappa and Cohen's kappa quantified interrater and intrarater agreement, respectively. Traditional cutoffs for moderate, substantial, and almost perfect agreement were set to 0.4, 0.6, and 0.8, respectively (J. Cohen, 1988). Odds ratios quantified the changes in perceptual judgments before versus after therapy. Traditional cutoffs of small, medium, and large odds ratios were 1.68, 3.47, and 6.71, respectively (Chen et al., 2010).

Individual patients were considered to have generalized if they showed a clinically meaningful improvement, that is, a decrease, in the NPVH index (≤ −0.28 logits) between their Day 1 versus Day 10 sustained vowel trials and spontaneous speech recordings. A decrease of at least 0.28 logits was considered clinically meaningful as this was the average voice therapy effect in a previous study (Van Stan, Ortiz, et al., 2021).

It was hypothesized that the patients who generalized after Day 10 practice, according to the NPVH index, would demonstrate higher normalized T-Costs and lower normalized N-Costs in Day 1 practice compared to patients who did not generalize. The quality of exploration in early practice was estimated by T-Cost and N-Cost in the first practice session. Specifically, high T-Cost and low N-Cost represented task exploration that should benefit generalization and low T-Cost and high N-Cost represented exploration with less potential benefit for generalization. A patient with high T-Cost in early practice represents beneficial exploration as substantial improvement would be possible by simply moving her current behavior closer to the solution manifold. In contrast, low T-Cost in early practice means that the patient's strategy for exploring the task cannot simply be shifted to improve performance. Low N-Cost in early practice would also indicate beneficial exploration as the patient's dispersion would be well matched to the error gradient, that is, little improvement would be possible by minimizing variability. In contrast, high N-Cost in early practice means that the patient will need to minimize their dispersion to improve performance, which would limit their ability to explore. We obtained the mean T-Cost and N-Cost across all trials per patient during Practice Day 1 (12 trials). To compare across patients, the costs were normalized by dividing them by the patient's mean error across all trials during Day 1. As a comparison measure, standard deviation of error was calculated on 12 trials from Practice Day 1 and normalized by the mean error. It was hypothesized that the standard deviation would not be different between patients who generalized and those who did not generalize because standard deviation is agnostic to the airflow-loudness solution manifold and error gradient.

Results

Individual patient data. All individual patient data for all outcome measures before and after the floating ball therapy can be found in Supplemental Materials S1 and S2.

Duration of therapy. The 10 patients completed the 10 days of practice with a mean of 15 days, minimum of 10 days, and maximum of 20 days.

Changes in outcomes. Large group-based improvements were shown in error, cost metrics, NPVH index, and V-RQOL after 10 sessions of FBVT. Figure 3 shows the patient's mean error, T-Cost, and N-Cost across practice. The patients improved in the FBVT's normative mean flow–vocal intensity ratio after 10 days of practice (mean change in error = −30 pixels, d = −1.3). As seen in Figure 3, the evolution of the costs across practice replicated previous studies (Abe & Sternad, 2013; Müller & Sternad, 2009; Van Stan et al., 2017; Van Stan, Park, et al., 2021). T-Cost was reduced quickly during early stages of learning. N-Cost became the more significant contributor to error in middle-to-late practice and slowly decreased during later stages of learning. Compared to Practice Day 1, both mean costs were reduced on Day 10 (T-Cost: −8 pixels, d = −0.9; N-Cost = −7 pixels, d = −1.8). The NPVH logit decreased when comparing sustained vowel voicing from Day 1 practice versus Day 10 practice (mean change = −0.59, d = −1.1). Illustrated in Figure 4, the patients improved their total V-RQOL score after 10 practice days (mean change = 19.75, d = 0.89). Auditory SLP judgments of overall voice severity yielded moderate intrarater agreement (Cohen's kappa = 0.47) and moderate interrater agreement (Fleiss kappa = 0.4). The posttherapy spontaneous speech samples were more likely to be rated as better compared to pretherapy spontaneous speech samples with a small effect size (odds ratio = 2.43).

Figure 3.

The mean value of N-Cost drops from 12 on day 1 to 8 on day 10. The mean value of T-Cost drops from 11 on day 1 to about 5 on day 10. The mean value of the error drops from 42 on day 1 to about 18 on day 10.

Mean group error (black), T-Cost (red), N-Cost (blue), and range (error bars) for the 10 patients with nonphonotraumatic vocal hyperfunction (NPVH) over 10 days of practice. Error range on Day 1 goes up to 120 pixels (represented by arrowhead).

Figure 4.

A dot plot depicts the mean value of voice related quality of life ratings by assessments. The mean value is 65 on day 1 and 85 on day 190.

Mean Voice-Related Quality of Life (V-RQOL) ratings for the 10 patients with nonphonotraumatic vocal hyperfunction (NPVH) on Practice Days 1 and 10. Error bars represent range.

Predicting generalization to spontaneous speech. In Table 2, after 10 practice sessions, five patients showed quantitative evidence of generalization into spontaneous speech per the NPVH index. Patients who generalized, compared to those who did not generalize, exhibited higher T-Cost (d = 1.2) and lower N-Cost (d = −0.6) during Practice Session 1. While these effect sizes are strong, they do not show a perfect separation between the groups' Day 1 behavior (high T-Cost/ low N-Cost vs. low T-Cost/high N-Cost) and generalization, which is to be expected. Standard deviation during Practice Session 1 did not appear to be different between those who generalized and those who did not (d = −0.02).

Table 2.

Individual patient performance on practice Day 1 and change in nonphonotraumatic vocal hyperfunction (NPVH) index (Day 10 minus Day 1) in floating ball voice therapy trials and spontaneous speech recordings.

Patient Day 1 floating ball trials
Day 1 vs. Day 10 floating ball trials
Day 1 vs. Day 10 spontaneous speech
T-Cost/error normalization N-Cost/error normalization SD/error normalization ∆ NPVH index ∆ NPVH index
1 0.27 0.38 0.20 −0.54 −0.66a
2 0.22 0.40 0.24 −0.67 −0.39a
3 0.50 0.21 0.35 −0.99 −0.62a
4 0.13 0.56 0.29 −0.95 −0.44b
5 0.47 0.16 0.27 −1.05 −0.34c
6 0.12 0.64 0.25 −0.93
7 0.14 0.60 0.40
8 0.09 0.32 0.29
9 0.09 0.66 0.18
10 0.33 0.11 0.23 −1.23

Note. Em dashes represent no significant change. T-Cost = tolerance cost; N-Cost = noise cost; SD = standard deviation.

a

Index improved both before and after Day 10 practice.

b

Index improved only before Day 10 practice.

c

Index improved only after Day 10 practice.

Discussion

Outcomes. The first aim of this study evaluated the FBVT's effects on clinical outcomes. On average, the patients demonstrated improved vocal efficiency ratios (i.e., significantly reduced error), NPVH indices, and self-reported V-RQOL total scores with large effect sizes. For example, Van Stan and colleagues (Van Stan et al., 2022) found that some individual patients strongly retained their voicing improvements in daily life with large effect sizes. The strength of these group-based results was surprising since the total therapy dose (10 practice sessions across 2–3 weeks of time) was well below typical. For example, evidence-based voice therapies often take 4–6 weeks with home programs that ask patients to practice at least every day (Desjardins et al., 2017; Ramig & Verdolini, 1998). FBVT might increase the speed of vocal motor learning compared to traditional voice therapy approaches for two reasons. First, it provided real-time objective feedback with a very specific task goal (i.e., the normative ratio), which was viewed by the patients as concrete, reliable, and minimally ambiguous. The voice therapy process currently relies upon the patient's and clinician's often unreliable or variable subjective judgments. Furthermore, patients often reported that vocal exercises can be abstract, difficult to understand, challenging to produce, and hard to judge (van Leer & Connor, 2010). Second, the FBVT's metrics are directly aligned with a treatment component that is common across multiple evidence-based therapies for VH: improved vocal efficiency. Often, voice-related biofeedback approaches measure pitch, loudness, and/or CPP—for example, Visi-pitch (KayPENTAX), smartphone apps (Van Leer et al., 2017)—which are indirectly related to treatment targets for VH. For example, patients can produce inefficient voicing at any pitch, loudness, or CPP value (Roy & Hendarto, 2005; Van Stan et al., 2015; Zañartu et al., 2014).

Generalization. Despite the absence of practice in spontaneous speech and only a small dose of treatment (only 10 practice sessions), there was strong evidence of generalization. The SLPs were more likely to rate patients' posttherapy spontaneous speech recordings as having lower overall auditory-perceptual severity compared to pretherapy recordings, albeit with a small effect size and moderate agreement. The small effect size is a reasonable result for multiple reasons. Most of the patients exhibited minimal-to-mild auditory-perceptual ratings of overall severity at baseline and only had a “small” amount of improvement possible. Also, the patients' complaints were mainly vocal fatigue/voice degradation as their days or weeks wore on, not constant or obvious dysphonia.

While the group of patients improved on all outcome measures, half of the patients showed quantitative evidence of generalization into spontaneous speech per the NPVH index. Since voice therapy studies overwhelmingly have only reported group-based measures, it was unclear if 50% generalization according to a quantitative voice measure was a strong, typical, or weak finding compared to previous literature. Although five patients did not exhibit quantitative evidence of generalization, none of these patients self-reported a worsening of their V-RQOL after therapy.

FBVT includes a semi-occluded vocal tract (SOVT), and SOVTs have been associated with immediate improvements in voicing that are not putatively skill based, but physiological in nature (e.g., impedance matching, resistance to phonation; Titze, 2006). To evaluate this possibility, patients provided a spontaneous speech sample before and after practice trials on Day 10. The SOVT's physiological effect did not seem to benefit generalization measures since the NPVH index was higher (worse) after the practice (d = 0.35, small effect size), and most patients who generalized did so both before and after practice on Day 10. Evidence of generalization from sustained vowel trials to spontaneous speech appeared to contradict classical motor learning studies suggesting that task-specific practice (e.g., spontaneous speech, not sustained vowel) was necessary for generalization (Bayona et al., 2005). These previous studies mainly reported group-based differences, but careful attention to members within the groups has shown that many individuals demonstrated strong treatment effects despite being in the “worse” group. For example, Van Stan et al. (2022) found that some individual patients strongly retained their voicing improvements in daily life, despite being given the highest dose of feedback; that is, typically retention is more strongly associated with lower amounts of feedback (Salmoni et al., 1984).

Characterize vocal exploration. The second aim of this study assessed any relationship between the quality of vocal exploration in early practice, measured by the cost metrics, and generalizing vocal improvements to spontaneous speech after 10 practice sessions, measured by the NPVH index. Results supported the hypothesis about beneficial exploration: Early practice trials with higher T-Cost and lower N-Cost values were associated with generalization from sustained vowels to spontaneous speech. Importantly, the standard deviation of error was not related to generalization, probably because it is agnostic to the mapping between execution variables and error.

Figure 5 illustrates how the cost metrics capture the range of “beneficial to nonbeneficial” exploration in two examples. Patients 3 and 9 began at similar levels of error on Day 1 (33 and 30 pixels) but generalized or not (respectively) during in-clinic spontaneous speech after 10 days of practice (see Table 2). Patient 3's distribution showed beneficial exploration as substantial improvement was possible by simply moving her current behavior (gray data) closer to the solution manifold (red data), rendering a large drop in error, estimated by high T-Cost. Her dispersion was well matched to the error gradient, as indicated by the blue data and low N-Cost. Patient 9's voicing distribution did not represent beneficial exploration as simply shifting her behavior (gray data) on the solution manifold (red data) did not improve her error (low T-Cost). Her dispersion was not well matched to the error gradient as indicated by the blue data and high N-Cost. It could be hypothesized that, because Patient 9's behavior covered a larger area of the execution space for the same amount of average error, she mapped out more of the error gradient in early practice than Patient 3. However, the location of her performance in the execution space (T-Cost) and variability around that location (N-Cost) suggests she may have primarily noticed how large changes (not small changes) in error relate to loudness and airflow modifications. To further support these interpretations about error sensitivity, future work could include temporal measures to investigate variations in persistent or antipersistent structure (Abe & Sternad, 2013).

Figure 5.

2 surface plots. In both plots, the horizontal axes represent the mean flow in liters per second and the vocal intensity in decibels. The vertical axis represents the error in pixels. The first plot is for patient number 3 and the second plot is for patient number 9. Grey, red, and blue data points represent the error, T-cost, and N-cost, respectively. For the first plot, error equals 33 pixels, t-cost equals 23 pixels, and n-cost equals 5 pixels. For the second plot, error equals 30 pixels, t-cost equals 3 pixels, and n-cost equals 29 pixels.

Left panel: Patient 3's distribution of her current behavior (gray data) as well as its ideal location (red data) and amount of stochasticity (blue data), which represents beneficial exploration. Right panel: Patient 9's distribution of her current behavior (gray data) as well as its ideal location (red data) and amount of stochasticity (blue data), which does not represent beneficial exploration. Circles = 200-ms segments of the original and optimized sustained voicing data.

Future Work

More studies with a larger patient population should be conducted to replicate current findings with more statistical power and test the FBVT's performance in comparison to standard of care voice therapy or other established evidence-based therapies. Since the treatment is entirely quantitative and virtual, there is potential to develop model systems such as computational learning models to provide causal insights into motor learning—for example, Bayesian approaches using a Kalman filter (Abe & Sternad, 2013; He et al., 2016)—or computational vocal fold models to gain additional physiological insights into vocal motor learning (Titze & Story, 2002; Zañartu et al., 2014). Metrics associated with generalization could help objectively explain variations in individual patient outcomes, evaluate an individual patient's best treatment option, and/or evaluate their readiness for discharge.

Virtual learning metrics could identify subgroups of VH, such as those with aberrant motor learning and/or control. For example, some patients with VH may have impaired sensorimotor function, not in a “physical” sense (paresis, paralysis), but in a “control” sense (decreased sensitivity to vocal error). Studies have shown that VH can be associated with reduced perceptual and motor sensitivity to vocal intensity and frequency (Abur et al., 2021). If the virtual task identifies VH subgroups with decreased error sensitivity, future work could design treatment strategies such as error amplification methods that implicitly call attention to the user's reduced error sensitivity (Hasson et al., 2016; Thorp et al., 2017). Patients in this study received feedback on every practice trial—that is, 100% feedback frequency—which can reduce generalization at a group level compared to less frequent feedback (Salmoni et al., 1984). Thus, future work could include practice trials without the visualization to evaluate if it facilitates generalization.

Limitations

One limitation of this study was the small sample size, so the strength of any differences could change as future studies collect data on more patients. Measuring generalization with an ambulatory voice monitoring classifier has high ecological validity. However, it is not the exact same measure as the FBVT's vocal efficiency error metric. The FBVT's error metric cannot be adapted to ambulatory data because (to the authors' knowledge) measuring or estimating mean airflow in daily life is currently not possible, though it is an active area of research with promising advances (Cortés et al., 2022; Fryd et al., 2016; Lin et al., 2019; Marks et al., 2020). Future work could try to identify aspects of the neck-skin acceleration signal that correlate to the FBVT's desired ratio, which would enable a more direct measure of generalization in daily life. The patients in this study were all judged to have minimal or mild dysphonia. Thus, how the FBVT works in patients with moderate and severe dysphonia will need to be approached in future work. These patients may need facilitating gestures such as throat clear, humming, cues for clear speech, lip trills, and so forth to first change their overall manner of voicing before starting therapy with the floating ball voice game.

Summary and Conclusions

This study provided a preliminary test of the FBVT's effects on voice therapy outcomes and the potential for its novel variability metrics to predict individual patient generalization based on exploration in early practice. FBVT was associated with a normalized ratio of mean airflow and intensity during voicing, improved NPVH index, improved overall auditory-perceptual voice severity, and improved total V-RQOL in a group of patients with NPVH. Five out of 10 patients showed quantitative evidence of generalizing (i.e., NPVH index) into spontaneous speech. The cost metrics demonstrated strong potential to predict individual patient's generalization. Specifically, patients who generalized into spontaneous speech on Day 10 exhibited airflow/intensity exploration that was more aligned with the error gradient on Day 1. Future work can continue to evaluate the FBVT's efficacy and effectiveness, as well as its potential to predict variation in outcomes across individual patients.

Data Availability Statement

De-identified overall average values for variables in this study are included in Supplemental Materials S1 and S2. Because the data are from human subjects, more detailed data may be available upon request, if appropriate, and require a data use agreement.

Supplementary Material

Supplemental Material S1. Individual patient average values Error (pixels), T-Cost (pixels), N-Cost (pixels), and V-RQOL (arbitrary units) on days 1 and 10 of practice.
JSLHR-67-3521-s001.pdf (393.7KB, pdf)
Supplemental Material S2. Individual NPVH indices (logit) for patients during early practice trials, late practice trials, and spontaneous speech during day 1, before day 10 practice, and after day 10 practice.
JSLHR-67-3521-s002.pdf (392.1KB, pdf)

Acknowledgments

This work was supported by the Voice Health Institute and the National Institutes of Health (NIH) under Grant Nos. R01-HD045639 and R37-HD087089 (PI: Dagmar Sternad), R33-DC011588 and P50-DC015446 (PD: Robert Hillman; PI: Jarrad Van Stan), and R21-DC016124 and R01-DC020247 (PI: Jarrad Van Stan). The article's contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. We would like to thank Matthew Jarvis for his assistance in developing the software for the FBVT and postprocessing algorithms and Rakshith Lokesh for his assistance in the creation of the figures.

Funding Statement

This work was supported by the Voice Health Institute and the National Institutes of Health (NIH) under Grant Nos. R01-HD045639 and R37-HD087089 (PI: Dagmar Sternad), R33-DC011588 and P50-DC015446 (PD: Robert Hillman; PI: Jarrad Van Stan), and R21-DC016124 and R01-DC020247 (PI: Jarrad Van Stan). The article's contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

References

  1. Abe, M. O., & Sternad, D. (2013). Directionality in distribution and temporal structure of variability in skill acquisition. Frontiers in Human Neuroscience, 7, Article 225. 10.3389/fnhum.2013.00225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Abur, D., Subaciute, A., Kapsner-Smith, M., Segina, R. K., Tracy, L. F., Noordzij, J. P., & Stepp, C. E. (2021). Impaired auditory discrimination and auditory–motor integration in hyperfunctional voice disorders. Scientific Reports, 11(1), Article 13123. 10.1038/s41598-021-92250-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ajemian, R., D'Ausilio, A., Moorman, H., & Bizzi, E. (2013). A theory for how sensorimotor skills are learned and retained in noisy and nonstationary neural circuits. Proceedings of the National Academy of Sciences, 110(52), E5078–E5087. 10.1073/pnas.1320116110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Angadi, V., Croake, D., & Stemple, J. (2019). Effects of vocal function exercises: A systematic review. Journal of Voice, 33(1), 124.e13–124.e34. 10.1016/j.jvoice.2017.08.031 [DOI] [PubMed] [Google Scholar]
  5. Awan, S. N., Solomon, N. P., Helou, L. B., & Stojadinovic, A. (2013). Spectral-cepstral estimation of dysphonia severity: External validation. Annals of Otology, Rhinology & Laryngology, 122(1), 40–48. 10.1177/000348941312200108 [DOI] [PubMed] [Google Scholar]
  6. Bayona, N. A., Bitensky, J., Salter, K., & Teasell, R. (2005). The role of task-specific training in rehabilitation therapies. Topics in Stroke Rehabilitation, 12(3), 58–65. 10.1310/BQM5-6YGB-MVJ5-WVCR [DOI] [PubMed] [Google Scholar]
  7. Bhattacharyya, N. (2014). The prevalence of voice problems among adults in the United States. The Laryngoscope, 124(10), 2359–2362. 10.1002/lary.24740 [DOI] [PubMed] [Google Scholar]
  8. Chen, H., Cohen, P., & Chen, S. (2010). How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics—Simulation and Computation, 39(4), 860–864. 10.1080/03610911003650383 [DOI] [Google Scholar]
  9. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum. [Google Scholar]
  10. Cohen, R., & Sternad, D. (2009). Variability in motor learning: Relocating, channeling and reducing noise. Experimental Brain Research, 193(1), 69–83. 10.1007/s00221-008-1596-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cortés, J. P., Lin, J. Z., Marks, K. L., Espinoza, V. M., Ibarra, E. J., Zañartu, M., Hillman, R. E., & Mehta, D. D. (2022). Ambulatory monitoring of subglottal pressure estimated from neck-surface vibration in individuals with and without voice disorders. Applied Sciences, 12(21), Article 10692. 10.3390/app122110692 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Desjardins, M., Halstead, L., Cooke, M., & Bonilha, H. S. (2017). A systematic review of voice therapy: What “effectiveness” really implies. Journal of Voice, 31(3), 392.e13–392.e32. 10.1016/j.jvoice.2016.10.002 [DOI] [PubMed] [Google Scholar]
  13. Donhauser, J., Tur, B., & Döllinger, M. (2024). Neural network-based estimation of biomechanical vocal fold parameters. Frontiers in Physiology, 15, Article 1282574. 10.3389/fphys.2024.1282574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Espinoza, V. M., Mehta, D. D., Van Stan, J. H., Hillman, R. E., & Zañartu, M. (2020). Glottal aerodynamics estimated from neck-surface vibration in women with phonotraumatic and nonphonotraumatic vocal hyperfunction. Journal of Speech, Language, and Hearing Research, 63(9), 2861–2869. 10.1044/2020_JSLHR-20-00189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Faisal, A. A., Selen, L. P., & Wolpert, D. M. (2008). Noise in the nervous system. Nature Reviews Neuroscience, 9(4), 292–303. 10.1038/nrn2258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Ferrand, C. T. (1995). Effects of practice with and without knowledge of results on jitter and shimmer levels in normally speaking women. Journal of Voice, 9(4), 419–423. 10.1016/S0892-1997(05)80204-8 [DOI] [PubMed] [Google Scholar]
  17. Fryd, A. S., Van Stan, J. H., Hillman, R. E., & Mehta, D. D. (2016). Estimating subglottal pressure from neck-surface acceleration during normal voice production. Journal of Speech, Language, and Hearing Research, 59(6), 1335–1345. 10.1044/2016_JSLHR-S-15-0430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ghasemzadeh, H., Khass, M. T., Arjmandi, M. K., & Pooyan, M. (2015). Detection of vocal disorders based on phase space parameters and Lyapunov spectrum. Biomedical Signal Processing and Control, 22, 135–145. 10.1016/j.bspc.2015.07.002 [DOI] [Google Scholar]
  19. Gillespie, A. I., Yabes, J., Rosen, C. A., & Gartner-Schmidt, J. L. (2019). Efficacy of conversation training therapy for patients with benign vocal fold lesions and muscle tension dysphonia compared to historical matched control patients. Journal of Speech, Language, and Hearing Research, 62(11), 4062–4079. 10.1044/2019_JSLHR-S-19-0136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hart, T., Dijkers, M. P., Whyte, J., Turkstra, L. S., Zanca, J. M., Packel, A., Van Stan, J. H., Ferraro, M., & Chen, C. (2019). A theory-driven system for the specification of rehabilitation treatments. Archives of Physical Medicine and Rehabilitation, 100(1), 172–180. 10.1016/j.apmr.2018.09.109 [DOI] [PubMed] [Google Scholar]
  21. Hasson, C. J., Zhang, Z., Abe, M. O., & Sternad, D. (2016). Neuromotor noise is malleable by amplifying perceived errors. PLoS Computational Biology, 12(8), Article e1005044. 10.1371/journal.pcbi.1005044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. He, K., Liang, Y., Abdollahi, F., Fisher Bittmann, M., Kording, K., & Wei, K. (2016). The statistical determinants of the speed of motor learning. PLOS Computational Biology, 12(9), Article e1005023. 10.1371/journal.pcbi.1005023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., & Vaughan, C. (1989). Objective assessment of vocal hyperfunction: An experimental framework and initial results. Journal of Speech and Hearing Research, 32(2), 373–392. 10.1044/jshr.3202.373 [DOI] [PubMed] [Google Scholar]
  24. Hillman, R. E., Stepp, C., Van Stan, J. H., Zanartu, M., & Mehta, D. D. (2020). An updated theoretical framework for vocal hyperfunction. American Journal of Speech-Language Pathology, 29(4), 2254–2260. 10.1044/2020_AJSLP-20-00104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hirano, M., & McCormick, K. R. (1986). Clinical examination of voice by Minoru Hirano. The Journal of the Acoustical Society of America, 80(4), Article 1273. 10.1121/1.393788 [DOI] [Google Scholar]
  26. Hogikyan, N. D., & Sethuraman, G. (1999). Validation of an instrument to measure voice-related quality of life (V-RQOL). Journal of Voice, 13(4), 557–569. 10.1016/S0892-1997(99)80010-1 [DOI] [PubMed] [Google Scholar]
  27. Ibarra, E. J., Arias-Londoño, J. D., Godino-Llorente, J., Mehta, D. D., & Zañartu, M. (2024). Subject-specific modelling of the subglottal pressure estimation from neck-surface vibration signals by domain adaptation. Authorea Preprints. 10.36227/techrxiv.171173471.15232421/v1 [DOI] [Google Scholar]
  28. Kempster, G. B., Gerratt, B. R., Verdolini Abbott, K., Barkmeier-Kraemer, J., & Hillman, R. E. (2009). Consensus Auditory-Perceptual Evaluation of Voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, 18(2), 124–132. 10.1044/1058-0360(2008/08-0017) [DOI] [PubMed] [Google Scholar]
  29. Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. The Journal of the Acoustical Society of America, 87(2), 820–857. 10.1121/1.398894 [DOI] [PubMed] [Google Scholar]
  30. Kridgen, S., Hillman, R. E., Stadelman-Cohen, T., Zeitels, S., Burns, J. A., Hron, T., Krusemark, C., Muise, J., & Van Stan, J. H. (2021). Patient-reported factors associated with the onset of hyperfunctional voice disorders. Annals of Otology, Rhinology & Laryngology, 130(4), 389–394. 10.1177/0003489420956379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lã, F. M., Wistbacka, G., Andrade, P. A., & Granqvist, S. (2017). Real-time visual feedback of airflow in voice training: aerodynamic properties of two flow ball devices. Journal of Voice, 31(3), 390.e1–390.e8. 10.1016/j.jvoice.2016.09.024 [DOI] [PubMed] [Google Scholar]
  32. Lin, J. Z., Espinoza, V. M., Marks, K. L., Zañartu, M., & Mehta, D. D. (2019). Improved subglottal pressure estimation from neck-surface vibration in healthy speakers producing non-modal phonation. IEEE Journal of Selected Topics in Signal Processing, 14(2), 449–460. 10.1109/JSTSP.2019.2959267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. MacPherson, M. K., Abur, D., & Stepp, C. E. (2017). Acoustic measures of voice and physiologic measures of autonomic arousal during speech as a function of cognitive load. Journal of Voice, 31(4), 504.e1–504.e9. 10.1016/j.jvoice.2016.10.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Marks, K. L., Lin, J. Z., Burns, J. A., Hron, T. A., Hillman, R. E., & Mehta, D. D. (2020). Estimation of subglottal pressure from neck surface vibration in patients with voice disorders. Journal of Speech, Language, and Hearing Research, 63(7), 2202–2218. 10.1044/2020_JSLHR-19-00409 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Müller, H., & Sternad, D. (2009). Motor learning: Changes in the structure of variability in a redundant task. In Sternad D. (Ed.), Progress in motor control: A multidisciplinary perspective (Vol. 629, pp. 439–456). Springer. 10.1007/978-0-387-77064-2_23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Patel, R. R., Awan, S. N., Barkmeier-Kraemer, J., Courey, M., Deliyski, D., Eadie, T., Paul D., Švec, J. G., & Hillman, R. (2018). Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. American Journal of Speech-Language Pathology, 27(3), 887–905. 10.1044/2018_AJSLP-17-0009 [DOI] [PubMed] [Google Scholar]
  37. Ramig, L. O., & Verdolini, K. (1998). Treatment efficacy: Voice disorders. Journal of Speech, Language, and Hearing Research, 41(1), S101–S116. 10.1044/jslhr.4101.s101 [DOI] [PubMed] [Google Scholar]
  38. Ranganathan, R., Cone, S., & Fox, B. (2022). Predicting individual differences in motor learning: A critical review. Neuroscience & Biobehavioral Reviews, 141, Article 104852. 10.1016/j.neubiorev.2022.104852 [DOI] [PubMed] [Google Scholar]
  39. Roy, N., Bless, D. M., Heisey, D., & Ford, C. N. (1997). Manual circumlaryngeal therapy for functional dysphonia: An evaluation of short- and long-term treatment outcomes. Journal of Voice, 11(3), 321–331. 10.1016/S0892-1997(97)80011-2 [DOI] [PubMed] [Google Scholar]
  40. Roy, N., & Hendarto, H. (2005). Revisiting the pitch controversy: Changes in speaking fundamental frequency (SFF) after management of functional dysphonia. Journal of Voice, 19(4), 582–591. 10.1016/j.jvoice.2004.08.005 [DOI] [PubMed] [Google Scholar]
  41. Roy, N., & Leeper, H. A. (1993). Effects of the manual laryngeal musculoskeletal tension reduction technique as a treatment for functional voice disorders: Perceptual and acoustic measures. Journal of Voice, 7(3), 242–249. 10.1016/S0892-1997(05)80333-9 [DOI] [PubMed] [Google Scholar]
  42. Roy, N., Peterson, E. A., Pierce, J. L., Smith, M. E., & Houtz, D. R. (2017). Manual laryngeal reposturing as a primary approach for mutational falsetto. The Laryngoscope, 127(3), 645–650. 10.1002/lary.26053 [DOI] [PubMed] [Google Scholar]
  43. Roy, N., Weinrich, B., Gray, S. D., Tanner, K., Stemple, J. C., & Sapienza, C. M. (2003). Three treatments for teachers with voice disorders. Journal of Speech, Language, and Hearing Research, 46(3), 670–688. 10.1044/1092-4388(2003/053) [DOI] [PubMed] [Google Scholar]
  44. Ruotsalainen, J., Sellman, J., Lehto, L., & Verbeek, J. (2008). Systematic review of the treatment of functional dysphonia and prevention of voice disorders. Otolaryngology–Head and Neck Surgery, 138(5), 557–565. 10.1016/j.otohns.2008.01.014 [DOI] [PubMed] [Google Scholar]
  45. Saccente-Kennedy, B., Gillies, F., Desjardins, M., Van Stan, J. H., & Govender, R. (2024). A systematic review of speech-language pathology interventions for presbyphonia using the Rehabilitation Treatment Specification System. Journal of Voice. Advance online publication. 10.1016/j.jvoice.2023.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Salmoni, A. W., Schmidt, R. A., & Walter, C. B. (1984). Knowledge of results and motor learning: A review and critical reappraisal. Psychological Bulletin, 95(3), 355–386. 10.1037/0033-2909.95.3.355 [DOI] [PubMed] [Google Scholar]
  47. Schmidt, R. A., Lee, T. D., Winstein, C., Wulf, G., & Zelaznik, H. N. (2018). Motor control and learning: A behavioral emphasis. Human Kinetics. [Google Scholar]
  48. Stemple, J. C. (2005). A holistic approach to voice therapy. Seminars in Speech Language, 26(02), 131–137. 10.1055/s-2005-871209 [DOI] [PubMed] [Google Scholar]
  49. Sternad, D. (2015, June 9–12). From theoretical analysis to clinical assessment and intervention: Three interactive motor skills in a virtual environment. 2015 International Conference on Virtual Rehabilitation (ICVR), Valencia, Spain. 10.1109/ICVR.2015.7358579 [DOI] [Google Scholar]
  50. Sternad, D., Huber, M. E., & Kuznetsov, N. (2014). Acquisition of novel and complex motor skills: Stable solutions where intrinsic noise matters less. In Levin M. (Ed.), Progress in motor control: Skill learning, performance, health, and injury (Vol. 826, pp. 101–124). Springer. [DOI] [PubMed] [Google Scholar]
  51. Teixeira, J. P., Oliveira, C., & Lopes, C. (2013). Vocal acoustic analysis—Jitter, shimmer and HNR parameters. Procedia Technology, 9, 1112–1122. 10.1016/j.protcy.2013.12.124 [DOI] [Google Scholar]
  52. Thorp, E. B., Kording, K. P., & Mussa-Ivaldi, F. A. (2017). Using noise to shape motor learning. Journal of Neurophysiology, 117(2), 728–737. 10.1152/jn.00493.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Titze, I. R. (1992). Vocal efficiency. Journal of Voice, 6(2), 135–138. 10.1016/S0892-1997(05)80127-4 [DOI] [Google Scholar]
  54. Titze, I. R. (2006). Voice training and therapy with a semi-occluded vocal tract: Rationale and scientific underpinnings. Journal of Speech, Language, and Hearing Research, 49(2), 448–459. 10.1044/1092-4388(2006/035) [DOI] [PubMed] [Google Scholar]
  55. Titze, I. R. (2013). Quantifying vocal efficiency and economy—How can computation augment clinical assessment? Proceedings of Meetings on Acoustics, 19(1), Article 060244. 10.1121/1.4799035 [DOI] [Google Scholar]
  56. Titze, I. R., & Story, B. H. (2002). Rules for controlling low-dimensional vocal fold models with muscle activation. The Journal of the Acoustical Society of America, 112(3), 1064–1076. 10.1121/1.1496080 [DOI] [PubMed] [Google Scholar]
  57. van Leer, E., & Connor, N. P. (2010). Patient perceptions of voice therapy adherence. Journal of Voice, 24(4), 458–469. 10.1016/j.jvoice.2008.12.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Van Leer, E., Pfister, R. C., & Zhou, X. (2017). An iOS-based cepstral peak prominence application: Feasibility for patient practice of resonant voice. Journal of Voice, 31(1), 131.e9–131.e16. 10.1016/j.jvoice.2015.11.022 [DOI] [PubMed] [Google Scholar]
  59. Van Lierde, K. M., Claeys, S., De Bodt, M., & Van Cauwenberge, P. (2007). Long-term outcome of hyperfunctional voice disorders based on a multiparameter approach. Journal of Voice, 21(2), 179–188. 10.1016/j.jvoice.2005.11.002 [DOI] [PubMed] [Google Scholar]
  60. Van Stan, J. H., Mehta, D. D., Zeitels, S. M., Burns, J. A., Barbu, A. M., & Hillman, R. E. (2015). Average ambulatory measures of sound pressure level, fundamental frequency, and vocal dose do not differ between adult females with phonotraumatic lesions and matched control subjects. Annals of Otology, Rhinology & Laryngology, 124(11), 864–874. 10.1177/0003489415589363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Van Stan, J. H., Ortiz, A. J., Cortes, J. P., Marks, K. L., Toles, L. E., Mehta, D. D., Burns, J. A., Hron, T., Stadelman-Cohen, T., Krusemark, C., Muise, J., Fox-Galalis, A. B., Nudelman, C., Zeitels, S., & Hillman, R. E. (2021). Differences in daily voice use measures between female patients with nonphonotraumatic vocal hyperfunction and matched controls. Journal of Speech, Language, and Hearing Research, 64(5), 1457–1470. 10.1044/2021_JSLHR-20-00538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Van Stan, J. H., Ortiz, A. J., Sternad, D., Mehta, D. D., Huo, C., & Hillman, R. E. (2022). Ambulatory voice biofeedback: Acquisition and retention of modified daily voice use in patients with phonotraumatic vocal hyperfunction. American Journal of Speech-Language Pathology, 31(1), 409–418. 10.1044/2021_AJSLP-21-00141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Van Stan, J. H., Park, S.-W., Jarvis, M., Mehta, D. D., Hillman, R. E., & Sternad, D. (2017). Measuring vocal motor skill with a virtual voice-controlled slingshot. The Journal of the Acoustical Society of America, 142(3), 1199–1212. 10.1121/1.5000233 [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Van Stan, J. H., Park, S.-W., Jarvis, M., Stemple, J., Hillman, R. E., & Sternad, D. (2021). Quantitative assessment of learning and retention in virtual vocal function exercises. Journal of Speech, Language, and Hearing Research, 64(1), 1–15. 10.1044/2020_JSLHR-20-00357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Van Stan, J. H., Roy, N., Stemple, J. C., Gartner-Schmidt, J., Gillespie, A., Whyte, J., Duffy, J., & Turkstra, L. (2024). Rehabilitation Treatment Specification System: Content and criterion validity across evidence-based voice therapies for muscle tension dysphonia. American Journal of Speech-Language Pathology, 33(4), 1774–1791. 10.1044/2024_AJSLP-23-00362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Van Stan, J. H., Whyte, J., Duffy, J. R., Barkmeier-Kraemer, J., Doyle, P., Gherson, S., Kelchner, L., Muise, J., Petty, B., Roy, N., Stemple, J., Thibeault, S., & Jorgensen Tolejano, C. (2021). Voice therapy according to the Rehabilitation Treatment Specification System: Expert consensus ingredients and targets. American Journal of Speech-Language Pathology, 30(5), 2169–2201. 10.1044/2021_AJSLP-21-00076 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Verdolini, K., Rosen, C., & Branski, R. C. (Eds.). (2006). Classification manual for voice disorders-I. Erlbaum. [Google Scholar]
  68. Zañartu, M., Galindo, G. E., Erath, B. D., Peterson, S. D., Wodicka, G. R., & Hillman, R. E. (2014). Modeling the effects of a posterior glottal opening on vocal fold dynamics with implications for vocal hyperfunction. The Journal of the Acoustical Society of America, 136(6), 3262–3271. 10.1121/1.4901714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Zhang, Y., Jiang, J. J., Wallace, S. M., & Zhou, L. (2005). Comparison of nonlinear dynamic methods and perturbation methods for voice analysis. The Journal of the Acoustical Society of America, 118(4), 2551–2560. 10.1121/1.2005907 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material S1. Individual patient average values Error (pixels), T-Cost (pixels), N-Cost (pixels), and V-RQOL (arbitrary units) on days 1 and 10 of practice.
JSLHR-67-3521-s001.pdf (393.7KB, pdf)
Supplemental Material S2. Individual NPVH indices (logit) for patients during early practice trials, late practice trials, and spontaneous speech during day 1, before day 10 practice, and after day 10 practice.
JSLHR-67-3521-s002.pdf (392.1KB, pdf)

Data Availability Statement

De-identified overall average values for variables in this study are included in Supplemental Materials S1 and S2. Because the data are from human subjects, more detailed data may be available upon request, if appropriate, and require a data use agreement.


Articles from Journal of Speech, Language, and Hearing Research : JSLHR are provided here courtesy of American Speech-Language-Hearing Association

RESOURCES