Abstract
An application of functional data analysis (FDA) (Ramsay and Silverman, 2005, Functional Data Analysis, 2nd ed. (Springer-Verlag, New York)) for linguistic experimentation is explored. The functional time-registration method provided by FDA is shown to offer novel advantages in the investigation of articulatory timing. Traditionally, articulatory studies examining the effects of linguistic variables such as prosody on articulatory timing have relied on comparing the durations of speech intervals of interest defined by kinematic landmarks. Such measurements, however, do not preserve information on the detailed, continuous pattern of articulatory timing that unfolds during these intervals. We present an approach that allows the analysis of entire, continuous kinematic trajectories obtained in a movement tracking experiment examining the influence of a phrasal boundary on articulatory patterning. FDA time deformation functions, after alignment of test and reference (control) signals, reveal delaying of articulator movement (i.e., slowing of the internal clock rate) in the presence of a phrase boundary as the speech stream recedes from the boundary. This is a theoretically predicted pattern (Byrd and Saltzman, 2003, The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening, Journal of Phonetics 31, 149–180.), which would be more difficult to validate with a traditional interval-based approach. It is concluded that the FDA time alignment method provides a useful tool for characterizing timing patterns in linguistic experimentation based on continuous kinematic trajectories.
I. INTRODUCTION
A. Background
In the past, experiments testing for the effects of linguistic variables on the temporal patterning of articulation have relied on comparing the durations of articulatory intervals defined piecewise by kinematic landmarks. For example, a number of articulatory movement tracking studies have shown that lengthening of articulatory movements occur at prosodic boundaries (Edwards et al., 1991; Beckman and Edwards, 1992; Byrd and Saltzman, 1998; Byrd, Kaun, Narayanan, and Saltzman, 2000; Fougeron, 2001; Cho and Keating, 2001; Tabin, 2003; Keating et al., 2004; Cho, in press; and Tabain and Perrier, 2005). The previous articulatory studies of this sort have relied on kinematic landmarks such as movement edges/extrema and peak velocities to define speech intervals of interest and compare their durations. Such measurements, however, lack information on the detailed pattern of articulatory timing that unfolds along the time dimension during the durational periods. It is expected that an examination of articulatory trajectories in a continuous way can reveal such timing evolution and thus can facilitate an understanding of the linguistic patterning.
The statistical framework called functional data analysis (FDA), introduced by Ramsay (1982; followed by Ramsay and Silverman, 1997, Ramsay and Silverman, 2005), offers a novel alternative that can consider entire, continuous kinematic trajectories obtained in various experimental conditions. FDA allows the deformation or warping of these trajectories over time to be characterized and compared within and across subjects in evaluating the linguistic variable of interest. For example, prosodic models that seek to explain how speakers modulate the spatiotemporal organization of articulatory gestures as a function of their phrasal position are particularly informed by examining continuous kinematic trajectories. It has been hypothesized that the internal “clock rate” that controls the temporal unfolding of utterances is slowed as a phrase boundary is approached and speeds up again as the boundary recedes (Byrd and Saltzman, 2003). Because such a change in articulatory dynamics is best described in the continuous time dimension, it is hypothesized that the FDA time alignment method will be able to detect such local temporal fluctuation of gestural activation near prosodic boundaries. In addition, the resulting continuous time warping functions provide data useful in constructing and verifying such models.
While most conventional statistical methods process a collection of individual data points, the FDA statistical framework is designed to process a collection of functions or curves (Ramsay and Silverman, 2005). The term “functional” reflects a view that by expressing discrete data in a functional form, one can better represent the underlying continuity of the physical or physiological system generating the data. Each curve is regarded as a sample of an underlying common pattern. It also permits a more natural way to utilize its derivatives (e.g., velocity and acceleration) for system description or modeling. In practice, such a functional representation of data is achieved by converting the raw sampled data points into a continuous function based on basis function expansion and smoothing.
The FDA framework provides novel data processing and statistical analysis algorithms for the creation and exploration of functional data (Ramsay and Silverman, 2005). Specifically, two essential data processing methods in the FDA framework are functional data smoothing and functional time alignment or time registration methods. These methods have been developed to prepare data for further analysis in the FDA framework, such as functional analysis of variance, functional principal component analysis, and functional canonical correlation analysis. They can be equally useful for other applications in which data smoothing or time registration of sequential data is desirable. In fact, the motivation of this study is to extend the usefulness of the functional time registration method applied for articulatory speech production studies.
The FDA time registration method has been applied in the analysis of lip movements (Ramsay, Munhall, Gracco, and Ostry, 1996), in aligning laryngeal and audio signals (Lucero et al., 1997; Lucero and Koenig, 2000), in the variability analysis of oral airflow data in children’s speech (Koenig and Lucero, 2002), and in the variability study of VCV articulation (Lucero and Löfqvist, 2005). However, in these studies the main focus has been either to demonstrate the FDA time registration method or to estimate signal average and variability in an optimal way from repeated productions of the same utterance. Here we present an extended use of the FDA time registration method for the analysis of kinematic articulatory trajectory data obtained in different linguistic conditions. Specifically, we investigate the difference in tongue-tip temporal patterning in two contrasting prosodic environments, namely, in the presence and absence of an intonational phrase boundary.
B. FDA functional data smoothing
Functional data smoothing is the first step of any data analysis in the FDA framework, and its purpose is to convert raw discrete data points into a smoothly varying function. This emphasizes patterns in the data by minimizing short-term deviation due to measurement errors or inherent system noise. We will give a brief mathematical outline of the FDA smoothing method.
In Ramsay and Silverman (1997, 2005), a preferred approach to the functional data smoothing is the classic least square error minimization method augmented with a regularization term or “roughness penalty” for the control of degree of smoothness, and the cost function F to be minimized is set to
| (1) |
where xj denotes an observed value at time tj in a discrete data sequence x, y(t) is the function to be estimated from the observed sequence x, λis a smoothing parameter, and “d4/dt4” denotes the fourth-order time derivative. Now the function y(t) is modeled as a linear combination of a set of basis functions,
| (2) |
where φk(t) is the kth basis function with weight ck, and K is the number of basis functions. Then the task of the functional data smoothing is to find the coefficients ck, which minimize the cost function F through an iterative minimization procedure.
The choice of basis function depends on the temporal characteristics of data. B-splines (de Boor, 2001) are the typical choice for nonperiodic observation sequences. The smoothing parameter λ is introduced for balance between exact data fitting and smoothing through the roughness penalty.1 If λ becomes close to zero, a more exact data fitting will occur as the cost function computation is dominated by the minimization of the least square error term. It is noted that the fourth-order time derivative of y(t) is used in the roughness penalty term in order to guarantee the smoothness of the second-order time derivative of y(t), which is related to the curvature of y(t). An output of the FDA data smoothing based on Eq. (1) is illustrated in Fig. 1 for a tongue tip kinematic velocity trajectory. A set of 20 B-spline basis functions of order 6 with two λ values (1E-9 and 1E-12) are tested. The dotted line in each panel represents the original signal. It can be seen that, for a given number of basis functions and order, the choice of λ is critical to faithfully represent the original signal. In fact, the choice of λ was found to be more important in data smoothing than the selection of the order and number of B-splines. By appropriately selecting λ and the order and number of basis functions, one can achieve a flexible approximation of discrete data into a functional form.
FIG. 1.

An example output of FDA smoothing using a tongue tip velocity trajectory. A set of 20 B-splines of order 6 was used for smoothing, but the penalty parameter λ was varied—1E-9 for the top panel and 1E-12 for the bottom panel. The dotted line in each panel represents the original signal and the solid line the smoothed signal. It is illustrated that, for a given number of basis functions and order, the choice of λ is critical to faithfully represent the raw signal.
C. FDA time registration method
Time alignment or registration refers to an operation by which signals are aligned in time so that a measure of distance between the signals and a reference is minimized. As illustrated in Fig. 2, it is common to observe that signals obtained under the same experimental condition differ in the timings (or phase) and amplitudes of signal landmarks (e.g., major peaks and valleys, zero-crossings), even after duration normalization by an equal-point resampling. The objective of time alignment is to find a common time path between two signals with different properties (one designated the reference signal and one designated the test signal) by expanding or compressing the physical or clock time of a test signal against the reference. The resulting common time path or “time warping function” represents an intersignal timing relation, that is, local advancing or slowing of the internal or system time of a test signal with respect to the physical or clock time of the reference signal. We describe below the conceptual outline of the FDA time registration method adapted for this study. For mathematical details, the reader should refer to Ramsay and Silverman (2005).
FIG. 2.

(Top) Plots of the velocity profiles of control utterances (12 reps) for subject A before time alignment (left) and after time alignment (right). (Bottom) Plots of the velocity profiles of test utterances (12 reps) for subject A before time alignment (left) and after time alignment (right). It is clear that signal average and variability can be measured more accurately after time alignment. (Middle) The middle panel also shows a time deformation function for the control signals compared to reference.
Once test and reference signals are represented as functional forms through the FDA smoothing, the task of FDA time registration is to find a smooth time warping function h(t) that minimizes the difference or distance between test and reference signals. In Ramsay and Silverman (1997, 2005), a general approach to FDA time registration is formulated as finding h(t) by minimizing the cost function
| (3) |
where h(t) is the time warping function to be determined, λ is a smoothing parameter, w(t) is a smoothness control function for h(t), and T is the end point of the time path. One can note that the form of the cost function is the same as that of the FDA smoothing, that is, the least square minimization augmented with a roughness penalty term.
Since the dimension of h(t) is time, it should be strictly increasing or monotonic and its time derivative should always be positive. Based on these constraints, h(t) can be set to satisfy Eq. (4):
| (4) |
That is, the first time derivative of h(t), not h(t) itself, is modeled as an exponential growth function, and w(t) controls the behavior of h(t).2 For instance, when w(t) is positive, the rate of internal time change of the test signal h(t) is slowed when compared to the physical time [i.e., h(t) > t], and thus the test signal runs “late.” That is, the same landmark occurs later in clock time when compared to the reference signal. It is noted that the square of w(t) is used as the regularization term in Eq. (4), which is equivalent to the square of “relative curvature” of h(t) [i.e., the second time derivative of h(t) scaled by its first derivative].
The general solution of Eq. (4) is obtained by integrating it twice; the solution is
| (5) |
where C0 and C1 are so determined that h(0) =0 and h(T) =1. C0 represents a linear time shift, and T is the end point of normalized time. T can be set to 1 without a loss of generality if durations of test and reference signals are normalized before time registration. This is a usual practice in the FDA time alignment procedure.
Now, the task of finding the monotonic time warping function h(t) is reduced to the task of determining w(t). For that purpose, w(t) is expressed as a linear combination of basis functions as in Eq. (2), and h(t) can be determined from w(t) which minimizes the cost function given in Eq. (3).
Because our major interest is in timing, we focus on timing differences in landmarks occurring in both test and reference articulator velocity patterns. Therefore, the landmark time registration with the aforementioned monotone smoothing method is used in this study in order to take advantage of the clear landmark locations observed in the velocity patterns. The landmark time registration accepts predetermined signal landmark time points as break points, and performs time alignment between two adjacent landmark points by linear shifting and scaling of the basis functions. Twelve B-splines of the order 4 and λ value of 1E-12 are used to represent w(t) in this study. All the computations are based on MATLAB implementations of the FDA smoothing and time registration algorithms publicly available at ftp://ego.psych.mcgill.ca/pub/ramsay/FDAfuns/.
II. METHOD
A. Speech materials
A subset of speech materials described in a previous study (Byrd et al., 2004, submitted) were used and the stimuli are given in Table I. The goal was to study rightward phrase boundary effects using sentences with the same phonological string varying in the presence or absence of an intonational phrase boundary.
TABLE I.
Sentence 2 is testing the rightward effect: the boundary is after the first consonant, and the consonants to be measure are D D N. Sentence 1 is the control sentence and contains no boundary.
| Effect | Consonants | Sentence |
|---|---|---|
| Control | NDDN | Birdhunting, we were shocked to see a new dodo knocking on wooden posts |
| Rightward effect | N #DDN | At the zoo, we were shocked to see a Gnu. Dodo knocking about, however, would have been more surprising |
The target sequence in each sentence was […nV dV dV nV…]. The Carstens Articulograph (AG200) was used to track a sensor adhered to the tongue tip. Sensors were also tracked on the maxilla and bridge of the nose for head movement correction, and a sample of the occlusal plane of each subject was acquired. Sensor position was sampled at 200 Hz during articulation. After data collection, the tongue-tip sensor position data was corrected for head movement and rotated to the occlusal plane. The tongue tip y (vertical) signal was differentiated in order to derive the tongue-tip movement velocity. The position and velocity data were smoothed before and after differentiation with a ninth-order Butterworth filter of cutoff frequency 15 Hz. Four native speakers of American English participated. Subjects read each sentence 12 times and were instructed to read in a casual, conversational style. Subjects will be referred to as Subject A, Subject D (the second author), Subject E, and Subject J.
Because this experiment was designed to investigate tongue tip trajectories for alveolar consonants, we will denote the target sequence for ease of presentation as [D D N]. The control sentence contained the sequence [D D N] with no preceding phrase boundary. To examine the rightward effect of the phrase boundary, the consonants [#D D N] with a preceding intonational phrase boundary (sentence 2) were compared against the same sequence in the no-boundary control utterance. Each target sequence [D D N] (from the onset of/d/to the closure of/n/) is identified in all sentences. The initial edge was defined as the zero-crossing associated with the peak tongue tip raising movement for (d) and the final edge as the zero-crossing associated with the peak tongue tip raising movement for the [n]. Then the velocity signals were processed for each subject using the FDA time alignment procedure described in the next section.3
B. Time alignment procedure
First, a linear time normalization is applied to each individual velocity signal by resampling so that each signal has 200 equally sampled data points (see Fig. 1). Twenty B-splines of the order 6 and λ value of 1E-12 are used for smoothing. A reference signal for each subject is then determined from control signals (those without a phrase boundary before the target string) as follows: Initially, an average of the control signals is computed and used as an initial reference signal for time alignment. After time alignment, an average of time-aligned control signals is computed again and used as a reference. Next, test signals (those having a phrase boundary before the target string) are subject to the landmark time registration with respect to the reference signal, and each time warping function is computed against the reference signal. For landmarks, the internal four zero-crossings of each test signal (see Fig. 2) are used as internal break points. The zero-crossings selected as landmarks correspond to time points where the tongue tip is about to move away from the position extrema. After time alignment, a time deformation function Ftest(t) is computed as follows:
| (6) |
which represents a delay [Ftest(t) > 0] or advance [Ftest(t) < 0] of the internal clock time of a test signal with respect to the reference.
It may be noted that there are several ways to do time alignment between two groups of signals: select one typical control signal as a reference and compare it with all test signals, or compare averaged signals (i.e., averaged control and averaged test signals), or, as in this study, compare test signals to a reference signal that is created by averaging the control signals. We chose the last method because the variability in timing among control signals is much less than that between test and control signals or within the set of test signals (see Fig. 2); therefore, the use of an averaged reference signal can be justified. Further, whereas the global alignment procedure (i.e., time alignment without landmarks) was used to create the reference signal, landmark alignment was chosen to compare test to reference due to the high temporal variability present in the test signals, the relative computational efficiency of the landmark alignment procedure, and the relevance of articulatory events to this study. In Fig. 2, tongue tip velocity profiles for control and test utterances are shown before and after FDA time alignment (subject A).
III. RESULTS
In Fig. 3 the resulting time deformation functions of each individual test signal for subjects A, D, E, J are shown. [Figure 2 (middle panel) shows a comparable deformation function for control utterances only (Subject A).] Because a linear time normalization is done before the time alignment, the resulting time warping or deformation function reflects nonlinear, local timing variations in tongue tip closing and release gestures. It is noted that because end points for this analysis are anchored or “pinned” at the edges of the interval of interest, timing effects at the two end points of the overall interval of interest are not discernable.
FIG. 3.

Time deformations of test signals with respect to the reference for subject A (top left), subject D (top right), subject E (bottom left), and subject J (bottom right). The plots indicate delay or slowing (deformation in the positive direction) of tongue tip articulation in the test condition as compared to the control condition. Note the general asymmetric shapes of deformation patterns. Variability among repetitions is also observed for the degree of slowing, but the slowing shows a similar patterning over time.
One can clearly observe detailed patterns of delay relative to the reference pattern of articulator movement as the speech stream recedes from the phrase boundary (recall that the initial and final end points are fixed and are not informative). Generally these temporal modifications due to the presence of the phrase boundary become reduced as time elapses, i.e., the temporal perturbation is largest close to the boundary and diminishes more remotely (Byrd and Saltzman, 2003) as can be seen by the skew of the time deformation functions and the steeper onset slope than falloff slope of the functions. It is also observed that although there are differences in the amount of time deformation among repetitions, the patterns are fairly similar across repetitions for most subjects. These observations would be difficult to isolate with the conventional landmark-based articulatory timing measurements.
IV. CONCLUDING REMARKS
We conclude from the results of this articulatory kinematic experiment analyzed with FDA time registration that rightward prosodic effects on articulation are greatest locally at the boundary and decrease with distance from the boundary. Such a pattern of delay adjacent to a phrase boundary diminishing with distance from the boundary is predicted within the prosodic π-gesture model of Byrd and Saltzman (2003).
In the past, experiments testing for the effects of linguistic variables on the temporal patterning of articulation have relied on comparing the durations of intervals defined piecewise by kinematic landmarks such as movement edges/extrema and peak velocities. FDA offers a novel alternative that considers entire, continuous kinematic trajectories obtained in various experimental conditions so that the deformation or warping of these trajectories over time can be characterized and compared within subject as well as across subjects.
The FDA smoothing and time registration method can be potentially useful for any applications where data smoothing or time alignment of articulatory trajectories is desirable. In addition, the coefficient set obtained by the FDA smoothing can be used as feature vectors for pattern classification or categorization of time series data including articulatory trajectories. Finally, by integrating the time deformation functions one also can quantify the degree of time deformation, which should be useful to model the strength of boundary types. In the future, we will explore the use of time derivatives of the time deformation function in conjunction with kinematic trajectories for describing articulatory dynamics associated with linguistic conditions.
Acknowledgments
The authors gratefully acknowledge the support of NIH Grant No. DC03172, the assistance of Dr. James Mah and Professor Shri Narayanan, and the comments of a very helpful anonymous reviewer.
Footnotes
Because the criteria for an optimal degree of smoothness seems fuzzy and can be subjective and problem dependent, it is unclear whether an automatic way to estimate some essential smoothing parameters (e.g., the degree and number of basis functions and λ) can be optimally formulated within the FDA framework.
In general, an exponential growth function g(t) is modeled as dg(t)/dt = rg(t), where r is a growth rate. One more time derivative of the exponential growth model with a time-dependent growth rate will yield the model equation of the time warping function h(t).
It is noted that the velocity signal has been chosen for time alignment for two main reasons. First, the velocity pattern has traditionally been used for the analysis of skilled movements because underlying dynamic parameters that describe the motion can be derived from the velocity patterns (cf., Nelson, 1983). Second, a velocity pattern has well-defined landmarks (e.g., extrema and zero crossings), which facilitates the FDA landmark time registration. We confirmed that as long as the same landmark time points are used, the time deformation functions estimated from position are effectively the same as those estimated using the velocity pattern.
Contributor Information
Sungbok Lee, USC Department of Linguistics and USC Viterbi School of Engineering, 3601 Watt Way, GFS 301, Los Angeles, California 90089-1693.
Dani Byrd, USC Department of Linguistics, 3601 Watt Way, GFS 301, Los Angeles, California 90089-1693.
Jelena Krivokapiæ, USC Department of Linguistics, 3601 Watt Way, GFS 301, Los Angeles, California 90089-1693.
References
- Beckman ME, Edwards J. Intonational categories and the articulatory control of duration. In: Tohkura Y, Vatikiotis-Bateson E, Sagisaka Y, editors. Speech Perception, Production and Linguistics Structure. Ohmsha; Tokyo, Japan: 1992. pp. 359–375. [Google Scholar]
- Byrd D, Kaun A, Narayanan S, Saltzman E. Phrasal signatures in articulation. In: Broe MB, Pierrehumbert JB, editors. Papers in Laboratory Phonology V. Acquisition and the Lexicon. Cambridge University Press; 2000. pp. 70–87. [Google Scholar]
- Byrd D, Krivokapiæ J, Lee S. How far, how long: On the temporal scope of phrase boundary effects. doi: 10.1121/1.2217135. submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Byrd D, Saltzman E. Intragestural dynamics of multiple phrasal boundaries. J Phonetics. 1998;26:173–199. [Google Scholar]
- Byrd D, Saltzman E. The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. J Phonetics. 2003;31:149–180. [Google Scholar]
- Cho T. Manifestation of prosodic structure in articulation: Evidence from lip movement kinematics in English. In: Goldstein L, editor. Laboratory Phonology 8: Varieties of Phonological Competence. Walter De Gruyter Inc; New York: in press. [Google Scholar]
- Cho T, Keating P. Articulatory and acoustic studies on domain-initial strengthening in Korean. J Phonetics. 2001;29:155–190. [Google Scholar]
- de Boor C. A Practical Guide to Splines. revised. Springer-Verlag; New York: 2001. [Google Scholar]
- Edwards J, Beckman ME, Fletcher J. The articulatory kinematics of final lengthening. J Acoust Soc Am. 1991;89:369–382. doi: 10.1121/1.400674. [DOI] [PubMed] [Google Scholar]
- Fougeron C. Articulatory properties of initial segments in several prosodic constituents in French. J Phonetics. 2001;29:109–135. [Google Scholar]
- Keating P, Cho T, Fougeron C, Hsu C. Domain-initial articulatory strengthening in four languages. In: Local J, Ogden R, Temple R, editors. Phonetic Interpretation (Papers in Laboratory Phonology VI) Cambridge University Press; Cambridge: 2004. pp. 143–161. [Google Scholar]
- Koenig LK, Lucero JL. The use of functional data analysis to study variability in children’s speech: Further data. J Acoust Soc Am. 2002;111:2478. [Google Scholar]
- Lucero J, Koenig L. Time normalization of voices signals using functional data analysis. J Acoust Soc Am. 2000;108:1408–1420. doi: 10.1121/1.1289206. [DOI] [PubMed] [Google Scholar]
- Lucero JL, Löfqvist A. Measures of articulatory variability in VCV sequence. ARLO. 2005;6:80–84. [Google Scholar]
- Lucero J, Munhall K, Gracco V, Ramsay J. On the registration of time and the patterning of speech movement. J Speech Lang Hear Res. 1997;40:1111–1117. doi: 10.1044/jslhr.4005.1111. [DOI] [PubMed] [Google Scholar]
- Nelson WL. Physical principles for economies of skilled movements. Biol Cybern. 1983;46:135–147. doi: 10.1007/BF00339982. [DOI] [PubMed] [Google Scholar]
- Ramsay JO. When the data are functions. Psychometrika. 1982;47:379–396. [Google Scholar]
- Ramsay JO, Munhall KG, Gracco VL, Ostry DJ. Functional data analysis of lip motion. J Acoust Soc Am. 1996;99:3718–3727. doi: 10.1121/1.414986. [DOI] [PubMed] [Google Scholar]
- Ramsay JO, Silverman BW. Functional Data Analysis. Springer-Verlag; New York: 1997. [Google Scholar]
- Ramsay JO, Silverman BW. Functional Data Analysis. 2. Springer-Verlag; New York: 2005. [Google Scholar]
- Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust, Speech, Signal Process. 1978;26:43–49. [Google Scholar]
- Strik H, Boves L. A dynamic programming algorithm for time-aligning and averaging physiological signals related to speech. J Phonetics. 1990;19:367–378. [Google Scholar]
- Tabain M. Effects of prosodic boundary on/aC/sequences: articulatory results. J Acoust Soc Am. 2003;113:2834–2849. doi: 10.1121/1.1564013. [DOI] [PubMed] [Google Scholar]
- Tabain M, Perrier P. Articulation and acoustics of/i/at prosodic boundaries in French. J Phonetics. 2005;33:77–100. [Google Scholar]
- West P. The extent of coarticulation of English liquids: an acoustic and articulatory study. ICPhS99; San Francisco: 1999. [Google Scholar]
