Abstract
Purpose
Accurate tissue motion tracking within the tongue can help to diagnose and treat vocal tract related disorders, evaluate speech quality before and after surgery, and conduct various scientific studies. We have compared tissue tracking results from four widely used deformable registration (DR) methods applied to Cine-MRI with harmonic phase (HARP)-based tracking applied to tagged-MRI.
Method
Ten subjects repeated the words “a geese” multiple times while sagittal images of the head were collected at 26 Hz, first in a tagged-MRI data set, and then in a Cine-MRI data set. HARP tracked the motion of eight specified tissue points in the tagged data set. Four DR methods including diffeomorphic demons and free-form deformations based on cubic B-spline with three different similarity measures were used to track the same eight points in the Cine-MRI data set. Individual points were tracked and length changes of several muscles were calculated using the DR and HARP based tracking methods.
Results
Results showed that the DR tracking errors were non-systematic and varied in direction, amount, and timing across speakers and within speakers. Comparison of HARP and DR tracking with manual tracking showed better tracking results for HARP except at the tongue surface, where mistracking caused greater errors in HARP than DR.
Conclusions
Tissue point tracking using DR tracking methods contain non-systematic tracking errors within and across subjects, making it less successful than tagged-MRI tracking within the tongue. However, HARP sometimes mistracks points at the tongue surface of tagged MRI due to its limited bandpass filter and tag pattern fading, so that DR has better success measuring surface tissue points on Cine-MRI than HARP does. Therefore a hybrid method is being explored.
Keywords: Deformable registration, tongue motion, MRI, muscle length, point tracking
Introduction
Tongue motion is usually measured at the tongue surface. Imaging techniques such as Cine-MRI provide good brightness contrast, allowing extraction and tracking of tongue surface contour motion. Point tracking systems, like electromagnetic articulography (EMA), track motion of pellets affixed to the tongue surface. These two types of data assess tongue surface shape, motion, and position in order to elucidate tongue and vocal tract function. Measurements within the tongue would enhance these data, but are difficult to obtain. The only method currently available is tagged-MRI (Parthasarathy, Prince, Stone, Murano, & Nessaiver, 2007), which is much less available than EMA or Cine-MRI. Therefore, the present work assesses the use of Cine-MRI movies to track internal tissue point motion in the tongue using deformable registration (DR) to see whether DR can track tissue-points from Cine-MRI as accurately as they can be tracked from tagged-MRI.
Tagged-MRI has been used in previous work to track internal tissue motion of the tongue (Parthasarathy, et al., 2007). Since Zerhouni and colleagues (Zerhouni, Parish, Rogers, Yang, & Shapiro, 1988) first developed tagged-MRI in order to create visible myocardial markers, tagged-MRI has been widely used in cardiac motion tracking (Axel, 2002; Ibrahim el, 2011). It has also been used for other motion estimation tasks, including imaging motion of the tongue in speech (Parthasarathy, et al., 2007). Tagged-MRI changes the magnetization in tissue planes, causing image intensity changes whose motions are then tracked through time (Axel & Dougherty, 1989). More specifically, MRI records the density of hydrogen in tissue. Tagged MRI temporarily magnetizes planes of tissue where tagging can be understood as the multiplication of the magnetization of the anatomy with a two-dimensional sinusoid (Parthasarathy, et al., 2007). The tags are placed just before the subject begins speaking and cine-MRI images are captured throughout the speech task. As the tissue moves, the tags move with it, allowing the internal deformation of the tongue, lips and velum to be observed and tracked. Having detailed information about soft-tissue deformation makes it possible to compute many other function measurements such as displacement, velocity, rotation, translation, elongation, strain, and local deformation.
Several tagging methods have been proposed, including spatial modulation of magnetization (SPAMM) (Axel & Dougherty, 1989), complementary SPAMM (CSPAMM) (Fischer, McKinnon, Maier, & Boesiger, 1993), and delay alternating with nutation for tailored excitation (DANTE) (Morris & Freeman, 2011). However, despite their abilities to offer different spatial tag patterns, quantifying the tissue motion itself still requires specialized postprocessing methods. Several methods to process tagged images have been proposed including FindTags (Guttman, Prince, & McVeigh, 1994), Harmonic phase (HARP) (N. F. Osman, Kerwin, McVeigh, & Prince, 1999; N. F. Osman, McVeigh, & Prince, 2000), Gabor filters (Chen, Wang, Chung, Metaxas, & Axel, 2010), and SinMod (Arts et al., 2010). As well, both displacement encoding with simulated echoes (DENSE) (Aletras, Ding, Balaban, & Wen, 1999) and strain encoded imaging (SENC) (N.F. Osman, Sampath, Atalar, & Prince, 2001) use stimulated echoes to establish more direct encoding of the displacement and strain, respectively, in the recovered signals. For an in-depth review on various tagged-MRI and stimulated echo techniques and applications, we refer the reader to (Axel, Montillo, & Kim, 2005; Ibrahim el, 2011).
Cine-MRI is a fast imaging technique, which does not involve tags, that is frequently used in speech analysis, since it allows non-invasive observation and measurement of tongue motion (Narayanan, Nayak, Lee, Sethy, & Byrd, 2004; Stone et al., 2001; Story, 2009; Winkler, Fuchs, Perrier, & Tiede, 2011). Cine-MRI might be well suited to studies of tongue motion if it could map tongue muscle shortening to tongue surface deformation. However, Cine-MRI is weak at such mapping because it does not image internal muscle shortening or track tissue point motion from which to measure changes in position of the muscle origin or insertion. Despite the limitations of Cine-MRI, it could supplement tagged-MRI, because tagged-MRI is less successful at surface tracking than internal tracking. Therefore, this study is motivated by the idea that Cine-MRI could provide better tissue tracking at the tongue surface than tagged-MRI. A recent study (J. Woo, Stone, & Prince, 2011) performed the fusion of Cine-MRI and high-resolution MRI in order to impose muscle anatomy from high-resolution MRI onto the time-frames of Cine-MRI. In the future rapid development of MRI technology could also allow high quality Cine-MRI to display textures within the tongue that reflect tissue characteristics, such as muscle bundles, fascia, and tendons. Textural information in Cine-MRI tongue images might allow the tracking of tissue points that represent points within or at the origins and insertions of muscles. In the present study, however, we used currently available Cine-MRI images which are representative of the current state of the art.
In this work, we applied four widely used DR methods to track tissue points from Cine-MRI in order to compare the performance with HARP tracking from tagged-MRI acquired in the same spatio-temporal coordinates as the Cine-MRI. In general, DR aims to find correspondences between source and target images by using a nonlinear transformation of the coordinate system. By using successive registrations of different time frames, it is possible to track motion of tissue points. In this work, we applied four intensity-based DR methods including (1) diffeomorphic demons (Vercauteren, Pennec, Perchant, & Ayache, 2009) and free-from deformations based on parametric cubic B-splines (Rueckert et al., 1999) with three different similarity measures: (2) mutual information (MI), (3) sum of squared differences (SSD), and (4) normalized cross correlation (NCC). These methods were used because they are state-of-the-art and the different similarity measures and transformation models differentially affect their tracking performance. Each of the similarity measures has unique characteristics. MI is the most popular similarity measure for multimodal image registration (Pluim, Maintz, & Viergever, 2003). SSD and NCC are used when two images are acquired similarly with similar intensity range (J. Woo et al., 2010). In particular, NCC is well-suited when intensity distributions between source and target images have an affine relationship (Hermosillo, Chefd’Hotel, & Faugeras, 2002). Diffeomorphic demons, like SSD, is based on the assumption that corresponding pixels have the same intensity values, but it uses spatial derivatives to form putative directions to warp the underlying coordinate system. Both transformation models are considered to be highly deformable (or so-called free-form), but the B-spline model is parametric while diffeomorphic demons uses a non-parametric model that enforces a diffeomorphism.
There exists literature on tracking the motion of the heart using Cine-MRI (Chandrashekara, Mohiaddin, & Rueckert, 2005; Gupta & Prince, 1995). Chandrasekhar and colleagues (Chandrashekara, et al., 2005) in particular used DR methods and found that cine measurements were well correlated to tagged measurements; however Cine-MRI did not capture cardiac twisting well. Tongue motion differs from heart motion in that it is more deformable and its motions are faster and more varied. This could diminish the quality of tracking from DR. However, the tongue is less homogeneous than the heart, and therefore DR tracking might make use of the tissue heterogeneity.
In this work, midsagittal tissue points were specified to represent tissue points bounding intermediate segments of the superior longitudinal (SL) muscle and the origin and insertion of genioglossus (GG) muscles. Muscle length changes during speech were determined and validated against tagged-MRI data for the same tasks to decide whether traditional Cine-MRI movies could provide reasonably accurate measurements of tissue point motion and muscle length change. To make these comparisons, both a Cine-MRI and a tagged-MRI dataset were collected in a single data recording session using identical spatial and temporal parameters. The tagged-MRI data, analyzed using HARP, was used to assess the errors of the four DR estimations of tissue-point location and muscle length. Hand tracked tissue points were used to compare tissue point tracks from HARP and DR.
Methods
Subjects and Speech Task
The subjects were 10 normal native speakers of American English between the ages of 22 and 52, six females and four males. The speech task was “a geese.” This task starts with a schwa, which positions the tongue in the center of the vowel space. The schwa is followed by upward tongue body motion (/g/), forward body motion into (/i/) and backing of the tongue body (/s/), while maintaining an elevated tip and blade. The jaw is minimally engaged, so that vocal tract shaping is heavily dependent on tongue deformation. Finally, the task can be repeated in one second, which is our MRI record time.
MRI Instrumentation and Data Collection
All MRI scanning was performed on a Siemens 3.0 T Tim Treo system (Siemens Medical Solutions, Malvern, PA) with a 16-channel head and neck coil. Two midsagittal MRI datasets were collected from each speaker in the same session: a midsagittal Cine-MRI movie (cine) and a midsagittal tagged-MRI movie (tagged). The tagged-MRI dataset was collected using Magnitude Imaged CSPAMM Reconstructed (MICSR) images (NessAiver & Prince, 2003; Parthasarathy, et al., 2007). Both data sets had a 1 second record duration, 26 time-frames (TF’s) per second, 6 mm slice thickness and tag separation (in the tagged data), no gap between slices, and a 1.875 mm in-plane resolution. Both datasets had identical parameters including slice location, field of view, etc. In-plane resolution was 1.875 mm × 1.875 mm and 7 sagittal slices were acquired. Other sequence parameters were TR 36 ms, TE 1.47 ms, flip angle 6°, and turbo factor 11.
Both MRI methods produce a single ‘movie’ for each slice by acquiring and summing multiple repetitions of the speech task. MICSR requires 3 repetitions per tissue slice acquired 4 times (12 repetitions) and the Cine-MRI algorithm used here requires 5 repetitions per tissue slice. The midsagittal data were extracted from a sagittal ‘stack’ of 5, 7, or 9 slices depending on the subject’s tongue size. Thus, this study focuses on a mid-sagittal slice to analyze the motion. Speaker precision during repetition was optimized by training the subject to speak to a metronome beat, which is also used in the scanner. Data from misaligned repetitions were discarded. The training method was based on the work of Masaki and colleagues (Masaki et al., 1999). Acoustic recordings of speech were made in the MRI scanner with a subtraction type fiber optic microphone (Optoacoustics, Ltd., Israel) to corroborate phoneme locations in the MRI image sequences (Boersma & Weenink, 2010).
Tissue Point Selection and Tracking
During analysis, 8 tissue points were selected manually within the tongue for each subject in the first time-frame (TF-1) of the cine movie (see Figure 1). The points represent the location of the superior longitudinal muscle in the tongue (1) tip, (2) blade, (3) dorsum, (4) pharynx, and (5) root, as well as the origins of (6) Genioglosus Anterior (GGa), (7) Medial (GGM), (8) Posterior (GGP).
Figure 1.

Eight points and their affiliated muscles. Points 1-5 define segments within the SL muscle. The rest indicate user defined origins and insertions of fibers in the GGa (2-6), GGm (4-7), and GGp (5-8) muscles. The red square defines the region of interest used in the analyses.
Deformable registration tracks, or registers, each pixel in TF-1 of the Cine-MRI sequence with the closest match, based on image features, in each of the remaining 25 time-frames. To track the motion of the tongue, we need to find the coordinate system transformation describing how each point moves over time. To find this transformation, we used either (1) diffeomorphic demons (Vercauteren, et al., 2009) or free-from deformations based on cubic B-spline (Rueckert, et al., 1999) with similarity measures (2) MI, (3) NCC, or (4) SSD. These algorithms are available in the ITK (Insight Segmentation and Registration Toolkit) library (Ibanez, Schroeder, Ng, & Cates, 2003). While MI and NCC can deal with potential intensity differences in the registration process, SSD works under the assumption that the corresponding pixels have same intensity values. Thus, prior to the SSD based registration and diffeomorphic demons we applied histogram matching. Histogram matching normalizes the intensity values of a source frame based on the intensity values of the target frame. Unlike SSD, which requires same intensity values in the corresponding pixels, MI and NCC can compensate for intensity differences so histogram matching was not used for these algorithms. All steps including preprocessing and the registration algorithms themselves were fully automated.
Let TF(x,t): Ω ⊂ ℝ2 × ℝ+ → ℝ+ denote the space-time acquisition of the frame, where Ω corresponds to an open and bounded domain and t denotes the time frame (i.e., t = 1,…, 26). For brevity, TF(x,1): Ω ⊂ ℝ2 is the first time frame denoted as TF-1. After DR, the locations of the 8 tissue-points chosen in TF-1 space were tracked in the other 25 time-frames using all four methods. A linear interpolation method was used to approximate the grid in the image.
For all four registration methods, we considered sequential versus individual tracking. Individual tracking takes a single tissue point in TF-1 and independently deforms it to the optimal location in the other frames: 1→2, 1→3, 1→4, …, 1→26. Therefore any one transformation is unaffected by the others. Sequential tracking deforms the single tissue point from TF-1 to TF-2, then from 2→3, 3→4, …, etc., so that the path of that point is followed. Sequential tracking has the potential to propagate errors made in early frames to registration of later frames.
The tagged-MRI dataset, which is the best available measurement of tag motion, was used as the basis for calculating the error in the DR tracking estimates of point location and muscle length. A tagged MRI image is depicted with the motion-path of 8 tissue points for subject 7 in Figure 2. The black and white grid depicts the intersections of horizontal and vertical tagged regions. It is used during motion to better visualize local tag deformation in the tongue. This is time-frame 1 and no motion has occurred, so the grid is undeformed and just contains squares. The tissue point numbers here identify the same tissue points as in Figure 1.
Figure 2.
A tagged MRI image illustrating the motion-path of 8 tissue points for subject 7. The back of the tongue moves faster and farther than the front. Colors indicate time, with yellow occurring earlier than red. Please note that the black and white grid is not an artifact, it is the tag grid filled in with black and white squares.
The MICSR images were analyzed using the HARP method, which tracks tags by determining the changes in harmonic phase over time (NessAiver & Prince, 2003). More specifically, tagged images have two harmonic peaks in the frequency domain (Xing et al., 2013). In order to isolate the spectral peaks, a bandpass filter is used, which reduces resolution of the reconstructed motion field and causes blurring. HARP uses phase information through time to track in every point of horizontal and vertical tagged images where a dense 2D motion field is obtained in each direction, respectively. In addition, the refinement methods were used to address erroneous tracking due to large tongue motion (Liu & Prince, 2010) and tag jumping (Liu, Murano, Stone, & Prince, 2007; Liu & Prince, 2010), respectively. To date, HARP provided fast, accurate assessment of the myocardial strains (Garot, Bluemke, Osman, Rochitte, McVeigh, et al., 2000) and regional function (Garot, Bluemke, Osman, Rochitte, Zerhouni, et al., 2000) of the heart from tagged MRI. It also has been recognized as a highly accurate program used by many in tagged MRI analysis. Therefore, it is used here to compare the accuracy of the DR tracking methods (Cho, Chan, Leano, Strudwick, & Marwick, 2006) and cardiac torsional deformation (Notomi et al., 2005).
Because HARP has a less accurate tracking performance at tongue edges, the 8 points were selected slightly below the surface in TF-1 of the cine-MR images. They were then superimposed on TF-1 of the tagged images. Recall that absent head motion the tissue points should be the same in both data sets. The locations of these tissue-points were then tracked by HARP through the 26 TF’s (cf., (Parthasarathy, et al., 2007)). Examination of their motion in the tagged datasets allowed identification of possible tag jumping or mistracking. If a tracking or jumping error occurred in the tagged data, a neighboring point was selected in the tagged dataset and tracked. Once a well-tracked point was identified, it also was used in the cine data set. The 8 points were located, as much as possible, at or near the two ends of the muscle sections identified above.
Evaluation
In this section, we describe a series of experiments to assess and quantitatively compare the tracking performance of HARP and DR. This includes tracking individual points, measuring muscle length change, and using manual tracking in comparison to HARP and DR tracking.
Tracking Motion of Specific Tissue Points using DR and HARP
To test the accuracy of HARP and DR, tracking the motion of 8 tissue points was performed. Error is defined as an absolute difference between HARP tracking and the DR method under consideration. We report global statistics on the tracking errors in all 8 subjects and gave special scrutiny to the two subjects whose tracking errors are the best and the worst.
Error in Muscle Length Change using DR and HARP
Muscle length was calculated using the Euclidean distance between tracked points. The distances between the circumferential points defines 4 regions of the Superior Longitudinal (SL) [1-2, 2-3, 3-4, 4-5], and 3 regions of the Genioglossus: (GGA) [2-6], (GGM) [4-7], and (GGP) [5-8] (see Figure 1). Points 6 and 7 were above the tendonous origin of GG and point 8 was below. The lengths were computed using both DR and HARP based tracking.
Comparison Between Tissue Point Tracks using DR, HARP, and Manual Tracking
A test was performed to determine whether it might be advantageous to combine DR and HARP in tracking tissue points. The tongue is surrounded by air, and tissue points at the tongue surface are often poorly tracked by HARP (Parthasarathy, et al., 2007). Deformable registration, on the other hand, excels at matching edges. Thus the combination of methods might provide excellent tissue point tracking both on and in the tongue. To test the idea of combining DR at the edges and HARP within the tongue, three points were selected from the subject with the best tracked points (subject 7) to give HARP and NCC the best chance of tracking accurately. These points were also tracked by hand to compare the experimenter’s judgment with the NCC and HARP results. In order for the experimenter to track the points, the points were chosen at intersecting gridlines in the MICSR images. The first was chosen at the gridline closest to point 4. It was as superficial as the intersecting tag lines would allow. The other two points were chosen at the next two closest tag-line intersections that were successively deeper in the tongue. Measured distances were 6 mm between points A and B, and 8 mm between points B and C. The three points were tracked manually by the second author to determine their position in each time frame. These measurements were used to validate both HARP and the NCC point tracking trajectories.
Results
Tracking Motion of Specific Tissue Points using DR and HARP
Table 1 presents the tracking results of four different tracking methods. As shown in Table 1, the NCC was the best by a small margin (see Table 1), so this paper presents the results of the NCC method. In addition, the MI method is the most commonly used in DR, so the MI results are also presented. The results of these tracking methods indicated that the individual tracking method produced less error than sequential tracking for all subjects and methods. Therefore, only the individual tracking results are presented.
Table 1.
| a. Error for the mutual information tracked tissue points, each averaged over 26 time-frames (Unit: mm). | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Subject | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Point 1 | 3.9 | 3.1 | 7.7 | 6.4 | 3.5 | 6.3 | 1.1 | 4.1 | 4.3 | 4.3 |
| 2 | 6.8 | 4.2 | 3.7 | 3.2 | 7.2 | 4.3 | 3.8 | 3.3 | 8.1 | 5.9 |
| 3 | 7.0 | 5.3 | 6.5 | 4.3 | 7.4 | 4.5 | 2.5 | 7.1 | 3.6 | 6.1 |
| 4 | 2.5 | 3.0 | 4.0 | 2.3 | 3.5 | 2.5 | 2.8 | 5.0 | 5.2 | 4.8 |
| 5 | 2.0 | 1.9 | 2.7 | 2.9 | 2.4 | 2.0 | 2.7 | 4.5 | 4.5 | 6.8 |
| 6 | 1.6 | 3.8 | 2.5 | 3.0 | 1.3 | 4.3 | 1.3 | 1.8 | 4.7 | 1.9 |
| 7 | 3.4 | 5.7 | 4.1 | 2.9 | 2.6 | 6.2 | 1.7 | 2.0 | 5.6 | 2.0 |
| 8 | 2.4 | 5.5 | 4.0 | 2.5 | 2.4 | 4.6 | 1.9 | 1.9 | 4.0 | 2.0 |
| Mean | 3.7 | 4.1 | 4.4 | 3.4 | 3.8 | 4.3 | 2.2 | 3.7 | 5.0 | 4.2 |
| SD | 2.1 | 1.4 | 1.8 | 1.3 | 2.3 | 1.5 | 0.9 | 1.9 | 1.4 | 2.0 |
| b. Error for the normalized cross correlation tracked tissue points each averaged over 26 time-frames (Unit: mm). | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Subject | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Point 1 | 2.7 | 3.4 | 7.6 | 5.5 | 2.4 | 6.7 | 1.3 | 3.1 | 2.8 | 3.9 |
| 2 | 3.7 | 3.3 | 3.3 | 3.8 | 5.2 | 4.7 | 2.2 | 2.7 | 4.1 | 3.1 |
| 3 | 4.7 | 2.6 | 2.7 | 3.9 | 4.2 | 2.6 | 2.2 | 6.5 | 2.7 | 3.2 |
| 4 | 4.5 | 2.9 | 2.8 | 2.1 | 2.7 | 2.4 | 2.2 | 5.1 | 3.7 | 3.6 |
| 5 | 2.7 | 2.1 | 2.1 | 2.4 | 1.6 | 2.0 | 2.0 | 4.1 | 3.1 | 4.5 |
| 6 | 2.9 | 3.8 | 2.6 | 3.3 | 2.1 | 3.2 | 0.8 | 1.8 | 3.0 | 2.6 |
| 7 | 2.8 | 5.2 | 2.2 | 2.7 | 2.1 | 3.5 | 1.1 | 2.0 | 3.7 | 2.6 |
| 8 | 1.8 | 5.0 | 2.4 | 1.9 | 1.6 | 1.6 | 1.8 | 1.6 | 2.9 | 2.2 |
| Mean | 3.2 | 3.5 | 3.2 | 3.2 | 2.7 | 3.3 | 1.7 | 3.4 | 3.2 | 3.2 |
| SD | 1.0 | 1.1 | 1.8 | 1.2 | 1.3 | 1.7 | 0.6 | 1.7 | 0.5 | 0.8 |
| c. Error for the sum of squared differences tracked tissue points each averaged over 26 time-frames (Unit: mm). | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Subject | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Point 1 | 2.7 | 3.6 | 7.4 | 6.0 | 2.6 | 6.6 | 1.5 | 2.8 | 4.1 | 5.0 |
| 2 | 6.0 | 5.1 | 4.0 | 5.0 | 7.8 | 4.6 | 3.0 | 3.8 | 7.0 | 4.9 |
| 3 | 4.9 | 2.4 | 2.8 | 3.8 | 4.4 | 3.5 | 2.0 | 7.8 | 3.8 | 3.7 |
| 4 | 5.0 | 3.1 | 3.7 | 2.0 | 3.2 | 1.7 | 2.7 | 5.8 | 4.7 | 4.1 |
| 5 | 2.2 | 2.2 | 2.5 | 2.9 | 2.3 | 2.0 | 1.9 | 4.1 | 5.4 | 5.1 |
| 6 | 2.9 | 5.1 | 2.3 | 4.7 | 2.8 | 4.0 | 1.2 | 1.8 | 4.8 | 2.5 |
| 7 | 3.0 | 7.7 | 2.7 | 4.2 | 4.0 | 6.2 | 1.2 | 2.1 | 6.8 | 2.1 |
| 8 | 2.7 | 7.3 | 3.1 | 3.0 | 3.4 | 2.5 | 3.3 | 1.3 | 5.0 | 2.0 |
| Mean | 3.7 | 4.6 | 3.6 | 4.0 | 3.8 | 3.9 | 2.1 | 3.7 | 5.2 | 3.7 |
| SD | 1.9 | 2.3 | 1.8 | 2.2 | 2.0 | 1.8 | 1.3 | 1.8 | 2.5 | 1.6 |
| d. Error for the diffeomorphic demons tracked tissue points each averaged over 26 time-frames (Unit: mm). | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Subject | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Point 1 | 3.6 | 2.1 | 7.7 | 5.4 | 0.8 | 6.5 | 1.1 | 4.5 | 3.3 | 3.6 |
| 2 | 4.8 | 2.4 | 2.6 | 3.2 | 1.7 | 4.9 | 2.0 | 3.3 | 4.3 | 4.4 |
| 3 | 6.4 | 3.3 | 3.7 | 2.7 | 2.8 | 4.0 | 1.3 | 4.9 | 5.1 | 4.6 |
| 4 | 5.0 | 3.2 | 2.6 | 1.0 | 1.9 | 2.6 | 1.5 | 3.8 | 6.9 | 4.7 |
| 5 | 2.8 | 2.5 | 2.2 | 2.0 | 1.3 | 1.4 | 2.4 | 3.6 | 7.6 | 4.5 |
| 6 | 2.3 | 2.0 | 1.9 | 3.9 | 1.4 | 4.4 | 0.6 | 1.1 | 3.8 | 2.3 |
| 7 | 2.9 | 2.9 | 1.7 | 2.5 | 2.4 | 4.3 | 1.1 | 1.6 | 6.2 | 1.4 |
| 8 | 1.8 | 2.8 | 2.2 | 2.5 | 2.2 | 4.2 | 1.1 | 1.3 | 5.1 | 1.9 |
| Mean | 3.7 | 2.6 | 3.1 | 2.9 | 1.8 | 4.1 | 1.4 | 3.0 | 5.3 | 3.4 |
| SD | 1.7 | 1.4 | 1.4 | 1.4 | 0.9 | 1.7 | 0.9 | 1.4 | 2.5 | 1.3 |
The tissue point trajectories of two subjects, tracked by HARP, MI, and NCC, are displayed in Figure 3. The circles indicate the location of the first (red) and last (blue) tissue point in the motion of “a geese.” The green line shows the trajectory through the 26 TF’s. Subject 7, the best tracked subject, had mean errors of 2.2 mm and 1.7 mm for the MI and NCC tracking respectively. Subject 3, had large mean errors in both methods 4.4 mm (MI) and 3.2 mm (NCC), and had particularly poor tracking of the tongue tip and dorsum (points 1 and 3).
Figure 3.

Motion patterns of eight points for subjects 7 and 3 using HARP tracking in the tagged-MRI dataset and using mutual information and normalized cross correlation in the cine-MRI dataset. The red circle is the first, the blue circle is the last time-frame, and the path is green.
The four DR methods all generated considerable tracking error. Errors for the best and worst methods are shown in Table 1 and Figure 4. Table 1 shows the mean error over the 26 TF’s for each point by subject for MI (worst) and NCC (best). NCC was considerably better than MI, but in most cases both methods had poor performance within the tongue (see Figure 4). Points 6, 7, and 8 had smaller errors because they were near the mandible and tended to move very little. The other points had fairly large errors, particularly points 1 through 4 which had large motions. Figure 4 graphs the errors of the two methods relative to HARP for points 1 and 3 for subjects 3 and 7. There were no errors in TF-1, because the starting points were the same in both datasets. For both subjects, NCC, bottom row, was tracked more accurately than MI, though point 1 for subject 3 had nearly identical error using both DR methods.
Figure 4.

Error compared to HARP tracking of points 1 and 3 for subjects with small (subject 7) and large (subject 3) DR tracking errors.
Error in Muscle Length Change using DR and HARP
The SL muscle had small length changes for this task possibly because the segments were short. In addition, errors for all SL muscle segments were highly variable. Since GGp length was derived from points 5 and 8 (Figure 1), both of which were tracked fairly well for all subjects (Table 1), only the DR data for the GGp, which were among the best, will be used to demonstrate the error patterns.
The changes in muscle length for GGp, calculated using HARP, are shown in Figure 5(a) along with the lengths calculated using MI and NCC. In addition, Figure 5(b) illustrates the GGp muscle length change as a percentage with respect to the HARP tracking results. The contact points, /g/, /i/, and /s/, were derived using the visual inspection of the vocal tract of the motion pattern in the midsagittal MR Images. HARP tracking of GGp (solid) shows that it started to shorten before tongue-palate contact was made for /g/ (first vertical line) in all subjects, with the most rapid shortening occurring before /i/ contact (second vertical line). Thus, shortening of the GGp was associated most strongly with /i/ production. GGp remained shortened during the /i/ and began to lengthen just after /s/ contact (third vertical line) in all subjects except subject 6, who continued shortening GGp through the /s/. The results of the MI (dashed) and NCC (dotted) registration were sometimes similar in pattern and timing to the HARP results (solid), as seen for subject 5. The errors, however, were not systematic within or across subjects. DR overestimated length in subject 1 and underestimated length in subjects 3, 4, and 8. For subjects 2, 6, and 10, DR did both at different times, and the DR was quite noisy for subjects 7 and 9. Timing errors also varied from subject to subject. The frame at which the GGp began to shorten or lengthen was fairly accurate for subject 1, and but not in most of the others. As with the direction and extent of motion, timing errors were not predictable. Similar error patterns were seen for the GGa, GGm, and SL muscles (not shown).
Figure 5.
(a) Comparison of GGP muscle length changes for 10 subjects. Mutual information (dashed) and normalized cross correlation (dotted) estimates differ from HARP tracks (solid) by overestimation, underestimation or both. (b) Comparison of GGP muscle length changes relative to the HARP tracking as a percentage (%).
Comparison Between Tissue Point Tracks Using DR, HARP, and Manual Tracking
Three additional tissue points were manually tracked (see Figure 6(a)) and compared to automatic tracking by NCC and HARP tracking methods. Errors between both automatic methods and the manual tracks are shown for the three points in Figure 6. Point A was the most superficial point. In the HARP errors (solid line), well-tracked timeframes usually had sub-pixel errors (recall that one pixel = 1.875 mm in both datasets). The large errors seen in point A starting at TF-17 were caused by HARP mistracking at the tongue surface. The tag pattern faded significantly in the image by that time and the algorithm tracked a point occurring in the airway. The error was then propagated through the rest of the TFs. The NCC errors (dashed line) were consistently larger than the HARP errors, except when the mistracking occurred. Although the DR errors were greater overall, they were more consistent.
Figure 6.

(a) Three tissue points including: A- surface point, B - point 6 mm deep to point A, C - point 8 mm deep to point B. (b) Tongue tracking errors of HARP (solid) and normalized cross correlation (dashed) relative to manual tracking of three tissue points.
Discussion
Tissue-point tracks calculated by HARP from tagged MRI data revealed varied amounts of motion across subjects and points (Figure 3). DR tracking was poorer in tracking internal points especially when large motions were seen. This is partly because DR employed in the present work finds pixel correspondences based on appearance measures like intensity. Thus the distinctive intensity change such as edges is captured well, while the internal tissue is not because of lack of the distinctive internal features. The internal tissue patterning of the Cine-MRI did not sufficiently aid DR and the resultant point tracking values were not close enough to HARP to be useful. Registration task is often cast as an optimization problem in which data fidelity (i.e., similarity measure) and regularization are employed to find the best transformation that aligns two images. Since the registration is an ill-posed problem, regularization is used (J. Woo, et al., 2010). In the present work, in the homogeneous regions where intensity values do not change, regularization plays an important role using B-spline (Modersitzki, 2004). However, although the B-spline carries an intrinsic regularization (i.e., a type of model on the deformations), the effect of regularization is limited. This is because the basis functions of cubic B-splines have a local support in the neighborhood of control points (Rueckert, et al., 1999) and the regularization may not reflect how the tongue deforms accurately. This is why tags have an advantage; the tags are features that give a regular indication of motion on a local basis without any prior assumption about the deformation. In addition, they are based on an entirely different methodology that tracks actual material points whose phases differ due to magnetic tagging.
The individual tracking method performed better than the sequential tracking method. As stated before, this is because the sequential tracking propagates errors, whereas the individual tracking method compares the target frame only to TF-1. In addition, only the DR method using the B-spline with NCC and MI similarity measures were used for the computations, since the B-spline with SSD and diffeomorphic demons and some of the other methods failed to track large deformations.
Muscle shortening patterns across subjects were generally linked to phoneme identity in HARP (Figure 5(a)). During the upward and forward tongue body movement into /g/ and /i/, the GGp muscle shortened and after the start of /s/ it began to lengthen, consistent with the local deformations that occur during tongue motion. Of course, some inter-subject differences may have been due to inadvertently choosing non-identical points across subjects. Better methods of muscle identification, will be helpful in preventing of that type error. Muscles tracked with DR sometimes had similar patterns of shortening and timing to the HARP tracks; however, the errors were not systematic (Figure 5). Therefore, with current image quality and DR methods, it is difficult to reliably predict the true shortening pattern and reduce the errors of overestimation, underestimation, extreme enhancement, or poor tracking. However, with improvement in image quality to contain detailed muscle information, DR methods could produce results that are comparable to the HARP tracking method as demonstrated in other applications (Chandrashekara, et al., 2005; Ouyang, Li, & El Fakhri, 2013).
The DR tracking method has some limitations that can be improved and others that are inherent in the task. One inherent limitation is that Cine-MRI images are not designed to capture tissue points. The Cine-MR images used in this study have low resolution (1.875 × 1.875 × 6 mm). Higher resolution images would contain tissue features that could be exploited better with a DR algorithm. This would be true of ultrasound or X-ray images as well; the better the resolution of different tissue types, the better a DR algorithm would perform. A second limitation is that DR focuses on edges and image features that are distinctive, and does a poorer job of tracking pixels in homogeneous regions. Several modifications could improve the DR estimation of tissue point motion in Cine-MRI. A motion model, such as the average tongue motion (i.e., motion atlas) derived from a large corpus of tagged-MRI, could be used as statistical prior information for regularization to constrain and direct the tracking algorithm, which is an ongoing research. A refinement technique, which starts at a well tracked set of pixels such as the tongue surface, could constrain tracking of neighboring pixels, continuing until the entire tongue is tracked. Regularization methods like these may be more limited in their use with certain patient populations, however, since many patients, such as neurologically impaired patients, are more likely to be variable and unpredictable in their utterances.
One of the limitations of HARP is that near edges, such as the tongue surface, it is more likely to track tissue-points erroneously due to blurring of the tag pattern near the edges caused by the HARP bandpass filter. Another limitation is that HARP’s ability to track degrades as the tags fade. Although retagging techniques or data interleaving techniques are available to address tags fading, speech applications require many repetitions, hampering the use of such techniques. Cine-MRI and DR have no such limitations. Thus it is possible that with the proper constraints, DR could be used at the tongue surface especially at later time frames to supplement HARP measurements. Such a “hybrid” algorithm has been reported in a limited data set (Jonghye Woo et al., 2013). Finally, the comparisons in the present work were based on 2D mid-sagittal slices. This can be improved by comparing DR-point tracking using full 3D super-resolution volumes (J. Woo, Murano, Stone, & Prince, 2012) with an incompressible deformation estimation algorithm (IDEA) (Liu et al., 2012), which is a subject of ongoing research.
Conclusion
DR applied to Cine-MRI has the potential to allow calculation of tissue point and muscle properties if the image is sufficiently detailed. However, at present this approach cannot track at sub-pixel resolution, as does HARP. However, where HARP cannot be used or has limitations itself, such as at the tissue air interface, DR might be a useful strategy for some studies, provided that its limitations are well-recognized and accounted for in the analysis.
Acknowledgement
This research was supported in part by NIH grant R01 CA133015 and K99 DC012575.
Footnotes
Parts of this paper were included in the Proceedings of the International Seminars in Speech Production, Montreal CA, June 2011.
Contributor Information
Jonghye Woo, University of Maryland School of Dentistry, Dept of Neural and Pain Sciences and Johns Hopkins University, Dept of Electrical and Computer Engineering.
Maureen Stone, University of Maryland School of Dentistry, Dept of Neural and Pain Sciences, Dept of Orthodontics.
Yuanming Suo, Johns Hopkins University, Dept of Electrical and Computer Engineering.
Emi Z. Murano, Johns Hopkins Hospital, Dept of Otolaryngology-Head and Neck Surgery
Jerry L. Prince, Johns Hopkins University, Dept of Electrical and Computer Engineering
References
- Aletras AH, Ding S, Balaban RS, Wen H. DENSE: Displacement Encoding with Stimulated Echoes in Cardiac Functional MRI. Journal of Magnetic Resonance. 1999;137:247–252. doi: 10.1006/jmre.1998.1676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arts T, Prinzen FW, Delhaas T, Milles J, Rossi AC, Clarysse P. Mapping displacement and deformation of the heart with local sine-wave modeling. Medical Imaging, IEEE Transactions on. 2010;29(5):1114–1123. doi: 10.1109/TMI.2009.2037955. [DOI] [PubMed] [Google Scholar]
- Axel L. Biomechanical dynamics of the heart with MRI. Annu Rev Biomed Eng. 2002;4:321–347. doi: 10.1146/annurev.bioeng.4.020702.153434. [DOI] [PubMed] [Google Scholar]
- Axel L, Dougherty L. MR imaging of motion with spatial modulation of magnetization. Radiology. 1989;171(3):841–845. doi: 10.1148/radiology.171.3.2717762. [DOI] [PubMed] [Google Scholar]
- Axel L, Montillo A, Kim D. Tagged magnetic resonance imaging of the heart: a survey. Med Image Anal. 2005;9(4):376–393. doi: 10.1016/j.media.2005.01.003. [DOI] [PubMed] [Google Scholar]
- Boersma P, Weenink D. Praat: doing phonetics by computer. 2010 http://www.praat.org.
- Chandrashekara R, Mohiaddin RH, Rueckert D. Comparison of Cardiac Motion Fields from Tagged and Untagged MR Images Using Nonrigid Registration; Paper presented at the Functional Imaging and Modeling of the Heart; 2005. [Google Scholar]
- Chen T, Wang X, Chung S, Metaxas D, Axel L. Automated 3D motion tracking using Gabor filter bank, robust point matching, and deformable models. Medical Imaging, IEEE Transactions on. 2010;29(1):1–11. doi: 10.1109/TMI.2009.2021041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cho GY, Chan J, Leano R, Strudwick M, Marwick TH. Comparison of two-dimensional speckle and tissue velocity based strain and validation with harmonic phase magnetic resonance imaging. Am J Cardiol. 2006;97(11):1661–1666. doi: 10.1016/j.amjcard.2005.12.063. [DOI] [PubMed] [Google Scholar]
- Fischer SE, McKinnon GC, Maier SE, Boesiger P. Improved myocardial tagging contrast. Magn Reson Med. 1993;30(2):191–200. doi: 10.1002/mrm.1910300207. [DOI] [PubMed] [Google Scholar]
- Garot J, Bluemke DA, Osman NF, Rochitte CE, McVeigh ER, Zerhouni EA, et al. Fast determination of regional myocardial strain fields from tagged cardiac images using harmonic phase MRI. Circulation. 2000;101(9):981–988. doi: 10.1161/01.cir.101.9.981. [DOI] [PubMed] [Google Scholar]
- Garot J, Bluemke DA, Osman NF, Rochitte CE, Zerhouni EA, Prince JL, et al. Transmural contractile reserve after reperfused myocardial infarction in dogs. J Am Coll Cardiol. 2000;36(7):2339–2346. doi: 10.1016/s0735-1097(00)00992-x. [DOI] [PubMed] [Google Scholar]
- Gupta SN, Prince JL. On Variable Brightness Optical Flow for Tagged MRI; Paper presented at the Information Processing in Medical Imaging; 1995. [Google Scholar]
- Guttman MA, Prince JL, McVeigh ER. Tag and Contour Detection in Tagged MR Images of the Left Ventricle. IEEE Transactions on Medical Imaging. 1994;13(1):74–88. doi: 10.1109/42.276146. [DOI] [PubMed] [Google Scholar]
- Hermosillo G, Chefd’Hotel C, Faugeras O. Variational Methods for Multimodal Image Matching. International Journal of Computer Vision. 2002;50:329–343. [Google Scholar]
- Ibanez L, Schroeder W, Ng L, Cates J. The ITK Software Guide. Kitware Inc; Albany, NY: 2003. [Google Scholar]
- Ibrahim el SH. Myocardial tagging by cardiovascular magnetic resonance: evolution of techniques--pulse sequences, analysis algorithms, and applications. J Cardiovasc Magn Reson. 2011;13:36. doi: 10.1186/1532-429X-13-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Abd-Elmoniem KZ, Stone M, Murano EZ, Zhuo J, Gullapalli RP, et al. Incompressible deformation estimation algorithm (IDEA) from tagged MR images. IEEE Trans Med Imaging. 2012;31(2):326–340. doi: 10.1109/TMI.2011.2168825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Murano E, Stone M, Prince JL. HARP tracking refinement using seeded region growing; Paper presented at the 4th IEEE International Symposium on Biomedical Imaging.2007. [Google Scholar]
- Liu X, Prince JL. Shortest path refinement for motion estimation from tagged MR images. IEEE Trans Med Imaging. 2010;29(8):1560–1572. doi: 10.1109/TMI.2010.2045509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masaki S, Tiede M, Honda K, Shimada Y, Fujimoto I, Nakamura Y, et al. MRI-based speech production study using a synchronized sampling method. J Acoust Soc Jpn. 1999;20(5):375–379. [Google Scholar]
- Modersitzki J. Numerical Methods for Image Registration. Oxford University Press; 2004. [Google Scholar]
- Morris GA, Freeman R. Selective excitation in Fourier transform nuclear magnetic resonance. 1978. J Magn Reson. 2011;213(2):214–243. doi: 10.1016/j.jmr.2011.08.031. [DOI] [PubMed] [Google Scholar]
- Narayanan S, Nayak K, Lee S, Sethy A, Byrd D. An approach to real-time magnetic resonance imaging for speech production. J Acoust Soc Am. 2004;115(4):1771–1776. doi: 10.1121/1.1652588. [DOI] [PubMed] [Google Scholar]
- NessAiver M, Prince JL. Magnitude image CSPAMM reconstruction (MICSR) Magn Reson Med. 2003;50(2):331–342. doi: 10.1002/mrm.10523. [DOI] [PubMed] [Google Scholar]
- Notomi Y, Setser RM, Shiota T, Martin-Miklovic MG, Weaver JA, Popovic ZB, et al. Assessment of left ventricular torsional deformation by Doppler tissue imaging: validation study with tagged magnetic resonance imaging. Circulation. 2005;111(9):1141–1147. doi: 10.1161/01.CIR.0000157151.10971.98. [DOI] [PubMed] [Google Scholar]
- Osman NF, Kerwin WS, McVeigh ER, Prince JL. Cardiac motion tracking using CINE harmonic phase (HARP) magnetic resonance imaging. Magnetic Resonance Medicine. 1999;42:1048–1060. doi: 10.1002/(sici)1522-2594(199912)42:6<1048::aid-mrm9>3.0.co;2-m. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Osman NF, McVeigh ER, Prince JL. Imaging Heart Motion Using Harmonic Phase MRI. IEEE Transactions on Medical Imaging. 2000;19(3):186–202. doi: 10.1109/42.845177. [DOI] [PubMed] [Google Scholar]
- Osman NF, Sampath S, Atalar E, Prince JL. Imaging longitudinal cardiac strain on short-axis images using strain-encoded MRI. Magnetic Resonance Medicine. 2001;46(2):324–334. doi: 10.1002/mrm.1195. [DOI] [PubMed] [Google Scholar]
- Ouyang J, Li Q, El Fakhri G. Magnetic resonance-based motion correction for positron emission tomography imaging. Semin Nucl Med. 2013;43(1):60–67. doi: 10.1053/j.semnuclmed.2012.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parthasarathy V, Prince JL, Stone M, Murano EZ, Nessaiver M. Measuring tongue motion from tagged cine-MRI using harmonic phase (HARP) processing. J Acoust Soc Am. 2007;121(1):491–504. doi: 10.1121/1.2363926. [DOI] [PubMed] [Google Scholar]
- Pluim JP, Maintz JB, Viergever MA. Mutual-information-based registration of medical images: a survey. IEEE Trans Med Imaging. 2003;22(8):986–1004. doi: 10.1109/TMI.2003.815867. [DOI] [PubMed] [Google Scholar]
- Rueckert D, Sonoda LI, Hayes C, Hill DLG, Leach MO, Hawkes DJ. Nonrigid registration using free-form deformations: application to breast MR images. Medical Imaging, IEEE Transactions on. 1999;18(8):712–721. doi: 10.1109/42.796284. [DOI] [PubMed] [Google Scholar]
- Stone M, Davis EP, Douglas AS, Aiver MN, Gullapalli R, Levine WS, et al. Modeling tongue surface contours from Cine-MRI images. J Speech Lang Hear Res. 2001;44(5):1026–1040. doi: 10.1044/1092-4388(2001/081). [DOI] [PubMed] [Google Scholar]
- Story BH. Vowel and consonant contributions to vocal tract shape. J Acoust Soc Am. 2009;126(2):825–836. doi: 10.1121/1.3158816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vercauteren T, Pennec X, Perchant A, Ayache N. Diffeomorphic demons: efficient non-parametric image registration. Neuroimage. 2009;45(1 Suppl):S61–72. doi: 10.1016/j.neuroimage.2008.10.040. [DOI] [PubMed] [Google Scholar]
- Winkler R, Fuchs S, Perrier P, Tiede M. Biomechanical tongue models: An approach to studying inter-speaker variability. Proceedings of Interspeech. 2011;2011:273–276. [Google Scholar]
- Woo J, Dey D, Cheng VY, Hong BW, Ramesh A, Sundaramoorthi G, et al. Nonlinear registration of serial coronary CT angiography (CCTA) for assessment of changes in atherosclerotic plaque. Med Phys. 2010;37(2):885–896. doi: 10.1118/1.3284541. [DOI] [PubMed] [Google Scholar]
- Woo J, Lee J, Bogovic J, Murano EZ, Xing F, Stone M, et al. Multi-subject atlas built from structural tongue magnetic resonance images; Paper presented at the POMA-ICA; 2013. [Google Scholar]
- Woo J, Murano EZ, Stone M, Prince JL. Reconstruction of high-resolution tongue volumes from MRI. IEEE Trans Biomed Eng. 2012;59(12):3511–3524. doi: 10.1109/TBME.2012.2218246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woo J, Stone M, Prince JL. Deformable registration of high-resolution and cine MR tongue images. Med Image Comput Comput Assist Interv. 2011;14(Pt 1):556–563. doi: 10.1007/978-3-642-23623-5_70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xing F, Lee J, Murano E, Woo J, Stone M, Prince JL. Estimating 3D Tongue Motion with MR Images; Paper presented at the IEEE Asilomar Conference on Signals, Systems and Computers; Asiloma. 2013. [Google Scholar]
- Zerhouni EA, Parish DM, Rogers WJ, Yang A, Shapiro EP. Human heart: tagging with MR imaging--a method for noninvasive assessment of myocardial motion. Radiology. 1988;169(1):59–63. doi: 10.1148/radiology.169.1.3420283. [DOI] [PubMed] [Google Scholar]




