Abstract
The ability to provide absolute calibrated measurement of the laryngeal structures during phonation is of paramount importance to voice science and clinical practice. Calibrated three-dimensional measurement could provide essential information for modeling purposes, for studying the developmental aspects of vocal fold vibration, for refining functional voice assessment and treatment outcomes evaluation, and for more accurate staging and grading of laryngeal disease. Recently, a laser-calibrated transnasal fiberoptic endoscope compatible with high-speed videoendoscopy (HSV) and capable of providing three-dimensional measurements was developed. The optical principle employed is to project a grid of 7×7 green-laser points across the field of view (FOV) at an angle relative to the imaging axis, such that (after calibration) the position of each laser point within the FOV encodes the vertical distance from the tip of the endoscope to the laryngeal tissues. The purpose of this study was to develop a precise method for vertical calibration of the endoscope. Investigating the position of the laser points showed that, besides the vertical distance, they also depend on the parameters of the lens coupler, including the FOV position within the image frame and the rotation angle of the endoscope. The presented automatic calibration method was developed to compensate for the effect of these parameters. Statistical image processing and pattern recognition were used to detect the FOV, the center of FOV, and the fiducial marker. This step normalizes the HSV frames to a standard coordinate system and removes the dependence of the laser-point positions on the parameters of the lens coupler. Then, using a statistical learning technique, a calibration protocol was developed to model the trajectories of all laser points as the working distance was varied. Finally, a set of experiments was conducted to measure the accuracy and reliability of every step of the procedure. The system was able to measure absolute vertical distance with mean percent error in the range of 1.7% to 4.7%, depending on the working distance.
Keywords: high-speed videoendoscopy, laser calibration, flexible endoscopy, spatial calibrated measurements, statistical signal processing, statistical learning
1. Introduction
Voice science has implemented various approaches for better understanding the phonatory physiology, which is of paramount importance for improving our knowledge of human communication and for advancing the clinical management of voice disorders. In a broad sense, the approaches for studying the vibratory behavior of the vocal folds may be divided into two main groups, outcome-based and internal-based approaches. In the outcome-based approaches, the output of the system (e.g. acoustic or other external signal) is analyzed in order to infer information regarding the internal states of the dynamic system. On the other hand, internal-based approaches provide direct information regarding the internal states of the system. As known from the mathematical analysis of dynamic systems and control theory, the internal states of a dynamic system (e.g., laryngeal structure and vibration of the vocal folds) cannot always be inferred from its output (e.g., acoustic or other external signal).1 Therefore, using imaging techniques for direct observation and studying the vocal fold vibration and laryngeal structure is of crucial importance for advancing theory and clinical practice. In fact, the direct observation of the vocal fold vibration through endoscopic imaging has been an integral part of instrumental assessment of voice and speech for a long time.2
A persistent issue with optical endoscopic imaging has been the presentation of the three-dimensional (3D) laryngeal physiology in a two-dimensional space. In that regard, endoscopic images are not a true representation of the actual phenomena that are being captured. Additionally, it is well known that the size of objects in an image is inversely proportional to their distances to the camera. Therefore, the conventional imaging techniques are deprived of important information regarding the absolute size of objects and also their 3D structures. Considering that vocal fold vibration occurs in both horizontal and vertical planes, having its 3D representation would provide significant amount of information regarding normal and abnormal voice production. On the other hand, the ability to obtain absolute measurements from the laryngeal tissues and structures would provide essential information necessary for the kinematic and aerodynamic modeling of vocal fold behavior3,4, studying the developmental aspects of vocal fold vibration5 and laryngeal tissues, better assessment of different treatment approaches for voice disorders, and more accurate staging and grading of relevant laryngeal disease6. Researchers have been working on augmenting the laryngeal imaging systems with absolute measurements and/or 3D reconstruction capabilities for more than two decades.5–14 Most often, these goals are achieved by projecting a laser pattern with certain topological properties on the field of view (FOV) and then using the information from position and displacement of the laser pattern for achieving absolute measurement or 3D reconstruction.6–8,14–16 Three main components can be identified in (almost) all systems that have been designed for that purpose: the laser projection component, the imaging component, and the endoscopic instrumentation. These three components determine the functionality, characteristics, and capabilities of each system.
Considering the underlying principles for creating the laser pattern, existing systems can be divided into three main categories. Systems in the first category use the well-known laser triangulation principle for doing the measurements.17 The main idea behind systems in this category is to project a laser point (or line) on the target surface and then record the scene from a different angle. The angle difference between the laser projection and the imaging axes captures the vertical displacement of the target surface. The single-point8,10 and single-line12 laser projection systems fall under this category. Systems in the second category have been developed based on projection of structured laser light. These systems project a set of (commonly two) parallel laser beams with known horizontal distance on the target surface. Then, the distance between the parallel laser patterns on the image acts as a scale for converting pixels into millimeter. Two-points7,9,11, two-parallel-line13, and multiple-parallel-line5 projection systems are examples from this category. Finally, systems in the third category have combined structured light projection with laser triangulation technique for achieving the desired measurements goals. The multiple-point laser projection systems are examples from this category.6,14,15 It is noteworthy that systems from each category have a different functionality. Systems from the first category could only capture the vertical movements of the target surface, whereas systems from the second category are typically used for enabling absolute measurements on the horizontal plane. The systems in the third category are by far the most flexible approach and, depending on the design, can provide detailed information regarding vertical movements and absolute measurements on the horizontal plane. This wealth of information comes at the cost of more complex hardware (optical) and software (algorithm) design.
Considering the optical imaging component, two main technologies, videostroboscopy and high-speed videoendoscopy (HSV) can be differentiated, where videostroboscopy has been the “gold standard” approach for clinical voice evaluations2,18,19 Videostroboscopy “provides real-time audiovisual feedback and continues to be the imaging modality of choice by voice clinicians.”19 This technique uses very short flashes of light and takes a sequence of pictures from different glottal cycles and then assembles them into a motion picture. An external trigger based on the vibratory phase of the acoustic or electroglottographic signal determines the time of the flashes. In this fashion, the assembled images represent a slow motion of the true vibration of the vocal folds.20,21 Consequently, videostroboscopy does not present the actual vibratory patterns of vocal fold, and its captured images would substantially deviate from the true pattern as the vibration becomes irregular and aperiodic.19,22 On the other hand, HSV systems capture the true vibratory patterns of vocal fold and therefore it is more appropriate when studying the intra-cycle characteristics of vocal fold vibration.18,21 In summary, the imaging component would determine the temporal resolution of the captured images and consequently, it has a significant role on the type of phenomena that can be captured and studied. Systems based on stroboscopy are applicable to stationary phenomena, whereas HSV systems can be used for capturing non-stationary behaviors such as onset and offset of phonation and also aperiodic phonation.
Considering the type of endoscopic instrument, two categories of endoscopes are available: rigid and flexible. The rigid endoscope provides images with better spatial resolution and visual quality but at the same time it affects the voice and speech production due to transoral insertion that requires unnatural retraction of the tongue for adequate laryngeal exposure, thus, only limited type of stimuli can be elicited. On the other hand, flexible endoscopy does not interfere with articulators and speech can be produced with minimal interference, therefore, it could be more ecologically valid. Additionally, there are less restrictions on the type of stimuli that could be produced, thus, could be used for analysis and studying of vibratory pattern of vocal folds during connected speech.23 Finally, it provides the possibility of simultaneous recordings of the aerodynamic measurements.24–26
Table 1 summarizes the taxonomy of the different systems with laser projection capabilities in the literature. Recently, we developed a new flexible, fiberoptic endoscope with laser-projection capabilities. The new system uses a flexible endoscope for accessing the superior view of larynx, which allows to elicit a wide range of stimuli, while at the same time the optical characteristics of the laser projection system were designed to be compatible with HSV systems and provide good visual contrast between laser points and the background. The system was designed so that absolute measurements in both horizontal and vertical planes are possible. Combining these characteristics, the new system could provide 3D information regarding the vocal fold vibratory pattern and the laryngeal configuration during laryngeal maneuvers, phonation, and connected speech. In order to achieve this goal, the system needs to be calibrated first.
Table 1.
Literature-based taxonomy of different imaging systems with laser projection. These abbreviations were used in the table: Strob. (videostroboscopy), HSV (high-speed videoendoscopy), 3D (three-dimensional reconstruction), nm (nano meter), mW (milli Watt)
| Year | Ref. | Laser pattern | Projection technique | Imaging | Endoscope | Functionality | Other notes |
|---|---|---|---|---|---|---|---|
| 1997 | [7] | 2-point | parallel beams | Strob. | 90°, rigid | horizontal | red laser, 670 nm, 3 mW power per laser point |
| 2001 | [8] | 1-point | triangulation | Strob. | 70°, rigid | vertical | red laser, 643 nm |
| 2002 | [9] | 2-point | parallel beams | HSV | 90°, rigid | horizontal | red laser, 633 nm |
| 2004 | [10] | 1-point | triangulation | HSV | 70°, rigid | vertical | 1 mW power |
| 2004 | [11] | 2-point | parallel beams | Strob. | 70°, rigid | horizontal | green laser, 1 mW power at source |
| 2006 | [6] | 23-point | structured light+triangulation | Strob. | flexible | horizontal | green laser, 150 mW power at source |
| 2008 | [12] | 1-line | triangulation | HSV | 90°, rigid | vertical+horizontal along a single line | red laser, 653 nm, irradiance of 1800 W/m2 |
| 2008 | [13] | 2-line | parallel lines | HSV | 90°, rigid | vertical+horizontal along two lines | red laser, 635 nm, 22 mW power |
| 2010 | [14] | 196-point | structured light+triangulation | HSV | 70°, rigid | vertical+horizontal+3D | - |
| 2013 | [5] | 21-line | parallel lines | HSV | 70°, rigid | horizontal | green laser, 300 mW power at source, irradiance of 1100 W/m2 at working distance of 30 mm |
| 2016 | [15] | 324-point | structured light+triangulation | HSV | 70°, rigid | vertical+horizontal+3D | 532 nm, 150 mW power at source, 80 mW at the tip of the endoscope, irradiance of 1000 W/m2 at working distance of 60 mm |
| 2019 | this work | 49-point | structured light+triangulation | HSV | flexible | vertical+horizontal+3D | green laser, 520 nm, 55 mW at source, 20 mW at the tip of the endoscope, irradiance of 372 W/m2 at working distance of 20 mm |
The main purpose of this paper is to present in detail the algorithm for automatic calibration along the vertical plane of the laser-projection system. It is noteworthy that calibration along the horizontal plane of the laser-projection system and also the segmentation method for automatic detection of laser points from in-vivo recordings are works in progress and are not subjects of this article. The proposed approach has a modular architecture and in that regard is a combination of different components (modules). The main advantage of this approach is independency of different modules from each other. This feature allows separate evaluation of each module. Furthermore, each module can be updated and replaced with a better approach in the future. The paper is organized as follows. Section 2 presents the calibration protocol, algorithmic details for removing the dependencies of position of laser points on the lens-coupler parameters, and finally the algorithm and method for decoding the vertical distances from the laser point position. Section 3 is devoted to the experimental analysis of the proposed method and its result, where performance of each module is evaluated separately. Section 4 presents a discussion and section 5, the conclusions.
2. Material and method
2.1. Laser-projection endoscope
A surgical flexible endoscope, Fiber Naso Pharyngo Laryngoscope Model FNL-15RP3 (PENTAX Medical, Montvale, NJ), with three channels (surgical, imaging, and light-delivery channels) was used for developing the laser-projection endoscope with absolute measurement capabilities. The surgical channel is used for delivering a green laser light with wavelength of 520 nm to the distal tip of the endoscope, where a diffraction-based system splits it into a mesh-pattern of 7×7 laser points. The size of the laser pattern is 16×16 mm at working distance of 20 mm. The imaging channel of the endoscope allows for coupling the endoscope with a color/monochrome high-speed digital camera and recording of the superior view of the larynx with the projected laser pattern at distance ranging from 5 mm to 35 mm. The third channel utilizes a fiberoptic light-delivery system that can be coupled with a xenon light source with power up to 300 W. Figure 1 depicts the calibrated endoscope with its main components.
Figure 1.
Calibrated flexible endoscope with insertion tube diameter of 4.9 mm and its main components.
2.2. Calibration protocol and recordings
To achieve the absolute measurements in the vertical plane the endoscope should be calibrated first. More specifically, the position of the laser points in the FOV is a non-linear function of the lens-coupler parameters and the working distance. Calibration is the process that accounts for these factors and finds the mathematical function for decoding the desired measurements from the positions of the laser points. In order to find that function, a data-science approach based on statistical pattern recognition and statistical learning techniques are adopted in this paper.
Figure 2(A) shows the recording setup. The laser-projection endoscope was connected to a high-speed monochrome camera Phantom v7.1 (Vision Research Inc., Wayne, NJ) using a 45-mm lens coupler and a 300-Watt xenon light source. Considering that calibrations are typically done under a controlled environment and the best possible settings, a monochrome camera was used for this phase. Monochrome cameras have higher sensitivity comparing to their color versions and they don’t use the Bayer-decomposition filters.27 These characteristics result in a sharper image with better defined edges. It is noteworthy that using the monochrome camera doesn’t impose restrictions on application of the system, and the calibrated endoscope can be used with a color camera after calibration. The camera and the endoscope were mounted on a vertical plane perpendicular to the target surface and FOV was recorded at the speed of 7000 frames per seconds with spatial resolution of 288×280 pixels. The target surface was attached to an adjustable arm that allowed to regulate with high precision the working distance to the distal end of the endoscope. The working distance was varied from 5 mm to 35 mm using a 1-mm step and it was measured using a digital height gauge with accuracy of 0.001” (0.03 mm).
Figure 2.
(A) Calibration setup; (B) measuring the distance to the tip of the endoscope; and (C) measuring the distance to the fixture.
Accurate measurement of working distance depended on accurate leveling of arm of gauge with the distal end of endoscope (Figure 2(B)), which should be determined visually and therefore time consuming and subject to variability. Therefore, in the setup a fixture was placed about 2 cm above the distal end of endoscope, and the following procedure for measuring the distance between tip of endoscope and the top surface of the fixture was implemented. The measurement arm of gauge was positioned on the top surface of the fixture and the height was recorded (Figure 2(C)). Then, the measurement arm was positioned parallel to the tip of endoscope and the height was recorded again. In order to check the leveling of the two surfaces, a 13-megapixel smartphone camera was positioned on the same vertical level as the tip of endoscope and the digital magnification feature of the camera was used to fine-tune the position of the measurement arm (Figure 2(B)). These steps were repeated ten times and then the results were averaged. The average distance to the fixture was 45.01±0.03 mm and the average distance to tip of endoscope was 21.85±0.11 mm. The measurement of the distance to the fixture shows lower value of standard deviation, supporting better accuracy of measurement when the fixture is used as the reference point. From these measurements, the distance between the tip of the endoscope and the top surface of the fixture was estimated to be 23.16 mm.
Two different recordings were made at each working distance. In the first recording, a white piece of paper was used, the xenon light was turned off, and the laser projection system was turned on with maximum power. In the second recording, a multi-resolution grid paper (1-mm, 2-mm, and 10-mm boxes) was used, the laser projection system was turned off, and the xenon light was turned on. Throughout the paper these two recordings will be referred to as laser recordings and grid recordings, respectively. Each of these two sets of recordings serve a different purpose in the calibration procedure. The laser recordings are used for finding the accurate position of laser points in the FOV, whereas the grid recordings are used for estimating the parameters of recordings. The grid recordings are also necessary for the horizontal calibration of the system, but this is a separate procedure, which is not presented in this report. It is noteworthy that these two recording conditions are only used to remove confounding factors from different calibration processes and to maximize the accuracy, but they don’t impose any restrictions on the application of the system, and they don’t need to be replicated during clinical data collection. Finally, since the intensity of pixels increases at shorter working distances leading to possible saturation of the image, the exposure time of the camera and the power of light source were adjusted at each step to prevent image saturation.
2.3. Measuring vertical distance
The position of the laser points in the captured image is a deterministic function of the vertical distance between the distal tip of endoscope, the target surface, and the lens-coupler parameters. This section presents the automatic approach for compensating the effect of lens-coupler parameters and for decoding the vertical distances from the positions of laser points.
2.3.1. Compensating for the lens-coupler parameters
Some of the lens-coupler parameters change the position of the laser points in the FOV even if the working distance is kept constant. Those parameters include the focal distance of the lens coupler connecting the endoscope to the camera and the position and angle of the endoscopic eyepiece relative to lens coupler. In order to decode the vertical displacements, first these parameters are estimated from the recordings and then compensated for. After that, positions of the laser points become only a function of the vertical distance, hence could be used for the measurements. The effects of the different lens-coupler parameters and the corresponding compensation approaches are presented as follows.
2.3.1.1. Recording model
The focal distance of the lens coupler determines the magnification of the camera. Thus, using higher magnification results in an image where everything is larger. Therefore, the number of pixels between certain laser points (equivalently x-y coordinates of the laser points in the image) would depend on the magnification of the camera. The second variability comes from the rotation of the endoscopic eyepiece inside the lens coupler attached to the camera. Because the camera is fixed, the recording frame would remain constant, but the FOV with everything inside of it would undergo a rotation transformation. Therefore, when the endoscope gets rotated, the projected laser pattern would also get rotated. This means that the x-y coordinates of the laser points in the image depend on the endoscope rotation. The last variability stems from the displacement of the eyepiece within the lens coupler. More specifically, the position of the eyepiece inside the lens coupler is not fixed and it can move in the horizontal plane. When the eyepiece is displaced, the whole FOV is displaced within the image frame. Consequently, the x-y coordinates of the laser points in the image depend on the position of the eyepiece within the lens adapter.
To account for variations due to these lens-coupler parameters, first we need to have a model that describes the effect of each parameter on the recorded images. The model that was used for this purpose consists of three main transformations of scaling (effect of magnification), rotation (effect of eyepiece rotation), and translation (eyepiece displacements). The aim of this model is to map the recordings with variable parameters into a fixed and standard coordinate system where x-y coordinates of the laser points are independent from those lens-coupler parameters.
Let I(x,y) and i(x,y) denote the original image and a pixel from it and J(x′,y′) and j(x′,y′) denote the mapping of that image in the standard coordinate system and the corresponding pixel in the new image. Also, let T, R, I2, and k denote a translation vector, a rotation matrix, an identity matrix with the size of two, and a scaling factor, respectively. Equation 1 shows the model.
| (1) |
Now, if values of T, R, and k are determined, the mapping can be carried out. Considering the aim of mapping, a few considerations should be taken into account when determining these parameters. First, the parameters should be determined so that the new image (j(x, y) ∈ J(x, y)) is invariant from the lens-coupler parameters. Second, the estimation of those parameters should be computationally efficient. Third, the estimated parameters should be relatively robust to different sources of noise.
The effect of the eyepiece displacements manifests itself as the position change of FOV in the image frame. Also, the effect of the focal length of the lens coupler is manifested through the FOV size. Therefore, both magnification and eyepiece displacement can be compensated by parametrization of the FOV. Both visual inspection and objective assessment confirmed that FOV can be estimated with a circle. Fortunately, very efficient algorithms have been developed for parametrization of circular objects.28,29 Additionally, circles have very well-defined and smooth topological shape, which makes estimation of their parameters robust to noise. Therefore, the translation transformation (T) was defined so that the center of the new coordinate system coincides with the center of FOV. Also, the radius of FOV was used to account for the magnification effect making the size of pixels constant. Flexible endoscopes have a fixed fiducial marker on their distal end that remains fixed relative to FOV and the target surface (Figure 3). This fiducial marker helps with determining the orientation during flexible endoscopy. The laser projection optics are glued inside the surgical channel and their position relative to fiducial marker is fixed, therefore, the position of fiducial marker can be used as a reference for compensating the effect of rotation of the endoscope within the lens coupler. In summary, Figure 3 shows a diagram of the model. Based on this model the recorded image (I(x,y)) undergoes a series of transformations including a translation, a rotation, and a scaling and gets converted into a new image (J(x,y)) in a standard coordinate system. In the translation phase, the center of the coordinate system is shifted to the center of the FOV. Rotation transformation brings the fiducial marker to a predetermined position (e.g., 0 degree). Finally, the scaling transformation stretches or shrinks the FOV so that its radius gets equal to a predetermined value of the radius r.
Figure 3.
Model for compensating the recording parameters of the system.
2.3.1.2. Automatic estimation of the mapping
Based on Equation 1, the mapping consists of three main transformations. As shown in Figure 3 the parameters of those transformations can be estimated based on two morphological components of the image. That is, the center of FOV and its radius are used for estimating parameters of translation and scaling transformations, and the angle (θ) between the line connecting the center of FOV to the fiducial marker and the horizontal line determines the parameter of rotation transformation. This section presents the algorithms and image processing techniques that were used for finding these two important morphological landmarks.
First, the detection of FOV is investigated. The lighting channels of the endoscope provide illumination for the FOV, which is then trimmed by the field of view of the endoscope, leaving the pixels outside of the endoscopic circle quite dark. Therefore, it is possible to apply a thresholding technique and find a rough estimation of FOV. However, any error in estimating the center and radius of FOV would change the position of the laser points in the standard coordinate system, introducing an error in estimating the vertical distances. In order to find a more robust approach, the FOV finder module requires an additional source of information. Assuming the noise and distortions have linear effects, the geometrical shape of FOV would remain intact. Therefore, combining the geometrical information with the illumination differences inside and outside of FOV could help devise a robust method.
FOV has a circular shape and therefore the pixels on its boundary can be expressed using a precise mathematical equation. Let denote the center of a circle with radius r, Equation 2 shows the locus of points on the perimeter of that circle.
| (2) |
The Hough transform is a very popular approach in the computer vision community, which initially was developed for detection of lines and other analytically defined shapes (e.g. circles, ellipses)30, but later on it was extended to other shapes31 Considering that FOV has an analytically well-defined shape, Hough transform can be used for capturing the geometrical information of FOV. In summary, the FOV finder module consists of two steps. In the first step a thresholding technique is applied on the gray scale image. This step uses information from differences between the intensity of pixels inside and outside of FOV and converts the gray scale image into a binary image. In the second step, the binary image is fed into the Hough transform algorithm, where it finds the center and radius of a circle that fits the binary image the best.
The second landmark shape in the image is the fiducial marker (Figure 3). The position of this landmark relative to the center of FOV and horizontal line determines the rotation transformation parameter. In order to make its detection as accurate and robust at possible, two different sources of information were identified and combined. First, the fiducial marker is fabricated through a physical notch in the FOV. Therefore, it is the most likely region outside the FOV to be bright and hence there would be differences between the intensity of the pixels within the fiducial marker comparing to other regions outside of FOV. Second, the fiducial marker is attached to the exterior of FOV and therefore there is no need to check all pixels outside of FOV. Using this spatial information would remove some incorrect candidates and improve performance of the fiducial finder module.
The fiducial finder module has two main steps. First, a torus mask centered at the center of FOV with inner radius of r+1 and outer radius r+8 was applied on the image. This step incorporates the spatial information into the method. Next, a threshold was applied on the remaining pixels. Due to the imperfect circular shape of FOV and the leakage of light to outside of FOV, a very thin arc could be present at this step. In order to remove those artifacts, the binary image at this stage underwent a morphological opening operation with a disk-shaped structuring element.32 The opening function removed any shape with thickness less than 3 pixels. The final step is to quantify the position of the detected fiducial marker. It is known that the centroid of an object is relatively insensitive to noise and therefore is a robust estimation of location of that object inside the image. Therefore, the centroid of the biggest element was computed as the location of the fiducial marker. Let B(x, y) denote a binary image with size of m×n pixels. Equation 3 shows how its centroid can be computed.
| (3) |
Where |A| denotes area of image B and it can be computed from Equation 4.
| (4) |
After finding the centroid of the fiducial marker, the rotation angle is computed using Equation 5.
| (5) |
2.3.2. Algorithm for distance estimation
After mapping a frame into the standard coordinate system, the position of each laser point on the new image (J(x, y)) depends only on the vertical distance between the distal tip of the endoscope and the target surface. This section presents details of the algorithm for automatic detection of laser points and decoding of vertical distances from those positions.
2.3.2.1. Automatic detection of laser points
The accuracy of the laser point detection module would have a significant effect on the accuracy of vertical distance estimation. Any error in the detection of the laser points, or in the quantification of their positions, would translate into vertical distance inaccuracies. To devise a robust detection algorithm and an accurate calibration method, the characteristics of the projected laser points should be known. Figure 4 shows a frame from one of the laser recordings data. As shown, the energy of the laser source is not uniformly divided between the laser points, where the points in the middle are significantly brighter than the points in the periphery. Additionally, as shown in Figure 4(A,C), the sum of the image on rows and columns can infer that the intensity of each laser point has a bell-shaped spatial distribution, with highest intensity at the center followed by a fast decay toward the distal pixels.
Figure 4.
Plot of the laser points’ (A) sum of intensity of pixels on the rows; (B) original image, and (C) sum of intensity of pixels on the columns.
Different characteristics and sources of information were taken into account during the design of the laser detection module. First, the difference between the intensity of the laser points and the background was exploited through an adaptive thresholding approach. Considering that intensity of pixels is a function of working distance, using the adaptive approach was inevitable. For that purpose, the histogram of intensity of pixels was constructed and the first bin was considered as the black reference and was discarded. The cumulative distribution function (CDF) of the logarithms of the remaining bins was estimated, and the value corresponding to 0.4 was selected as the intensity threshold. Second, referring to Figures 4(A) and (C), a very large magnitude of gradient around the laser points is expected; therefore, an adaptive thresholding approach was used for exploiting this information, as well. To that end, the histogram of magnitude of gradient of the image was constructed and the value of sixth bin was used as the gradient threshold. The two thresholding values were applied on the image followed by a morphological opening operation with a disk-shaped structuring element. At this point, every laser point would be represented by a blob and its centroid can be computed. Considering that the intensities of the laser points have a bell-shaped spatial distribution, this information was used too. This information was incorporated by using the weighted centroid instead. Third, the laser points should have circular shapes, but most of the times the extracted blobs don’t have that characteristic; therefore, the weighted centroid would be affected by those artifacts. To remedy that and also to incorporate the morphological information of laser points, a disk with radius of 7 pixels was constructed around every centroid, and then the final position of laser points was computed as a weighted centroid of the pixels within those disks.
2.3.2.2. Vertical distance decoding:
Figure 5 shows how the x-y coordinates of each laser point vary depending on the working distance. Based on this figure, each laser point travels along a unique and well-defined trajectory, and hence its position within that trajectory can be used for decoding the vertical distance from that point to the tip of the endoscope. Additionally, it is evident that each laser point has some idiosyncratic characteristics. As seen in Figure 5(B), the behaviors of the laser points are different, where some of the laser points travel along a line (almost) perpendicular to the x-axis, indicating very small variations in the x-coordinate of these points; while other points travel along non-linear trajectories and show significant variations in the x-coordinate. Interestingly, some of these points have deflection to the right and some of them have deflection to the left. Considering that each laser point has a slightly different projection angle, these variations are to be expected. It is desirable to have trajectories with variation only along one axis, but these observations show that such characteristic can not be achieved perfectly.
Figure 5.
Position of each laser point as a function of working distance where each color shows a different laser point: (A) x-y coordinates as a function of working distance; (B) x-coordinate as a function of working distance; and (C) y-coordinate as a function of working distance
In order to make the decoding process efficient and fast, the trajectory of each laser point was modeled using a function. Let be position of the laser point i at working distance j (1 ≤ i ≤ 49, 1 ≤ j ≤ 31). This point can be converted into the polar coordinate system using Equation 6.
| (6) |
Now, the goal is to find a family of parametrized function and their proper parameters β such that on average the estimated distances () and the true distances (dij) are near each other based on a properly defined distance function (). Equations 7–8 show these:
| (7) |
| (8) |
The parameter β could be determined using optimization Equation 8 and a set of data points (training phase). After that, the trained function could be used for decoding working distances of new data points. As shown in Figure 5(B–C), different laser points follow relatively similar semi-exponential trajectories; thus, the same family of curves () was used for decoding purpose. But, in order to capture the idiosyncratic characteristics of each trajectory, the training phase was done separately for each laser point. Therefore, a total number of 49 different curves were trained during this phase. Equation 9 shows the family of curves there were used.
| (9) |
Finally, most often normalization of data points improves performance of machine learning algorithms.33 Let mi and σi denote the mean and standard deviation of the radius of all laser points from trajectory i, Equation 10 shows the employed normalization process. The normalized values were then used for the purpose of training.
| (10) |
3. Analysis and results
3.1. Evaluation of FOV and the fiducial finder modules
The performance of the methods for compensating the effect of lens-coupler parameters was evaluated. Doing that requires a ground truth as a reference for comparison. Additionally, to measure the performance of each module separately, the standard deviation of the estimated parameters within a recording was used as the evaluation criterion. To that end, the videos from grid recordings were used. During those recordings, the configuration between the camera and endoscope was kept constant; therefore, the position of FOV, the radius of FOV, and the position of the fiducial marker should be the same for all of them. This observation was used for objective evaluation of the implemented algorithms. For that purpose, each recording was divided into batches with 200 frames. Then, frames within each batch were averaged and the result were fed into algorithm for estimating the center of FOV, the radius of FOV, and the angle of the fiducial marker. Figure 6 shows the centralized distribution (the means were subtracted to make the plots more comparable) of estimated parameters over all batches and recordings. As seen on these figures, the centralized probability density functions are concentrated around zero with very sharp peaks. This supports that the proposed FOV and fiducial finder modules are quite robust and have very stable performances.
Figure 6.
Distribution of the variability in the output of FOV and the fiducial finder modules. (A) Distribution of the centralized coordinates of FOV center. (B) Distribution of the centralized radius of FOV. (C) Distribution of the centralized fiducial angle.
3.2. Evaluation of the laser finder module
The performance of the laser detecting module was evaluated using the videos from the laser recordings. To that end, each recording at a specific working distance was divided without any overlaps into 11 batches of 200 frames. Then, the frames within each batch were averaged to remove the effect of additive noise. The position of the laser points for each batch were estimated using the proposed algorithm. Because all batches were recorded at the same working distance, the estimated position should have no variation in the ideal case. Therefore, the standard deviation of the (x,y) coordinates of each laser point over all batches could be used to evaluate the performance of the algorithm. The same approach was repeated for all working distances. The distribution of this evaluation criterion had a Mean of 0.012 pixels, and std of 0.0224 pixels. Figure 7 shows the distribution of this evaluation metric. The figure shows that the probability density function is concentrated around a small number near the zero with a very sharp peak. This supports that the employed approach for detection of laser points is quite robust and has very stable performance.
Figure 7.
Distribution of the variability in the output of the laser finder module.
3.3. Displacement analysis and vertical resolution of the system
The displacement of the laser points when the working distance was varied was analyzed. To that end, the positions of all laser points were computed for all working distances. Then, the magnitude of displacement was plotted as a function of the variation in working distance. Figure 8(A) shows the magnitude of displacement when the working distance is changed from 35 mm to another target distance. This figure clearly shows a semi-exponential relationship between the working distance and the magnitude of displacement, where, at large working distances the displacement is small, but at small working distances the magnitude of displacement is much larger. In order to present this phenomenon better, the magnitude of displacement between two consecutive working distances was computed. In this fashion the amount of decrement in working distance is kept constant (around 1 mm), but the effect of different working distances was studied. Figure 8(B) shows the result. Clearly, at large working distances (>20 mm) reducing working distance by 1 mm leads to small variation in position of the laser points. On the other hand, as the working distance is reduced, much larger variation in position of the laser points for the same reduction in working distance is seen. Considering the fact that the variation in position of the laser points captures the vertical displacement of the target surface, these analyses show that the vertical resolution of the endoscope is a function of the working distance, where the vertical movements can be measured with higher resolution at shorter working distances.
Figure 8.
Displacement analysis of the laser points as working distance is changing. (A) Magnitude of variation in position of the laser points as working distance is changing from 35 mm to a new distance. (B) Magnitude of variation in position of the laser points for 1 mm decrement at different working distances.
Figure 8 shows that different laser points at the same working distance exhibit different behaviors. More specifically, some laser points show higher magnitude of displacement indicating higher sensitivity to variation in working distance. In order to find whether those points have certain relationship to each other or not, another analysis was carried out. The average magnitude of displacement for 1 mm decrement in different working distances (Figure 8(B)) was computed separately for each laser point and then the result was plotted. Figure 9 presents the employed indexing and the result. The result from Figure 9 (B) is significant in several regards. First, the figure has a specific pattern and it is not random. Therefore, the variability seen in Figure 8 does not stem from the detection algorithm, but it is rather inherent to the optical characteristics of the system. Second, assuming that the square grid of 7×7 points is parallel to the x-y axes (Figure 9 (A)), the points with highest sensitivity to vertical displacement were the three middle rows and the first and last rows had the lowest sensitivity to vertical displacement. Therefore, the best reconstruction of vertical movements is achieved if the target region is covered with laser points from the three middle rows.
Figure 9.
(A) indexing used in this paper. (B) Average magnitude of displacement of each laser point.
3.4. Evaluation of vertical distance measurements
The system was evaluated using two different criteria. First, goodness of fit of the functions during the training phase was analyzed. For that purpose, the values of root mean square error (RMSE) and adjusted r-squared were computed. Figure 10 shows these values for each individual function.
Figure 10.
(A) root mean square error (RMSE) of different trajectories in the training phase (B) adjusted r-squared of different trajectories in the training phase
Figure 10 shows that the training error has peaks for RMSE and dips for adjusted r-squared for laser point indices {7, 14, 21, 28, 35, 42, 49}. These trajectories correspond to the top row of the projection pattern. Therefore, for these points, high values of error in the testing phase are expected. That is, the points on the top row would have higher vertical measurement error. Referring to the trajectories with the best performance, most of them were from middle rows of the projection pattern and hence those rows would have lower vertical measurement error. This last observation concurs with results shown in Figure 9.
Next, performance of the system in the testing scenario was analyzed. To that end, the target surface was positioned at fifteen different new working distances, and positions of the laser points were recorded. After finding the x-y coordinates of the laser points from the above described approach, they were mapped into the polar coordinate system using Equation 6. The radius in the polar coordinate system was then fed as the input to all 49 trained functions, and each function returned an estimated vertical distance. Figure 11 shows a boxplot of the error at each working distance during the testing phase. Considering that the top row of laser projection pattern had quite different performance in the training phase (Figure 10), two different scenarios are reported, first results from all trajectories were used for finding the vertical distance (A), second, the functions corresponding to the top row in the projection pattern were excluded from the analysis (B).
Figure 11.
Boxplot of vertical measurement errors at different working distances. (A) Results from all functions. (B) Results when functions from the top row are discarded.
Referring to Figure 11, some observations can be made. First, the estimation error at short working distances (<20 mm) is much lower than at large working distances. It is noteworthy that this observation agrees with the result and discussions of section 3.3 and Figure 8(B). Considering the working distance of flexible endoscopy and the fact that these endoscopes could get near to the target tissue, this characteristic could be utilized very efficiently during the examination. As a rule of thumb, the proximity of the endoscope to the target tissue can be ensured by filling the image with the tissue of interest. Second, when the points on the top row are discarded, the estimation error is reduced considerably. Finally, Table 2 reports the measurement error for each working distance, comparing the mean of the estimation error for the whole pattern (averaged over all 49 functions) to the case when the top row is discarded (averaged over 42 functions). This value is significant because if a flat and horizontal target surface can be assumed, averaging multiple measurements would remove significant amount of error from the measurements. It also provides the lower (upper) bound on error (accuracy) of the measurements from the device. Additionally, the mean percent error (mPE) defined as the absolute value of error divided by the working distance, and the maximum percent error (MPE) defined as the absolute value of the error divided by the working distance, are also computed and reported.
Table 2.
Statistics of the measurement error. All measurements have the unit of mm and the number in parentheses signifies the number of functions that were used in the measurements
| Measurement error | ||||||
|---|---|---|---|---|---|---|
| Dist. | mean (49) | mPE (49) | MPE (49) | mean (42) | mPE (42) | MPE (42) |
| 5.77 | −0.43 | 10.4% | 141.1% | 0.04 | 1.7% | 5.7% |
| 7.95 | 0.17 | 3.4% | 21.3% | 0.02 | 1.7% | 5.6% |
| 9.54 | 0.19 | 3.1% | 32.5% | 0.03 | 1.5% | 5.8% |
| 11.87 | 0.21 | 2.6% | 25.2% | 0.06 | 1.2% | 6.4% |
| 13.38 | −0.03 | 2.1% | 25.9% | −0.09 | 1.3% | 6.8% |
| 15.39 | 0.76 | 4.9% | 20.6% | 0.57 | 3.7% | 10.2% |
| 16.92 | 0.38 | 2.8% | 14.5% | 0.33 | 2.2% | 5.3% |
| 18.41 | −0.42 | 3.1% | 16.9% | −0.15 | 1.8% | 6.6% |
| 20.3 | −0.84 | 5.5% | 30.8% | −0.34 | 3.3% | 13.6% |
| 21.69 | −0.78 | 4% | 14.8% | −0.6 | 3.2% | 14.8% |
| 23.16 | −1.3 | 5.6% | 20.7% | −0.99 | 4.3% | 13.8% |
| 25.34 | −0.4 | 2.7% | 8.8% | −0.43 | 2.2% | 7.2% |
| 27.06 | 0.51 | 3.3% | 9.3% | 0.66 | 3.2% | 9.3% |
| 28.69 | 1.13 | 4.1% | 13.8% | 0.9 | 3.3% | 10.8% |
| 30.14 | 1.53 | 5.1% | 12.7% | 1.41 | 4.7% | 12% |
4. Discussion
Speech and voice are the outcome of intricate collaborative function between different systems of the body. The pulmonary system provides the driving force for the voice and speech production system and its effect can be measured on a calibrated scale using air-flow and air-pressure measurements, which are used for modeling the underlying mechanisms. On the output, the intensity of the acoustic signal can also be measured on a calibrated scale using sound pressure level. The methodology and the required instrumentation for performing these measurements have been available to researchers for a long time34 One of the remaining pieces for developing a comprehensive model for voice and speech production is performing the kinematic measurements on the vocal folds and their vibratory pattern on a calibrated scale. Having access to a device with absolute measurement capabilities along the horizontal and vertical planes would address this gap. Additionally, personalized medicine35 and patient-specific modeling36 are topics of high importance to medicine, because they allow taking into account the differences between individuals during diagnosis and treatment. In patient specific-modeling, such differences could be fed into computational models for improving the diagnosis and treatment of patients by making better predictions about the outcome of different therapeutic options and surgeries.36 Considering that most current patient-specific modeling approaches rely on the geometry of the tissues derived from 3D imaging techniques, instrumentation with absolute measurement and 3D reconstruction capabilities would be beneficial for developing patient-specific models for populations with voice disorders. Finally, imaging techniques with absolute measurement capabilities can significantly enhance evidence-based practice, an important clinical topic in all fields, including laryngology and speech-language pathology.37 More specifically, the ability to perform absolute measurements on tissues and to reconstruct the 3D vibratory patterns of the vocal folds would provide researchers and clinicians with means for measuring the size of lesions and performing quantitative analysis on the kinematics of the vocal folds. This information can be obtained prior to, and after therapy and the comparison between the two would allow evaluating the efficacy of the therapy. Other important clinical applications of an imaging system with calibrated measurement capabilities include studying the developmental aspects of the laryngeal tissues and the resulting changes in vocal fold vibration5, and the more accurate staging and grading of relevant laryngeal disease.6
This paper provides a detailed analysis of the calibration characteristics and procedures, which is the first step into developing an accurate instrument allowing absolute measurements of the vocal fold vibratory kinematics. Achieving the above-mentioned goals depends on a software solution that performs several additional tasks. Considering the end-user perspective, the laser points should be first detected and tracked on in-vivo recordings. This module should handle efficiently the non-uniform intensity of the laser points, the high-intensity reflection points in the recorded images, and the non-uniform reflections of the tissues. Further, a second module would take the estimated position of the laser points as an input and perform the required measurements and the reconstruction of the 3D vibratory pattern of the vocal folds. Establishing the relationship between the position of the laser points and the target measurements is the pre-requisite for this second module. This process is known as calibration, where the calibration along the vertical dimension was the focus of this article. To that end, an automatic modular solution was proposed for performing vertical calibration. The modular solution allows the system to be broken into different components and has several important advantages. It makes objective analysis of each module possible, in that different sources of error can be distinguished and each of them can be quantified separately. Also, it provides flexibility in the design where each module may be replaced independently with a better solution in the future. Another feature of the proposed calibration method was using the data science approach. Considering that each of the laser points have idiosyncratic characteristics, the manufacturing of each endoscope, and the different endoscopic brands introduce differences, this approach adds significant flexibility to the system. In that regard, the calibration system was designed based on a set of parameters (a translation vector, a rotation matrix, a scaling factor, and parameters of the decoding functions ) where the parameters of are determined separately for each endoscope and the remaining parameters are computed per recording. Another distinctive feature of the data science approach is its robustness to measurement error. More specifically, all measurements have some inherent errors and using statistical learning approaches can remove the random error component and hence, improve performance of the system. This feature improves by increasing the number of training samples.
5. Conclusion
The ability to provide absolute calibrated measurements and to estimate the vertical vibratory pattern of the vocal folds would further advance the kinematic and aerodynamic modeling of voice production, enabling new clinically significant research approaches, such as patient-specific modeling and studying laryngeal development. With these goals in mind, the paper presented an automatic and modular approach for calibration of a newly developed transnasal fiberoptic endoscope with absolute measurement and 3D reconstruction capabilities. This was achieved by mapping the recorded image into a standard and fixed coordinate systems, where the position of the laser points was independent from the lens-coupler parameters such as the magnification of the camera, the rotation of the endoscope relative to the camera, and the displacement of the endoscope within the lens. Consequently, the position of the laser points in this new coordinate system is only a function of working distance. The analysis showed that each laser point travels along a unique and deterministic trajectory, making the efficient decoding of the vertical distance possible. The decoder was implemented based on statistical learning techniques, where a different function was trained per each trajectory. The trained function produces the estimated vertical distance upon a given input. Each module of the system was tested separately, and the results were satisfactory. The system was able to measure absolute vertical distance with mean percent error varying from 1.7% to 4.7%, depending on the working distance.
Supplementary Material
Video S1 presents a simulation of the principle behind the laser-projection system and how the vertical distance is encoded. In all images, the blue diamonds denote the positions of the laser points when the target surface is flat and equidistant from the camera, whereas the red circles denote the actual positions of the laser points as the target surface gets deformed. The image on the left shows the actual 3D structures of the target surface superimposed with the laser points. The image on the right shows the 2D images that the camera would capture. Two important observations can be made from these images. First, the direction of the displacement encodes whether the distance to the camera is increasing or decreasing. Second, the magnitude of the displacement encodes the changes along the vertical direction.
Acknowledgments:
Funding provided by the Michigan State University Foundation, the Voice Health Institute, and the National Institutes of Health (NIH) National Institute on Deafness and Other Communication Disorders (Grant P50 DCO15446). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors acknowledge the contributions of Drs. Milen Shishkov and Brett Bouma from the Wellman Center for Photomedicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, who designed and developed the laser-projection system for the transnasal fiberoptic endoscope, and documented the system’s technical specifications.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
A portion of this study has been accepted for presentation at the 48th Symposium of The Voice Foundation: Care of the Professional Voice, Philadelphia, PA, May 29 – June 02, 2019.
References:
- 1.Kalman RE. Mathematical description of linear dynamical systems. J Soc Ind Appl Math Ser A Control. 1963;1(2): 152–192. [Google Scholar]
- 2.Patel RR, Awan SN, Barkmeier-Kraemer J, et al. Recommended Protocols for Instrumental Assessment of Voice: American Speech-Language-Hearing Association Expert Panel to Develop a Protocol for Instrumental Assessment of Vocal Function. Am J speech-language Pathol. 2018:1–19. [DOI] [PubMed] [Google Scholar]
- 3.Hunter EJ, Titze IR, Alipour F. A three-dimensional model of vocal fold abduction/adduction. J Acoust Soc Am. 2004; 115(4): 1747–1759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Thomson SL, Mongeau L, Frankel SH. Aerodynamic transfer of energy to the vocal folds. J Acoust Soc Am. 2005;118(3): 1689–1700. [DOI] [PubMed] [Google Scholar]
- 5.Patel RR, Donohue KD, Lau D, Unnikrishnan H. In vivo measurement of pediatric vocal fold motion using structured light laser projection. J Voice. 2013;27(4):463–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kobler JB, Rosen DI, Burns JA, et al. Comparison of a flexible laryngoscope with calibrated sizing function to intraoperative measurements. Ann Otol Rhinol Laryngol. 2006;115(10):733–740. [DOI] [PubMed] [Google Scholar]
- 7.Herzon GD, Zealear DL. New laser ruler instrument for making measurements through an endoscope. Otolaryngol Neck Surg. 1997;116(6):689–692. [DOI] [PubMed] [Google Scholar]
- 8.Manneberg G, Hertegard S, Liljencrantz J. Measurment of human vocal fold vibrations with laser triangulation. Opt Eng. 2001;40(9):2041–2045. [Google Scholar]
- 9.Schuberth S, Hoppe U, Döllinger M, Lohscheller J, Eysholdt U. High-precision measurement of the vocal fold length and vibratory amplitudes. Laryngoscope. 2002; 112(6): 1043–1049. [DOI] [PubMed] [Google Scholar]
- 10.Larsson H, Hertegård S. Calibration of high-speed imaging by laser triangulation. Logop Phoniatr Vocology. 2004;29(4): 154–161. [DOI] [PubMed] [Google Scholar]
- 11.Schade G, Leuwer R, Kraas M, Rassow B, Hess MM. Laryngeal morphometry with a new laser “clip on” device. Lasers Surg Med Off J Am Soc Laser Med Surg. 2004;34(5):363–367. [DOI] [PubMed] [Google Scholar]
- 12.George NA, de Mul FFM, Qiu Q, Rakhorst G, Schutte HK. New laryngoscope for quantitative high-speed imaging of human vocal folds vibration in the horizontal and vertical direction. J Biomed Opt. 2008; 13(6):64024. [DOI] [PubMed] [Google Scholar]
- 13.Wurzbacher T, Voigt I, Schwarz R, et al. Calibration of laryngeal endoscopic high-speed image sequences by an automated detection of parallel laser line projections. Med Image Anal. 2008;12(3):300–317. [DOI] [PubMed] [Google Scholar]
- 14.Luegmair G, Kniesburges S, Zimmermann M, Sutor A, Eysholdt U, Dollinger M. Optical reconstruction of high-speed surface dynamics in an uncontrollable environment. IEEE Trans Med Imaging. 2010;29(12): 1979–1991. [DOI] [PubMed] [Google Scholar]
- 15.Semmler M, Kniesburges S, Birk V, Ziethe A, Patel R, Döllinger M. 3D reconstruction of human laryngeal dynamics based on endoscopic high-speed recordings. IEEE Trans Med Imaging. 2016;35(7): 1615–1624. [DOI] [PubMed] [Google Scholar]
- 16.Luegmair G, Mehta DD, Kobler JB, Döllinger M. Three-Dimensional Optical Reconstruction of Vocal Fold Kinematics Using High-Speed Video With a Laser Projection System. IEEE Trans Med Imaging. 2015;34(12):2572–2582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ji Z, Leu M-C. Design of optical triangulation devices. Opt Laser Technol. 1989;21(5):339–341. [Google Scholar]
- 18.Deliyski DD, Petrushev PP, Bonilha HS, Gerlach TT, Martin-Harris B, Hillman RE. Clinical implementation of laryngeal high-speed videoendoscopy: Challenges and evolution. Folia Phoniatr Logop. 2008;60(1):33–44. [DOI] [PubMed] [Google Scholar]
- 19.Mehta DD, Hillman RE. Current role of stroboscopy in laryngeal imaging. Curr Opin Otolaryngol Head Neck Surg. 2012;20(6):429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hillman RE, Mehta DD. The science of stroboscopic imaging In: Kendall KA, Leonard RJ, eds. Laryngeal Evaluation: Indirect Laryngoscopy to High-Speed Digital Imaging. Thieme; New York, NY; 2010:101–109. [Google Scholar]
- 21.Deliyski D. Laryngeal high-speed videoendoscopy In: Kendall K, Leonard R, eds. Laryngeal Evaluation: Indirect Laryngoscopy to High-Speed Digital Imaging. Thieme Medical, New York, NY; 2010:245–270. [Google Scholar]
- 22.Powell M, Deliyski DD, Zeitels SM, et al. Efficacy of videostroboscopy and high-speed videoendoscopy to measure change in vocal-fold vibratory function before and after phonomicrosurgery in patients with mass lesions. J Voice (in Press). [Google Scholar]
- 23.Naghibolhosseini M, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF. Temporal Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech. J Voice. 2018;32(2):256–e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kobler JB, Zeitels SM, Hillman RE, Kuo J. Assessment of vocal function using simultaneous aerodynamic and calibrated videostroboscopic measures. Ann Otol Rhinol Laryngol. 1998;107(6):477–485. [DOI] [PubMed] [Google Scholar]
- 25.Mehta DD, Deliyski DD, Zeitels SM, Zañartu M, Hillman RE. Integration of transnasal fiberoptic high-speed videoendoscopy with time-synchronized recordings of vocal function. ePhonoscope. 2015:105–114. [Google Scholar]
- 26.Zañartu M, Mehta DD, Ho JC, Wodicka GR, Hillman RE. Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: a case study. J Acoust Soc Am. 2011; 129(1):326–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bayer BE. Color imaging array. 1976.
- 28.Atherton TJ, Kerbyson DJ. Size invariant circle detection. Image Vis Comput. 1999; 17(11):795–803. [Google Scholar]
- 29.Yuen HK, Princen J, Illingworth J, Kittler J. Comparative study of Hough transform methods for circle finding. Image Vis Comput. 1990;8(1):71–77. [Google Scholar]
- 30.Duda RO, Hart PE. Use of the Hough Transformation to Detect Lines and Curves in Pictures.; 1971. [Google Scholar]
- 31.Ballard DH. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit. 1981; 13(2): 111–122. [Google Scholar]
- 32.Dougherty ER, Lotufo RA Hands-on Morphological Image Processing. Vol 59 SPIE press; 2003. [Google Scholar]
- 33.Ghasemzadeh H, Tajik Khass M, Khalil Atjmandi M, Pooyan M. Detection of vocal disorders based on phase space parameters and Lyapunov spectrum. Biomed Signal Process Control. 2015;22:135–145. doi: 10.1016/j.bspc.2015.07.002 [DOI] [Google Scholar]
- 34.Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. Cengage Learning; 2000. [Google Scholar]
- 35.Hamburg MA, Collins FS. The path to personalized medicine. N Engl J Med. 2010;363(4):301–304. [DOI] [PubMed] [Google Scholar]
- 36.Neal ML, Kerckhoffs R. Current progress in patient-specific modeling. Brief Bioinform. 2009; 11(1): 111–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Roy N, Barkmeier-Kraemer J, Eadie T, et al. Evidence-based clinical voice assessment: a systematic review. Am J Speech-Language Pathol. 2013. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Video S1 presents a simulation of the principle behind the laser-projection system and how the vertical distance is encoded. In all images, the blue diamonds denote the positions of the laser points when the target surface is flat and equidistant from the camera, whereas the red circles denote the actual positions of the laser points as the target surface gets deformed. The image on the left shows the actual 3D structures of the target surface superimposed with the laser points. The image on the right shows the 2D images that the camera would capture. Two important observations can be made from these images. First, the direction of the displacement encodes whether the distance to the camera is increasing or decreasing. Second, the magnitude of the displacement encodes the changes along the vertical direction.











