Abstract
Accurately imaging the spatial distribution of longitudinal speed of sound (SoS) has a profound impact on image quality and the diagnostic value of ultrasound. Knowledge of SoS distribution allows effective aberration correction to improve image quality. SoS imaging also provides a new contrast mechanism to facilitate disease diagnosis. However, SoS imaging is challenging in the pulse-echo mode. Deep learning (DL) is a promising approach for pulse-echo SoS imaging, which may yield more accurate results than pure physics-based approaches. Herein, we developed a robust DL approach for SoS imaging that learns the nonlinear mapping between measured time shifts and the underlying SoS without subjecting to the constraints of a specific forward model. Various strategies were adopted to enhance model performance. Time-shift maps were computed by adopting a common mid-angle configuration from the non-DL literature, normalizing complex beamformed ultrasound data, and accounting for depth-dependent frequency when converting phase shifts to time shifts. The structural similarity index measure (SSIM) was incorporated into the loss function to learn the global structure for SoS imaging. A two-stage training strategy was employed, leveraging computationally efficient ray-tracing synthesis for extensive pretraining, and more realistic but computationally expensive full-wave simulations for fine-tuning. Using these combined strategies, our model was shown to be robust and generalizable across different conditions. The simulation-trained model successfully reconstructed the SoS maps of phantoms using experimental data. Compared with the physics-based inversion approach, our method improved reconstruction accuracy and contrast-to-noise ratio in phantom experiments. These results demonstrated the accuracy and robustness of our approach.
Index Terms: Speed of sound imaging, Pulse echo, Deep learning, Time shift
I. INTRODUCTION
Accurate imaging of the spatial distribution of the longitudinal speed of sound (SoS) holds substantial value in enhancing both image quality and diagnostic effectiveness in medical ultrasound [1]–[7]. Knowledge of the SoS distribution enables effective aberration correction, significantly improving the overall quality of ultrasound images [3], [4]. SoS is also an important material property that changes with disease processes, as in liver steatosis [5], [6] and muscle loss [7]. Clinical success has particularly been demonstrated using transmission-based ultrasound computed tomography (USCT) and with high resolution [1], [2], [8]. However, USCT requires a rigid configuration of transducer elements surrounding the organ to be imaged and often requires a water bath [1], [2], [8], [9]. This limits the use of USCT primarily to breasts and demands a specialized heavy system. There is a need to develop a SoS imaging approach in the pulse-echo mode that can be performed from a single side of the human body [10]–[12]. Such a pulse-echo-based approach not only enables imaging of organs inside the human body, but also enhances accessibility, mobility, and cost-effectiveness through compatibility with hand-held transducer probes.
To pursue SoS imaging in the pulse-echo mode, various methods based on physics modeling have been proposed. Assuming a homogeneous SoS distribution, investigators developed methods that estimate the SoS value by maximizing the signal coherency of pulse echoes [6], [13], [14]. While these methods show promise in liver fat quantification [6], they do not provide the capability of resolving inhomogeneous SoS distribution. To relax the homogeneity assumption, methods were developed in [15]–[17] that consider an imaged medium with horizontal layering. Furthermore, methods aiming at an arbitrary SoS distribution were developed in [11], [18], [19], where SoS distribution is calculated from the apparent shifts of the echo signals between two different wave propagation directions. The apparent shifts are typically computed as phase shifts based on radiofrequency (RF) data [11], [18], [20] or speckle displacements based on B-Mode images [12], [21], [22]. These methods invert a linear model that relates apparent shifts to tissue SoS, assuming straight rays. The solution to this linear model is achieved with frequency-domain reconstruction [11], [23], spatial-domain reconstruction [12], [18], [24], and guidance of manually segmented B-mode images [20]. Adaptations have been explored to accommodate convex probes [25], [26] and diverging waves [21], [22]. These methods offer a solution facilitated by the straight ray assumption while neglecting the diffraction and refraction caused by SoS variation. Effects caused during signal processing, including beamforming and phase computation, are also yet to be considered. It is conceivable to derive a more accurate model to improve SoS reconstruction by considering further details about the wave physics; however, the forward model would be challenging to build due to the complexity of physics involved, and the inverse problem would be time consuming to solve as the model complexity increases. The design of an accurate forward model is further challenged by potentially unknown factors contributing to the model.
Deep learning (DL) has been investigated as an alternative approach for SoS imaging[10], [27]–[30]. Supported by the universal approximation theorem, DL has the potential to more accurately reconstruct SoS than physics-based non-DL approaches, as deep neural networks may be able to implicitly model the complex physics as well as factors that are not considered in an explicit non-DL inversion model. Moreover, the DL method is fast in the inference stage and still fulfills the real-time imaging requirement while implicitly modeling the complex physics. Several network architectures were explored for SoS imaging, including auto-encoder [10], [29], linked auto-encoder [28], generative adversarial network [30], and variational network [27].
While DL methods have promising potential to achieve high SoS reconstruction accuracy, supervised training of DL models is challenged by the lack of a large amount of training data with ground truth SoS images. Simulations are often required to generate training data. However, the realistic simulations of ultrasound RF signals are extremely challenging due to the dependence of RF signals on numerous factors such as ultrasound system settings (e.g., center frequency and transducer response function) and tissue properties (e.g., attenuation). Therefore, DL models that use RF data (e.g., channel data) as the input face challenges in generalizability. Instead of using the RF data as the input, Bernhard et al. [27] used the speckle displacements (a form of apparent signal shifts) as the input to the DL model, which effectively reduced the gap between simulations and real data because the speckle displacement is less dependent on the system settings than the RF signals. Their approach also employed multidomain simulations to further reduce the gap between simulations and real data. However, their approach still relies on an explicit physics-based forward model, and variational networks were used to numerically solve the inverse optimization of a given straight-ray-based forward model. Specifically, variational networks were used to unroll the iteration loops of gradient descent optimization for the inverse problem, and an explicit forward model transforming the SoS to the speckle displacements was required for the initialization of loop unrolling and computation at each layer of the variational network.
Inspired by the prior progress in the field discussed above, herein we develop a robust DL approach for SoS imaging that does not require an explicit physics-based forward model during inference, thus avoiding potential limitations posed by explicit modeling assumptions. Unlike prior work that formulated SoS imaging as an inverse optimization problem and constructed a network to learn the individual gradient descent steps of the optimization [27], we reformulate the problem as an image-to-image translation DL problem that directly learns the mapping between the measured time shifts (a form of apparent shifts) and the SoS without an explicit forward model. We demonstrate that the popular U-Net architecture [31] originally developed for image segmentation is effective for converting time-shift maps to SoS maps. With this DL formulation, we incorporated the structural similarity index measure (SSIM) [32] into the loss function to help preserve structural information in the SoS images. The training process is split into pretraining with computationally efficient ray-tracing-based data to extensively pretrain the model, and fine-tuning with more realistic full-wave simulation data to finalize the model.
In the context of DL, we incorporated and extended the advances from non-DL SoS imaging. Specifically, we transform raw echo data into time-shift maps, which is demonstrated to improve the robustness of the DL model. To ensure high quality input into the DL model, we adopt from the non-DL literature the common mid-angle (CMA) configuration [18] to enhance the quality of time-shift computation. To account for frequency downshift with depth, we consider the depth-dependent frequency when converting phase shifts to time shifts. Additionally, we normalize the RF data during phase shift computation to reduce the influence of large local amplitude variations (e.g., specular reflections). The effectiveness of our approach is demonstrated in simulation and phantom experiments. The implementation of our method is publicly available at: https://github.com/haotian-c/SoS_time_shift_DL.
II. Methodology
The overview of our proposed SoS imaging approach is illustrated in Fig. 1. The ultrasound transducer transmits a series of plane waves to insonify the medium from various angles. Upon receiving channel data, CMA beamforming is performed to obtain frames of beamformed RF data, where the phase shifts in the RF signals between different transmit (TX) angles are computed and translated to time shifts that consider the depth dependence of center frequency due to attenuation. The time-shift maps contain rich information on the SoS distribution and serve as the input to a U-Net model that learns the mapping between the measured time-shift maps and the underlying SoS.
Fig. 1.
Schematic diagram of the proposed method for SoS imaging: Channel data are received after transmitting a series of plane waves of various steering angles. The data undergo common mid-angle delay and sum (DAS) beamforming and phase-shift computation to generate time-shift maps, which serve as input to a U-Net model that learns to map the measured time-shift maps to the underlying SoS. The model is pretrained using ray-tracing synthesized data and fine-tuned using full-wave simulation data, and the loss function combines information from both the mean square error (MSE) and the structural similarity index measure (SSIM).
A. Time-shift maps as input
Time-shift maps are used as DL input in our approach to enhance DL generalizability and robustness. This choice was adopted because non-DL SoS approaches [18], [21], [22], [24], [26] have shown that such apparent shift data offer a measure of aberration that is independent of the waveform data, and DL approaches are expected to benefit from the same feature. In this subsection we summarize the theory related to time-shift maps and SoS imaging.
In principle, SoS distribution determines the travel time of sound waves and is therefore reflected in certain time-domain measurements. Consider a scatterer at location and an acoustic pulse that takes a round-trip path to reach the scatterer and return. The total time of flight can be expressed as [18]:
| (1) |
where denotes a location on the path , denotes the SoS at that location, and is an infinitesimal line element along the path. Following convention in the SoS literature, we define slowness . Equation (1) shows that is location dependent. However, the received channel data are a mixture of backscattered signals from various locations. To relate signals to specific locations, delay-and-sum (DAS) beamforming is often performed with an assumed SoS value . When the actual SoS deviates from this assumed value, the true time of flight for a scatterer at location deviates from the expected time of flight computed using . This difference, known as the aberration delay, is given by [18]:
| (2) |
where is the slowness derivation, which represents the deviation of slowness from the assumed value .
The aberration delay depends on the specific propagation path. As a result, an apparent local phase shift may be observed between a pair of beamformed RF signals obtained from different paths, for example, signals with different TX angles (defined during transmission) or different receive (RX) angles (defined during beamforming). The measured phase shift is related to the SoS by [18]:
| (3) |
where is the center frequency estimated from echo data, and are the TX and RX angles of the -th propagation path , and , as shown in Fig. 2. The denominator accounts for the angle between each propagation path and the mid angle, assuming both round-trip paths share the same mid angle.
Fig. 2.
Illustration of the physics modeling that calculates aberration delay from pulse propagation paths and considers the projection angle.
Equation (3) shows that the measured phase shift is dependent on the center frequency . Under the same SoS distribution, a higher center frequency leads to faster signal oscillation, causing a larger phase shift. To avoid this issue, we convert the phase shift to time shift. Additionally, exhibits a depth-dependent downshift due to attenuation, which is considered in the phase-to-time conversion in our approach, where the time shift is computed from the phase shift by:
| (4) |
In this study, the center frequency was estimated from RF data beamformed with a mid-angle of . This mid-angle produces axial oscillations in the beamformed RF data, facilitating accurate frequency estimation along each axial line. A sliding window was applied along the depth, and the frequency spectrum within each window was computed using fast Fourier transform (FFT). The center frequency at each depth was computed as the spectral centroid of the windowed signal.
The task of SoS imaging involves reconstructing SoS values from these time-shift values. This reconstruction is typically achieved by inverting the forward model (Equation (3)) and assuming a straight-ray propagation path for and , which is described as follows.
Ray-tracing modeling:
Under the straight-ray assumption, time shift can be modeled as a linear summation of slowness deviation along a straight ray path. With this linearity, the relationship between time shift and slowness deviation can be expressed as a matrix multiplication [18]:
| (5) |
where vector contains a total of vectorized time shift maps, with each map having pixels. Matrix describes a forward model that relates measured to the underlying , which denotes the vectorized slowness deviation map. can be obtained by inverting the forward model with regularizations. The SoS map is obtained by reshaping and converting from slowness deviation values to SoS values.
DL:
While the ray-tracing model offers a simplified solution, in reality, the relationship between time shift and SoS is more complicated because: 1) propagation of sound waves does not necessarily follow a straight path, and involves refraction, diffraction, and interference; and 2) processing of echoes involves beamforming and phase-shift computation that have combined effects on the resulting time-shift measurement. Sources contributing to the latter complexity may include velocity-depth ambiguity due to incorrect beamforming SoS values, incorrectly assumed steering angle in beamforming, and decorrelation due to strong aberration or the presence of signal noise.
Effects from these two sources contribute to a more realistic forward model , describing the relationship between measured time shift maps (represented by a 3-D tensor) and the underlying SoS map :
| (6) |
Accurate formulation of remains a challenging task. Inverting is also computationally expensive due to nonlinear components, e.g., the SoS-dependent refraction, hampering the potential for real-time application. Owing to these difficulties in forward modeling and inversion, our paper proposes to learn the reconstruction function directly using DL:
| (7) |
where denotes a trained neural network.
Upon receiving channel data, we performed beamforming and phase shift computation to obtain . In the context of SoS imaging, it is challenging to accurately compute phase shift, due to an artifact introduced by changing TX angles [18], [19], [23]. The artifact arises from the rotated oscillation direction of beamformed RF when TX angles are changed. To address this issue, our method utilized a CMA beamforming approach [18], [23], [33] that aligns the oscillation direction of the RFs. This approach results in high accuracy for phase-shift computation [18]. Furthermore, we improved phase-shift computation for SoS imaging by considering potential large local amplitude variations in echo signals. The detailed process is presented in Section II-B.
B. Beamforming and phase-shift computation
This section describes how to obtain the time-shift maps to be used as the input to the DL model. The detailed process is illustrated in Fig. 3 and is extended from the principles described in [18].
Fig. 3.
Illustration of the ultrasound data acquisition and processing steps to yield the time-shift maps.
Wave transmission.
A sequence of 71 steered plane waves were transmitted. The sequence covered TX angles −17.5° to 17.5° with a 0.5° interval. Following each wave transmission, echoes were recorded in all transducer elements. The 71 transmissions led to 71 frames of channel data.
Beamforming.
Each frame of channel data was beamformed using the DAS algorithm with dynamic receive focusing with a constant f-number (=3). Importantly, beamforming was performed using a CMA approach [18], that is, for a pre-specified mid-angle defined as
| (8) |
where the RX angle is defined as the angle of the line connecting the beamforming pixel and the central element of the RX aperture. In practice, given and , Eq. (8) determines the RX angle , which then determines the central element of the RX aperture. The width of the RX aperture is determined by the f-number.
To enrich the information available in the final time-shift maps to be used for DL, we performed beamforming using multiple mid-angles following the non-DL literature [18]: , and . Specifically, each of the 71 channel data frames was beamformed using . The channel data frames obtained with were additionally beamformed using , and the channel data frames obtained with were additionally beamformed using . This process resulted in three groups of beamformed RF frames, each obtained from a separate mid-angle.
After beamforming, coherent plane wave compounding was performed to reduce clutter. Compounding was performed by summing the neighboring frames (TX angle difference ) within the same mid-angle . Finally, a beamformed RF frame , represented as passband analytic signals, was obtained for each combination of and .
Phase shift computation.
The phase shift was computed using the zero-lag cross-correlation of a pair of beamformed RF frames after normalization. First, each RF frame was normalized to yield the normalized RF frame by
| (9) |
where was used to ensure numerical stability and prevent division by zero. The normalization was performed to avoid the influence of RF signal amplitude variations due to factors such as large specular reflections.
Second, the zero-lag cross-correlation was computed by
| (10) |
where LPF represents low-pass filtering, which is part of the cross-correlation operation and filters out speckle noise [23], [34]. LPF was implemented by convolution with a 2-D Hann function having a full width at half maximum (FWHM) of 2.0 mm × 2.0 mm. The operator denotes element-wise multiplication, and denotes the complex conjugate of . The phase shift was then obtained by calculating the argument of by
| (11) |
For each mid angle, the phase shifts between two beamformed images sharing the mid angle were calculated using this method. In total, three mid angles (i.e., –7.5°, 0, and 7.5°) were used, leading to three time-shift maps, as shown in Fig. 3. Time shifts were converted from the measured phase shift using Equation (4). To prevent aliasing, phase-shift computation was performed in small angular steps that were later summed up [18]. Subsequently, averaging is performed for the two time-shift maps sharing the same round-trip angles (switching TX and RX angles). This averaging helps strengthen the signal while smoothing out the noise.
C. Deep learning
1). Network architecture
U-Net [31] was used as our network architecture for several reasons: 1) The use of U-Net was informed by recent work in the literature that showed that the forward process of relating the SoS with the speckle displacement can be modeled by a 2-D convolution operation [23], [35]. It is therefore reasonable that the inverse process can also be formulated as a 2-D deconvolution problem. 2) U-Net has demonstrated promising performance in tomographic reconstruction and inverse problems in medical imaging [36], [37]. 3) U-Net’s encoder-decoder structure with skip connections enables it to capture both local and global dependencies in complicated wave propagation processes.
Our U-Net implementation is described as follows. In the encoding path, each encoding block contains two 3×3 convolutional layers and tanh activations, followed by a 2×2 max pooling layer. A dropout layer with a 0.3 dropout rate was added after each encoding block to reduce overfitting. In the decoding path, each decoding block included a 2×2 transposed convolutional layer with a stride of 2 and two 3×3 convolutional layers with batch normalization (BN) and Tanh activation. Finally, 3×3 convolutional layers were added as the output layer.
2). Loss function
The loss function was composed of 1) the mean squared error (MSE) loss that quantifies pixel-wise difference, and 2) the structural similarity index measure (SSIM) loss that quantifies structural differences:
| (12) |
where is the MSE loss, is the SSIM loss, and is a hyperparameter balancing the relative contributions of the two loss terms in the total loss function.
For each pair of U-Net-predicted SoS image and the corresponding ground-truth image , the MSE loss is defined as
| (13) |
where is the reconstructed SoS value at pixel , and is corresponding ground-truth SoS value at pixel .
The SSIM loss is defined as
| (14) |
where is computed by
| (15) |
where and are the mean pixel values of the reconstructed and ground-truth SoS maps, respectively, and and are the pixel variances of the reconstructed and ground-truth maps, respectively. The term represents the covariance between the reconstructed and ground-truth maps. The constants and were included to stabilize the division when the denominator approaches zero.
Because the network was trained using batches of samples (images), the loss function for each batch was computed by averaging the losses of all the samples within the batch.
3). Training strategy
The model was pre-trained on straight-ray data and subsequently fine-tuned using full-wave simulation data.
The use of time-shift maps as the input enables the efficient generation of a large number of straight-ray synthesized time-shift and SoS map pairs to facilitate model training because computing the time-shift map from a given SoS map using the straight-ray theory is orders of magnitude faster than generating synthetic ultrasound data from a given SoS map using full-wave simulation. However, the straight-ray computation is less accurate and less realistic than full-wave simulations. To combine the advantages of both computational methods (i.e., straight-ray computation and full-wave simulation), we divided our model training into two stages:
The pre-training stage: the model was trained with 1.28 million sample pairs computed by using the straight-ray forward model without requiring full-wave simulations, as detailed in Section II-E.
The fine-tuning stage: the model was fine-tuned with 2,000 sample pairs obtained from full-wave simulations where the time-shift maps were obtained from the simulated channel RF data for the given SoS maps.
The network was trained using the Adam optimizer with a learning rate of and a batch size of 32. The pre-training stage ran for 3 epochs, and the fine-tuning stage for 50 epochs. The total number of trainable parameters of the model was 8,654,705.
Hyperparameters of the U-Net were determined on a validation set including an additional 200 sample pairs obtained from full-wave simulations.
D. Ray-tracing synthesis
The physics model assuming straight rays is widely utilized in SoS imaging literature, resulting in a linear forward model applicable through matrix multiplication. In this study, we employed this forward model to efficiently generate time-shift maps. This process involves a linear transformation applied to SoS maps to produce corresponding time-shift maps, requiring less than 0.01 seconds to generate each map, in contrast to the over 15 minutes per map needed for full-wave simulations on a Nvidia A40 GPU. Full-wave simulations are time consuming in part because of the large number of transmissions that need to be simulated. While it is practical to generate 1.28 million sample pairs through ray-tracing computation, it is impractical to generate such a large amount of data through full-wave simulation.
To diversify the image patterns, we synthesized SoS maps from natural images in the ImageNet dataset [38]. The natural images were resized to pixels to be consistent with the neural network input. Each natural image was converted to grayscale and proportionally mapped to the SoS values ranging from 1400 m/s to 1600 m/s. Finally, the images were converted to time-shift maps (Fig. 4). A total of 1.28 million input-output pairs were synthesized for model pretraining. After this pretraining stage, the more accurate but computationally expensive full-wave simulated sample pairs were then used to fine-tune the model.
Fig. 4.
Generation of ray-tracing synthesized data. The colored natural image (a) was converted to gray scale (b), which was then converted to a SoS map (c). The time-shift map (d) was then efficiently computed from the SoS map using the straight-ray approximation.
E. Full-wave simulation
Full-wave simulation data used for fine-tuning were generated using the k-Wave toolbox [39]. A GE 9L-D transducer (GE Healthcare, Chicago, Illinois, USA) was simulated, with 192 transducer elements, a 0.23-mm pitch, and a 5.2-MHz center frequency. To reduce crosstalk, a highly absorbing medium (30 dB/cm-MHz) was placed between transducer elements in the simulation. The simulation medium was defined in two dimensions. A total of 1024×1024 grid points were simulated, including perfectly matching layers with a thickness of 20 grid points on all four sides to absorb waves. The spatial resolution of the simulation was 0.046 mm, and the temporal resolution was 5.97 ns with a Courant-Friedrichs-Lewy number of 0.2.
The SoS maps used for simulation were generated to contain randomized ellipses and layered structures, mimicking lesions and the layered tissue structure of the abdominal wall, as illustrated in Fig. 5. Specifically, the size, shape, location, and orientation of the ellipses were randomized, and the layers were randomly tilted rather than restricted to the horizontal orientation. Geometries were allowed to intersect, creating more complex structures that better mimic real tissue. The SoS values varied from 1400 m/s to 1600 m/s. The background mass density was set to be 1000 kg/m3. Scatterers were simulated in the density domain, where the mass density of the scatterers varied between −3% and +3%. To mitigate strong specular reflections and maintain simulation stability, both the SoS and density maps were smoothed to minimize sharp discontinuities, and the smoothed SoS maps were used as the ground truth for fine tuning.
Fig. 5.
Medium setup for full-wave simulation: exemplary SoS map (a) and mass density map (b).
III. Evaluation Setup
A series of studies were conducted to evaluate the proposed SoS imaging approach in a stepwise fashion, from a numerical study based on straight-ray synthesized time-shift maps to demonstrate the feasibility of reconstructing the SoS from time-shift maps using DL, to a full-wave simulation study to demonstrate the robustness of the approach, and to a phantom study to further evaluate the proposed approach.
A. Straight-ray-based numerical study
Our DL-based approach estimates the SoS distribution by performing image-to-image translation, rather than by inverting a linear forward model. Before demonstrating its potential advantages, we first evaluated whether time-shift DL could at least function as a linear problem solver. For this reason, the DL model evaluated in the straight-ray-based numerical study was a pretrained model that did not undergo fine tuning.
Given a ground-truth SoS map, the corresponding ray-tracing synthesized time-shift maps are calculated using a linear forward model [18]. By inverting the synthesized time-shift maps to a SoS map, the DL is essentially solving the linear forward problem in this case.
To intensively test the learning capabilities of DL, time-shift maps were generated from the SoS maps that had complicated patterns. These SoS maps were derived from a subset of ImageNet that was excluded from training. This data generation process was previously described in Section II-D for generating training data and now used here for evaluation. Additionally, we incorporated SoS maps from an open-access dataset, which was obtained using tools from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project [40]. This dataset provides anatomically realistic SoS patterns for validation.
Inverse crime [41] can occur if reconstruction is performed using time-shift maps synthesized from SoS maps at the same resolution (73×89 pixels), leading to overly optimistic performance. To rigorously assess model robustness, time-shift maps were instead derived from higher-resolution SoS maps, which were then downsampled to 73×89. Due to variation in image sizes across data sets (e.g., averaging around 400×350 pixels in ImageNet [38]), we resized each high-resolution SoS map to to compute the time-shift map, and then downsampled the resulting map to 73 × 89. We explored several values for the scaling factor , with .
B. Full-wave simulation study
The acoustic signals captured by ultrasound transducers are influenced by a variety of acoustic properties and pulse transmit settings. A robust DL model is supposed to accurately discern the SoS distribution from other coinciding acoustic properties and imaging system settings. To assess the model’s robustness, we devised a controlled study in which the SoS maps were held fixed while other parameters were modified. A robust model is expected to produce relatively consistent outcomes. This study was conducted in simulation due to the convenience of adjusting medium properties and pulse settings in simulations.
We simulated two types of fluctuations to test model robustness: density distribution and pulse settings. All other simulation settings remain the same as those described in Section II-E.
Mass density distribution:
When generating training data, the density was set to be uncorrelated with the SoS, preventing information leakage from the SoS map. In reality, however, soft tissues may have density values positively correlated with the SoS [42]. This scenario was also simulated to test robustness.
Pulse settings may differ across transducer probes. Even for the same transducer, parameters such as the center frequency are often tuned to optimize image quality. Robustness against center frequency change was tested.
As a results, we evaluated the reconstruction performance for three simulation conditions:
Original Condition was the condition for model training, with a 5.2-MHz center frequency and the mass density being uncorrelated with SoS.
Altered Condition 1 had an altered density assumption, with mass density linearly correlated with SoS.
Altered Condition 2 had an altered center frequency, with the center frequency changed from 5.2 MHz to 4 MHz.
Full-wave simulations were performed at the Original Condition as well as altered conditions on 50 testing samples. This approach allowed evaluating the model’s ability to discern SoS distribution while being robust to coinciding changes in acoustic properties and imaging system settings.
We compared the performances of three reconstruction methods: 1) physics-based inversion that applies linear model-based inverse optimization on the time-shift maps to yield the SoS map, 2) DL using channel data as the input, and 3) DL using time-shift maps as the input. The characteristics of these three methods are summarized in Table I.
TABLE I.
Summary of the reconstruction methods
| Method | Input | Requires Explicit Forward Model? | Model Linearity |
|---|---|---|---|
| Physics-based inversion | Time-shift maps | Yes | Linear |
| Channel-data DL | Channel data | No | Non-linear |
| Time-shift DL (Ours) | Time-shift maps | No | Non-linear |
The first comparator, physics-based inversion, involved beamforming and phase-shift computation. The SoS estimate was obtained by solving a linear optimization problem [12]:
| (16) |
which yielded an estimate of the slowness deviation map from which the SoS map was computed. Beamforming SoS was determined for each imaging object using the iterative process described in [22]. Total variation regularization was applied, where and denote the differential operators in the axial and lateral directions, respectively. To avoid introducing directional bias in the reconstruction, and were set to be equal. The regularization strength was empirically tuned on training data based on root-mean-square error (RMSE), yielding , which were then fixed during evaluation. Equation (16) was used throughout Section III for physics-based reconstruction. Following [12], the inverse problem was solved using the MATLAB CVX toolbox [43], [44].
The second comparator, the channel-data DL method, was implemented using a U-Net backbone, and the network architecture design followed [10], [29] using convolutional encoding and decoding paths. Following [10], which used three input frames from symmetric insonification directions (left, center, right), we used three channel data frames acquired at 0°, +10°, and –10° steering angles. Each frame contained channel data of size 1723 × 192 (sample length × transducer elements). As in [10], we adopted the “middle network” design, where each input frame was independently encoded to extract directional features before being concatenated and passed into a shared encoding path for joint processing. The channel-data DL model had a total of 7,795,777 trainable parameters.
For a fair comparison, all three methods were set to have the same output size of pixels. This reconstruction view was located in the center of the input time-shift maps. This region was located at the intersection of the plane waves with different Tx angles.
C. Physical phantom study
The experimental data were acquired using the GE 9L-D linear transducer probe connected to a Vantage 256 system (Verasonics, Kirkland, WA, USA). Two tissue-mimicking phantoms were imaged: a breast phantom (Model 073) (Sun Nuclear, Norfolk, VA, USA) and an abdominal phantom (Model 057A) (Sun Nuclear). The reference SoS value of each phantom compartment was provided by the phantom manufacturer.
The phantom study was conducted to compare the three reconstruction approaches described in Section III-B. Additionally, given that the phantom experiments involved noisier, more realistic data, we conducted an ablation study to assess the contribution of various components of our time-shift DL method under realistic noise conditions. Specifically, we ablated the ray-tracing data pretraining and the SSIM loss. The ray-tracing data pretraining facilitated using time-shift maps as input enables extensive training of the network. The SSIM loss encourages the model to recognize global structural features, rather than focusing solely on local variations.
In addition, we investigated the effects of beamforming SoS values. Previous studies using physics-based inversion have shown that reconstruction quality depends on the choice of beamforming SoS [18], [26]. Therefore, we examined how beamforming SoS impacts the performance of the time-shift DL model, in comparison with physics-based inversion. The beamforming SoS values were uniformly sampled from 1400, 1450, 1500, 1550, and 1600 m/s. While some of these values were extreme and deviated significantly from the typical tissue SoS of 1540 m/s, they served as stress tests for evaluating the robustness of the proposed time-shift DL model.
IV. Results
A. Straight-ray-based numerical study
Both time-shift DL and physics-based inversion with small regularization values produced accurate SoS reconstructions for scaling factor , as shown in Fig. 6, and physics-based inversion achieved slightly higher accuracy than time-shift DL under this setting (Fig. 7, scaling factor ), which was expected because physics-based inversion used an inverse model that exactly matched the forward model. While this setting was a useful sanity check to ensure implementation accuracy, it is inappropriate for rigorous evaluation of model performance due to the risk of inverse crime.
Fig. 6.
Ground-truth SoS maps (left column) and the corresponding SoS maps reconstructed from straight-ray-synthesized time-shift maps for scaling factors of 1, 3, and 5. Two reconstruction approaches were compared, time-shift DL and physics-based inversion. Various hyper-parameter values (, , and ) were also compared for physics-based inversion. was used as a shorthand for and , which were set equal.
Fig. 7.
Reconstruction error (RMSE) versus scaling factor for time-shift DL and physics-based inversion performed using various hyper-parameter values for straight-ray-synthesized data.
A more rigorous evaluation required inspecting the results obtained from scaling factor , where the time-shift maps were generated from higher-resolution SoS maps, that is, using a forward model with a higher resolution than the inversion model. It was observed that the performance of physics-based inversion with small regularization quickly degraded as the scaling factor increased. While large regularization improved robustness, it appeared to over smooth the reconstruction, losing fine details. In contrast, time-shift DL demonstrated improved resilience to resolution mismatch, consistently yielding the lowest RMSE across all scaling factors ≥ 2 (Fig. 7). This suggested that the learned model generalized better to variations in the forward model.
By closely examining the reconstructed SoS map, it was observed that the reconstruction error manifested more in the bottom area, which was expected, because the SoS variation at a given location affects the time-shift signal in the posterior region, and the bottom pixels of the SoS map lack an adequate region beneath them to fully provide the corresponding time-shift information. A prior study using physics-based inversion also reported an artifact at the bottom [11]. To avoid the artifact, it is suggested to set the time-shift maps deeper than the SoS maps, allowing adequate incorporation of time-shift information at the bottom.
B. Full-wave simulation study
Fig. 8 exemplifies the ground-truth SoS maps and the reconstructed SoS maps, where each row corresponds to a different digital phantom. Both physics-based inversion and time-shift DL demonstrated high consistency across different conditions, whereas channel-data DL experienced apparent degradation when the simulation conditions changed between training and testing. Time-shift DL better preserved SoS contrast than physics-based inversion. This difference can be attributed to the regularization techniques applied in physics-based inversion. In particular, spatial-gradient regularization, including total-variation regularization, is commonly used in SoS imaging [12], [18] to stabilize the ill-posed inverse problem. However, this regularization penalizes spatial variations in SoS values, leading to reduced contrast. An additional observation was that physics-based inversion struggled to reconstruct strong inclusion contrasts, as observed in row 3 of Fig. 8, where the actual wave propagation deviated from the straight-ray assumption.
Fig. 8.
Ground truth and reconstructed SoS maps on full-wave simulation data. DL models were trained on the Original Condition that had a 5.2 MHz center frequency and the mass density uncorrelated with SoS. Altered Condition 1 had the altered density assumption, with mass density linearly correlated with SoS. Altered Condition 2 had altered center frequency, with center frequency changed from 5.2 MHz to 4 MHz.
Fig. 9 summarizes reconstruction errors in each simulation condition. In the Original Condition under which DL models were trained, both DL methods achieved high accuracy (i.e., low RMSE) on the test set. However, in the altered conditions, channel-data DL exhibited a large increase in RMSE, indicating a drop in generalization performance. Additionally, physics-based inversion showed large variations in RMSE across samples, suggesting high inter-sample performance variability.
Fig. 9.
RMSEs of three reconstruction methods, reported in different conditions on full-wave simulated data. Altered Condition 1 had an altered density assumption. Altered Condition 2 had an altered center frequency.
To further investigate this variability, we present in Fig. 10 RMSEs for all individual test samples and illustrate the relationship between reconstruction error (RMSE) and the within-image standard deviation (STD) of the ground-truth SoS distribution. A higher STD indicates greater SoS contrast. As shown in these plots, when SoS contrast was low, all the methods maintained low errors. The differences become more pronounced as SoS contrast increased. In higher-contrast cases, physics-based inversion exhibited higher errors across all the conditions, indicating its challenges in handling the large SoS contrast. In lower-contrast cases, physics-based inversion maintained low errors. This stratification in error distribution contributed to the inter-sample variability observed in Fig. 9. Channel-data DL performed reasonably well under the Original Condition but showed a substantial increase in RMSE under altered conditions, particularly when the density assumption was altered. Overall, time-shift DL consistently maintained low RMSE and effectively handled strong SoS contrasts, demonstrating its robustness across various conditions.
Fig. 10.
Distribution of reconstruction RMSE versus the within-image standard deviation (STD) of ground-truth SoS on full-wave simulated data under (a) Original Condition; (b) Altered Condition 1, with an altered density assumption; and (c) Altered Condition 2, with an altered center frequency. Each marker on the plots represents one digital phantom.
C. Physical phantom study
Fig. 11 presents the experimental results conducted on a tissue-mimicking abdominal phantom (top row) and breast phantom (middle and bottom rows). The abdominal phantom, from top layer to bottom layer, had SoS values of 1430 m/s for the top fat-mimicking layer, 1555 m/s for the middle muscle-mimicking layer, and 1540 m/s for the bottom liver-mimicking region as provided by the vendor. The breast phantom had SoS values of 1460 m/s for the top layer, 1520 m/s for the background of the bottom layer, and 1580 m/s for the hypoechoic lesions as provided by the vendor. Presented in Fig. 11 are (from the left to right): B-mode images, reference SoS maps created using vendor-provided SoS values, and SoS maps reconstructed using the physics-based inversion, channel-data DL, an ablated version of time-shift DL (without the pretraining step described in Section II-C-3 and without the SSIM loss described in Section II-C-2), another ablated version of time-shift DL (with pretraining but without SSIM loss), and the full time-shift DL.
Fig. 11.
Experimental results on the tissue-mimicking abdominal phantom (top row) and breast phantom (middle and bottom rows), obtained using multiple methods.
In the B-mode images, bright reflections resembling the noise present in real tissue can be observed on the breast phantom, mimicking a realistic situation to be expected in challenging real-world scenarios.
It was observed that the physics-based inversion effectively differentiated the layered structure at reconstructed SoS maps, especially for the abdominal phantom. For the breast phantom, although the inclusions were not as salient, some contrast was noticeable in the inclusion regions.
In contrast, the channel-data DL approach had challenges in distinguishing tissue-mimicking compartments.
Overall, the proposed time-shift DL exhibited the best reconstruction accuracy and the highest contrast-to-noise ratio (CNR), as summarized in TABLE II. Compared with physics-based inversion, the time shift approach reduced RMSE by 23.6% and the mean absolute error (MAE) by 40.9%, allowing more accurate reconstruction.
TABLE II.
Reconstruction performances in phantom experiments.
| Method | RMSE [m/s] | MAE [m/s] | CNR |
|---|---|---|---|
| Physics-based inversion | 21.2 | 15.9 | 0.62 |
| Channel-data DL | 76.1 | 67.6 | 0.14 |
| Time-shift DL, ablation (w/o pretraining and w/o SSIM loss) | 18.1 | 13.2 | 2.22 |
| Time-shift DL, ablation (w/o SSIM loss) | 17.5 | 10.6 | 2.23 |
| Time-shift DL | 16.2 | 9.4 | 2.68 |
The ablation study showed that time-shift DL benefited from having the pretraining and SSIM loss. Particularly, using SSIM loss yielded smoothed results in each segment, potentially because it allowed the network to learn structural information.
Fig. 12 presents the reconstructed SoS maps using different beamforming SoS values, with the corresponding reconstruction errors shown in Fig. 13. The performance of physics-based inversion degraded as the beamforming SoS deviated from true SoS, and the degree of degradation increased as the beamforming SoS error increased. Time-shift DL showed a reduced dependency on the beamforming SoS. However, time-shift DL did not eliminate the influence of beamforming SoS. This result was expected because both physics-based inversion and time-shift DL used beamformed RF data in the process and thus were subject to the influence of beamforming SoS. However, time-shift DL seemed to be less subject to the beamforming SoS likely because the time-shift DL approach does not rely on an explicit straight-ray-based forward model.
Fig. 12.
Experimental results on the influence of the beamforming SoS value on SoS reconstruction in the physical phantom study: (a) physics-based inversion, and (b) time-shift DL.
Fig. 13.
Reconstruction error (RMSE) versus beamforming SoS value in the physical phantom study: A comparison between physics-based inversion and time-shift DL.
V. Discussion
This paper presents a robust DL approach for SoS imaging that learns the nonlinear mapping between measured time shifts and the underlying SoS without subjecting to the constraints of a specific forward model. By incorporating multiple strategies to enhance time-shift computation and model training, our approach achieved more accurate SoS reconstruction than physics-based inversion thanks to the nonlinear mapping capability provided by the DL model, and achieved more robust performance across varying conditions than channel-data-based DL, thanks to the reduced system dependence of time shift data and the use of advanced techniques to more accurately and effectively obtain the time shift data. Critical to the success of our approach was the adoption of CMA tracking techniques from non-DL literature that allowed us to more effectively extract SoS-related information from the raw data. Fig. 14. provides an exemplary time-shift map obtained from the breast phantom, along with the B-mode image for reference. A pattern resembling a rocket’s tail flame is observed posterior to the inclusion, revealing a high spatial affinity.
Fig. 14.
Time-shift maps obtained from the breast phantom: (a) B-mode image, (b) time-shift map along 0° mid angle, (c) time-shift map along 7.5° mid angle, and (d) time-shift map along −7.5° mid angle.
Pulse-echo SoS imaging has been a longstanding challenging task. Despite the availability of equipment (i.e., multi-element linear electronic arrays) for over half a century, the proof-of-concept study of SoS imaging has only emerged within last decade [11]. In recent years, advancements in DL and simulation software provided an opportunity for pursuing SoS imaging in a data-driven manner. Several DL approaches were proposed to reconstruct SoS maps directly from raw channel data. While these approaches showed admirable results in simulation data, they encountered a drastic performance decline in the experiment data. This generalizability issue was confirmed in our paper, through phantom experiments and a controlled simulation study. Our results demonstrated that the trained channel-data DL was susceptible to changes in testing conditions. In contrast, the time-shift DL showed better robustness.
While time-shift maps contain valuable information for inferring SoS maps, the best approach to utilize time-shift maps for SoS imaging remains an open question. In this work, we summed the time-shift maps computed over small angular steps and further averaged pairs of maps sharing the same underlying round-trip path (but with reversed TX and RX angles) to enhance signal strength before inputting them into the DL model. Alternatively, individual time-shift maps at smaller angular offsets could directly serve as DL inputs. While potentially capturing richer angular information, this approach can increase model complexity and training data requirements, highlighting a trade-off between information richness and model complexity. Other aggregation strategies may also be explored. For instance, averaging all time-shift maps corresponding to each mid-angle, as averaging retains information about aberration effects and integral SoS along propagation paths [4], [44]. These alternative strategies for utilizing time-shift maps represent interesting avenues for future exploration.
Our work has several limitations, which represent directions for future study. First, the plane wave approach used in our method may limit penetration depth, which could affect certain applications requiring deeper imaging. While plane wave imaging ensures a uniform wavefront, facilitating CMA beamforming and time-shift estimation, its energy distribution can be less effective at greater depths compared to focused beam transmissions. Future work could explore SoS imaging approaches incorporating focused beam transmissions, which would allow for greater penetration. Second, our model is developed in a linear probe and has a limited imaging view. Future extensions could involve adapting the model to a curvilinear array, which would provide an expanded view. Third, our current testing protocol does not account for probe motion. However, given the ultrafast data acquisition capabilities of the Verasonics system, as used in other studies in SoS imaging [11], [18], [24], we anticipate minimal impact from probe motion. Nonetheless, future studies will address potential motion issues to further improve the results. Fourth, we did not test our method using in vivo data due to the lack of ground-truth SoS for in vivo data. Although phantom-based testing is appropriate at this stage, future in vivo human studies are needed to assess the clinical value of the proposed method.
VI. Conclusion
The proposed DL approach for SoS imaging demonstrated superior performance in experimental data, compared with physics-based inversion and channel-data DL. Future in vivo studies are needed to further evaluate the approach and demonstrate clinical utility.
VII. Acknowledgments
The authors wish to thank MD Rizwanul Kabir, Yuanbin Zhu, and Jingyi Zuo for their assistance with ultrasound data acquisition.
This was work supported in part by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health (NIH) under Award number R21EB032638. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Contributor Information
Haotian Chen, Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
Aiguo Han, Department of Biomedical Engineering and Mechanics, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA.
Reference
- [1].Duric N, Littrup P, Poulo L, Babkin A, Pevzner R, Holsapple E, Rama O, and Glide C, “Detection of breast cancer with ultrasound tomography: First results with the Computed Ultrasound Risk Evaluation (CURE) prototype,” Med Phys, vol. 34, no. 2, pp. 773–785, 2007. [DOI] [PubMed] [Google Scholar]
- [2].Duric N, Sak M, Fan S, Pfeiffer RM, Littrup PJ, Simon MS, Gorski DH, Ali H, Purrington KS, Brem RF, Sherman ME, and Gierach GL, “Using whole breast ultrasound tomography to improve breast cancer risk assessment: A novel risk factor based on the quantitative tissue property of sound speed,” J Clin Med, vol. 9, no. 2, p. 367, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Ali R, Brevett T, Zhuang L, Bendjador H, Podkowa AS, Hsieh SS, Simson W, Sanabria SJ, Herickhoff CD, and Dahl JJ, “Aberration correction in diagnostic ultrasound: A review of the prior field and current directions,” Zeitschrift fur Medizinische Physik, vol. 33, no. 3. pp. 267–291, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Ali R. and Dahl JJ, “Distributed Phase aberration correction techniques based on local sound speed estimates,” in IEEE International Ultrasonics Symposium, IUS, 2018, pp. 1–4. [Google Scholar]
- [5].Stähli P, Becchetti C, Korta Martiartu N, Berzigotti A, Frenz M, and Jaeger M, “First-in-human diagnostic study of hepatic steatosis with computed ultrasound tomography in echo mode,” Communications Medicine, vol. 3, no. 1, p. 176, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Imbault M, Dioguardi Burgio M, Faccinetto A, Ronot M, Bendjador H, Deffieux T, Triquet EO, Rautou PE, Castera L, Gennisson JL, Vilgrain V, and Tanter M, “Ultrasonic fat fraction quantification using in vivo adaptive sound speed estimation,” Phys Med Biol, vol. 63, no. 21, p. 215013, 2018. [Google Scholar]
- [7].Sanabria SJ, Martini K, Freystätter G, Ruby L, Goksel O, Frauenfelder T, and Rominger MB, “Speed of sound ultrasound: A pilot study on a novel technique to identify sarcopenia in seniors,” Eur Radiol, vol. 29, no. 1, pp. 3–12, 2019. [DOI] [PubMed] [Google Scholar]
- [8].Wiskin J, Malik B, Natesan R, and Lenox M, “Quantitative assessment of breast density using transmission ultrasound tomography,” Med Phys, vol. 46, no. 6, pp. 2610–2620, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Natesan R, Wiskin J, Lee S, and Malik BH, “Quantitative assessment of breast density: Transmission ultrasound is comparable to mammography with tomosynthesis,” Cancer Prevention Research, vol. 12, no. 12, pp. 871–876, 2019. [DOI] [PubMed] [Google Scholar]
- [10].Feigin M, Freedman D, and Anthony BW, “A deep learning framework for single-sided sound speed inversion in medical ultrasound,” IEEE Trans Biomed Eng, vol. 67, no. 4, pp. 1142–1151, 2020. [DOI] [PubMed] [Google Scholar]
- [11].Jaeger M, Held G, Peeters S, Preisser S, Grünig M, and Frenz M, “Computed ultrasound tomography in echo mode for imaging speed of sound using pulse-echo sonography: Proof of principle,” Ultrasound Med Biol, vol. 41, no. 1, pp. 235–250, 2015. [DOI] [PubMed] [Google Scholar]
- [12].Sanabria SJ, Ozkan E, Rominger M, and Goksel O, “Spatial domain reconstruction for imaging speed-of-sound with pulse-echo ultrasound: Simulation and in vivo study,” Phys Med Biol, vol. 63, no. 21, p. 215015, 2018. [Google Scholar]
- [13].Imbault M, Faccinetto A, Osmanski BF, Tissier A, Deffieux T, Gennisson JL, Vilgrain V, and Tanter M, “Robust sound speed estimation for ultrasound-based hepatic steatosis assessment,” Phys Med Biol, vol. 62, no. 9, p. 3582, 2017. [DOI] [PubMed] [Google Scholar]
- [14].Dioguardi Burgio M, Imbault M, Ronot M, Faccinetto A, Van Beers BE, Rautou PE, Castera L, Gennisson JL, Tanter M, and Vilgrain V, “Ultrasonic adaptive sound speed estimation for the diagnosis and quantification of hepatic steatosis: A pilot study,” Ultraschall in der Medizin, vol. 40, no. 6, pp. 722–733, 2019. [DOI] [PubMed] [Google Scholar]
- [15].Ali R, Telichko AV, Wang H, Sukumar UK, Vilches-Moure JG, Paulmurugan R, and Dahl JJ, “Local sound speed estimation for pulse-echo ultrasound in layered media,” IEEE Trans Ultrason Ferroelectr Freq Control, vol. 69, no. 2, pp. 500–511, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Ali R, Hyun D, and Dahl JJ, “Application of common midpoint gathers to medical pulse-echo ultrasound for optimal coherence and improved sound speed estimation in layered media,” in IEEE International Ultrasonics Symposium, IUS, 2020, pp. 1–4. [Google Scholar]
- [17].Heriard-Dubreuil B, Besson A, Wintzenrieth F, Cohen-Bacrie C, and Thiran JP, “Refraction-based speed of sound estimation in layered media: An angular approach,” IEEE Trans Ultrason Ferroelectr Freq Control, vol. 70, no. 6, pp. 486–497, 2023. [DOI] [PubMed] [Google Scholar]
- [18].Stähli P, Kuriakose M, Frenz M, and Jaeger M, “Improved forward model for quantitative pulse-echo speed-of-sound imaging,” Ultrasonics, vol. 108, p. 106168, 2020. [Google Scholar]
- [19].Jaeger M. and Frenz M, “Towards clinical computed ultrasound tomography in echo-mode: Dynamic range artefact reduction,” Ultrasonics, vol. 62, pp. 299–304, 2015. [DOI] [PubMed] [Google Scholar]
- [20].Stahli P, Frenz M, and Jaeger M, “Bayesian approach for a robust speed-of-sound reconstruction using pulse-echo ultrasound,” IEEE Trans Med Imaging, vol. 40, no. 2, pp. 457–467, 2021. [DOI] [PubMed] [Google Scholar]
- [21].Rau R, Schweizer D, Vishnevskiy V, and Goksel O, “Speed-of-sound imaging using diverging waves,” Int J Comput Assist Radiol Surg, vol. 16, no. 7, pp. 1201–1211, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Schweizer D, Rau R, Bezek CD, Kubik-Huch RA, and Goksel O, “Robust imaging of speed-of-sound using virtual source transmission,” IEEE Trans Ultrason Ferroelectr Freq Control, vol. 70, no. 10, pp. 1308–1318, 2023. [DOI] [PubMed] [Google Scholar]
- [23].Podkowa AS and Oelze ML, “The convolutional interpretation of registration-based plane wave steered pulse-echo local sound speed estimators,” Phys Med Biol, vol. 65, no. 2, p. 025003, 2020. [Google Scholar]
- [24].Beuret S, Hériard-Dubreuil B, Martiartu NK, Jaeger M, and Thiran J-P, “Windowed Radon transform for robust speed-of-sound imaging with pulse-echo ultrasound,” IEEE Trans Med Imaging, vol. 43, no. 4, pp. 1579–1593, 2024. [DOI] [PubMed] [Google Scholar]
- [25].Sanabria SJ, Brevett T, Telichko A, and Dahl J, “Speed of sound imaging with curvilinear probes from full-synthetic aperture data,” in IEEE International Ultrasonics Symposium, IUS, 2022, pp. 1–4. [Google Scholar]
- [26].Jaeger M, Stähli P, Martiartu NK, Yolgunlu PS, Frappart T, Fraschini C, and Frenz M, “Pulse-echo speed-of-sound imaging using convex probes,” Phys Med Biol, vol. 67, no. 21, p. 215016, 2022. [Google Scholar]
- [27].Bernhardt M, Vishnevskiy V, Rau R, and Goksel O, “Training variational networks with multidomain simulations: Speed-of-sound image reconstruction,” IEEE Trans Ultrason Ferroelectr Freq Control, vol. 67, no. 12, pp. 2584–2594, 2020. [DOI] [PubMed] [Google Scholar]
- [28].Khun Jush F, Dueppenbecker PM, and Maier A, “Speed-of-sound mapping for pulse-echo ultrasound raw data using linked-autoencoders,” in Workshop on Machine Learning for Multimodal Healthcare Data, 2023, pp. 103–114. [Google Scholar]
- [29].Khun Jush F, Biele M, Dueppenbecker PM, and Maier A, “Deep learning for ultrasound speed-of-sound reconstruction: Impacts of training data diversity on stability and robustness,” Machine Learning for Biomedical Imaging, vol. 2, pp. 202–236, 2023. [Google Scholar]
- [30].Pavlov I, Prado E, Navab N, and Zahnd G, “Towards in-vivo ultrasound-histology: Plane-waves and generative adversarial networks for pixel-wise speed of sound reconstruction,” in IEEE International Ultrasonics Symposium, IUS, 2019, pp. 1913–1916. [Google Scholar]
- [31].Ronneberger O, Fischer P, and Brox T, “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 9351, pp. 234–241. [Google Scholar]
- [32].Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, 2004. [Google Scholar]
- [33].Haun MA, Jones DL, and O’Brien WD Jr., “Overdetermined least-squares aberration estimates using common-midpoint signals,” IEEE Trans Med Imaging, vol. 23, no. 10, pp. 1205–1220, 2004. [DOI] [PubMed] [Google Scholar]
- [34].O’Donnell M, Skovoroda AR, Shapo BM, and Emelianov SY, “Internal displacement and strain imaging using ultrasonic speckle tracking,” IEEE Trans Ultrason Ferroelectr Freq Control, vol. 41, no. 3, 1994. [Google Scholar]
- [35].Bezek CD, Haas M, Rau R, and Goksel O, “Learning the imaging model of speed-of-sound reconstruction via a convolutional formulation,” IEEE Trans Med Imaging, 2024. [Google Scholar]
- [36].Jin KH, McCann MT, Froustey E, and Unser M, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, 2017. [Google Scholar]
- [37].Mizusawa S, Sei Y, Orihara R, and Ohsuga A, “Computed tomography image reconstruction using stacked U-Net,” Computerized Medical Imaging and Graphics, vol. 90, 2021. [Google Scholar]
- [38].Deng J, Dong W, Socher R, Li L-J, Li Kai, and Fei-Fei Li, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, 2010, pp. 248–255. [Google Scholar]
- [39].Treeby BE and Cox BT, “k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave fields,” J Biomed Opt, vol. 15, no. 2, pp. 021314–021314, 2010. [Google Scholar]
- [40].Li F, Villa U, Park S, and Anastasio M, “2D acoustic numerical breast phantoms and USCT measurement data.” Harvard Dataverse, 2021. [Google Scholar]
- [41].Colton D. and Kress R, “Inverse acoustic and electromagnetic scattering theory: Fourth edition,” in Applied Mathematical Sciences; (Switzerland: ), vol. 93, 2019. [Google Scholar]
- [42].Mast TD, “Empirical relationships between acoustic parameters in human soft tissues,” Acoustics Research Letters Online, vol. 1, no. 2, pp. 37–42, 2000. [Google Scholar]
- [43].Grant M. and Boyd S, “CVX: Matlab software for disciplined convex programming, ver 2.1,” Available at http://cvxr.com/cvx/, 2017. [Google Scholar]
- [44].Grant MC and Boyd SP, “Graph implementations for nonsmooth convex programs,” Lecture Notes in Control and Information Sciences, vol. 371, 2008. [Google Scholar]
- [45].Jakovljevic M, Hsieh S, Ali R, Chau Loo Kung G, Hyun D, and Dahl JJ, “Local speed of sound estimation in tissue using pulse-echo ultrasound: Model-based approach,” J Acoust Soc Am, vol. 144, no. 1, 2018. [Google Scholar]














