Skip to main content
Photoacoustics logoLink to Photoacoustics
. 2026 Jan 30;48:100804. doi: 10.1016/j.pacs.2026.100804

Ultrasound-guided sound speed correction for photoacoustic computed tomography

Xuanhao Zhang a,1, Zheng Qu a,1, Bin Ouyang a, Lidai Wang a,b,
PMCID: PMC12890839  PMID: 41685117

Abstract

Photoacoustic computed tomography (PACT) reconstructs high-resolution images of various chromophores in deep biological tissue. A key to high-quality reconstruction is accurate compensation for the spatially heterogeneous speed of sound (SoS) in tissue. Existing computational methods often estimate or compensate SoS by tuning it directly in the image domain, for example by optimizing sharpness or contrast of reconstructed PA images. However, because the PA signal-to-noise ratio (SNR) decays rapidly with depth due to optical attenuation, such image-domain cues become less informative in deeper regions, limiting SoS accuracy there. Here, we present a dual-modal deep learning framework to correct the heterogeneous SoS via joint processing co-registered PA and ultrasound (US) images. We estimate the spatially varying SoS map from the US image and then fuse the SoS map with the PA image to compute a reduced-aberration photoacoustic image. This method takes advantages of the rich speckle and high SNR in the co-registered US image – and thus can compensate for SoS with high accuracy and efficiency. We tested this method on numerical and tissue-mimicking phantoms, demonstrating cross-domain generalization. In-vivo results demonstrate that incorporation of the predicted SoS maps significantly improved PA image quality, enhancing structural detail and reducing acoustic artifacts. Via fusing the US and PA images, our method produces high-contrast PA images with significantly reduced SoS distortion and artifacts.

Keywords: Photoacoustic computed tomography, Deep learning, Speed of sound image reconstruction, Aberration correction

1. Introduction

Photoacoustic (PA) tomography is a non-invasive biomedical imaging technique that can furnish comprehensive insights into the anatomical [1], [2], [3], [4], [5], functional [6], [7], [8], [9], and molecular [10], [11], [12] attributes of deep biology tissues at high resolution. To reconstruct a PA image, photoacoustic computed tomography (PACT) commonly uses universal back-projection (UBP) or related beamforming schemes to combine time series from multiple receive elements under the assumption of a known speed of sound (SoS). Accurate knowledge of the spatially varying SoS is therefore critical for producing high-contrast, artifact-reduced PA images [13], [14], [15].

Direct measurement of the SoS map at each anatomical site is costly, time consuming, and seldom performed in practical PACT systems [13], [16], [17]. A common practice is to use one or a few fixed SoS values to approximate the SoS map [2], [6], [18], and some methods adaptively estimate such global values from PA data [19]. Although these strategies can correct low-order phase errors, they remain suboptimal in acoustically heterogeneous tissue.

Substantial effort has focused on estimating dense SoS maps directly from PA measurements. Model-based reconstruction approaches have explored joint reconstruction techniques to simultaneously estimate SoS and PA initial pressure using PA signals and acoustic wave equations in either frequency domain [20], [21], [22], [23], [24], [25], or time domain [26], [27], [28]. More recently, constrained optimization–based joint reconstruction methods have been proposed to further stabilize the simultaneous estimation of initial pressure and SoS from PA data alone [29]. In parallel, data-driven techniques have been explored to adaptively estimate SoS in complex media [30], [31], [32], [33]. Despite their promise, many such methods are computationally intensive and sensitive to noise, limiting their practicality in real-time or low signal-to-noise ratio (SNR) settings. In addition, k-Wave–based PACT simulation frameworks have been developed to generate realistic in-silico and numerical breast phantoms, providing synthetic datasets for systematically testing reconstruction and quantitative imaging strategies [34], [35], [36].

Building on this foundation, deep learning was introduced to further advance PACT reconstruction. Initial works demonstrated feasibility on in-silico phantoms and simplified setups [37], [38], [39], [40], and later deep neural networks (DNNs) showed promise in mitigating under-sampling and noise in-vivo [41], [42], [43], [44]. More recently, learning-based image reconstruction has been applied to fast 3D transcranial PACT to mitigate skull-induced aberrations and recover vascular structures from distorted measurements [45]. To more directly address SoS-related artifacts, Jeon et al. developed SegU-Net, which reduced artifacts in both simulations and in-vivo experiments [46], yet its performance remains dependent on PA image quality, limiting its broader applicability. Subsequently, Dehner et al. introduced DeepMB, a real-time PA reconstructor that enables interactive adjustment of a single global SoS value for fast, high-quality imaging [47]. However, its reliance on one global SoS value, rather than a spatially varying SoS map, can limit performance in strongly heterogeneous media.

Because ultrasound (US) typically exhibits rich speckle and high SNR, estimating dense SoS maps from US is generally more stable and accurate than from PA alone. Traditional approaches, such as those developed by Jaeger et al., laid the foundation for SoS imaging and aberration correction using multiple ultrasound transmission angles [48], followed by improved forward models and estimation pipelines [49], [50]. Recently, the use of deep learning has further advanced in this field, particularly with synthetic datasets generated through the k-Wave software. These datasets allow for the precise annotation of ultrasound signals with local sound speed information, enabling more accurate SoS predictions [51], [52], [53], [54], supporting both research and potential clinical translation.

Building upon these advancements, Shi et al. developed a DNN model that generates SoS maps from US images and then applies the conventional time-reversal (TR) method to reconstruct PA images [55]. While this compensates heterogeneous SoS, TR remains computationally expensive and can struggle with noise and background clutter. To our knowledge, a unified end-to-end architecture that estimates SoS from US and performs PA reconstruction via deep learning has not been developed.

Here we present an end-to-end deep-learning framework that (i) predicts a spatially varying SoS map from US B-mode and (ii) reconstructs a PA image conditioned on that map—enabling aberration correction without RF data or TR. We acquire co-registered US/PA frames on a clinically compatible PA–US system. A fine-tuned residual encoder–decoder in our network predicts the SoS map from the US image. From the predicted SoS range, we uniformly sample eight SoS values and beamform PA images to form an 8-image stack. The predicted SoS map is then one-hot encoded into 80 bins and fed to squeeze-and-excitation (SE) gates to modulate decoder features [56], while the 8-image PA stack is projected by a 1× 1 convolution into an 80-channel embedding. The two streams are fused to yield the final PA reconstruction. By exploiting the high SNR and rich speckle of US and explicit SoS conditioning, our method corrects phase errors, suppresses sidelobes, and reduces background noise with far lower runtime than TR method. We validate the approach on in-silico, ex-vivo, and in-vivo data, showing consistent gains in full width at half maximum (FWHM), Contrast-to-Noise Ratio (CNR), Peak Signal-to-Noise Ratio (PSNR; reported only where a reference exists—in-silico and ex-vivo phantom), and computational efficiency in heterogeneous media.

2. Methods

2.1. Deep-learning network architecture

As shown in Fig. 1, our deep-learning network has two main tasks: estimating the SoS map from the US data and fusing the SoS map with the photoacoustic image. The network uses ResU-Net as the backbone architecture for both tasks [57], [58], which consists of an encoder and decoder connected by residual connections at each level.

Fig. 1.

Fig. 1

Two-task framework: Task 1 (blue) estimates the SoS map from a US stack using a pretrained ResU-Net (ResNet-34 encoder). Task 2 (gray) reconstructs PA images by fusing the predicted SoS (one-hot, 80 ch) with PA data (DAS×8 using single SoS values from the map, then 1×1 conv → 80 ch). Down-sampled SoS features modulate decoder activations via SE blocks (GAP–FC–ReLU–FC–Sigmoid, channel-wise weighting). Inputs/outputs are labeled. ‘ch’ in the figure means ‘channel’.

In the first task, we estimate the SoS map from US images. To better exploit speckle and texture, the network takes a multi-channel input composed of the original B-mode image plus two log-compressed variants with different offsets (“gain”). Here, “gain” is the dB-domain offset used in the log mapping (not electronic amplification) and effectively defines a display window [−G,0] dB that is rescaled to [0,1]. We use two windows [Glow,0] and [Ghigh,0] with Glow<Ghigh. Glow yields a tighter window that accentuates bright structures and edges (sharper boundaries), whereas Ghighuses a wider window that lifts low-intensity echoes and reveals more subtle tissue texture at greater depths. For each group of images that shares similar acquisition and contrast statistics (e.g., one in-silico scene type, the ex-vivo phantom data, or the in-vivo data), we choose a single pair of Glow,Ghighby briefly sweeping candidate values to bracket the B-mode intensity histogram. The same pair is then used for all images in that group, and we found the results to be insensitive to the exact choice. The picking mimics clinical gain tuning and exposes the model to complementary views of the same anatomy, improving over prior single-channel or synthetic-delay inputs [27], [51], [53]. Compared to the conventional U-Net [58], our ResU-Net employs a ResNet-34 encoder (stages with 3/4/6/3 residual blocks) pretrained on ImageNet [59]. Residual connections in the network ease optimization and deepen feature extraction; transfer learning provides generic priors on edges and tissue texture that speed convergence and improve accuracy with limited data. Batch normalization and rectified linear units (ReLU) were systematically employed following each convolutional and transposed convolutional layer to promote stable and efficient training, except for the final 1× 1 convolutional layer, to stabilize the training process and introduce non-linearity.

In the second task, we fused the estimated SoS map with the PA data. We adopt an 80-channel ResU-Net variant (U-shaped), chosen for its ability to capture features from both the SoS distribution and PA artifacts. In this configuration, pretrained weights are not used because the modified 80-channel input is incompatible with standard pretrained backbones. To better align the two data modalities, the predicted SoS map was one-hot encoded into 80 discrete channels (details in Supplementary Method 9). In parallel, we beamform eight single-SoS DAS reconstructions using speeds uniformly sampled from the min–max range of the predicted SoS; these eight images are stacked and passed through a 1× 1 convolution to produce an 80-channel feature map, matched to the SoS one-hot dimensionality. The 80-channel SoS map was downsampled to match the spatial resolution at each decoder stage and fed to SE block [56], which generated channel-wise reweighting vectors. Within the SE block, Global Average Pooling (GAP) first “squeezes” each feature map into a per-channel descriptor; this descriptor is then passed through two Fully Connected (FC) layers with a ReLU and a Sigmoid to generate the channel-wise weights. These weights are finally applied to the decoder features via element-wise channel multiplication. This design encourages the network to emphasize acoustically informative channels and strengthens the fusion of SoS information with PA reconstructions.

2.2. Loss functions and training configuration

For both deep-learning tasks—estimating SoS maps and generating high-resolution PA images—the models were trained over 300 epochs with a batch size of two, employing distinct learning rates tailored to each task. Specifically, learning rates were chosen by short log-spaced range tests with validation monitoring. For the first task we tested {1×102,5×103,2×103,1×103,5×104}; for the second task we examined {1×104,5×105,2×105,1×105,5×106}. Each candidate was run for 10–15 pilot epochs, and we selected the largest rate that yielded a stable, monotonically decreasing validation loss without divergence: 2×103 for the first task and 2×105 for the second task. The lower rate in the second task reflects its greater sensitivity to step size from fusing multimodal inputs and an untrained 80-channel encoder. After selection, learning rates were kept fixed. To prevent overfitting, early stopping was implemented based on the validation loss. Training was optimized using the Adam optimizer [60] with a weight decay of 1×108, which helped stabilize updates and regularize the models. All models were developed using PyTorch [61] version 1.12 and executed on an NVIDIA GeForce RTX 4090D graphics card. GPU memory usage differed between tasks, with the first task consuming 2.7 GB and the second task 11.8 GB. For both tasks, we optimized performance using the following loss function:

L=LcharXS,Y+μLedge(XS,Y) (1)

Here, XS denotes the predicted image and Y represents the ground-truth image. The coefficient μ, which controls the relative weighting of the two loss components, was fixed at 0.05, following established practices [62]. the term Lchar is the Charbonnier loss [63]:

Lchar=XsY2+ε2 (2)

. We set the constant ε to 103 across all experiments based on empirical determination. Additionally, Ledge is the edge loss, which is formulated as:

Ledge=2(Xs)2(Y)2+ε2 (3)

where 2 represents the Laplacian operator. To prevent overfitting, we applied early stopping based on validation PSNR between the predicted image Xs and ground-truth Y. We saved the checkpoint with the highest validation PSNR and stopped training when this metric failed to improve within a fixed patience window.

2.3. In-silico dataset preparation

A dual-modal PA–US imaging system was simulated using the k-Wave toolbox [52]. Ultrasound RF data were generated from spatial distributions of SoS and density, while PA RF data were generated from scene-specific initial-pressure maps (p0), both modalities were run on the same spatial grid, time step (t), transducer model, and perfectly matched layer (PML) to ensure identical acoustic settings. Fig. 2 summarizes the distribution of our in-silico scenes. We instantiate six scenario categories: (1) Small Tissue; (2) Various Tissues (large tissue mixed with smaller tissue); (3) Small Tissue + Skin; (4) Big Tissue + Skin; (5) Various Tissues + Skin (skin over mixed large/small tissues); and (6) Multi-layers (stacked skin, big tissue, small tissue, and blood/water). Class-wise acoustic parameters are listed in Table 1 [54], [64], [65], [66], [67], [68], [69], [70], [71], [72] For example, “Big Tissue” is obtained by averaging values for skeletal muscle and heart muscle; “Small Tissue” follows heart-lumen–like parameters; and “Skin” uses typical values reported for human skin (epidermis and dermis) as the superficial soft-tissue layer. In the spatial layout, “Big Tissue” regions are modeled with simplified ellipses to capture realistic bulk mass variation, whereas the placement of “Small Tissue” regions is guided by image structures derived from breast ultrasound datasets [73]. Each scene is generated either with or without a uniform skin layer to maintain consistent boundary conditions when needed. Overall, the six scenarios are constructed from four base tissue classes—skin, small tissue, big tissue, and blood/water—with the skin-covered configurations and “Multi-layers” cases formed by stacking these classes.

Fig. 2.

Fig. 2

(a) Data-generation and reconstruction pipeline. PA branch: an initial-pressure map built from vessel masks plus simple synthetic shapes, together with the same SoS map, is supplied to k-Wave to simulate PA RF data; TR reconstruction with the true heterogeneous SoS map (TR+GT) yields the ground-truth PA image. US branch: a sound-speed map and a density map are supplied to k-Wave to simulate US RF data, which are then reconstructed with DAS beamforming to yield US B-mode images. (b) Representative in-silico scenes: Small Tissue, Various Tissues, Small Tissue-Skin, Big Tissue-Skin, Various Tissues-Skin, and Multi-Layers. For each scene we show the sound map (m/s) and the density map (kg/m³) shared by US and PA, and a representative US image; the corresponding PA initial-pressure map is generated as described in (a).

Table 1.

Characteristic parameters for speed of sound and density for different tissues.

Name μc(m/s) δc(m/s) μdensity(kg/m3) δdensity(kg/m3)
Skin 1624.0 91.8 1109.0 14.0
Small Tissues (e.g. Heart Lumen) 1578.2 11.3 1050.0 19.0
Big Tissue (e.g., Muscle, Heart Muscle) 1574.9 19.9 1085.2 29.8
Blood 1549.3 1.3 1020.0 1.0
Water 1482.3 0.0 994.0 0.0

Note: μc is the mean value of SoS when, δc is the standard deviation of SoS. μdensity is the mean value of density map, and δdenstity is standard deviation of the density map. Units: SoS (m/s), density (kg/m³).

For PA, each scene additionally includes an initial-pressure map p0 built from vessel masks in [74], augmented with simple ellipses and lines to create spatially varying absorbers. We assume homogeneous 1064-nm illumination and assign absorption primarily to the vessel masks (background negligible). At this wavelength, vessel absorption coefficients are sampled from reported ranges for oxygenated and deoxygenated blood, so that different vessels exhibit slightly different p0 amplitudes even after normalization and thus span a range of intrinsic absorber strengths. Under this homogeneous-fluence assumption, depth-dependent optical attenuation inside the object is not modeled. The p0 maps are optionally edge-smoothed and normalized, then propagated in k-Wave on the same SoS and density grids used for US, so US scattering and PA propagation share identical acoustic conditions (see Supplementary Method 3 for the p0 formulation and symbol definitions).

After generating the spatial distributions of density and SoS for US imaging, as well as the p0 for PA imaging, the corresponding RF data were generated using the k-Wave toolbox [52], as illustrated in Fig. 2(a). We denote the RF data matrices by rusRTus×M and rpaRTpa×M, where each column is the time trace recorded by one receive element, M=128, and T* is the number of time samples per A-line. In our setup, Tus=4301 and Tpa=2048. The probe was modeled as a flat 1D linear array of 128 contiguous elements with 0.3-mm lateral pitch and width (4 grid points, zero kerf), placed along the top boundary of the computational grid, centered laterally and aligned with the y-axis, and used for both transmission and reception. Both modalities were simulated under identical acoustic settings—the same spatial grid, medium maps, transducer model, PML, and Δt (thus the same temporal sampling rate, see Supplementary Method 4 for discretization and sampling details). Detailed simulation parameters (Table 2) were as follows.

Table 2.

K-wave parameter values.

k-wave parameter Parameter value
Grid size 0.075 mm
Simulation region size 30 mm×45 mm×8 mm
Speed of sound Designed using Ultrasound Dataset [73]
Media Density Designed using Ultrasound Dataset [73]
Transducer element number 128
Transducer element width 0.3 mm
CFL (Courant–Friedrichs–Lewy) number 0.3
PML thickness 20 grid size

During data generation, PA RF data in all splits (train/val/test) were corrupted with additive white Gaussian noise (AWGN) using k-Wave’s addNoise. For each frame (2048 samples × 128 channels), a target SNR was uniformly sampled between 5 and 15 dB, and addNoise was applied channel-wise to scale the zero-mean noise accordingly, yielding rpanoisy=rpa+nR2048×128. We did not add noise to US RF data, as its acquisition SNR is typically high and stable under our settings. To support both tasks—SoS estimation from US and PA reconstruction using the estimated SoS—samples from each configuration were proportionally allocated to training, validation, and testing; the per-task distributions are summarized in Table 3.

Table 3.

Sample numbers for training, testing and validation.

Small Tissue Various Tissues Small Tissue-Skin Big Tissue-Skin Various Tissues-Skin Multi-Layers Total
Training 300 300 900 600 900 600 6000
Validation 50 50 150 100 150 100 600
Testing 150 150 450 300 450 300 1800
All 500 500 1500 1000 1500 1000 6000

2.4. Experimental setup and data acquisition

2.4.1. Dual-modal US/PACT system and common acquisition settings

A co-registered ultrasound and photoacoustic tomography (US/PACT) system was used in this study (Supplementary Fig. 2 and Supplementary Method 2). A Q-switched 1064-nm Nd:YAG laser (Spectra-Physics) operated at a pulse repetition frequency of 20 Hz, and the laser output was delivered via a custom bifurcated fiber bundle to illuminate the target on both sides of the ultrasound probe. The surface fluence was set to 15 mJ/cm², remaining below the ANSI safety limit. Photoacoustic signals were detected by a 128-element linear array transducer (Verasonics L11–4V; center frequency 6.25 MHz, 96 % bandwidth) and acquired using a Verasonics Vantage research platform. For ultrasound imaging, five plane waves were transmitted at angles of −10°, −5°, 0°, 5°, and 10° with 100 μs intervals, followed by laser triggering for PA acquisition. Unless otherwise specified, the same acquisition settings were used across experiments to ensure fair comparisons.

2.4.2. Reference SoS measurement by acoustic transmission test

Material-matched reference sound speeds were measured using an acoustic transmission test with a tungsten-wire PA target, following prior work [13], [75]. Briefly, homogeneous slabs (agar, all-fat, and all-muscle) of known thickness d were placed in a water bath above a tungsten-wire reference. Under identical acquisition settings, PA A-lines from the tungsten wire were acquired with and without the slab in place. The slab sound speed cs was computed from the additional time-of-flight delay Δt introduced by the slab:

cs=(1cw-Δtd)1 (4)

where cw denotes the speed of sound in water and Δt is the measured time shift between the tungsten-wire–induced PA peaks acquired with and without the slab. Reference SoS statistics (mean and standard deviation) were computed from repeated acquisitions for each slab material.

2.4.3. In-vivo studies and ethics approvals

In-vivo validation was conducted in both animal model and human volunteer model. For the animal study, we imaged the neck of an 8-week-old male Sprague–Dawley rat (180–220 g). The rat was anesthetized with 1.2 % vaporized isoflurane and positioned supine on a 37 °C heated pad. A membrane-sealed water tank was placed over the neck with ultrasound gel for acoustic coupling. All animal procedures were approved by the Animal Ethics Committee of the City University of Hong Kong. For the human study, in-vivo US/PA imagings were performed on the forearm of healthy adult volunteers under a protocol approved by the Human Research Ethics Committee of the City University of Hong Kong. In some acquisitions, an additional superficial coupling layer was introduced to emulate acoustic heterogeneity (details provided in the corresponding Results section).

2.4.4. ROI definition and quantitative metrics

For quantitative evaluation, we computed CNR and lateral FWHM using consistent ROIs within each dataset. For CNR, a signal ROI was defined as a depth band containing visible vessels, while a background ROI was defined as a vessel-free depth band with the same lateral extent to estimate background fluctuation. The same ROI sizes and lateral span were used for all reconstruction methods for that dataset. When different SoS assumptions led to an apparent axial shift of structures, ROIs were shifted only along depth to remain centered on the same anatomical region, while keeping ROI dimensions unchanged to ensure fair comparison. CNR was computed using the standard ROI-based definition (Supplementary Method 8).

FWHM was used for quantitative evaluation of profile sharpness on a representative target that is visible across reconstructions. For each dataset, we selected a target (e.g., a point target in in-silico phantoms or a representative vessel segment in experiments) and kept the target identity and measurement location consistent across all methods. For each reconstruction, a 1D intensity profile was extracted through the local peak of the target. The profile was normalized by its peak value, and the FWHM was computed as the full width at half maximum. Depending on the experiment, the profile was taken either along the axial direction or the lateral direction. In the in-silico point-target study, we report both axial and lateral FWHM as functions of depth. In ex-vivo and in-vivo vessel experiments, we primarily report lateral FWHM, because vessel orientations and shapes can make axial width less comparable across cases.

In addition to ROI definitions for quantitative metrics, region partitioning is also required for the dual-SoS method, where a piecewise-constant SoS map is constructed for two-layer beamforming. Specifically, the dual-SoS map was constructed as a two-region, piecewise-constant approximation derived from the US image. The split boundary was defined using the most clearly delineated layer interface. For the three-layer phantom in Fig. 4(a), we used the second (most prominent) interface as the split, merging the upper two layers into one region and treating the bottom layer as the second region; the two SoS values were set as the mean SoS within each region.

Fig. 4.

Fig. 4

(a) Qualitative results. I: GT SoS map used in k-Wave; II: US B-mode (DAS beamforming from k-Wave-simulated RF data); III: SoS predicted by deep learning. IV: PA image reconstructed with DAS using the single SoS selected by the AUS (optimal c =1526 m/s); V: dual-speed DAS (CF number based,cshallow/cdeep = 1520/1480 m/s; three-layer split: second US interface; see Section 2.4.4); VI: deep-learning PA-only fusion using SegU-Net; VII: time-reversal with the predicted SoS; VIII: proposed DL reconstructor using B-mode stacks + predicted SoS (RF-free); IX: reference (time-reversal with ground-truth SoS; green boxes mark the 12 point targets). (b-e) Quantitative analysis: (b) CNR comparison across methods; (c) axial FWHM vs. depth; (d) PSNR vs Reference (Time-Reversal with Ground-Truth SoS); (e) lateral FWHM vs. depth.

3. Results

3.1. In-silico and ex-vivo phantom study of speed of sound estimation

For each ground truth SoS map, we generated 20 independent US-RF realizations by applying small RF-level perturbations while keeping the SoS map fixed, and we report errors as mean absolute error (MAE)±standard deviation (SD) across these realizations (Table 4).

Table 4.

Category-wise SoS estimation error (MAE, m/s; mean±SD) and structural similarity (SSIM; mean±SD) for our network on the in-silico test set. Statistics are computed over 20 US-RF realizations per ground-truth SoS map.

Category SoS estimation error (MAE, mean±SD) Structural similarity index (SSIM)
Metrics for Our Network when Predicting SoS map from US images Small Tissue 0.95± 0.06 0.81± 0.02
Various Tissues 1.28 ± 0.11 0.89± 0.01
Small Tissue-Skin 2.27 ± 0.18 0.78 ± 0.02
Big Tissue-Skin 1.95 ± 0.14 0.98 ± 0.01
Various Tissues-Skin 2.82 ± 0.19 0.88 ± 0.03
Multi-Layers 3.90 ± 0.49 0.79 ± 0.02

Under this protocol, our fine-tuned ResU-Net delivers accurate SoS estimation across tissue configurations. In simpler cases like “Small Tissue,” ResU-Net we utilized yields minimal error, while more complex scenarios—layered structures or skin interfaces—lead to larger deviations. The highest error is observed in the “Multi Layers” case, reflecting the greater challenge posed by abrupt acoustic transitions. Despite these variations, ResU-Net reliably preserves anatomical boundaries and reconstructs key spatial patterns of SoS, indicating strong generalization to heterogeneous media. To quantify structural fidelity, we additionally computed the structural similarity index (SSIM) between the predicted and ground-truth SoS maps. ResU-Net attains high SSIM values across all categories and outperforms U-Net and SegU-Net (see Table 4 and Supplementary Table S2), indicating that it preserves anatomical boundaries and key spatial patterns of SoS more faithfully than the baseline networks. A detailed comparison with U-Net and SegU-Net is provided in Supplementary Fig. 1, where ResU-Net consistently outperforms the other models in both spatial fidelity and quantitative accuracy.

For quantitative assessment, the mean SoS and its standard deviation were computed from ten repeated acquisitions and compared with material-specific reference SoS values measured on homogeneous slabs using the acoustic transmission test described in Section 2.4.2; these reference values are listed in Supplementary Table 3. As summarized in Fig. 3(b), our network delineates muscle–fat boundaries and accurately recovers the depth-dependent SoS with minimal bias, supporting generalization beyond the synthetic domain. The quantitative comparison is reported in Table 5.

Fig. 3.

Fig. 3

(a) In-silico test results across six categories (Small Tissue, Various Tissues, Small-Tissue–Skin, Big-Tissue–Skin, Various-Tissues–Skin, Multi-Layers). For each case the rows show the US B-mode input, the ResU-Net prediction, and the ground-truth SoS map; color bars indicate US intensity (dB) and SoS (m/s). Errors are summarized as MAE±SD over 20 independent RF realizations per ground-truth SoS map (see Table 4). (b) Ex-vivo layered bovine-steak/agar phantom. Left: specimens; middle: US B-mode; right: predicted SoS map with labeled ROIs (1–5). ROIs 3–1 and 3–2 denote the muscle and fatty regions within the third bovine layer, respectively. ROI-wise reference SoS values are obtained from homogeneous slab measurements using the tungsten-wire method (Supplementary Method 5; Supp. Table 3), and the corresponding MAE±SD are reported in Table 5.

Table 5.

ROI-wise Speed-of-Sound Evaluation on the Ex-Vivo Layered Phantom (Material-Wise Reference vs. Our Network Estimates). ‘Ground Truth SoS’ denotes the material-wise reference mean±SD from homogeneous slab measurements (Supp. Table S1). ‘Estimated SoS’ is the segment-wise mean±SD from the predicted SoS map. ‘Error’ reports bias = (estimated mean − reference mean); the uncertainty reflects the SD of the segment estimate.

Area Depth (mm) Material Ground truth SOS (m/s) Estimated SOS
(ResU-net, m/s)
Error SoS
(ResU-net, m/s)
1 20.0 – 24.9 steak(muscle) 1604.6 ± 6.2 1609.5 ± 7.9 4.9 ± 7.9
2 24.9 – 27.6 agar 1538.3 ± 1.2 1526.0 ± 9.6 12.3 ± 9.6
3–1 27.6 – 32.7 steak(muscle) 1604.6± 6.2 1593.4 ± 10.2 11.2 ± 10.2
3–2 27.6 – 32.7 steak(fat) 1502.4 ± 3.9 1494.2 ± 1.5 8.2 ± 1.5
4 32.7 – 35.4 agar 1538.3 ± 1.2 1540.0 ± 9.6 1.7± 9.6
5 35.4 – 40.0 steak(fat) 1502.4 ± 3.9 1514.8 ± 11.9 12.4 ± 11.9

3.2. In-silico phantom study of photoacoustic image reconstruction using SoS map

In the in-silico study, we first used k-Wave to synthesize co-registered US/PA data from ground-truth (GT) SoS maps for the six tissue scenarios described in Section 2.2. These data were used to train and quantitatively evaluate the SoS-prediction network. For the reconstruction experiment in Fig. 4, we additionally generated a three-layer in-silico phantom using the same k-Wave configuration but with an extended axial field-of-view (12 cm), containing twelve-point absorbers placed between 2 and 8 cm depth to enable depth-dependent FWHM and CNR analysis. Our network then firstly predicted SoS map from the US images (MAE = 3.4 m/s). We use several reconstructions for comparison: DAS with a single SoS using automatic SoS selection (AUS) via the Brenner focus function [76] (details in Supplementary Method 6.1); dual-speed DAS chosen by CF number [2] (Supplementary Method 6.2); deep-learning PA-only fusion (SegU-Net) [46]; time-reversal with the predicted SoS (TR+pred) and our method [77]. Time-reversal with the ground-truth SoS (TR+GT) served as the reference.

Fig. 4 summarizes these comparisons. As expected, TR+GT provides the sharpest reference reconstruction. Using the predicted SoS in TR (TR+pred) yields axial and lateral FWHM curves that closely track those of TR+GT, confirming that the SoS error of 3.4 m/s is small enough for a highly SoS-sensitive model-based reconstruction. Our learned method achieves resolution that is largely comparable to TR+pred—with slightly broader axial FWHM at some depths but similar or even smaller lateral FWHM in others—while consistently providing higher CNR (Fig. 4b) and PSNR relative to TR+GT (Fig. 4d). All bars are reported as mean±SD, computed from 10 independent re-simulations in which we add AWGN of the same power to the PA RF data and re-run the k-Wave pipeline (US RF remains noise-free).

Overall, these in-silico results indicate that a reasonably accurate SoS predicted from US is sufficient for effective acoustic correction, and that our method achieves a favorable trade-off: resolution close to TR+pred and TR+GT, but with improved CNR and substantially lower computational cost than physics-based TR (see Section 3.6 and Supplementary Method 8 for an overall runtime comparison), while providing markedly better image quality than classical DAS beamforming.

3.3. Ex-vivo tissue phantom study for enhanced PA reconstruction using SoS map

To further validate our method, we performed an ex-vivo study with a chicken-breast slab covering a 90 μm tungsten wire (Fig. 5a). The reference image (Fig. 5i) was acquired after removing the tissue, providing a distortion-free result. The chicken breast SoS was measured as 1539.8±1.47m/s using the acoustic transmission test in Eq. (4) with six samples with different thicknesses. The predicted SoS map (Fig. 5c) agrees closely with the measured value 1542.8±3.67m/s using six images, supporting accurate aberration correction in heterogeneous tissue. Fig. 5 compares PA reconstructions (d) single-SoS AUS (1540 m/s), (e) dual-speed (CF based), (f) deep-learning PA-only fusion (SegU-Net), (g) time-reversal with the predicted SoS map, and (h) our proposed method that uses the predicted SoS map. Consistent with the in-silico study, our method (Fig. 5h) best localizes the wire with higher contrast and fewer sidelobes than single/dual-speed beamforming (Fig. 5d–e). The TR+pred result (Fig. 5g) improves focusing but remains noisier, while SegU-Net (Fig. 5f) exhibits poorer axial sharpness and occasional spurious double-peaks.

Fig. 5.

Fig. 5

Ex-vivo chicken-breast experiment. (a) Schematic showing the measurement plane. (b) B-mode US image of the region (scale bar: 5 mm). (c) Predicted SoS map (m/s). PA reconstructions with different methods: (d) PA image reconstructed with DAS using the single SoS selected by the AUS (optimal c =1540 m/s). (e) dual-speed based on CF number (cshallow/cdeep = 1500/1550 m/s), (f) deep-learning PA-only fusion using SegU-Net, (g) time-reversal using the predicted SoS map, (h) proposed deep-learning method with the predicted SoS map, and (i) reference image acquired after removing the tissue (tungsten-wire only). (j) CNR comparison (dB) of the wire target across methods. (k) gain-normalized PSNR (dB) against the reference wire image within the common ROI. (l) Lateral profiles through the wire (peak-normalized) with FWHM reported in the legend; dashed line denotes half-maximum. Mean±SD for (j-l) are computed from 6 repeated acquisitions.

For quantitative analysis, we extracted lateral profiles through the wire and computed FWHM (Fig. 5l), CNR (Fig. 5j) and PSNR (Fig. 5k). The mean values and standard deviations were computed from 6 repeated acquisitions. Numerically, lateral profiles through the wire (Fig. 5l) show our method has narrower FWHM than other methods. The CNR bar chart in Fig. 5(j) indicates our method achieves the highest CNR. PSNR in Fig. 5(k) shows only minor differences across methods. Since PSNR is computed in a cropped ROI after only a tiny lateral shift and a single global gain (Supplementary Method 7), it primarily reflects structural MSE around the single wire target; hence the methods yield similar PSNR.

Collectively, in this ex-vivo phantom with a ground-truth reference, our method achieves higher CNR while maintaining FWHM comparable to TR+pred. Compared with SegU-Net, our method produces sharper vessel depiction with comparable CNR. Overall, it provides the best balance of resolution and contrast among the five methods, demonstrating effective correction of SoS-induced aberrations and improved PA image fidelity.

3.4. In-vivo prediction for internal jugular vein (IJV)

The in-vivo study protocols and ethics approvals are described in Section 2.4.3. Fig. 6 summarizes the reconstruction results across comparison methods with our method.

Fig. 6.

Fig. 6

(a) Schematic of the in-vivo rat experiment with the imaging plane indicated. (b) B-mode US image of the jugular-vein region. (c) SoS map predicted by the deep-learning network (d) PA image reconstructed with DAS using the single SoS selected by the AUS (optimal c =1514 m/s). (e) Dual-speed DAS reconstruction, where the two speeds are chosen by the CF number (cshallow/cdeep = 1485/1580 m/s). (f) deep-learning PA-only fusion using SegU-Net. (g) Time-reversal reconstruction driven by the predicted SoS map (TR+pred). (h) Our method using the predicted SoS map. In (d–h), dashed boxes mark two vessels (Line 1 and Line 2) used for quantitative evaluation; scale bar, 3 mm. (i) CNR comparison across methods (bar chart) measured within the field. (j,k) Lateral FWHM (bar charts) for Line 1 and Line 2, respectively. Error bars in (i-k) denote SD computed from 6 images.

From the B-mode US image (Fig. 6b), our network predicts a spatially varying SoS map (Fig. 6c) that is then fused with the PA image stack. We compared: AUS (Fig. 6d), dual-speed (CF based; Fig. 6e), deep-learning PA-only fusion using (SegU-Net; Fig. 6f), time-reversal driven by the predicted SoS (TR+pred; Fig. 6g), and our method (Fig. 6h). The dashed boxes in Fig. 6(d)–h mark two vessels (Line 1 and Line 2) used for quantitative analysis. The CNR comparison (Fig. 6i) shows our method achieving the highest CNR in the field, with SegU-Net a close second, while AUS and dual-speed are lowest. TR+pred trails well behind our method due to stronger background clutter.

For Line 1 in the dashed box, the upper wall exhibited composite peaks due to near-field interference (most evident with AUS). Therefore, we measured FWHM on the lower wall for Line 1 and the upper wall for Line 2 which is clean. Under this setting, our method produced the smallest FWHM at both Line 1 and Line 2, comparable to TR+pred. Taken together, our method shows sharper vessel profiles while delivering higher vessel-to-background CNR, underscoring the value of injecting a predicted SoS map from US images to enable effective aberration correction in vivo.

3.5. In-vivo imaging of human forearm

To further assess clinical generalizability, we performed in-vivo US/PA imaging on a healthy adult volunteer under an approved protocol (Section 2.4.3). In this experiment, a thin, boneless chicken-breast layer was placed between the probe and the forearm to introduce superficial acoustic heterogeneity (Fig. 7a).

Fig. 7.

Fig. 7

(a) Schematic of the forearm experiment; a boneless chicken-breast slab was placed on the forearm to emulate layered soft-tissue propagation. (b) B-mode US image. (c) SoS map predicted by the deep network. (d–h) PA reconstructions: (d) single-SoS DAS selected by an autofocus approach (AUS; optimal c=1668 m/s); (e) dual-speed DAS chosen by the CF number (cshallow/cdeep = 1500/1668 m/s). (f) deep-learning PA-only fusion (SegU-Net); (g) time-reversal using the predicted SoS; (h) proposed DL reconstructor conditioned on the predicted SoS. In (d–h), dashed boxes mark two vessels (Line 1 and Line 2) used for quantification and the insets show magnified views; scale bar, 3 mm. (i) CNR comparison for the representative acquisition shown in this figure, within a common ROI after a single global gain match (cohort-level case-by-case CNR is summarized in Supplementary Fig. S3(b)). (j,k) Lateral FWHM (bar charts) for Line 1 and Line 2, respectively. Error bars in (i-k) denote SD computed from 6 repeated images for this acquisition.

From the co-registered US data we estimated a spatially varying SoS map (Fig. 7c) and generate several PA reconstructions: single-SoS DAS with AUS, dual-speed DAS (CF based), deep-learning PA-only fusion (SegU-Net), time-reversal using the predicted SoS (TR+pred), and our method (Fig. 7d–h; US image in Fig. 7b).

Visually, our method yields clearer vessel continuity and boundaries. For quantification, we extracted lateral intensity profiles along two representative vessels (Line 1 and Line 2 in Fig. 7(d); Fig. 7(j), k). To avoid bias, profiles were taken on segments locally orthogonal to the vessel wall and exhibiting a single-peaked line-spread function; we excluded the superficial, obliquely oriented segment of Vessel 2, and the upper wall for Vessel 1. Our method achieved the narrowest lateral FWHM for Line 1 and Line 2 among five methods. Therefore, even under stronger, layered SoS heterogeneity, our method preserves sharper vessels with high CNR, underscoring robust sound aberration correction.

3.6. Overall performance and computational cost

Across the in-silico phantoms, our model attains CNR on par with TR+GT SoS and higher than TR+pred while suppressing sidelobes and artifacts (Fig. 4). On ex-vivo phantom and in-vivo data, our method delivers the highest CNR while maintaining lateral FWHM comparable to TR+pred (Fig. 5, Fig. 6, Fig. 7).

For the six other in-vivo forearm acquisitions, our method achieves a mean CNR of 48.38±4.23 dB versus 33.94±1.50 dB for TR+pred, while reducing the per-frame runtime from 37.89±6.68 s to 2.54±0.06 s (≈15 × speed-up) like Supplementary Fig. S3.

CNR varies across acquisitions due to differences in probe angle/contact and vessel morphology. Supplementary Fig. S3 summarizes the per-case results across the six acquisitions, with implementation details provided in Supplementary Method 8. Together, these results confirm consistent CNR gains and a substantial reduction in computational cost for our approach compared with TR+pred.

4. Conclusions

In this study, we aim to mitigate a significant challenge of PA imaging posed by SoS aberrations and streak artifacts, which are critical factors degrading the imaging quality. We propose a novel framework that leverages the high signal-to-noise ratio and rich speckle information of co-registered US images to estimate spatially resolved SoS maps. These maps are then incorporated into the deep-learning-based PA reconstruction pipeline, enabling correction of acoustic aberrations and improvement in image fidelity. The proposed deep learning model is capable of accurately estimating SoS distributions and integrating them into the PA reconstruction process. Unlike methods such as that of Shi et al. [55], which rely on computationally intensive TR-based PA reconstructions, our approach offers a more efficient and scalable alternative.

Extensive in-silico, ex-vivo, and in-vivo studies show that, against four methods—AUS, dual-speed DAS (CF based), TR+pred, and an image-domain network without an SoS prior (SegU-Net)—our method yields sharper vascular delineation across depths/tissues, higher CNR, comparable PSNR (in-silico and ex-vivo phantom only) and substantially lower runtime than time-reversal. The integration of US-derived SoS maps successfully enhances the robustness of PA imaging, which is particularly critical for clinical applications where anatomical and acoustic heterogeneity are prevalent.

The present work is intentionally focused on soft-tissue–like heterogeneity, where SoS varies within a moderate range and density transitions are relatively smooth. In addition, our in-silico optical model assumes homogeneous 1064-nm illumination with contrast encoded through vessel-dependent absorption coefficients, without simulating depth-dependent fluence variations inside the object. While our experiments indicate that this simplified fluence model is sufficient to study the impact of SoS heterogeneity on PA reconstruction in this regime, more realistic light-transport modeling will be required to fully capture fluence distortions in highly heterogeneous tissues, such as brain imaging through the skull. Large discontinuities in both SoS and density at bone–soft-tissue interfaces can introduce strong reflections, refraction, mode conversion, and shadowing that are not yet represented in our current training data.

In future work, we plan to exploit the plug-in nature of our framework—US-based SoS estimation plus SoS-conditioned PA reconstruction—to retrain the SoS module for bone-inclusive phantoms and to combine US-driven SoS priors with more advanced acoustic and optical models specifically tailored for transcranial PACT. We expect such extensions to further widen the applicability of SoS-aware PA reconstruction to challenging neurovascular and deep-tissue imaging scenarios.

CRediT authorship contribution statement

Lidai Wang: Writing – review & editing, Supervision, Methodology, Funding acquisition, Conceptualization. Bin Ouyang: Validation, Methodology. Zheng Qu: Writing – review & editing, Visualization, Validation, Methodology, Conceptualization. Xuanhao Zhang: Writing – review & editing, Writing – original draft, Visualization, Validation, Methodology, Investigation, Conceptualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors express their heartfelt gratitude to Prof. Jiang Liu and Prof. Yan Hu from the Southern University of Science and Technology for their insightful guidance and invaluable suggestions. This work was supported in part by the Research Grants Council of the Hong Kong Special Administrative Region under grant [11104922, 11103320] and the National Natural Science Foundation of China under grant [81627805, 61805102]

Biographies

graphic file with name fx1.jpg

Xuanhao Zhang: Xuanhao Zhang is a Ph.D. student at the Department of Biomedical Engineering, City University of Hong Kong. He received Bachelor's degrees from Nottingham University. His research focuses on photoacoustic imaging and ultrasound imaging.

graphic file with name fx2.jpg

Zheng Qu: Zheng Qu is a Ph.D. student at the Department of Biomedical Engineering, City University of Hong Kong. He received Bachelor's and Master's degrees from Tianjin University. His research focuses on photoacoustic imaging and ultrasound imaging.

graphic file with name fx3.jpg

Bin Ouyang: Ouyang Bin is a Ph.D. candidate student at the Department of Biomedical Engineering, City University of Hong Kong. He received Bachelor's degrees from SUN YAT-SEN University. His research focuses on photoacoustic imaging and flexible electronics.

graphic file with name fx4.jpg

Lidai Wang: Lidai Wang received the Bachelor and Master degrees from the Tsinghua University, Beijing, and received the Ph.D. degree from the University of Toronto, Canada. After working as a postdoctoral research fellow in the Prof Lihong Wang's group, he joined the City University of Hong Kong in 2015. His research focuses on biophotonics, biomedical imaging, wavefront engineering, ultrasonically encoded photoacoustic flowgraphy (UE-PAF), instrumentation and their biomedical applications.

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.pacs.2026.100804.

Appendix A. Supplementary material

Supplementary material

mmc1.docx (5.8MB, docx)

Data availability

Data will be made available on request.

References

  • 1.Zhang Y., et al. Video-rate dual-modal wide-beam harmonic ultrasound and photoacoustic computed tomography. IEEE Trans. Med. Imaging. 2022;41(3):727–736. doi: 10.1109/TMI.2021.3122240. [DOI] [PubMed] [Google Scholar]
  • 2.Zhang Y., Wang L. Video-rate full-ring ultrasound and photoacoustic computed tomography with real-time sound speed optimization. Biomed. Opt. Express. 2022 Jul 27;13(8):4398–4413. doi: 10.1364/BOE.464360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang L.V., Hu S. Photoacoustic tomography: in vivo imaging from organelles to organs. Science. 2012;335(6075):1458–1462. doi: 10.1126/science.1216210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Helfen A., Masthoff M., Claussen J., et al. Multispectral optoacoustic tomography: intra- and interobserver variability using a clinical hybrid approach. J. Clin. Med. 2019;8(1):63. doi: 10.3390/jcm8010063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Xia J., Wang L.V. Small-animal whole-body photoacoustic tomography: a review. IEEE Trans. Biomed. Eng. 2014;61(5):1380–1389. doi: 10.1109/TBME.2013.2283507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Li L., Zhu L., Ma C., Lin L., Yao J., Wang L., Maslov K., Zhang R., Chen W., Shi J., Wang L.V. Single-impulse panoramic photoacoustic computed tomography of small-animal whole-body dynamics at high spatiotemporal resolution. Nat. Biomed. Eng. 2017;1(5) doi: 10.1038/s41551-017-0071. 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kim J., et al. Super-resolution localization photoacoustic microscopy using intrinsic red blood cells as contrast absorbers. Light Sci. Appl. 2019;8:103. doi: 10.1038/s41377-019-0220-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang L.V., Yao J. A practical guide to photoacoustic tomography in the life sciences. Nat. Methods. 2016;13(8):627–638. doi: 10.1038/nmeth.3925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhu J., et al. Self-fluence-compensated functional photoacoustic microscopy. IEEE Trans. Med. Imaging. 2021;40(12):3856–3866. doi: 10.1109/TMI.2021.3099820. [DOI] [PubMed] [Google Scholar]
  • 10.Liu C., Chen J., Zhang Y., Zhu J., Wang L. Five-wavelength optical-resolution photoacoustic microscopy of blood and lymphatic vessels. Adv. Photonics. 2021;3(1) [Google Scholar]
  • 11.Chen J., Zhang Y., Li X., Zhu J., Li D., Li S., Lee C.S., Wang L. Confocal visible/NIR photoacoustic microscopy of tumors with structural, functional, and nanoprobe contrasts. Photonics Res. 2020;8(12):1875–1880. [Google Scholar]
  • 12.Knox H.J., Chan J. Acoustogenic probes: a new frontier in photoacoustic imaging. Acc. Chem. Res. 2018;51(11):2897–2905. doi: 10.1021/acs.accounts.8b00351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mercep E., et al. Transmission-reflection optoacoustic ultrasound (TROPUS) computed tomography of small animals. Light Sci. Appl. 2019;8:18. doi: 10.1038/s41377-019-0130-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jose J., et al. Speed-of-sound compensated photoacoustic tomography for accurate imaging. Med. Phys. 2012;39(12):7262–7271. doi: 10.1118/1.4764911. [DOI] [PubMed] [Google Scholar]
  • 15.Huang C., et al. Aberration correction for transcranial photoacoustic tomography of primates employing adjunct image data. J. Biomed. Opt. 2012;17(6) doi: 10.1117/1.JBO.17.6.066016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cai C., et al. Feature coupling photoacoustic computed tomography for joint reconstruction of initial pressure and sound speed in vivo. Biomed. Opt. Express. 2019;10(7):3447–3462. doi: 10.1364/BOE.10.003447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Belhachmi Z., Glatz T., Scherzer O. A direct method for photoacoustic tomography with inhomogeneous sound speed. Inverse Probl. 2016;32(4) [Google Scholar]
  • 18.Lin L., et al. High-speed three-dimensional photoacoustic computed tomography for preclinical research and clinical translation. Nat. Commun. 2021;12(1):882. doi: 10.1038/s41467-021-21232-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tang Y., et al. High-fidelity deep functional photoacoustic tomography enhanced by virtual point sources. Photoacoustics. 2023;29 doi: 10.1016/j.pacs.2023.100450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Huang C., et al. Full-wave iterative image reconstruction in photoacoustic tomography with acoustically inhomogeneous media. IEEE Trans. Med. Imaging. 2013;32(6):1097–1110. doi: 10.1109/TMI.2013.2254496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pattyn A., et al. Model-based optical and acoustical compensation for photoacoustic tomography of heterogeneous mediums. Photoacoustics. 2021;23 doi: 10.1016/j.pacs.2021.100275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jiang H., Yuan Z., Gu X. Spatially varying optical and acoustic property reconstruction using finite-element-based photoacoustic tomography. J. Opt. Soc. Am. A. 2006;23(4):878–888. doi: 10.1364/josaa.23.000878. [DOI] [PubMed] [Google Scholar]
  • 23.Yuan Z., Zhang Q., Jiang H. Simultaneous reconstruction of acoustic and optical properties of heterogeneous media by quantitative photoacoustic tomography. Opt. Express. 2006;14:6749–6754. doi: 10.1364/oe.14.006749. [DOI] [PubMed] [Google Scholar]
  • 24.Yuan Z., Jiang H. Simultaneous recovery of tissue physiological and acoustic properties and the criteria for wavelength selection in multispectral photoacoustic tomography. Opt. Lett. 2009;34:1714–1716. doi: 10.1364/ol.34.001714. [DOI] [PubMed] [Google Scholar]
  • 25.Matthews T.P., et al. Parameterized joint reconstruction of the initial pressure and sound speed distributions for photoacoustic computed tomography. SIAM J. Imaging Sci. 2018;11(2):1560–1588. doi: 10.1137/17M1153649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhang J., Wang K., Yang Y., Anastasio M.A. Simultaneous reconstruction of speed-of-sound and optical absorption properties in photoacoustic tomography via a time-domain iterative algorithm. InPhotons Plus Ultrasound: Imaging and Sensing 2008: The Ninth Conference on Biomedical Thermoacoustics, Optoacoustics, and Acousto-optics 2008 Feb 28 (Vol. 6856, pp. 427-434). SPIE.
  • 27.Jakovljevic M., Hsieh S., Ali R., Chau Loo Kung G., Hyun D., Dahl J.J. Local speed of sound estimation in tissue using pulse-echo ultrasound: model-based approach. J. Acoust. Soc. Am. 2018;144(1):254–266. doi: 10.1121/1.5043402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Huang C., Wang K., Schoonover R.W., Wang L.V., Anastasio M.A. Joint reconstruction of absorbed optical energy density and sound speed distributions in photoacoustic computed tomography: a numerical investigation. IEEE Trans. Comput. Imaging. 2016;2(2):136–149. doi: 10.1109/TCI.2016.2523427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jeong G., Villa U., Anastasio M.A. Revisiting the joint estimation of initial pressure and speed-of-sound distributions in photoacoustic computed tomography with consideration of canonical object constraints. Photoacoustics. 2025;43 doi: 10.1016/j.pacs.2025.100700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhang Y., Li S., Wang Y., Sun Y., Huang T., Xiang W., Li C. Iterative optimization algorithm with structural prior for artifacts removal of photoacoustic imaging. Photoacoustics. 2025 doi: 10.1016/j.pacs.2025.100726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chen S., Jing X., Li S., Yin Z., Yang H. Inversion of sound speed field in photoacoustic imaging based on root mean square propagation algorithm. Appl. Sci. 2024;14(8):3381. [Google Scholar]
  • 32.Poimala J., Cox B., Hauptmann A. Compensating unknown speed of sound in learned fast 3D limited-view photoacoustic tomography. Photoacoustics. 2024;37 doi: 10.1016/j.pacs.2024.100597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cui M., et al. Adaptive photoacoustic computed tomography. Photoacoustics. 2021;21 doi: 10.1016/j.pacs.2020.100223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chen P., Park S., Jeong G., Cam R.M., Huang H.K., Villa U., Anastasio M.A. Vol. 13319. SPIE; 2025. Benchmarking deep learning-based reconstruction in photoacoustic computed tomography with clinically relevant synthetic datasets; pp. 70–76. (Photons Plus Ultrasound: Imaging and Sensing 2025). [Google Scholar]
  • 35.Park S., Villa U., Li F., Cam R.M., Oraevsky A.A., Anastasio M.A. Stochastic three-dimensional numerical phantoms to enable computational studies in quantitative optoacoustic computed tomography of breast cancer. J. Biomed. Opt. 2023;28(6) doi: 10.1117/1.JBO.28.6.066002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gröhl J., Dreher K.K., Schellenberg M., Rix T., Holzwarth N., Vieten P., Ayala L., Bohndiek S.E., Seitel A., Maier-Hein L. SIMPA: an open-source toolkit for simulation and image processing for photonics and acoustics. J. Biomed. Opt. 2022;27(8) doi: 10.1117/1.JBO.27.8.083010. (-.) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gutta S., Kadimesetty V.S., Kalva S.K., Pramanik M., Ganapathy S., Yalavarthy P.K. Deep neural network-based bandwidth enhancement of photoacoustic data. J. Biomed. Opt. 2017;22(11):1. doi: 10.1117/1.JBO.22.11.116001. 116001-. [DOI] [PubMed] [Google Scholar]
  • 38.Awasthi N., et al. Deep neural network-based sinogram super-resolution and bandwidth enhancement for limited-data photoacoustic tomography. IEEE Trans. Ultrason. Ferroelectr. Freq. Control. 2020;67(12):2660–2673. doi: 10.1109/TUFFC.2020.2977210. [DOI] [PubMed] [Google Scholar]
  • 39.Shan H., Wiedeman C., Wang G., Yang Y. Vol. 11105. SPIE; 2019. Simultaneous reconstruction of the initial pressure and sound speed in photoacoustic tomography using a deep-learning approach; pp. 18–27. (Novel Optical Systems, Methods, and Applications XXII). [Google Scholar]
  • 40.Anas E.M., Zhang H.K., Audigier C., Boctor E.M. Vol. 2018. Springer; 2018. Robust photoacoustic beamforming using dense convolutional neural networks; pp. 3–11. (MICCAI). (Workshops) (Workshops) [Google Scholar]
  • 41.Davoudi N., Deán-Ben X.L., Razansky D. Deep learning optoacoustic tomography with sparse data. Nat. Mach. Intell. 2019;1(10):453–460. [Google Scholar]
  • 42.Hauptmann A., Lucka F., Betcke M., Huynh N., Adler J., Cox B., Beard P., Ourselin S., Arridge S. Model-based learning for accelerated, limited-view 3-D photoacoustic tomography. IEEE Trans. Med. Imaging. 2018;37(6):1382–1393. doi: 10.1109/TMI.2018.2820382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Shahid H., et al. A deep learning approach for the photoacoustic tomography recovery from undersampled measurements. Front. Neurosci. 2021;15 doi: 10.3389/fnins.2021.598693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zhang H., Hongyu L.I., Nyayapathi N., Wang D., Le A., Ying L., Xia J. A new deep learning network for mitigating limited-view and under-sampling artifacts in ring-shaped photoacoustic tomography. Comput. Med. Imaging Graph. 2020;84 doi: 10.1016/j.compmedimag.2020.101720. [DOI] [PubMed] [Google Scholar]
  • 45.Huang H.K., Kuo J., Zhang Y., Aborahama Y., Cui M., Sastry K., Park S., Villa U., Wang L.V., Anastasio M.A. Fast aberration correction in 3D transcranial photoacoustic computed tomography via a learning-based image reconstruction method. Photoacoustics. 2025;43 doi: 10.1016/j.pacs.2025.100698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jeon S., et al. A deep learning-based model that reduces speed of sound aberrations for improved in vivo photoacoustic imaging. IEEE Trans. Image Process. 2021;30:8773–8784. doi: 10.1109/TIP.2021.3120053. [DOI] [PubMed] [Google Scholar]
  • 47.Dehner C., et al. A deep neural network for real-time optoacoustic image reconstruction with adjustable speed of sound. Nat. Mach. Intell. 2023;5(10):1130–1141. [Google Scholar]
  • 48.Jaeger M., et al. Computed ultrasound tomography in echo mode for imaging speed of sound using pulse-echo sonography: proof of principle. Ultrasound Med. Biol. 2015;41(1):235–250. doi: 10.1016/j.ultrasmedbio.2014.05.019. [DOI] [PubMed] [Google Scholar]
  • 49.Stähli P., et al. Improved forward model for quantitative pulse-echo speed-of-sound imaging. Ultrasonics. 2020;108 doi: 10.1016/j.ultras.2020.106168. [DOI] [PubMed] [Google Scholar]
  • 50.Simson W., Zhuang L., Sanabria S.J., Antil N., Dahl J.J., Hyun D. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer Nature Switzerland; Cham: 2023. Differentiable beamforming for ultrasound autofocusing; pp. 428–437. [Google Scholar]
  • 51.Feigin M., Freedman D., Anthony B.W. A deep learning framework for single-sided sound speed inversion in medical ultrasound. IEEE Trans. Biomed. Eng. 2020;67(4):1142–1151. doi: 10.1109/TBME.2019.2931195. [DOI] [PubMed] [Google Scholar]
  • 52.Treeby B.E., Cox B.T. k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave fields. J. Biomed. Opt. 2010;15(2) doi: 10.1117/1.3360308. [DOI] [PubMed] [Google Scholar]
  • 53.Heller M., Schmitz G. Proceedings of the 2021 IEEE International Ultrasonics Symposium (IUS) IEEE; 2021. Deep learning-based speed-of-sound reconstruction for single-sided pulse-echo ultrasound using a coherency measure as input feature; pp. 1–4. [Google Scholar]
  • 54.Simson W.A., Paschali M., Sideri-Lampretsa V., Navab N., Dahl J.J. Investigating pulse-echo sound speed estimation in breast ultrasound with deep learning. Ultrasonics. 2024;137 doi: 10.1016/j.ultras.2023.107179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Shi M., Vercauteren T., Xia W. Learning-based sound speed estimation and aberration correction for linear-array photoacoustic imaging. Photoacoustics. 2024 Aug 1;38 doi: 10.1016/j.pacs.2024.100621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Hu J., Shen L., Sun G. Proc. IEEE CVPR; 2018. Squeeze-and-excitation networks; pp. 7132–7141. [Google Scholar]
  • 57.Zhang Z., Liu Q., Wang Y. Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018;15(5):749–753. [Google Scholar]
  • 58.Ronneberger O., Fischer P., Brox T. Vol. 9351. Springer; 2015. U-Net: convolutional networks for biomedical image segmentation; p. 2015. (MICCAI). (LNCS) (LNCS) [DOI] [Google Scholar]
  • 59.K. He, X. Zhang, S. Ren, J. SunDeep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition 2016. pp. 770-778.
  • 60.Kingma D.P.Adam. A Method Stoch. Optim. 2014 arXiv preprint arXiv:1412.6980. [Google Scholar]
  • 61.Paszke A., et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019;32 [Google Scholar]
  • 62.K. Jiang, Z. Wang, P. Yi, C. Chen, B. Huang, Y. Luo, J. Ma, J. JiangMulti-scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020. pp. 8346-8355.
  • 63.Charbonnier P., Blanc-Feraud L., Aubert G., Barlaud M. Vol. 2. IEEE; 1994. Two deterministic half-quadratic regularization algorithms for computed imaging; pp. 168–172. (Proceedings of 1st International Conference on Image Processing). [Google Scholar]
  • 64.Khun Jush F., Dueppenbecker P.M., Maier A. Proceedings of the Annual Conference on Medical Image Understanding and Analysis. Springer International Publishing; Cham: 2021. Data-driven speed-of-sound reconstruction for medical ultrasound: impacts of training data format and imperfections on convergence; pp. 140–150. [Google Scholar]
  • 65.Jush F.K., Biele M., Dueppenbecker P.M., Schmidt O., Maier A. Proceedings of the 2020 IEEE International Ultrasonics Symposium (IUS) IEEE; 2020. DNN-based speed-of-sound reconstruction for automated breast ultrasound; pp. 1–7. [Google Scholar]
  • 66.Burger B., et al. Real-time GPU-based ultrasound simulation using deformable mesh models. IEEE Trans. Med. Imaging. 2013;32(3):609–618. doi: 10.1109/TMI.2012.2234474. [DOI] [PubMed] [Google Scholar]
  • 67.Duck F. Academic press; 2013. Physical Properties of Tissues: A Comprehensive Reference Book. [Google Scholar]
  • 68.Mcintosh R.L., Anderson V. A comprehensive tissue properties database provided for the thermal assessment of a human at rest. Biophys. Rev. Lett. 2010 Sep;5(03):129–151. [Google Scholar]
  • 69.Kyriakou A. Multi-physics computational modeling of focused ultrasound therapies (Doctoral dissertation, ETH Zurich).
  • 70.Chivers R.C., Parry R.J. Ultrasonic velocity and attenuation in mammalian tissues. J. Acoust. Soc. Am. 1978;63(3):940–953. doi: 10.1121/1.381774. [DOI] [PubMed] [Google Scholar]
  • 71.Goss S.A., Johnston R.L., Dunn F. Comprehensive compilation of empirical ultrasonic properties of mammalian tissues. J. Acoust. Soc. Am. 1978;64(2):423–457. doi: 10.1121/1.382016. [DOI] [PubMed] [Google Scholar]
  • 72.Begui Z.E. Acoustic properties of the refractive media of the eye. J. Acoust. Soc. Am. 1954;26(3):365–368. [Google Scholar]
  • 73.Al-Dhabyani W., Gomaa M., Khaled H., Fahmy A. Dataset of breast ultrasound images. Data Brief. 2020;28 doi: 10.1016/j.dib.2019.104863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Chen M., et al. Simultaneous photoacoustic imaging of intravascular and tissue oxygenation. Opt. Lett. 2019;44(15):3773–3776. doi: 10.1364/OL.44.003773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Zell K., et al. Acoustical properties of selected tissue phantom materials for ultrasound imaging. Phys. Med. Biol. 2007;52(20):N475–N484. doi: 10.1088/0031-9155/52/20/N02. [DOI] [PubMed] [Google Scholar]
  • 76.Treeby B.E., Zhang E.Z., Cox B.T. Automatic sound speed selection in photoacoustic image reconstruction using an autofocus approach. J. Biomed. Opt. 2011;16(9) doi: 10.1117/1.3619139. [DOI] [PubMed] [Google Scholar]
  • 77.Treeby B.E., Zhang E.Z., Cox B.T. Photoacoustic tomography in absorbing acoustic media using time reversal. Inverse Probl. 2010;26(11) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx (5.8MB, docx)

Data Availability Statement

Data will be made available on request.


Articles from Photoacoustics are provided here courtesy of Elsevier

RESOURCES