Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jan 3;20(1):e0314944. doi: 10.1371/journal.pone.0314944

An end-to-end implicit neural representation architecture for medical volume data

Armin Sheibanifard 1, Hongchuan Yu 1,*, Zongcai Ruan 2, Jian J Zhang 1
Editor: Xiyu Liu3
PMCID: PMC11698368  PMID: 39752347

Abstract

Medical volume data are rapidly increasing, growing from gigabytes to petabytes, which presents significant challenges in organisation, storage, transmission, manipulation, and rendering. To address the challenges, we propose an end-to-end architecture for data compression, leveraging advanced deep learning technologies. This architecture consists of three key modules: downsampling, implicit neural representation (INR), and super-resolution (SR). We employ a trade-off point method to optimise each module’s performance and achieve the best balance between high compression rates and reconstruction quality. Experimental results on multi-parametric MRI data demonstrate that our method achieves a high compression rate of up to 97.5% while maintaining superior reconstruction accuracy, with a Peak Signal-to-Noise Ratio (PSNR) of 40.05 dB and Structural Similarity Index (SSIM) of 0.96. This approach significantly reduces GPU memory requirements and processing time, making it a practical solution for handling large medical datasets.

1 Introduction

Medical visualisation commonly involves volumetric medical data such as CT, MRI, PET scans, and confocal spectral microscopy images. This technique is essential in clinical practices across various biomedical disciplines, like radiology, nuclear medicine, surgery planning, and nearly all neuroscience sub-fields. However, the generated volume data often reaches enormous sizes. The generated data often becomes very large, sometimes reaching terabyte-scale. For instance, biological volumetric datasets that capture microscale details of cells or tissues are commonly produced [15]. The emerging challenges lie in organising, storing, transmitting, manipulating, and rendering such terabyte-scale volume data.

Recent advances in deep neural networks have led to their rapid application in medical imaging [68]. In particular, implicit neural representations have become an approach for compressing volumetric medical images by storing the parameters of trained neural networks instead of explicit voxel data such as SIREN [9]. However, the compression rate is often limited and volumetric data still require considerable memory, especially GPU memory. This results in high memory demands and longer training times for deep learning applications. In addition, there is currently a scarcity of research addressing these specific challenges.

To address these challenges, this paper presents an End-to-End architecture that improves compression rates and reduces GPU memory usage, based on our previous work [10]. The proposed architecture consists of three key modules: a downsampling module, an Implicit Neural Representation (INR) module, and a 3D Super-Resolution (SR) module (e.g., [11]). The downsampling module reduces data size, enabling the INR module to represent the volume using a compact deep neural network. The SR module then reconstructs the original high-resolution volume from the INR module output. This architecture reduces memory needs and allows for more efficient neural network training. The main challenge lies in achieving a high compression rate and minimal reconstruction loss. To address this, we propose a trade-off point method that optimises the configuration of each module to achieve peak performance. This approach can be generalised to a wide range of deep network designs. Our key contributions include:

  • We propose an End-to-End architecture with three computational modules, designed to optimise volumetric data compression by achieving a high compression rate while maintaining superior reconstruction quality and minimising GPU memory consumption.

  • We introduce a trade-off point method to determine the optimal configuration for the proposed End-to-End architecture, balancing key performance metrics such as compression rate and reconstruction quality.

The rest of the paper is structured as follows. Section 2 briefly reviews related work. Section 3 presents the proposed architecture and the trade-off point method. Section 4 presents experimental results and analysis. Finally, Section 5 concludes our work.

2 Background and relevant literature

In our previous work [10], we developed an architecture that leveraged existing pre-trained deep networks to decrease the volume data size. The basic idea is to transform volume data into an implicit neural network representation, such as SIREN [9], to compress the data while maintaining reconstruction accuracy. However, pre-trained deep networks often struggle to generalise well, especially with medical volume data. Many pre-trained Super-Resolution deep networks require fine-tuning for different medical datasets. A “one-size-fits-all” approach does not work, since each dataset has its own characteristics. The existing deep networks do not generalise well to diverse volume data. Therefore, this paper aims to train an end-to-end deep network, rather than simply piecing together multiple pre-trained networks.

2.1 Implicit neural representation

Representing 3D geometry for rendering and reconstruction involves trade-offs across fidelity, efficiency, and compression capabilities. The DeepSDF model [12] uses a continuous Signed Distance Function (SDF) to represent shapes. Another approach [13] employs an encoder-decoder neural architecture for lossless compression. However, this method has a high inference time due to explicit optimisation requirements.

MedZip [14] proposes a lossless compression technique employing Long Short-Term Memory (LSTM) for volumetric MRI and CT. NeRF [15] presents a notable method for synthesising new views of a volumetric scene through implicit neural representation as a continuous function. However, it is outperformed by SIRENs [9] due to its time consumption. [16] presents a 3D representation technique to reduce memory usage by predicting an occupancy function for a continuous volume. COIN [17] applies a multi-layer perceptron (MLP) to implicit neural network compression by encoding geometric inputs. However, it demonstrates inferior performance compared to state-of-the-art compression methods. INR-GAN [18] applies a GAN model to multi-scale Implicit Neural Representations (INRs) but struggles with artefacts when dealing with high-frequency features. NeRP [19] introduces a novel approach to generate a computational image from sampled sensor data. However, dealing with sparsely sampled images encounters additional hurdles due to limited data points. Unlike previous deep learning methods for image reconstruction, NeRP leverages both the internal structure of an image prior and the physics governing sparsely sampled measurements to represent the entire subject.

2.2 Super-resolution techniques

Numerous techniques leveraging convolutional neural networks (CNNs) have demonstrated exceptional performance in image super-resolution (SR). The pioneering work of SRCNN [20] introduced CNNs to SR by learning a non-linear mapping from low-resolution to high-resolution images with only three convolution layers. CNN-based methods illustrated their impressive performance in SR. Still, they became impractical when taking into account constraints on time and memory resources [2130]. SRNO [11] designed for continuous super-resolution tasks. It treats each image as a function and learns a mapping between finite-dimensional function spaces, enabling it to train and generalise across various discretisation levels. Experiments demonstrate that SRNO surpasses other arbitrary-scale super-resolution methods in terms of both performance and computational time, particularly excelling in capturing global image structures, which is important in medical imaging.

Table 1 highlights the gaps between the proposed method and four state-of-the-art models—SIREN [9], MedZip [14], NeRF [15], and COIN [17]—across several key metrics: high compression rate, low GPU memory consumption, high reconstruction quality (PSNR > 40), good visual similarity (SSIM > 0.9), scalability to large datasets, fast training time, adaptability to medical imaging, and handling high-frequency features. The proposed method addresses several limitations of existing models, particularly in achieving high compression rates and excellent reconstruction quality, while maintaining efficiency in GPU memory usage and adaptability to medical imaging tasks.

Table 1. Identifying gaps in state-of-the-art models compared to the proposed method.

Feature/Metric Proposed Method SIREN [9] MedZip [14] NeRF [15] COIN [17]
High Compression Rate
Low GPU Memory Consumption
High Reconstruction Quality (PSNR > 40)
Good Visual Similarity (SSIM > 0.9)
Scalable to Large Datasets
Fast Training Time
Adaptability to Medical Imaging
Handles High-Frequency Features Well

3 Methodology

In this section, we first present the end-to-end architecture and then introduce the trade-off point approach to evaluate the proposed architecture in terms of compression efficiency and reconstruction accuracy.

3.1 Proposed end-to-end architecture

Our end-to-end architecture, shown in Fig 1, is composed of three core modules: Downsampling, Implicit Neural Representation (INR), and Super-Resolution (SR). The Downsampling module does not require training. We need to train the INR and SR modules in an end-to-end way. We employ a L1 loss function to evaluate reconstruction quality here. In the following sections, we will explain each module individually.

Fig 1. Workflow of the proposed end-to-end architecture, including downsampling, implicit neural representation (INR), and super-resolution (SR) modules.

Fig 1

3.1.1 3D downsampling module

Given a high-resolution volume of x, this module aims to acquire its low-resolution counterpart y. The relationship between x and y can be modelled as follows,

y=FLR-1DFHRx+n (1)

where, FHR is the FFT operator for the high-resolution regime, FLR-1 is the inverse FFT operator for the low-resolution regime, D is the low-pass operator on the frequency domain, and n is the noise. Fourier Transform technique is widely employed in medical imaging [31]. We hope to point out that the operator D in the frequency domain is both controllable and easy to implement. In our case, it effectively generates low-resolution volumes at downsampling scales of ×12, ×14, and ×18. Additionally, it can be noted that this module does not need training.

3.1.2 3D implicit neural representation (INR)

The INR module harnesses the capabilities of implicit neural networks to efficiently encode volumetric data. Specifically, using INR for low-resolution volumes helps prevent memory overflow. Unlike conventional explicit representations, INRs depict the volume as a continuous function that maps spatial coordinates to voxel intensity values. This enables a concise representation that can be readily adjusted to different levels of detail. Drawing inspiration from recent breakthroughs in implicit neural representations, we employed a multi-layer perceptron (MLP) architecture with periodic activation functions (i.e., SIREN [9]) to effectively capture the intricate structures within the volumetric data.

3.1.3 3D super resolution (SR) module

The SR module employs the super-resolution model, SRNO [11]. SRNO model utilises deep learning to learn intricate transformations from low-resolution to high-resolution data. Beyond enhancing resolution, SRNO models frequently possess intrinsic denoising abilities, resulting in cleaner and clearer images. Compared to other super-resolution techniques, SRNO models can produce images with fewer artefacts, such as ringing and blurring [11]. Moreover, the number of channels in the attention structure can significantly influence the SRNO model’s performance. Thus, we regard it as a hyper-parameter of the SRNO models and evaluate the SRNO by it.

3.2 Trade-off point approach

To achieve an overall optimal performance for our proposed end-to-end architecture, we propose a metric system to measure overall performance and further determine the optimal setting for each module accordingly. This design method is called the Trade-off Point Method. Our metric system includes four measurements: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Bitrate, and Compression Rate (CR) as below. PSNR provides a measure of pixel-level accuracy by calculating the ratio of signal power to noise power, yet it often does not correspond to human visual perception. In contrast, SSIM assesses perceptual quality by comparing luminance, contrast, and structure, but may overlook precise pixel-wise errors. Recognising the limitations of using PSNR or SSIM alone for performance measurement, we combine both metrics to evaluate image quality thoroughly.

3.2.1 Metric definition

  • Peak Signal to Noise Ratio(PSNR) is a metric used to measure the quality of a reconstructed or compressed signal compared to the original signal. It is expressed in decibels (dB) and is calculated using the following formula:
    PSNR=10·log10(MAX2MSE) (2)
    where: MAX is the maximum possible pixel value of the image (e.g., 255 for an 8-bit image), and MSE is the Mean-Squared Error between the original and reconstructed images.

    A high PSNR value indicates a high-quality reconstruction, as it signifies that the reconstructed signal is closer to the original signal in terms of fidelity.

  • Structural Similarity Index Measurement(SSIM): The Structural Similarity Index Measurement(SSIM) is a metric to assess the similarity between a reference image (original) and a distorted or processed image. SSIM quantifies similarity by considering three key components: luminance, contrast, and structure. SSIM is defined as,
    SSIM(x,y)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2) (3)
    where: μx and μy are the means of the original and distorted images, respectively, σx2 and σy2 are the variances of the original and distorted images, respectively, σxy is the covariance of the original and distorted images, C1 and C2 are small constants added for numerical stability. The SSIM value ranges from -1 to 1, with 1 indicating perfect similarity. High SSIM values indicate high similarity between the images, while low values suggest more significant differences or distortions.
  • Bitrate: Bitrate is a metric used in digital imaging to quantify the amount of data assigned to each pixel in a raster image. Bpp indicates the level of detail or precision in representing colour or intensity information for each pixel. High Bpp values typically result in high image quality but large file size, while low Bpp values lead to low quality but small files. It is computed as,
    Bitrate=TotalbitsTotalpixels (4)
    In greyscale images, each pixel is represented by a single channel (e.g., luminance). Bpp is degraded as,
    Bitrate=Bitdepth1 (5)

    When compression techniques are applied, the Bitrate measures the density of the pixel value of the image to assess the trade-off between image quality and file size. High Bitrate values generally result in high-quality but large image files, while low Bitrate values lead to more aggressive compression and small files but with potential quality loss.

  • Downsampling Scale (DS): Let Dx, Dy, and Dz be the original dimensions of the 3D image stacks in a (x, y, z) coordinate system, respectively; and the new dimensions be (dx, dy, dz) after downsampling. The DS (sx, sy, sz) is defined as,
    dx=Dxsx,dy=Dysy,dz=Dzsz

    We may simply set (sx, sy, sz) identically.

  • Number of the neurons in SIREN (SN): With SIREN’s layer count set at 3, each layer contains an identical number of neurons. We adjust the neuron count per layer from 30 to 230, using this to represent SIREN’s size.

  • Number of Channels (NC): We incorporate the 3D version of SRNO into the SR module. The cornerstone of a super-resolution network lies in its feature extractor. Existing super-resolution models possess their own topologies for their feature extractors. The number of Channels indicates the feature extractor’s size, thereby reflecting the complexity of the super-resolution network. This complexity is particularly influenced by the downsampling scale within our proposed architecture, leading to a significant increase in channel numbers due to the abundance of volume data. To minimise the size of the SR module in our proposed architecture, we initially assess the performance of the SR module with different sizes of attention mechanisms and fully connected layer submodules, after which we fix the topologies and sizes of these two submodules. However, the channel number of the feature extractor remains adaptable to accommodate varying reconstruction accuracy requirements.

  • Compression Rate (CR): The CR refers to the ratio of the compressed data’s size over the uncompressed data’s size. A high compression rate indicates an efficient compression process, as it signifies a remarkable reduction in data size. It is defined as,
    CR=(1-SizeofthenetworkSizeofUncompressedData)×100% (6)

    In this paper, we define the size of a deep network by its weight count and the size of a volume by its voxel number.

3.2.2 Trade-off settings

To find the trade-off settings for the individual modules, we first apply the metrics of PSNR, SSIM, and CR defined in the above section separately to a specific volume of data concerning three dimensions: DS, NC, and SN. The different combinations of DS, NC, and SN result in different measurements, which are stored in a 3D array, as shown in Fig 2. We need to balance the performance of (PSNR, SSIM, and CR) associated with the combination of three dimensions (DS, NC, SN) to determine the trade-off point for our end-to-end architecture. This may be described as,

{min(x,y,z)3DA(1PSNR+1-|SSIM|+(1-CR))subjectto{c1:x-DSmax=0c2:y-NCmin=0c3:z-SNmin=0 (7)

where, 3DA denotes the 3D array with 3 dimensions, DS, NC, SN, and DSmax denotes the given maximum value for DS, and others have a similar definition. Applying the Augmented Lagrangian method here yields,

TradeOff=argmin(x,y,z)3DA(1PSNR+1-|SSIM|+(1-CR)-i=13αici+12βi=13ci2) (8)

where α are Lagrange factors and β is the penalty parameter. The resulting (x,y,z) is called the trade-off point. To visualise it, we compute the marginal distributions concerning three dimensions separately on 3DA as below,

{PSNR(x3DA(DS))=(y,z)3DA(NC,SN)PSNR(x,y,z)SSIM(x3DA(DS))=(y,z)3DA(NC,SN)SSIM(x,y,z)CR(x3DA(DS))=(y,z)3DA(NC,SN)CR(x,y,z) (9)
Fig 2. Illustration of the data structure in the context of the metrics, PSNR, SSIM and CR, according to the DS, NC and SN dimensions.

Fig 2

There are a total of three sets of marginal distributions. Each set illustrates the PSNR bounds, SSIM bounds, and CR bounds concerning the scale at each dimension specified by the trade-off point, one after another. Theoretical equivalence is expected among these three sets of PSNR, SSIM and CR bounds at the trade-off point. The trade-off point indicates the tolerance of the proposed architecture in three dimensions at an expected PSNR, SSIM and CR bounds level. The area delimited by the trade-off point intuitively and quantitatively illustrates the proposed architecture’s performance.

4 Materials and experimental results

Our experiments can be categorised into two parts. The first part aims to justify the selection of each module in our proposed end-to-end architecture. The second part involves applying the trade-off point method to determine an optimal architecture that balances various considerations.

4.1 Data and implementation setup

The dataset comprises 750 multi-parametric magnetic resonance images (mp-MRI) collected from patients diagnosed with either glioblastoma or lower-grade glioma [32]. We select T2 Fluid-Attenuated Inversion Recovery (FLAIR) 3D scan from a random patient with the size of 155 x 240 x 240. The implementation of our architecture starts with a high-resolution 3D volumetric input, such as a medical scan, denoted as x. Initially, the input volume undergoes normalisation, scaling the voxel values to a range between 0 and 1. To streamline computations, the volume is segmented into smaller patches, each measuring 64 × 64 × 64. Patches with 70% or more non-zero voxels containing more information are classified as High-Resolution (HR) patches. From these, one HR patch is selected as the high-resolution input for further processing.

Once the data are prepared, the 3D Downsampling module applies a Fourier Transform to convert the high-resolution volume from the spatial domain to the frequency domain. A low-pass filter is then used to eliminate high-frequency components, thereby reducing resolution. This removal process is crucial in medical imaging, as it decreases the data size while preserving essential information, ultimately easing the model processing load. The Inverse Fourier Transform reverts the data to the spatial domain, yielding a low-resolution version of the original volume.

Next, the downsampled volume is processed through the 3D Implicit Neural Representation (INR) module. Here, a Multi-Layer Perceptron (MLP) utilising Sinusoidal Activation Functions (SIREN) maps input coordinates to output voxel intensities, enabling the neural network to represent complex structures as continuous functions. These functions are then converted into voxel intensities.

Following this, the 3D Super-Resolution (SR) module employs a 3D Convolutional Neural Network (CNN) for feature extraction, incorporating an Attention Mechanism to prioritise significant features. This SR module improves the resolution of the volume, restoring it to a level close to the original.

The reconstructed volume, denoted as y, is compared to the original x using an L1 loss function to assess and optimise reconstruction quality. The entire system is trained using the Adam optimiser with a learning rate of 0.0015 for 5,000 epochs on an NVIDIA A4000 16GB GPU with CUDA support in the PyTorch framework. All source codes and results are available at https://github.com/asheibanifard/EndtoEndCompression.

4.2 Trade-off architecture

4.2.1 3D downsampling module

The Downsampling module does not require training. This implies that the downsampling scale is per set without consideration of the final result quality. We select three downsampling scales of 1/2, 1/4, and 1/8 in our experiments. It is necessary to test the performance of the proposed architecture at three downsampling scales, particularly the INR module. Table 2 presents a comprehensive comparison of reconstruction results for different downsampling scales, illustrating the effectiveness of our proposed architecture in maintaining a high reconstruction quality across various compression levels. It can be noted that decreasing the downsampling scales does not significantly degenerate the quality of the reconstruction. Additionally, non-standard sampling scales like 1/3, 1/5, or 1/7 would introduce unnecessary complexity and inconsistencies without offering meaningful improvements, making them less suitable for the architecture’s goals. Thus, these three downsampling scales are acceptable.

Table 2. Performance of the INR module and the whole end-to-end architecture.

(The upper row shows the performance of a single SIREN and the lower row shows that of the whole end-to-end architecture).

INR module
Scale Avg Bitrate ↓ Avg CR (%)↑ Avg PSNR ↑ Avg SSIM ↑ Avg #Para ↓
1/2 5.21 83.71 36.96 0.95 42711
1/4 5.21 83.71 51.48 1.00 42711
1/8 5.21 83.71 67.34 1.00 42711
Whole end-to-end Architecture
Scale Avg Bitrate ↓ Avg CR (%)↑ Avg PSNR ↑ Avg SSIM ↑ Avg #Para ↓
1/2 5.743 82.052 38.001 0.956 47048.0
1/4 6.655 79.200 38.381 0.953 54524.0
1/8 10.062 68.553 39.462 0.961 82436.0

4.2.2 3D INR module

We opt for the SIREN model [9] as our INR module, focusing on two primary aspects of the SIREN structure: the number of layers and the number of neurons per layer. The goal is to use a compact SIREN model to enhance the compression rate (CR). We experiment with various configurations of the SIREN model, altering the layer count and neuron count per layer, as detailed in Table 3. We find that a SIREN network with 3 layers and between 30 and 230 neurons per layer offers satisfactory performance, especially for small volume data inputs, while substantially cutting down on GPU memory usage. Furthermore, we compare the performance of a single SIREN model against our proposed architecture, as shown in Table 2. The notable benefit is a dramatic reduction in GPU memory consumption while maintaining comparable reconstruction quality. Additionally, using more than 230 neurons per layer increases the model’s capacity to represent detailed structures but leads to diminishing returns in terms of reconstruction quality. Beyond 230 neurons, the gains in PSNR and SSIM are marginal, while the computational cost and GPU memory usage increase significantly. This increased complexity does not translate into substantial improvements in performance, making the additional computational overhead unjustified. Thus, we prefer the SIREN model with 3 layers in the INR module.

Table 3. Average values for different INR layers and neurons.
Layers Neurons Bitrate(bpp) ↓ CR (%)↑ PSNR ↑ SSIM ↑ #Para ↓
3 30 0.245 99.233 31.081 0.767 2011
3 50 0.653 97.959 32.205 0.804 5351
3 70 1.256 96.074 34.550 0.903 10291
3 90 2.055 93.579 35.637 0.923 16831
3 110 3.048 90.474 36.610 0.942 24971
3 130 4.237 86.759 37.862 0.960 34711
3 150 5.621 82.433 38.389 0.964 46051
3 170 7.201 77.497 38.626 0.965 58991
3 190 8.976 71.950 39.954 0.975 73531
3 210 10.946 65.793 39.553 0.974 89671
3 230 13.112 59.026 40.934 0.981 107411
4 30 0.359 98.878 29.008 0.627 2941
4 50 0.964 96.986 32.472 0.825 7901
4 70 1.863 94.178 34.814 0.902 15261
4 90 3.054 90.455 36.450 0.937 25021
4 110 4.539 85.817 37.251 0.951 37181
4 130 6.316 80.262 39.872 0.974 51741
4 150 8.386 73.793 41.887 0.984 68701
4 170 10.750 66.407 42.395 0.986 88061
4 190 13.406 58.107 42.738 0.988 109821
4 210 16.355 48.890 43.586 0.989 133981
4 230 19.597 38.758 44.335 0.991 160541
5 30 0.473 98.523 30.846 0.781 3871
5 50 1.276 96.013 32.469 0.799 10451
5 70 2.470 92.282 38.391 0.964 20231
5 90 4.054 87.331 38.456 0.963 33211
5 110 6.029 81.159 40.037 0.976 49391
5 130 8.395 73.766 41.990 0.985 68771
5 150 11.151 65.152 42.953 0.988 91351
5 170 14.298 55.318 42.284 0.986 117131
5 190 17.836 44.263 43.366 0.989 146111
5 210 21.764 31.987 44.676 0.991 178291
5 230 26.083 18.491 44.616 0.991 213671
6 30 0.586 98.169 30.077 0.774 4801
6 50 1.587 95.041 36.246 0.935 13001
6 70 3.076 90.387 39.529 0.974 25201
6 90 5.054 84.207 40.204 0.977 41401
6 110 7.520 76.501 41.779 0.984 61601
6 130 10.474 67.270 41.733 0.984 85801
6 150 13.916 56.512 43.195 0.988 114001
6 170 17.847 44.229 42.773 0.988 146201
6 190 22.266 30.420 44.301 0.991 182401
6 210 27.173 15.084 42.274 0.984 222601
6 230 32.568 -1.777 43.441 0.988 266801

4.2.3 3D super-resolution module

We utilise the SRNO [11] for the SR module due to its compact size, as evidenced by the average number of parameters of deep networks in Table 2. We also compare our end-to-end architecture with cutting-edge methods [3237]. Table 8 reveals that (1) the SR module performs effectively, as our architecture, using a 3-layer SIREN, matches the reconstruction quality of a standalone 5-layer SIREN; and (2) our architecture surpasses other state-of-the-art image compression methods in terms of PSNR and SSIM.

4.2.4 Find a trade-off architecture by trade-off point approach

To find the trade-off point for our proposed architecture, firstly, our proposed architecture is tested in terms of all combinations of NC, DS and SN, which is presented separately in Table 4 with 4 channels of feature extraction in the SRNO model, Table 5 with 8 channels of feature extraction in the SRNO model, and Table 6 with 16 channels of feature extraction in the SRNO model. The trade-off point of the proposed architecture is then calculated using Eq 8, that is, the trade-off point (NC = 4, DS = 1/2, SN = 30). At the trade-off point, the PSNR upper bound is around 38, the SSIM upper bound is around 0.94, and the CR upper bound is around 76.6%, as shown in Table 7. This is a good setting for the proposed architecture, as it reaches a high compression rate and good quality for reconstruction.

Table 4. The results of our proposed architecture with 4 channels of shallow feature extractor in SR module.
Scale # Neurons Bitrate(bpp) ↓ CR(%)↑ PSNR(db) ↑ SSIM ↑ #Para ↓ GPU memory(GB) ↓
1/2 30 0.775 97.578 33.885 0.885 6348 1.366
1/2 50 1.183 96.304 35.211 0.915 9688 1.395
1/2 70 1.786 94.420 36.853 0.947 14628 1.426
1/2 90 2.584 91.925 37.682 0.961 21168 1.456
1/2 110 3.578 88.820 38.547 0.969 29308 1.484
1/2 130 4.767 85.104 38.968 0.972 39048 1.512
1/2 150 6.151 80.779 38.867 0.974 50388 1.540
1/2 170 7.730 75.842 39.419 0.973 63328 1.570
1/2 190 9.505 70.296 39.603 0.976 77868 1.599
1/2 210 11.476 64.139 39.319 0.970 94008 1.629
1/2 230 13.641 57.372 39.665 0.976 111748 1.661
1/4 30 1.520 95.250 33.221 0.858 12452 1.319
1/4 50 1.928 93.976 33.985 0.892 15792 1.322
1/4 70 2.531 92.091 34.503 0.916 20732 1.331
1/4 90 3.329 89.597 34.789 0.921 27272 1.335
1/4 110 4.323 86.491 34.753 0.915 35412 1.333
1/4 130 5.512 82.776 34.825 0.921 45152 1.349
1/4 150 6.896 78.450 35.080 0.923 56492 1.348
1/4 170 8.476 73.514 35.001 0.924 69432 1.347
1/4 190 10.250 67.967 34.977 0.919 83972 1.347
1/4 210 12.221 61.810 35.300 0.927 100112 1.354
1/4 230 14.386 55.043 35.393 0.922 117852 1.355
1/8 30 7.481 76.622 40.991 0.977 61284 1.313
1/8 50 7.889 75.348 36.212 0.909 64624 1.314
1/8 70 8.492 73.463 40.873 0.977 69564 1.312
1/8 90 9.290 70.969 38.934 0.965 76104 1.315
1/8 110 10.284 67.863 40.995 0.979 84244 1.319
1/8 130 11.473 64.148 40.799 0.978 93984 1.316
1/8 150 12.857 59.822 40.150 0.975 105324 1.318
1/8 170 14.437 54.886 39.587 0.974 118264 1.316
1/8 190 16.211 49.339 39.866 0.973 132804 1.318
1/8 210 18.182 43.182 38.954 0.966 148944 1.317
1/8 230 20.347 36.415 39.094 0.960 166684 1.319
Table 5. The results of our proposed network with 8 channels of shallow feature extractor in SR module.
Scale # Neurons Bitrate(bpp) ↓ CR(%)↑ PSNR(db) ↑ SSIM ↑ #Para ↓ GPU memory(GB) ↓
1/2 30 1.688 94.727 32.829 0.825 13824 1.330
1/2 50 2.095 93.452 36.009 0.922 17164 1.360
1/2 70 2.698 91.568 37.112 0.945 22104 1.387
1/2 90 3.497 89.073 38.502 0.964 28644 1.422
1/2 110 4.490 85.968 39.373 0.975 36784 1.451
1/2 130 5.679 82.253 39.127 0.974 46524 1.479
1/2 150 7.063 77.927 39.365 0.971 57864 1.506
1/2 170 8.643 72.990 41.106 0.982 70804 1.537
1/2 190 10.418 67.444 40.133 0.979 85344 1.562
1/2 210 12.388 61.287 38.495 0.976 101484 1.595
1/2 230 14.554 54.520 40.149 0.976 119224 1.625
1/4 30 3.171 90.091 35.348 0.886 25976 1.277
1/4 50 3.579 88.817 36.051 0.917 29316 1.281
1/4 70 4.182 86.932 37.443 0.941 34256 1.289
1/4 90 4.980 84.438 36.178 0.935 40796 1.291
1/4 110 5.974 81.332 38.016 0.948 48936 1.295
1/4 130 7.163 77.617 37.564 0.946 58676 1.307
1/4 150 8.547 73.291 38.021 0.948 70016 1.307
1/4 170 10.126 68.355 36.991 0.944 82956 1.307
1/4 190 11.901 62.808 38.424 0.950 97496 1.310
1/4 210 13.872 56.651 36.755 0.942 113636 1.311
1/4 230 16.037 49.884 36.696 0.942 131376 1.319
1/8 30 15.038 53.006 45.056 0.990 123192 1.273
1/8 50 15.446 51.732 44.628 0.989 126532 1.272
1/8 70 16.049 49.847 45.004 0.990 131472 1.276
1/8 90 16.847 47.353 44.226 0.987 138012 1.272
1/8 110 17.841 44.247 45.528 0.992 146152 1.277
1/8 130 19.030 40.532 43.596 0.985 155892 1.276
1/8 150 20.414 36.206 44.410 0.990 167232 1.276
1/8 170 21.994 31.270 43.539 0.986 180172 1.276
1/8 190 23.769 25.723 43.650 0.988 194712 1.277
1/8 210 25.739 19.566 42.061 0.981 210852 1.277
1/8 230 27.904 12.799 41.435 0.978 228592 1.275
Table 6. The results of our proposed network with 16 channels of shallow feature extractor in SR module.
Scale # Neurons Bitrate(bpp) ↓ CR(%)↑ PSNR(db) ↑ SSIM ↑ #Para ↓ GPU memory(GB) ↓
1/2 30 5.095 84.079 34.469 0.858 41736 1.366
1/2 50 5.502 82.805 37.843 0.947 45076 1.396
1/2 70 6.105 80.920 39.609 0.965 50016 1.424
1/2 90 6.904 78.426 39.634 0.967 56556 1.457
1/2 110 7.897 75.320 40.585 0.979 64696 1.486
1/2 130 9.086 71.605 38.699 0.975 74436 1.514
1/2 150 10.471 67.279 41.600 0.982 85776 1.546
1/2 170 12.050 62.343 39.853 0.969 98716 1.568
1/2 190 13.825 56.796 41.416 0.981 113256 1.601
1/2 210 15.795 50.639 41.327 0.977 129396 1.632
1/2 230 17.961 43.872 39.053 0.977 147136 1.660
1/4 30 8.055 74.829 35.132 0.891 65984 1.272
1/4 50 8.462 73.555 38.964 0.941 69324 1.279
1/4 70 9.065 71.671 40.981 0.965 74264 1.290
1/4 90 9.864 69.176 40.182 0.965 80804 1.289
1/4 110 10.857 66.071 40.705 0.969 88944 1.292
1/4 130 12.046 62.355 42.462 0.979 98684 1.298
1/4 150 13.431 58.029 42.580 0.981 110024 1.307
1/4 170 15.010 53.093 41.079 0.970 122964 1.308
1/4 190 16.785 47.546 42.322 0.979 137504 1.306
1/4 210 18.755 41.389 41.177 0.974 153644 1.311
1/4 230 20.921 34.622 41.987 0.979 171384 1.313
1/8 30 31.734 0.830 49.798 0.998 259968 1.270
1/8 50 32.142 -0.444 48.750 0.996 263308 1.271
1/8 70 32.745 -2.328 48.238 0.995 268248 1.271
1/8 90 33.543 -4.823 48.497 0.997 274788 1.271
1/8 110 34.537 -7.928 46.755 0.993 282928 1.280
1/8 130 35.726 -11.644 48.436 0.997 292668 1.275
1/8 150 37.110 -15.970 50.539 0.999 304008 1.277
1/8 170 38.690 -20.906 48.897 0.996 316948 1.273
1/8 190 40.465 -26.453 48.730 0.997 331488 1.275
1/8 210 42.435 -32.610 46.971 0.994 347628 1.277
1/8 230 44.601 -39.377 47.476 0.996 365368 1.277
Table 7. Our proposed architecture’s trade-off point.
Marginal values NC = 4 DS = 1/2 SN = 30
1/PSNR 0.02598 0.02681 0.02694
1 − |SSIM| 0.04287 0.05491 0.09239
1 − CR 0.23398 0.25709 0.25887

Moreover, it is further illustrated by Eq 9. We show the three sets of marginal distributions concerning dimensions (NC, DS, SN), in Figs 35, respectively. If CR is decreased, the SIREN size (SN) or channel number (NC) can be increased. However, the reconstruction quality (i.e. PSNR or SSIM) shows a slight improvement. Thus, enlarging the model size or channel number will not significantly improve reconstruction quality. Additionally, compared to other existing approaches in Table 8, our architecture excels in maintaining a low Bitrate(bpp), ensuring that the compressed file size is significantly smaller. Our results (PSNR and SSIM) are still comparable with those of the “3D-VOI-OMLSVD [34]”. Fig 6 further shows the reconstructed slices of volume data.

Fig 3. Illustrates the trade-off point for the number of channels (NC) in the SR module concerning the performance metrics, 1/PSNR, 1-SSIM, and 1-CR.

Fig 3

The red dashed lines indicate the intersection where the optimal trade-off is achieved, balancing compression efficiency and reconstruction quality.

Fig 5. The trade-off point for the number of neurons (SN) in the SIREN model, plotted against the performance metrics, 1/PSNR, 1-SSIM, and 1-CR.

Fig 5

The red dashed lines indicate the optimal configuration of neurons in the SIREN model for achieving high reconstruction quality with minimal compression loss.

Table 8. Comparison of our techniques with other state-of-the-art methods in terms of PSNR and SSIM in volume reconstruction.
Method avg PSNR ↑ avg SSIM ↑ CR(%)↑ Bitrate(bpp) ↓ GPU(GB) ↓
Single SIREN [9] 40.008 0.947 67.062 10.348 3.390
Devadoss et al. [33] 34.1098 - 78.16 4.580 -
MVAR [32] 40.050 - 90.00 - -
3D-VOI-OMLSVD [34] 42.04 0.978 89.17 2.54 -
aiWave-heavy [36] 39.00 - - 2.5 -
Block CS [35] 30.86 0.7489 50.00 - -
EZW with Haar [37] 30.15 - 40.31 - -
Our Architecture 40.052 0.961 97.578 0.775 0.769
Fig 6. The Left column shows the different original slices of the volume with sizes of (155, 240, 240); the middle column shows the labelled patches of the slices with sizes of (64, 64, 64); the right column shows the reconstructed patches by our architecture.

Fig 6

Fig 4. The trade-off point for the downsampling scale (DS) is based on the performance metrics, 1/PSNR, 1-SSIM, and 1-CR.

Fig 4

The red dashed lines highlight where the downsampling scale achieves an optimal balance between compression rate and reconstruction accuracy.

Additionally, Fig 7 shows a steady optimisation process over 5000 epochs, with continuous improvements in reconstruction accuracy and structural similarity. The PSNR curve exceeds 40 dB, indicating high reconstruction quality with minimal error. The SSIM curve approaches 0.96, demonstrating the model’s effectiveness in preserving perceptual and structural fidelity. The steady decrease in the loss function, alongside the PSNR and SSIM improvements, confirms effective convergence. These results, consistent with the final performance metrics in Table 8, highlight the architecture’s ability to balance compression efficiency and high-quality reconstruction, making it ideal for medical imaging.

Fig 7. Training procedure of the architecture according to the trade-off point setting.

Fig 7

Remark: The proposed trade-off point approach serves as a pragmatic optimisation strategy. In the context of the compression problem, it is essential to balance various requirements, including downsampling scales, INR module size, SR module structure, etc., rather than overemphasising one or two factors. The trade-off point approach addresses this challenge by elegantly optimising the parameters involved.

5 Conclusion and future work

In this paper, we proposed an innovative architecture that integrates available deep-learning techniques with a focus on compressing volume data while maintaining high reconstruction fidelity. One notable aspect of our approach is the utilisation of emerging deep learning technologies, which have witnessed rapid development in recent years. We emphasised the importance of carefully considering various factors such as network architecture, computational efficiency, and reconstruction accuracy when designing and implementing the end-to-end solution. To this end, we proposed the end-to-end network architecture for volume data compression and developed the trade-off approach to determine optimal settings for individual modules, which is a practical method to balance performance considerations in the context of medical visualisation tasks.

5.1 Limitations

5.1.1 Generalisation to diverse medical datasets

Applying the proposed end-to-end architecture to various volume datasets requires significant retraining time for each dataset individually, as there is no fine-tuning strategy in place to speed up this process.

5.1.2 Time complexity of trade-off point approach

The trade-off point method necessitates sampling the model’s performance across different architecture settings, which is highly time-consuming.

5.2 Future work

Beyond the realm of compression, visualising over-large medical volume data through real-time rendering is meaningful. Compression with rendering could enable real-time visualisation of such over-large volume data. In future work, we intend to focus on volume-rendering techniques that leverage implicit neural representations. This research direction shows significant promise for advancements in the field of visualisation.

Data Availability

The data and codes supporting the findings of this study are openly available at GitHub and can be accessed at https://github.com/asheibanifard/EndtoEndCompression.

Funding Statement

This research was partially supported by the EU Horizon Project-ACMod (No. 101130271). Zongcai Ruan was supported by STI2030-Major Projects of China (2021ZD0204002). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Denk W, Horstmann H. Serial Block-Face Scanning Electron Microscopy to Reconstruct Three-Dimensional Tissue Nanostructure. PLOS Biology. 2004;2(11):null. doi: 10.1371/journal.pbio.0020329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Micheva KD, Smith SJ. Array Tomography: A New Tool for Imaging the Molecular Architecture and Ultrastructure of Neural Circuits. Neuron. 2007;55(1):25–36. doi: 10.1016/j.neuron.2007.06.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Xu CS, Hayworth ea. Enhanced FIB-SEM systems for large-volume 3D imaging. eLife. 2017;6:e25916. doi: 10.7554/eLife.25916 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Scheffer LK, Xu CS, et al J. A connectome and analysis of the adult Drosophila central brain. eLife. 2020;9:e57443. doi: 10.7554/eLife.57443 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Peddie CJ, Genoud C, Kreshuk A, et al. Volume electron microscopy. Nature Reviews Methods Primers. 2022;2(1):51. doi: 10.1038/s43586-022-00131-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Jaba Deva Krupa A, Dhanalakshmi S, Lai KW, Tan Y, Wu X. An IoMT enabled deep learning framework for automatic detection of fetal QRS: A solution to remote prenatal care. Journal of King Saud University—Computer and Information Sciences. 2022;34(9):7200–7211. doi: 10.1016/j.jksuci.2022.07.002 [DOI] [Google Scholar]
  • 7. Wu X, Zhang YT, Lai KW, Yang MZ, Yang GL, Wang HH. A Novel Centralized Federated Deep Fuzzy Neural Network with Multi-objectives Neural Architecture Search for Epistatic Detection. IEEE Transactions on Fuzzy Systems. 2024; p. 1–13. doi: 10.1109/TFUZZ.2024.3369944 [DOI] [Google Scholar]
  • 8. Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digital Medicine. 2022;5(1):48. doi: 10.1038/s41746-022-00592-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sitzmann V, Martel, et al. Implicit Neural Representations with Periodic Activation Functions; 2020. Available from: https://arxiv.org/abs/2006.09661.
  • 10. Sheibanifard A, Yu H. A Novel Implicit Neural Representation for Volume Data. Applied Sciences. 2023;13(5). doi: 10.3390/app13053242 [DOI] [Google Scholar]
  • 11.Wei M, Zhang X. Super-Resolution Neural Operator; 2023.
  • 12. Park JJ, Florence PR, Straub J, Newcombe RA, Lovegrove S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. CoRR. 2019;abs/1901.05103. [Google Scholar]
  • 13.Tang D, Singh S, Chou PA, Haene C, Dou M, Fanello S, et al. Deep Implicit Volume Compression; 2020.
  • 14.Nagoor OH, Whittle J, Deng J, Mora B, Jones MW. MedZip: 3D Medical Images Lossless Compressor Using Recurrent Neural Network (LSTM). In: 2020 25th International Conference on Pattern Recognition (ICPR); 2021. p. 2874–2881.
  • 15. Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun ACM. 2021;65(1):99–106. doi: 10.1145/3503250 [DOI] [Google Scholar]
  • 16. Mescheder LM, Oechsle M, Niemeyer M, et al. Occupancy Networks: Learning 3D Reconstruction in Function Space. CoRR. 2018;abs/1812.03828. [Google Scholar]
  • 17.Dupont E, Goliński A, Alizadeh M, Teh YW, Doucet A. COIN: COmpression with Implicit Neural representations; 2021.
  • 18. Skorokhodov I, Ignatyev S, Elhoseiny M. Adversarial Generation of Continuous Images. CoRR. 2020;abs/2011.12026. [Google Scholar]
  • 19. Shen L, Pauly J, Xing L. NeRP: Implicit Neural Representation Learning With Prior Embedding for Sparsely Sampled Image Reconstruction. IEEE Transactions on Neural Networks and Learning Systems. 2022; p. 1–13. doi: 10.1109/TNNLS.2022.3177134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dong C, Loy CC, He K, Tang X. Image Super-Resolution Using Deep Convolutional Networks; 2015. [DOI] [PubMed]
  • 21.Kim J, Lee JK, Lee KM. Accurate Image Super-Resolution Using Very Deep Convolutional Networks; 2016.
  • 22.Dong C, Loy CC, Tang X. Accelerating the Super-Resolution Convolutional Neural Network; 2016.
  • 23.Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced Deep Residual Networks for Single Image Super-Resolution; 2017.
  • 24.Tai Y, Yang J, Liu X, Xu C. MemNet: A Persistent Memory Network for Image Restoration; 2017.
  • 25.Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual Dense Network for Image Super-Resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018.
  • 26.Dai T, Cai J, Zhang Y, Xia ST, Zhang L. Second-Order Attention Network for Single Image Super-Resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019.
  • 27.Liu D, Wen B, Fan Y, Loy CC, Huang TS. Non-Local Recurrent Network for Image Restoration; 2018.
  • 28.Niu B, Wen W, Ren W, Zhang X, Yang L, Wang S, et al. Single Image Super-Resolution via a Holistic Attention Network; 2020.
  • 29.Mei Y, Fan Y, Zhou Y. Image Super-Resolution With Non-Local Sparse Attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2021. p. 3517–3526.
  • 30.Zhang X, Zeng H, Guo S, Zhang L. Efficient Long-Range Attention Network for Image Super-resolution; 2022. Available from: 10.1007/978-3-031-19790-1-39. [DOI]
  • 31. de Leeuw den Bouter ML, Ippolito G, et al. Deep learning-based single image super-resolution for low-field MR brain images. Scientific Reports. 2022;12(1):6362. doi: 10.1038/s41598-022-10298-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Fahrni G, Rotzinger DC, Nakajo C, Dehmeshki J, Qanadli SD. Three-Dimensional Adaptive Image Compression Concept for Medical Imaging: Application to Computed Tomography Angiography for Peripheral Arteries. Journal of Cardiovascular Development and Disease. 2022;9(5). doi: 10.3390/jcdd9050137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Devadoss CP, Sankaragomathi B. Near lossless medical image compression using block BWT–MTF and hybrid fractal compression techniques. Cluster Computing. 2019;22(5):12929–12937. doi: 10.1007/s10586-018-1801-3 [DOI] [Google Scholar]
  • 34. Boopathiraja S, Kalavathi P, Deoghare S, Prasath VBS. Near Lossless Compression for 3D Radiological Images Using Optimal Multilinear Singular Value Decomposition (3D-VOI-OMLSVD). Journal of Digital Imaging. 2023;36(1):259–275. doi: 10.1007/s10278-022-00687-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Chakraborty P, Chandrapragasam T. Extended Applications of Compressed Sensing Algorithm in Biomedical Signal and Image Compression. Journal of The Institution of Engineers (India): Series B. 2022;103(1):83–91. doi: 10.1007/s40031-021-00592-8 [DOI] [Google Scholar]
  • 36.Xue D, Ma H, Li L, Liu D, Xiong Z. aiWave: Volumetric Image Compression with 3-D Trained Affine Wavelet-like Transform; 2022. Available from: https://arxiv.org/abs/2203.05822. [DOI] [PubMed]
  • 37. Miya J, Ansari M. Wavelet Techniques for Medical Images Performance Analysis and Observations with EZW and Underwater Image Processing. Wireless Personal Communications. 2021;116. doi: 10.1007/s11277-020-07238-w [DOI] [Google Scholar]

Decision Letter 0

Rossana Mastrandrea

21 May 2024

PONE-D-24-12715An End-to-End Implicit Neural Representation Network for Medical Volume DataPLOS ONE

Dear Dr. Yu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 05 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Rossana Mastrandrea

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure: 

   "EU Horizon Marie Skłodowska-Curie Action;

Grant No. 101130271;

Title: Affective Computing Models: from Facial Expression to Mind-Reading"

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: A preliminary theoretical analysis of the proposed methodology motivates its rationale. Also, this manuscript has many problems, such as structure, contribution, proposed method, etc. So, In my opinion, this is not suitable for publication.

Reviewer #2: The issues are listed in the following:

1. The professional English editing is recommended. The authors should get editing help from someone with full professional proficiency in English.

2. What is the main difference or importance of the proposed methods and the other state-of-the-arts?

3. The abstract should reflect the contributions of the manuscript. I suggest rewriting it.

4. More recent references in the context of this study need to be added and discussed in manuscript. For example, “A Secure Visual Framework for Multi-index Privacy Protection Evaluation in Networks, doi: 10.1016/j.dcan. 2022.05.007.”

5. The Conclusion section should point out the potential disadvantages and possible future research directions of the manuscript. How this work can be extended in future?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Xiyu Liu

17 Sep 2024

PONE-D-24-12715R1An End-to-End Implicit Neural Representation Architecture for Medical Volume DataPLOS ONE

Dear Dr. Yu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 01 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Xiyu Liu

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This manuscript proposes an innovative architecture that integrates available deep-learning techniques focusing on compressing volume data while maintaining high reconstruction fidelity. Adequate revisions to the following points should be undertaken to justify the recommendation for publication.

� The authors should clearly state the limitations of the proposed method in other real applications.

� The abstract section is fragile. Please rewrite it, explain the result obtained and contribution, improve a proposed method, and delete unnecessary information.

� Proofread the manuscript carefully to eliminate any grammatical errors or typos and ensure clarity and coherence in writing. Additionally, adhere to the formatting and style guidelines specified by the target journal or publication venue to enhance the professionalism of the manuscript.

� I suggest the authors add a table at the end of the literature review and compare the reviewed papers to clarify the research gap better.

� Please write your contribution to this paper in the Introduction section.

� Expand the critical results in the conclusion. Focus on the main developments in the finale. Also, write the main contributions in the conclusion.

� Numerical results are good enough, but more explanations are required to analyze each figure presented.

� The simulation section needs to be more detailed. The authors should provide more information about the data they employed and the simulation process.

� Please Change the “conclusion” section to “ Conclusion and Future Work” and write future work.

� All figures are of low quality, so please improve all of them.

Good luck

Reviewer #2: The issues are listed in the following:

1. The professional English editing is recommended. The authors should get editing help from someone with full professional proficiency in English.

2. The abstract contains a large amount of information. It is recommended to simplify the language and highlight the innovative points and main results of the study. It is suggested to update some higher quality or newer literature. For example, the introduction of convolutional neural networks can be referenced as "A Novel Centralized Federated Deep Fuzzy Neural Network with Multi-objectives. Neural Architecture Search for Epistatic Detection, DOI: 10.1109 / TFUZZ. 2024.3369944 ', Fourier Transform technique introduction can be quoted "An IoMT enabled deep learning framework for automatic detection of fetal QRS: A solution to remote prenatal care, DOI10.1016 / j.j ksuci. 2022.07.002.

3. The figures included in the manuscript are not sufficiently clear. High-resolution images should be provided to ensure that all details are visible and can be thoroughly examined by readers.

4. The introduction should provide a clearer overview of the motivation, objectives, and main contributions of the study. Additionally, the limitations of existing technologies and how your work addresses these gaps should be briefly discussed.

5. Please outline your main contributions more clearly in the introduction and conclusion sections. This should include the specific innovations of your proposed end-to-end architecture and the advantages over traditional methods.

6. Your proposed end-to-end architecture requires a more detailed description.

7. The results section requires more details to substantiate your claims. For instance, for the performance evaluation of each module, more quantitative data and visualizations should be provided.

8. The selection of the dataset, experimental setup, and evaluation metrics need more detailed descriptions to ensure the reproducibility and validity of the results.

9. Please ensure that all relevant and up-to-date literature is cited and that the citation format adheres to the journal's requirements.

10. Discuss in more detail within the text how your work relates to existing technologies. This includes a comparison of your work with the current state-of-the-art and where your work offers improvements.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jan 3;20(1):e0314944. doi: 10.1371/journal.pone.0314944.r004

Author response to Decision Letter 1


25 Oct 2024

- Response to reviewer #1:

1. The proposed method’s limitations have been stated in the conclusion in the revision [349-356]

2. The abstract has been rewritten in the revision

3. The grammatical errors or typos have been corrected in the revision and Guidelines were checked

4. The suggested table has been added as Table 1 in the revision

5. The contributions mentioned before, but they have been expanded for clarification in the revision.[31-37]

6. The conclusion has been rewritten accordingly in the revision [338-348]

7. In Find a trade-off architecture by trade-off point approach subsection, we added Fig 7 to show the training procedure of the end to end architecture. Moreover, we also rewrote the Materials and Experimental results Section to explain each figure and Table in details in the revision. [313-330]

8. The required explanations about simulation process have been added in the Materials and Experimental Results section. Data and implementation setup subsection has been rewritten in the revision. [234-260]

9. It has been changed in the revision [337]

10. The PACE tool standardised the images according to the guidelines. But this time for assurance, the original figures' configurations and quality were adapted to the guidelines with a maximum resolution of 600 dpi.

- Response to reviewer #2:

1. The revision has been proofread to correct errors.

2. These missed reference have been cited in the revision. [11-12]

3. The PACE tool standardised the images according to the guidelines. For assurance, the original high quality figures were uploaded instead in the revision.

4. The introduction has been rewritten accordingly and for clarification, table 1 has been added to show the gaps in the revision.

5. The contributions have been explicitly mentioned in introduction and conclusion in the revision.[31-37][338-348]

6. The Proposed End-to-End architecture subsection has been updated accordingly and Data and implementation setup subsection has been updated as well in the revision. [100-104]

7. In Find a trade-off architecture by trade-off point approach subsection, we added Fig 7 to show the training procedure of the end to end architecture. Moreover, we also rewrote the Materials and Experimental results Section to explain each figure and Table in details in the revision.[265-268][272-274][287-293]

8. More details have been added to the Data and implementation setup subsection in the revision. [231-234]

9. The citations and their format have been reviewed in the revision.

10. Table 7 shows the comparison with SOTA and has been updated in both caption and description in text in the revision.[319-321]

Attachment

Submitted filename: Response to Reviewers.docx

pone.0314944.s002.docx (11.9KB, docx)

Decision Letter 2

Xiyu Liu

19 Nov 2024

An End-to-End Implicit Neural Representation Architecture for Medical Volume Data

PONE-D-24-12715R2

Dear Dr. Yu,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Xiyu Liu

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have addressed all the issues according to my previous comments. The related work has been enriched, and the indistinct description and deficient analysis have been further refined. More discussions have also been added. This paper has been revised thoroughly to reach the standard for publication. Consequently, I advise you to accept this paper.

Reviewer #2: All of my questions have been addressed and have been revised as required for the proposed publication in this journal.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

Acceptance letter

Xiyu Liu

20 Dec 2024

PONE-D-24-12715R2

PLOS ONE

Dear Dr. Yu,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Xiyu Liu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: response to reviewers.pdf

    pone.0314944.s001.pdf (51.6KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0314944.s002.docx (11.9KB, docx)

    Data Availability Statement

    The data and codes supporting the findings of this study are openly available at GitHub and can be accessed at https://github.com/asheibanifard/EndtoEndCompression.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES