Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2022 Dec 8;2022:4431536. doi: 10.1155/2022/4431536

Super-Resolution Swin Transformer and Attention Network for Medical CT Imaging

Jianhua Hu 1, Shuzhao Zheng 1, Bo Wang 1, Guixiang Luo 2, WoQing Huang 1, Jun Zhang 1,
PMCID: PMC9754833  PMID: 36531651

Abstract

Computerized tomography (CT) is widely used for clinical screening and treatment planning. In this study, we aimed to reduce X-ray radiation and achieve high-quality CT imaging by using low-intensity X-rays because CT radiation is damaging to the human body. An innovative vision transformer for medical image super-resolution (SR) is applied to establish a high-definition image target. To achieve this, we proposed a method called swin transformer and attention network (STAN) that uses the swin transformer network, which employs an attention method to overcome the long-range dependency difficulties encountered in CNNs and RNNs to enhance and restore the quality of medical CT images. We adopted the peak signal-to-noise ratio for performance comparison with other mainstream SR reconstruction models used in medical CT imaging. Experimental results revealed that the proposed STAN model yields superior medical CT imaging results than the existing SR techniques based on CNNs. The proposed STAN model employs a self-attention mechanism to more effectively extract critical features and long-range information, hence enhancing the quality of medical CT image reconstruction.

1. Introduction

Computerized tomography (CT) images are used by doctors in clinical practice to judge a patient's condition. Good image quality is crucial for effective and accurate screening and diagnosis of a patient's condition [1]. CT imaging played a vital role in the diagnosis and treatment of COVID-19 [24]. CT images are obtained using X-rays. However, X-ray radiation is harmful to the human body. Therefore, to reduce the auxiliary radiation, the X-ray intensity is reduced during operation, resulting in low resolution and blurring of CT images. Therefore, how to obtain high-definition medical CT images through superresolution (SR) is an important research topic. For high-quality medical CT images, plenty of deep learning- (DL-) based SR techniques have been proposed [46].

Convolutional neural networks (CNNs) have been used to accomplish SR tasks. Initially, the SRCNN network and CNN were used for performing high-resolution reconstruction tasks [7]. This is the earliest reconstruction from low-resolution to high-resolution images by using CNNs and point-to-point nonlinear feature mapping and reconstruction. Currently, DL is widely used in SR applications [810]. Shan et al. [11] improved the initial CNN-based SR method by introducing residual learning and attention mechanism.

Powerful reconstruction algorithms have been proposed to improve SR capability. In 2016, FSRCNN [12] was proposed to improve the SRCNN model, and upsampling was performed to increase the running speed. Many CNN-based SR algorithms have been proposed to improve residual learning, attention mechanism, model depth, speed, complexity reduction, and SR performance [1322].

CNNs are used in the mainstream medical CT image SR algorithms because they provide a very high-performance advantage for the image domain. However, CNNs cannot realize long-range feature extraction. The transformer is mostly employed in the audio industry, but it has been, recently, employed for SR as a replacement for CNNs [23, 24] because the transformer can support long-range feature extraction by using a self-attention (SA) mechanism and yields very good performance in the image domain. Transformer models in the field of medical imaging have been extensively studied [25]. The transformer network and the shared attention approach to limit feature extraction improve the SR performance. Nevertheless, few studies have used the transformer network to improve the SR of medical images. As such, in this study, we attempted to use the transformer network for SR reconstruction of medical CT images.

To reconstruct medical CT images, in this study, we developed the swin transformer and attention network (STAN) model. The main advantage of STAN is that it can learn feature information better. STAN consists of three types of blocks: the low-frequency feature extraction block, deep feature extraction block (including attention transformer blocks (ATBs)), and high-resolution image reconstruction block.

To preserve the low-frequency information, the low-frequency feature extraction block is directly connected to the reconstructed model. The deep feature extraction module mainly consists of ATBs. To extract image edge and texture information, a shift window size is used, which reduces resource consumption. Finally, in the high-resolution image reconstruction block, the features of the first two models are first obtained, multilayer feature fusion is performed, and finally, low-resolution to high-resolution reconstruction is realized.

The main contributions of this study are as follows:

  1. A swin transformer is proposed in this paper for an SR network of medical CT images. The use of the attention mechanism improves the network's ability to extract features and edge and flat area information from medical CT images and reconstruct high-quality CT images

  2. We developed a low-frequency extraction module with an attention mechanism to capture the long-range dependency feature of the image

  3. To handle long-range dependency images, we used a shift window mechanism, overcoming the traditional transformation limitation of dividing the input image into fixed-size patches

2. Related Work

The traditional SR algorithm uses a bicubic interpolation algorithm to upsample an image and has the disadvantages of losing details and blurring the image. Therefore, neural networks have been employed for SR. The transformer network can further improve the performance of a traditional CNN. With the development of SR, many scholars have applied the SR technology to improve the clarity and reliability of medical CT images by employing the following three approaches:

  1. Obtaining SR images by using CNNs: CNNs are mainly used to perform transformations between images of different resolutions (e.g., LR image to HR image). Due to the different characteristics of the image, different image scaling methods need to be used to recover different image details. Therefore, nonlinear mapping is performed to recover the lost high-frequency details. CNNs are widely used to reconstruct high-quality images and realize SR through dense connection convolution, multichannel networks, and symmetric jump connections [26, 27]

  2. The use of transformer networks in the field of image applications: transformer networks are generally used in the audio field. Because of their local attention mechanism and long-term compliance, transformer networks are highly suitable for image feature extraction. Therefore, the transformer [28, 29] is widely used in the field of image processing because of its ability to better access information and integrate the CNN and transformer. Pan et al. [23] proposed a high-quality reconstruction transformer to capture image global features for medical CT image reconstruction

  3. The use of SR in the field of medical CT imaging: DL technology is extensively employed for medical CT imaging [3032]. Many scholars have applied SR technology to the medical field [3335]. SR technology is used to reconstruct high-definition images for the characteristics of medical images, which can effectively improve image quality and reduce X-ray radiation to the human body

In this study, we designed the STAN model to reconstruct medical CT images. In addition, we introduced a self-information mechanism in the network model to enable updates to be performed on long-range information; moreover, the medical image quality smoothing area enables better image quality.

3. Methods

The architecture of the medical CT image performance enhancement network is presented here. For the SR reconstruction of medical CT images, a transformer and an attention network are employed. To improve the extraction of low-frequency and high-frequency medical feature information, we designed the STAN model. We used a transformer network instead of traditional CNNs to considerably increase the quality of medical CT images and edge information. The proposed system comprises three types of blocks: low-frequency feature extraction block, deep feature extraction block, and high-resolution image reconstruction block.

3.1. Network Architecture

The structure of the proposed STAN model is illustrated in Figure 1. STAN employs an efficient long-range attention transformer network for reconstructing high-resolution images from low-resolution medical CT images. STAN includes the low-frequency feature extraction block (for the extraction of flat area image information), deep feature extraction block (including six ATBs), and high-resolution CT image reconstruction block (for the feature extraction of image edge information). Low-resolution images are inputted into the STAN model. The low-frequency feature extraction block extracts the low-frequency feature information from medical CT images by using multilayer CNNs. The deep feature extraction block employs the self-attention mechanism transformer network to extract the edge information of medical CT images, and multichannel image information is obtained by adding to the previous network. The high-resolution CT image reconstruction block fuses the low-frequency and high-frequency information; in addition, it extracts the characteristic data of multiple channels and upsamples the image to obtain an SR medical CT image.

Figure 1.

Figure 1

The architecture of the proposed STAN model. The STAN includes the low-frequency feature extraction block (for the extraction of flat area image information), deep feature extraction block (including six ATBs), and high-resolution CT image reconstruction block (for the feature extraction of image edge information).

The proposed STAN algorithm is shown below. The low-resolution input image is ILQ. Transform HLF is a low-frequency feature extraction block. In the deep feature extraction block, there are M ATBs, and each ATB has L STLs and a convolution operation. After the deep feature extraction block, the high-quality picture is reconstructed through high-resolution CT image reconstruction.

3.2. Low-Frequency Feature Extraction Block

As shown in Figure 2, the low-frequency feature extraction block realizes low-frequency information extraction and includes three layers. A low-resolution image is input in. After feature extraction by using 3 × 3 convolution operations, fine feature extraction is performed using 1 × 1 convolution. Finally, the low-frequency information extraction output of the current block is obtained using a convolution kernel size of 3 × 3.

Figure 2.

Figure 2

Low-frequency feature extraction block.

A low-resolution image is input as ILQ. Then, two 3 × 3 and one 1 × 1 convolution layer are used to obtain the low-frequency feature output as follows:

F0=HLFILQ. (1)

This module uses a multilayer network to better accomplish the extraction of low-frequency information.

3.3. Deep Feature Extraction Block

Deep features FDFRH×W×C are extracted from the low-frequency feature output F0 as follows:

FDF=HDFF0, (2)

where comprises six ATBs. The composition and principle of the ATB are described in detail here.

As shown in Figure 3, ATBs are composed of swin transformer layers (STLs) and convolutional layers with self-awareness functions. The STL is the base component of the ATB. The base network comprises multiple STLs and ends with a convolutional layer to form the ATB. In this study, the number of STLs in an ATB was set as 6 to achieve a balance between extraction performance and model complexity.

Figure 3.

Figure 3

Attention transformer block.

F i,0 indicates the i-th ATB. Information features Fi,1, Fi,2,…, Fi,L are extracted by the ATB layers as follows:

Fi,j=HSTLi,jFi,j1,j=1,2,,L, (3)

where HSTLi,j(·) is the i-th ATB and J denotes the j-th STL. This design offers two advantages: spatial variation convolution and residual connection reconstruction module.

The output of ATB can be formulated as

Fi,out=HCONViFi,L+Fi,0, (4)

where HCONVi(·) is the i-th ATB swin transformer.

The STL enables the self-attentive mechanism through the transformer layer. Its most important feature is the use of local attention and shift window mechanism. By the size H × W × C, the ATB splits the input into nonoverlapping M × M local windows. In this manner, the input size is reshaped into the (HW/M2) × M2 × C feature, where HW/M2 is the number of windows.

The STL consists of three components: layer specification (LN) layer (used for regularization), multicontrol head SA (MCSA) layer, and multilayer control perceptron (MLCP) layer. The MLCP layer is composed of two completely connected neural networks, and feature extraction is performed between them through nonlinear transformation. The LN layer is added before the MCSA and MLCP layers, and then, the residuals are used to connect the two modules. The process is as follows:

X=MCSALNX+X, (5)
X=MCLPLNX+X. (6)

3.4. High-Resolution Image Reconstruction Block

The low-frequency feature extraction of high-quality images is performed from medical CT images according to the convergence:

IRHQ=HRECF0+FDF, (7)

where HREC(·) indicates the reconstruction model. The low-frequency information mainly includes low frequencies, while the deep features are used to repair the missing high frequencies. Sloshing inverter circuits are used to transmit low-frequency information to the medical CT image reconstruction module through a high skip connection and help the deep-level feature collection module to focus on high-frequency information.

The high-resolution image reconstruction block (Figure 4) comprises a 64-channel CNN with channel size H/2, w/2, 64. The 64-channel feature map output is obtained using the pixel shuffle upsampling method. Finally, a 3-channel CNN is used to generate the high-definition image output.

Figure 4.

Figure 4

High-resolution image reconstruction blocks.

The primary function of pixel shuffle is to convert the multichannel feature map rr into size of wr and hr (e.g., the original feature map size is 4 × 128 × 128, which is then adjusted to size 1∗(128 × 2)∗(128 × 2)), where r is the upsampling factor, which is the magnification of the image.

4. Results

We evaluated the performance of the proposed model on open-source datasets and evaluated the image quality by using the peak signal-to-noise ratio (PSNR) metric. Compared with other advanced SR methods, the proposed model offers obvious performance advantages.

4.1. Dataset

We used the largest medical CT medical image dataset, DeepLesion [36], for training and testing the model. This dataset not only includes key CT slices containing the important lesions but also provides the three-dimensional context (additional slices of 30 mm above and below the key slices). The size of the dataset is 221 GB. Because of the huge amount of data, 11,500 high-quality CT images were randomly selected and divided into three parts. The majority of the images were used for training (10,000), and the remaining were used for verification (1000) and testing (500). This dataset consists of the original image and the downsampled image through bicubic interpolation by using the function torchvision.transforms.resize() in the PyTorch library. The source HR medical CT image was reduced to a LR image as the input data, and the original HR medical CT medical image was used as the data label to be used as the input dataset of the DL neural network for training. For the sake of accuracy of model training, the training set was added through data enhancement to improve the generalization ability.

4.2. Implementation Details

The three-channel (RGB) pixels of the input image and the original data were linearly reduced to obtain the LR image, and the original data were used as label data and inputted into the network. Six ATBs were used. The sliding window size of each transform network was set as 8, and the patch size corresponding to the LR image resolution was 48.

Adam optimizer was adopted with two improvements: gradient sliding average and bias correction. The learning rate decayed with each update factor decay set as 0.999, and the initial learning rate was 2 × 10−4. The pixel shuffle method was used for image upsampling.

4.3. Evaluation Index

We evaluated the reconstructed SR images by using two methods: subjective evaluation and objective evaluation. Many factors influence subjective evaluation, and the reconstructed SR images are evaluated mainly based on human visual perception. In this study, the PSNR was measured as the objective evaluation metric to study the performance of high-resolution restoration networks for medical CT images. To demonstrate the superiority of the proposed model visually, we calculated the PSNR values of the SR images generated using the proposed method and other methods and compared them.

PSNR is an objective criterion for evaluating images. The calculation method is as follows:

MSE=1mni=0m1j=0n1Ji,jLi,j2. (8)

Then, PSNR can be obtained as follows:

PSNR=10log10MaxValue2MSE=10log102bits1MSE, (9)

where J and L are the two pixel values and the size of the image is m × n. The greater the PSNR, the better the medical CT image effect, and vice versa.

4.4. Ablation Study

To better understand how STAN performs SR in medical CT images, a comprehensive ablation study on ATBs was performed to evaluate the role of key parts of the proposal STAN model, as well as the degree of depth and the choice of shared attention mechanism.

As can be seen in Table 1, we studied the effect of the removal and addition of ATB modules on the performance of the medical CT image reconstruction network. To analyze the effect of the low-frequency feature extraction block and ATB on the performance of the STAN model, we conducted ablation experiments by using different numbers of modules and studied their corresponding PSNR performance under the ×4 scaling condition. The number of ATBs affects the PSNR, i.e., the higher the number of ATBs, the higher the PSNR.

Table 1.

Ablation study on ATB design.

ATB number
Number of ATBs X4 PSNR
1 23.3
2 25.78
4 28.967
6 31.34
8 31.90
10 32.02

As shown in Figure 5, we studied the relationship between the number of ATBs and the PSNR performance on DeepLesion for image SR (×4). To obtain a relatively lightweight model, the number of ATBs was selected as 6, and the number of convolutional layers was 3 in the final test performance experiment.

Figure 5.

Figure 5

Relationship between PSNR and number of ATBs in STAN.

4.5. Analysis of Experimental Results

Network optimization was performed. The performance comparison results in terms of PSNR with ×2 and ×4 scale factors are presented in Table 2. We analyzed different algorithms on the DeepLesion testing set. Compared with the bicubic method, the PSNR of STAN improved by 9.58 and 13.36 dB when the scale factor was ×2 and ×4, respectively. Compared with the method using the DL neural network, the PSNR of STAN improved by 3.81 and 3.56 dB when the scale factor was ×2 and ×4, respectively.

Table 2.

PSNR (dB) values for the proposed STAN model.

Model ×2 ×4
Bicubic 23.32 21.76
SRCNN [37] 33.17 27.78
DRRN [38] 34.56 29.65
MDSR [39] 34.85 29.90
RDN [40] 34.96 30.24
RCAN [41] 34.99 30.35
LTE [42] 36.22 31.11
STAN (our) 36.98 31.34

As can be seen in Figure 6, the bicubic reconstruction of medical CT images yielded the worst effect and the lowest PSNR. The SR algorithm based on DL performed better than the algorithm based on interpolation. The STAN model based on transform networks proposed in this paper performed relatively better than the CNN-based SR method by 0.76 and 0.23 dB when the scale factor was ×2 and ×4, respectively.

Figure 6.

Figure 6

Medical CT image testing using different algorithms on the DeepLesion dataset.

Thus, the proposed STAN model exhibited superior performance to the CNN-based SR method, demonstrating that the transformer network yields obvious performance advantages in medical CT imaging.

By using different algorithms, the medical CT image was reconstructed with multiple resolutions. The results of different algorithms on the DeepLesion are shown in Figure 6.

5. Conclusions

For SR of medical CT images, we proposed an improved STAN model that uses the SA mechanism for feature extraction and solves the long-range dependency problem encountered in CNNs and RNNs. In addition, it can obtain more important feature information. In STAN, nonoverlapping feature values are computed using different window sizes, and feature extraction is performed using a shared-attention mechanism.

We experimentally demonstrated the SR effectiveness of the proposed STAN model in medical CT images. We used the PSNR metric for performance comparison. The results revealed that the PSNR of the proposed STAN model is much better than that of the CNN SR method. The use of the SA mechanism in STAN yields clearer reconstruction results, and the reconstruction effect in the low-frequency regions of medical CT images is better. However, medical imaging may generate image noise due to the influence of hardware equipment and the external environment. As such, the next step is to denoise medical CT images in the SR process.

Algorithm 1.

Algorithm 1

The implementation steps of our STAN for medical superresolution.

Acknowledgments

The authors acknowledge the Foundation for 2022 Basic and Applied Basic Research Project of Guangzhou Basic Research Plan (research on video compression algorithm based on dual neural network, Grant 202201011753), the 2020 University Industry University Research Innovation Fund New Generation Information Technology Innovation Project of China (key project, Grant 2020ITA03004), the 2021 characteristic innovation project of the Department of Education of Guangdong Province (2021KTSCX217), the Research on Classified and Accurate Training of Higher Vocational IT Talents based on Education Big Data Under the Background of Enrolment Expansion (Grant 2021GXJK714), and the Innovative Research Team in Universities of Guangdong Province of China (Grant 2021KCXTD079).

Data Availability

The medical CT medical image data used to support the findings of this study have been deposited in the https://nihcc.app.box.com/v/DeepLesion repository. This is an open source medical open source dataset, you can download and use it freely.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  • 1.Brenner D. J., Elliston C. D., Hall E. J., Berdon W. E. Estimated risks of radiation-induced fatal cancer from pediatric CT. American Journal of Roentgenology . 2001;176(2):289–296. doi: 10.2214/ajr.176.2.1760289. [DOI] [PubMed] [Google Scholar]
  • 2.Bakator M., Radosav D. Deep learning and medical diagnosis: a review of literature. Multimodal Technologies and Interaction . 2018;2(3):p. 47. doi: 10.3390/mti2030047. [DOI] [Google Scholar]
  • 3.Long C., Xu H., Shen Q., et al. Diagnosis of the coronavirus disease (COVID-19): rRT-PCR or CT? European Journal of Radiology . 2020;126, article 108961 doi: 10.1016/j.ejrad.2020.108961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rahaman M. M., Li C., Yao Y., et al. Identification of COVID-19 samples from chest X-ray images using deep learning: a comparison of transfer learning approaches. Journal of X-Ray Science and Technology . 2020;28(5):821–839. doi: 10.3233/XST-200715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen Z., Guo X., Woo P. Y., Yuan Y. Super-resolution enhanced medical image diagnosis with sample affinity interaction. IEEE Transactions on Medical Imaging . 2021;40(5):1377–1389. doi: 10.1109/TMI.2021.3055290. [DOI] [PubMed] [Google Scholar]
  • 6.Georgescu M. I., Ionescu R. T., Miron A. I., et al. Multimodal multi-head convolutional attention with various kernel sizes for medical image super-resolution. 2022. https://arxiv.org/abs/2204.04218.
  • 7.Kim J., Lee J. K., Lee K. M. Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition . 2016:1646–1654. [Google Scholar]
  • 8.Passarella L. S., Mahajan S., Pal A., Norman M. R. Reconstructing high resolution ESM data through a novel fast super resolution convolutional neural network (FSRCNN) Geophysical Research Letters . 2022;49(4) doi: 10.1029/2021GL097571. [DOI] [Google Scholar]
  • 9.Wang X., Yu K., Wu S., et al. Enhanced deep residual networks for single image super-resolution; Proceedings of the European conference on computer vision (ECCV) workshops ; 2018. [Google Scholar]
  • 10.Lodha I., Kolur L., Krishnan K., Dheenadayalan K., Sitaram D., Nandi S. Cost-optimized video transfer using real-time super resolution convolutional neural networks. 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD); 2022; Bangalore, India. [DOI] [Google Scholar]
  • 11.Shan L., Bai X., Liu C., Feng Y., Liu Y., Qi Y. Super-resolution reconstruction of digital rock CT images based on residual attention mechanism. Advances in Geo-Energy Research . 2022;6(2):157–168. doi: 10.46690/ager.2022.02.07. [DOI] [Google Scholar]
  • 12.Dong C., Loy C. C., Tang X. Accelerating the super-resolution convolutional neural network. Proceedings of the European Conference on Computer Vision (ECCV 2016); 2016; Amsterdam, The Netherlands. pp. 391–407. [DOI] [Google Scholar]
  • 13.Ta M., Ylmaz B. Super resolution convolutional neural network based pre-processing for automatic polyp detection in colonoscopy images. Computers & Electrical Engineering . 2021;90(12, article 106959) doi: 10.1016/j.compeleceng.2020.106959. [DOI] [Google Scholar]
  • 14.Deshpande A., Estrela V. V., Patavardhan P. The DCT-CNN-ResNet50 architecture to classify brain tumors with super-resolution, convolutional neural network, and the ResNet50. Neuroscience Informatics . 2021;1(4, article 100013) doi: 10.1016/j.neuri.2021.100013. [DOI] [Google Scholar]
  • 15.Yaroshenko M. O., Varfolomieiev A. Y., Yaganov P. O. Hierarchical convolutional neural network for infrared image super-resolution. Microsystems Electronics and Acoustics . 2021;26(1) doi: 10.20535/2523-4455.mea.230603. [DOI] [Google Scholar]
  • 16.Qiu D., Zheng L., Zhu J., Huang D. Multiple improved residual networks for medical image super-resolution. Future Generation Computer Systems . 2021;116:200–208. doi: 10.1016/j.future.2020.11.001. [DOI] [Google Scholar]
  • 17.Chang Q., Jia X., Lu C., Ye J. Multi-Attention Residual Network for Image Super Resolution. International Journal of Pattern Recognition and Artificial Intelligence . 2022;36(8, article 2254009) doi: 10.1142/S021800142254009X. [DOI] [Google Scholar]
  • 18.Brown K. G., Waggener S. C., Redfern A. D., Hoyt K. Faster super-resolution ultrasound imaging with a deep learning model for tissue decluttering and contrast agent localization. Biomedical Physics & Engineering Express . 2021;7(6, article 065035) doi: 10.1088/2057-1976/ac2f71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang X., Yu K., Wu S., et al. Esrgan: enhanced super-resolution generative adversarial networks. European Conference on Computer Vision Workshops; 2018; Munich, Germany. pp. 701–710. [Google Scholar]
  • 20.Zhang J., Zou X., Kuang L. D., Wang J., Sherratt R. S., Yu X. CCTSDB 2021: a more comprehensive traffic sign detection benchmark. Human-centric Computing and Information Sciences . 2022;12 doi: 10.22967/HCIS.2022.12.023. [DOI] [Google Scholar]
  • 21.Chen Y., Liu L., Phonevilay V., et al. Image super-resolution reconstruction based on feature map attention mechanism. Applied Intelligence . 2021;51(7):4367–4380. doi: 10.1007/s10489-020-02116-1. [DOI] [Google Scholar]
  • 22.Zhang J., Li C., Kosov S., et al. LCU-Net: a novel low-cost U-Net for environmental microorganism image segmentation. Pattern Recognition . 2021;115:p. 107885. doi: 10.1016/j.patcog.2021.107885. [DOI] [Google Scholar]
  • 23.Pan J., Zhang H., Wu W., Gao Z., Wu W. Multi-domain integrative Swin transformer network for sparse-view tomographic reconstruction. Patterns . 2022;3(6, article 100498) doi: 10.1016/j.patter.2022.100498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhou B., Schlemper J., Dey N., et al. DSFormer: a dual-domain self-supervised transformer for accelerated multi-contrast MRI reconstruction. 2022. https://arxiv.org/abs/2201.10776. [DOI] [PubMed]
  • 25.Shamshad F., Khan S., Zamir S. W., et al. Transformers in medical imaging: a survey. 2022. https://arxiv.org/abs/2201.09873 . [DOI] [PubMed]
  • 26.Zou Y., Xiao F., Zhang L., Chen Q., Wang B., Hu Y. Image super-resolution using convolutional neural network with symmetric skip connections. Fourth International Conference on Photonics and Optical Engineering; 2021 Jan 15; Xi’an, China. pp. 283–288. [Google Scholar]
  • 27.Wang Y., Zeng S., Wang X., Wang J. Super-resolution multi-focus image fusion based on convolutional neural network. Journal of Physics: Conference Series . 2021;1885(2, article 022011) doi: 10.1088/1742-6596/1885/2/022011. [DOI] [Google Scholar]
  • 28.Liu Z., Lin Y., Cao Y., et al. Swin transformer: hierarchical vision transformer using shifted windows. 2021. https://arxiv.org/abs/2103.14030 .
  • 29.Chen H., Li C., Li X., et al. Gashis-transformer: a multi-scale visual transformer approach for gastric histopathology image classification. 2021. https://arxiv.org/abs/2104.14528 .
  • 30.Kumar R., Khan A. A., Kumar J., et al. Blockchain-federated-learning and deep learning models for covid-19 detection using ct imaging. IEEE Sensors Journal . 2021;21(14):16301–16314. doi: 10.1109/JSEN.2021.3076767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.He K., Liu X., Li M., Li X., Yang H., Zhang H. Noninvasive KRAS mutation estimation in colorectal cancer using a deep learning method based on CT imaging. BMC medical imaging . 2020;20(1):p. 59. doi: 10.1186/s12880-020-00457-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wu W., Hu D., Niu C., et al. Deep learning based spectral CT imaging. Neural Networks . 2021;144:342–358. doi: 10.1016/j.neunet.2021.08.026. [DOI] [PubMed] [Google Scholar]
  • 33.Gupta R., Sharma A., Kumar A. Super-resolution using GANs for medical imaging. Procedia Computer Science . 2020;173:28–35. doi: 10.1016/j.procs.2020.06.005. [DOI] [Google Scholar]
  • 34.Zhu J., Tan C., Yang J., Yang G., Lio P. Arbitrary scale super-resolution for medical images. International Journal of Neural Systems . 2021;31(10, article 2150037) doi: 10.1142/S0129065721500374. [DOI] [PubMed] [Google Scholar]
  • 35.Zhang J., Gong L. R., Yu K., Qi X., Wen Z., Hua Q. 3D reconstruction for super-resolution CT images in the Internet of health things using deep learning. IEEE Access . 2020;8:121513–121525. doi: 10.1109/ACCESS.2020.3007024. [DOI] [Google Scholar]
  • 36.Ke Y., Wang X., Le L., Summers R. M. DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. Journal of medical imaging . 2018;5(3, article 036501) doi: 10.1117/1.JMI.5.3.036501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dong C., Loy C. C., He K., Tang X. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2016;38(2):295–307. doi: 10.1109/TPAMI.2015.2439281. [DOI] [PubMed] [Google Scholar]
  • 38.Tai Y., Yang J., Liu X. Image super-resolution via deep recursive residual network. Proceedings of the IEEE conference on computer vision and pattern recognition.; 2017; Honolulu, Hawaii. pp. 3147–3155. [Google Scholar]
  • 39.Lim B., Son S., Kim H., Nah S., Mu Lee K. Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE conference on computer vision and pattern recognition workshops; 2017; Honolulu, Hawaii. pp. 136–144. [Google Scholar]
  • 40.Zhang Y., Tian Y., Kong Y., Zhong B., Fu Y. Residual dense network for image super-resolution. Proceedings of the IEEE conference on computer vision and pattern recognition; 2018; Salt Lake City, Utah. pp. 2472–2481. [Google Scholar]
  • 41.Zhang Y., Li K., Li K., Wang L., Zhong B., Fu Y. Image super-resolution using very deep residual channel attention networks. Proceedings of the European conference on computer vision (ECCV); 2018; Munich, Germany. pp. 286–301. [Google Scholar]
  • 42.Lee J., Jin K. H. Local texture estimator for implicit representation function. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022; Ernest N. Moral Convention Center, New Orleans. pp. 1929–1938. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The medical CT medical image data used to support the findings of this study have been deposited in the https://nihcc.app.box.com/v/DeepLesion repository. This is an open source medical open source dataset, you can download and use it freely.


Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES