Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 28.
Published in final edited form as: Proc IEEE Int Symp Biomed Imaging. 2025 May 12;2025:10.1109/isbi60581.2025.10981104. doi: 10.1109/isbi60581.2025.10981104

TPOT: TOPOLOGY PRESERVING OPTIMAL TRANSPORT IN RETINAL FUNDUS IMAGE ENHANCEMENT

Xuanzhao Dong 1, Wenhui Zhu 1, Xin Li 1, Guoxin Sun 1, Yi Su 2, Oana M Dumitrascu 3, Yalin Wang 1
PMCID: PMC12380521  NIHMSID: NIHMS2104092  PMID: 40881625

Abstract

Retinal fundus photography enhancement is important for diagnosing and monitoring retinal diseases. However, early approaches to retinal image enhancement, such as those based on Generative Adversarial Networks (GANs), often struggle to preserve the complex topological information of blood vessels, resulting in spurious or missing vessel structures. The persistence diagram, which captures topological features based on the persistence of topological structures under different filtrations, provides a promising way to represent the structure information. In this work, we propose a topology-preserving training paradigm that regularizes blood vessel structures by minimizing the differences of persistence diagrams. We call the resulting framework Topology Preserving Optimal Transport (TPOT). Experimental results on a large-scale dataset demonstrate the superiority of the proposed method compared to several state-of-the-art supervised and unsupervised techniques, both in terms of image quality and performance in the downstream blood vessel segmentation task. The code is available at https://github.com/Retinal-Research/TPOT.

Keywords: Retinal Fundus Photography, Image Denoising, GANs, Topology Preserving

1. INTRODUCTION

Retinal color fundus photography (CFP) plays a critical role in diagnosing ocular diseases [4, 5]. However, retinal fundus imaging equipment, particularly nonmydriatic cameras, faces challenges such as artifacts and blurring caused by uncontrollable factors (e.g., variability in ambient light conditions, capture errors, or operator mistakes). These issues often result in degraded retinal images. Some types of degradation, such as contrast mismatch, inconsistent illumination, and vessel occlusions, can impair the accurate visualization of lesions and blood vessels, essential for diagnosing retinal diseases such as diabetic retinopathy. Therefore, developing a robust framework to enhance low-quality images is vital for improving diagnostic accuracy [1, 2].

Since it is both difficult and costly to collect paired degraded and clean retinal images, research on retinal fundus image quality enhancement has shifted from supervised algorithms [6] to unsupervised algorithms, leading to remarkable improvements in image quality. These unsupervised approaches typically model retinal quality enhancement as an end-to-end image-to-image (I2I) translation task using Generative Adversarial Networks (GANs), where low-quality images X are transformed into their high-quality counterparts Y. Specifically, CycleGAN [7] serves as a common strategy, as its cycle consistency regularization enables learning meaningful bidirectional mappings and improves generation alignment, albeit with additional computational cost and potential artifacts.

To address these limitations, Wang et al. [8] applied optimal transport (OT) theory to learn optimal mappings between two image domains, Zhu et al. [1, 2] introduced to use the Structural Similarity Index (SSIM) to regularize the generator and prevent excessive distortion of lesion structures. However, these methods often prove suboptimal, such as suffering from mode collapse, when handling images with complex structures or multimodal distributions. To mitigate this, Dong et al. [9] proposed the unpaired neural schrödinger bridge method to reserve the smooth and probabilistically consistent transformation between distributions. However, their iterative learning process can smooth high-frequency information, producing suboptimal structure preservation.

Topology preservation has become a key focus in segmentation tasks. Specifically, Hu et al. [10] extended beyond simple geometric structures by constructing a differentiable topology loss based on persistence diagrams and incorporating it into the segmentation training process. However, existing fundus image enhancement methods primarily focus on maintaining semantic consistency in the feature space [11] without considering introducing topology preservation and leveraging the natural advantage(e.g., vessel topology structures).

In this paper, we propose a novel topology-preserving retinal fundus image enhancement framework, named Topology Preserving Optimal Transport (TPOT), which leverages Optimal Transport (OT) theory to learn the optimal mappings between low-quality and high-quality images while incorporating topology regularization to preserve the complex blood vessel structures. Our main contributions are threefold: (i) To the best of our knowledge, this is the first work to introduce topology-preserving into OT-GAN-based retinal enhancement methods. (ii) We extend the application of topology loss beyond segmentation tasks, demonstrating its effectiveness in preserving topological structures during image generation. (iii) We conduct experiments on public retinal fundus datasets, showing our method outperforms various supervised and unsupervised approaches in denoising and downstream vessel segmentation.

2. TOPOLOGY PRESERVING OPTIMAL

Our proposed methods contain two main modules. Optimal Transport guided domain transformation and topology-preserving regularization for retinal blood vessels. The whole framework is shown in Fig. 1.

Fig. 1.

Fig. 1.

(A) illustrates the TPOT framework. The structures of Gθ and Dβ follows the design outlined in [1, 2], and SN follows the design presented in [3]. (B) represents the changes in segmentation masks of the images during training. The orange box highlights regions with complex topological structures. (C) provides an example of how our topology-preserving regularization operates. The yellow crosses and blue dots represent corresponding persistent features in SN(x) and SNGθ(x), respectively. The green dots indicate persistent noise features that must be removed during training. The model penalizes the differences between corresponding points to encourage topological consistency. Further details are discussed in Sec. 2.

Quality enhancement guided by optimal transport.

Let X and Y represent the domains of low-quality and high-quality images, with corresponding probability measures μ and ν. The retinal fundus image enhancement task can be naturally framed as solving Monge’s optimal transport problem, which is formulated as follows:

f*=inffxC(x,f(x))dμ(x)subjecttof#μ=ν (1)

where C(,):X×YR+ represents the cost function, f denotes the candidate mapping functions, and the constraint ensures alignment between the two probability measures. Inspired by the works in [1, 2, 8], Eq. 1 can be modeled using an adversarial training strategy. Specifically, the generator Gθ, parameterized by θ, is used to model the optimal mapping f*. Identity regularization Cy,Gθ(y) ensures that it doesn’t introduce unnecessary changes. The discriminator Dβ helps model the Wasserstein-1 distance between PY and PGk(X), fulfilling the pushforward condition in Eq. 1. This leads to the following objective function:

maxGθminDβExPxCx,Gθ(x)+EyPYCy,Gθ(y)+λW1PY,PGθ(x) (2)

Here, Cost functions Cx,Gθ(x) and Cy,Gθ(y) are calculated based on the locally Quasi-Convex Multi-Scale Structural Similarity Index Measure [12, 13]. To enforce 1-Lipschitz continuity in the discriminator, a gradient penalty is applied.

Topology-preserving regularization.

Persistent homology provides a continuous representation of topological changes as the threshold α dynamically varies. Specifically, let SN denote the pre-trained segmentation network, SN(x) and SNGθ(x) represent the output likelihood map of low-quality input and synthetic high-quality counterparts, respectively. (e.g., SN(x)0.5=xSN(x)>0.5 represents the binary segmentation mask for x) As α gradually decreases, an increasing number of pixels in SNGθ(x)α are classified as foreground (e.g., blood vessels), leading to corresponding changes in the topological structures. For instance, as α decreases, vessels that form loops may lose their structure as more neighboring pixels are considered foreground vessels. The persistence diagrams, which summarize the changes in topology, are denoted as DgmSN(x) and DgmSNGθ(x). These diagrams yield various persistent points, denoted as P=d1,b1 and PG=d2,b2, where d1 and d2 correspond to the death thresholds and b1 and b2 correspond to the birth thresholds of the topological features. Following the ideal outlined in [10], the topology regularization, Ctopox,Gθ(x), is defined as:

Ctopox,Gθ(x)=minαPGDgmSNGθPGαPG2=PGDgmSNGθb2α*b22+d2α*d22 (3)

where α*d2,α*b2P and α* represents the optimal correspondence of points between P and PG. This correspondence is calculated based on the persistence (e.g., d1b2) of the critical points in P and PG. We compute the topology regularization based on image patches. The final topology regularization is expressed as:

Ctopox,Gθ(x)=iCtopoixi,Gθxi (4)

where i represents the index of the image patches. As illustrated in Fig. 1, by enforcing the alignment of persistent dots in PG with their corresponding dots in P and by removing noisy dots that lack matches, we ensure that the enhancement process preserves the structural integrity of blood vessels. Consequently, as training progresses, the complex topological structure of the blood vessels (highlighted in the orange box) gradually becomes more consistent between x and Gθ(x). Additionally, since Eq. 3 is computed based on the mean squared error between critical points in P and PG, this regularization is convex and has a well-established gradient [10]. Thus, the final training objective is given by:

maxGθminDβλssimExPXCx,Gθ(x)+λidtEyPYCy,Gθ(y)+λtopoExPXCtopox,Gθ(x)Iti+W1PY,PGθ(X) (5)

where Iti is the indicator function, signifying that Gθ incorporates topology regularization after ti epochs.

3. EXPERIMENTS AND RESULTS

3.1. Experimental Details

Baselines

We compare our methods against the following baselines: Supervised algorithm: cofe-Net [6]; Unsupervised algorithms: CycleGAN [7], OTT-GAN [8], OTRE-GAN [1], WGAN [14], Context-aware OT [11] and CUNSB-RFIE [9]. For the quality enhancement task, we utilize Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR) as evaluation metrics. For the downstream vessel segmentation task, we assess performance using Area under the ROC Curve (AUC), Area under the Precision-Recall Curve (PR), F1-score and Specificity (SP).

Dataset.

For the quality enhancement task, we used the algorithms outlined in [6] to generate degraded images that combined spot artifacts, illumination changes, and blurring. All models were trained on 5,000 EyeQ images [15] and tested on 6,217 EyeQ images. The training utilized both supervised and unsupervised modes, depending on the specific methods employed. The vessel segmentation task was conducted using the enhanced images, starting from scratch with the DRIVE [16] dataset, adhering to the official training and testing split. Before any operations, All images are cropped and resized into 256 × 256 pixels, except method [6].

TPOT settings.

The generator and discriminator used in Eq. 5 follows the design outlined in [1,2], and the pre-trained segmentation network employed from [3]. We keep the same parameter settings with [6, 2, 11, 9] for all baselines. For TPOT, we trained the model for 200 epochs using the RM-Sprop optimizer, with ti=100. The learning rate in each phase started at 2 × 10−4 and decayed by a factor of 10 after every 50 epochs. Additional parameters were set as follows: λssim=30,λidt=15,λtopo=1 and the patch size in Eq. 4 was set to 65.

3.2. Experimental Results

Quality enhancement task.

The metric scores are shown in Tab. 1, where our methods achieve the best performance in both SSIM and PSNR, highlighting its superiority in preserving perceptual quality and pixel-wise alignment. For other models, as shown in Fig. 2(A), the Context-aware OT method shows noticeable semantic deficiencies at the image boundaries, resulting in the disappearance of blood vessels. In contrast, CUNSB-RFIE maintains better consistency in image space and effectively handles spot artifacts, though it slightly alters the background. Other GAN-based models (e.g., CycleGAN, OTRE-GAN, OTT-GAN) struggle to preserve peripheral blood vessels with complex topological structures (e.g., loops), still resulting in over-tampering of vessel structures. Our method outperforms these approaches by preserving overall image quality and vessel structures.

Table 1.

Performance comparison with baselines.

Method EyeQ Downstream vessel segmentation

SSIM ↑ PSNR↑ AUC ↑ PR ↑ F1-score ↑ SP↑

Supervised Methods cofe-Net [6] 0.9408 24.907 0.9188 0.7698 0.6890 0.9801

Unsupervised Methods Context-aware OT [11] 0.8497 21.088 0.8650 0.6547 0.5887 0.9765
CycleGAN [7] 0.9313 25.076 0.9015 0.7277 0.6462 0.9801
OTRE-GAN [1] 0.9392 24.812 0.9156 0.7678 0.6918 0.9797
OTT-GAN [8] 0.9275 24.065 0.9034 0.7400 0.6608 0.9812
WGAN [14] 0.9266 24.793 0.9081 0.7494 0.6768 0.9764
CUNSB-RFIE [9] 0.9121 24.242 0.9163 0.7625 0.6872 0.9784

Ours 0.9417 25.196 0.9191 0.7748 0.6926 0.9816

The best performance within each column is highlighted in bold.

Fig. 2.

Fig. 2.

(A) illustrates the result of quality enhancement task on the EyeQ dataset. (B) shows the enhanced DRIVE image along with the corresponding vessel segmentation. The orange box highlights a patch with a complex topological structure, where our method demonstrates greater consistency.

Vessel segmentation task.

As shown in Tab. 1, our method achieves the best performance across all four evaluation metrics. Specifically, Fig. 2(B) presents the vessel segmentation illustration on the DRIVE dataset. Due to vessel deficiencies at the image level, the Context-OT model produces noisy boundary predictions. Other GAN-based methods (e.g., OTRE-GAN and CycleGAN) improve overall image quality but struggle to preserve peripheral vessel structures. In the highlighted segmentation patch, CUNSB demonstrates a better ability to predict small vessel branches, while other methods tend to classify them as background. Our method preserves the complex topology structure, leading to continuous and consistent vessel morphology.

3.3. Ablation Study

Our method leverages topology-preserving regularization to maintain the blood vessel structure during enhancement. To evaluate the effect of this additional regularization on the final generation performance, we conducted ablation studies on the EyeQ dataset, with the results reported in Tab. 2. The absence of topology-preserving regularization, as represented by OTRE-GAN, resulted in relatively poorer performance. As we incrementally increased the weight λtopo in Eq. 5, we observed a clear performance improvement, even with a small weight. This demonstrates that topology-preserving regularization contributes significantly to the overall enhancement quality.

Table 2.

Ablation study over Topology-preserving regularization.

λtopo SSIM PSNR

0 (OTRE-GAN) 0.9392 24.812
0.1 0.9409 24.963
1 0.9417 25.196
10 0.9413 25.126

4. CONCLUSION

In this study, we propose a Topology Preserving Optimal Transport (TPOT) pipeline for enhancing retinal images. We incorporate topology regularization to preserve blood vessel structures, promoting vessel alignment in the enhancement process. We demonstrate the effectiveness of TPOT in denoising and maintaining the topological integrity of blood vessels on EyeQ and DRIVE datasets, respectively. While we utilized synthetic degraded images and faced challenges related to domain shifts in capturing segmentation outputs, our findings underscore the importance of preserving topological structures in medical image generation. Future research will further explore the potential of topology regularization in other medical image generation applications.

ACKNOWLEDGMENTS

This work was supported by grants from the National Institutes of Health (RF1AG073424, R01EY032125, and R01DE0 30286) and the State of Arizona via the Arizona Alzheimer Consortium.

Footnotes

COMPLIANCE WITH ETHICAL STANDARDS

This research study was conducted retrospectively using human subject data available in open access by [15]. Ethical approval was not required as confirmed by the license attached with the open-access data.

REFERENCES

  • [1].Zhu Wenhui, Qiu Peijie, Dumitrascu Oana M, Sobczak Jacob M, Farazi Mohammad, Yang Zhangsihao, Nandakumar Keshav, and Wang Yalin, “Otre: Where optimal transport guided unpaired image-to-image translation meets regularization by enhancing,” in International Conference on Information Processing in Medical Imaging. Springer, 2023, pp. 415–427. 1, 2, 3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Zhu Wenhui, Qiu Peijie, Farazi Mohammad, Nandakumar Keshav, Dumitrascu Oana M, and Wang Yalin, “Optimal transport guided unsupervised learning for enhancing low-quality retinal images,” Proc IEEE Int Symp Biomed Imaging, 2023. 1, 2, 3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Zhou Yuqian, Yu Hanchao, and Shi Humphrey, “Study group learning: Improving retinal vessel segmentation trained with noisy labels,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer, 2021, pp. 57–67. 2, 3 [Google Scholar]
  • [4].Zhu Wenhui, Qiu Peijie, Chen Xiwen, Li Xin, Lepore Natasha, Dumitrascu Oana M., and Wang Yalin, “nnmobilenet: Rethinking cnn for retinopathy research,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2024, pp. 2285–2294. 1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Dumitrascu Oana M., Li Xin, Zhu Wenhui, Woodruff Bryan K., Nikolova Simona, Sobczak Jacob, Youssef Amal, Saxena Siddhant, Andreev Janine, Caselli Richard J., Chen John J., and Wang Yalin, “Color fundus photography and deep learning applications in alzheimer’s disease,” Mayo Clinic Proceedings: Digital Health, 2024. 1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Shen Ziyi, Fu Huazhu, Shen Jianbing, and Shao Ling, “Modeling and enhancing low-quality retinal fundus images,” IEEE transactions on medical imaging, vol. 40, no. 3, pp. 996–1006, 2020. 1, 3 [DOI] [PubMed] [Google Scholar]
  • [7].Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232. 1, 3 [Google Scholar]
  • [8].Wang Wei, Wen Fei, Yan Zeyu, and Liu Peilin, “Optimal transport for unsupervised denoising learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2104–2118, 2022. 1, 2, 3 [DOI] [PubMed] [Google Scholar]
  • [9].Dong Xuanzhao, Vasa Vamsi Krishna, Zhu Wenhui, Qiu Peijie, Chen Xiwen, Su Yi, Xiong Yujian, Yang Zhangsihao, Chen Yanxi, and Wang Yalin, “Cunsb-rfie: Context-aware unpaired neural schr”{o} dinger bridge in retinal fundus image enhancement,” arXiv preprint arXiv:2409.10966, 2024. 1, 3 [Google Scholar]
  • [10].Hu Xiaoling, Li Fuxin, Samaras Dimitris, and Chen Chao, “Topology-preserving deep image segmentation,” Advances in neural information processing systems, vol. 32, 2019. 1, 3 [Google Scholar]
  • [11].Vasa Vamsi Krishna, Qiu Peijie, Zhu Wenhui, Xiong Yujian, Dumitrascu Oana, and Wang Yalin, “Context-aware optimal transport learning for retinal fundus image enhancement,” arXiv preprint arXiv:2409.07862, 2024. 1, 3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Wang Zhou, Simoncelli Eero P, and Bovik Alan C, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. Ieee, 2003, vol. 2, pp. 1398–1402. 2 [Google Scholar]
  • [13].Brunet Dominique, Vrscay Edward R, and Wang Zhou, “On the mathematical properties of the structural similarity index,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1488–1499, 2011. 2 [DOI] [PubMed] [Google Scholar]
  • [14].Arjovsky Martin, Chintala Soumith, and Bottou Léon, “Wasserstein generative adversarial networks,” in International conference on machine learning. PMLR, 2017, pp. 214–223. 3 [Google Scholar]
  • [15].Fu Huazhu, Wang Boyang, Shen Jianbing, and et al. , “Evaluation of retinal image quality assessment networks in different color-spaces,” MICCAI, pp. 48–56, 2019. 3, 5 [Google Scholar]
  • [16].Staal J, Abramoff MD, Niemeijer M, Viergever MA, and van Ginneken B, “Ridge-based vessel segmentation in color images of the retina,” IEEE Transactions on Medical Imaging, vol. 23, no. 4, pp. 501–509, 2004. 3 [DOI] [PubMed] [Google Scholar]

RESOURCES