基于卷积字典扩散模型的眼底图像增强算法

王珍; 霍光磊; 兰海; 胡建民; 魏宪

发布时间： 2024-08-14
摘要点击次数： 789
全文下载次数： 291
DOI: 10.11834/jig.230595
2024 | Volume 29 | Number 8

基于卷积字典扩散模型的眼底图像增强算法

王珍^1,2, 霍光磊³, 兰海², 胡建民⁴, 魏宪⁵(1.福建农林大学机电工程学院, 福州 350108;2.中国科学院福建物质结构研究所, 福州 350002;3.泉州通维科技有限责任公司, 泉州 362008;4.福建医科大学医学技术与工程学院, 福州 350122;5.华东师范大学软件工程学院, 上海 200062)

摘要

目的视网膜眼底图像广泛用于临床筛查和诊断眼科疾病,但由于散焦、光线条件不佳等引起的眼底图像模糊,导致医生无法正确诊断,且现有图像增强方法恢复的图像仍存在模糊、高频信息缺失以及噪点增多问题。本文提出了一个卷积字典扩散模型,将卷积字典学习的去噪能力与条件扩散模型的灵活性相结合,从而解决了上述问题。方法算法主要包括两个过程：扩散过程和去噪过程。首先向输入图像中逐步添加随机噪声,得到趋于纯粹噪声的图像；然后训练一个神经网络逐渐将噪声从图像中移除,直到获得一幅清晰图像。本文利用卷积网络来实现卷积字典学习并获取图像稀疏表示,该算法充分利用图像的先验信息,有效避免重建图像高频信息缺失和噪点增多的问题。结果将本文模型在EyePACS数据集上进行训练,并分别在合成数据集DRIVE （dgital retinal images for vessel extraction）、CHASEDB1（child heart and health study in England）、ROC（retinopathy online challenge）和真实数据集RF（real fundus）、HRF（high-resolution fundus）上进行测试,验证了所提方法在图像增强任务上的性能及跨数据集的泛化能力,其评价指标峰值信噪比（peak signal-to-noise ratio,PSNR）和学习感知图像块相似度（learned perceptual image patch similarity,LPIPS）与原始扩散模型（learning enhancement from degradation,Led）相比平均分别提升了1.992 9 dB和0.028 9。此外,将本文方法用于真实眼科图像下游任务的前处理能够有效提升下游任务的表现,在含有分割标签的DRIVE数据集上进行的视网膜血管分割实验结果显示,相较于原始扩散模型,其分割指标对比其受试者工作特征曲线下面积（area under the curve,AUC）,准确率（accuracy,Acc）和敏感性（sensitivity,Sen）平均分别提升0.031 4,0.003 0和0.073 8。结论提出的方法能够在保留真实眼底特征的同时去除模糊、恢复更丰富的细节,从而有利于临床图像的分析和应用。

关键词

眼底图像增强卷积字典学习稀疏表示扩散模型条件扩散模型

Fundus image enhancement algorithm based on convolutional dictionary diffusion model

Wang Zhen^1,2, Huo Guanglei³, Lan Hai², Hu Jianmin⁴, Wei Xian⁵(1.College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou 350108, China;2.Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350002, China;3.Quanzhou Tongwei Technology Co., Ltd., Quanzhou 362008, China;4.School of Medical Technology and Engineering, Fujian Medical University, Fuzhou 350122, China;5.Software Engineering Institute, East China Normal University, Shanghai 200062, China)

Abstract

Objective Retinal fundus images have important clinical applications in ophthalmology. These images can be used to screen and diagnose various ophthalmic diseases, such as diabetic retinopathy, macular degeneration, and glaucoma. However, the acquisition of these images is often affected by various factors in real scenarios, including lens defocus, poor ambient light conditions, patient eye movements, and camera performance. These issues often lead to quality problems such as blurriness, unclear details, and inevitable noise in fundus images. Such poor-quality images pose a challenge to ophthalmologists in their diagnostic work. For example, blurred images will lead to the absence of detailed information about the morphological structure of the retina, which causes difficulty for the physicians to accurately localize and identify abnormalities, lesions, exudations, and other conditions. Existing enhancement methods for fundus images have progressed in improving image quality. However, some problems still exist, such as image blurring, artifacts, missing high-frequency information, and increased noise. Therefore, in this study, we propose a convolutional dictionary diffusion model, which combines convolutional dictionary learning with conditional diffusion model. This algorithm aims to cope with the abovementioned problems of low-quality images to provide an effective tool for fundus image enhancement. Our approach can improve the quality of fundus images and enable physicians to increase diagnostic confidence, improve assessment accuracy, monitor treatment progress, and ensure better care for patients. This method will contribute to ophthalmic research and provide more opportunities for prospective healthcare management and medical intervention, which positively impacts patients’ ocular health and overall quality of life. Method The algorithm consists of two parts： simulation of diffusion process and inverse denoising process. First, random noise is gradually added to the input image to obtain a purely noisy image. Then, a neural network is trained to gradually remove the noise from the image until a clear image is finally obtained. This study takes the blurred fundus image as the conditional information to better preserve the fine-grained structure of the image. Collecting blurred-clear fundus image pairs is difficult. Thus, synthetic fundus dataset is widely used for training. Therefore, a Gaussian filtering algorithm is designed to simulate the defocus blur images. In the training process, the conditional information and the noisy image are first spliced and fed into the network, and the abstract features of the image are extracted by continuously reducing the image size through downsampling. This procedure can significantly reduce the time and space complexity of the sparse representation calculation. Then, the convolutional network is used to implement convolutional dictionary learning and obtain the sparse representation of the image. Given that the self-attention mechanism can capture non-local similarity and long-range dependency, this study adds self-attention to the convolutional dictionary learning module to improve the reconstruction quality. Finally, hierarchical feature extraction is achieved by feature concatenation to realize information fusion between different levels and better use local features in the image. The downsampled feature is recovered to the original image size by an inverse convolutional layer. The model minimizes the negative log-likelihood loss, which represents the difference in probability distribution between the generated image and the original image. After the model is trained, a clear fundus image is generated by gradually removing the noise from a noisy picture with a blurred image as conditional input. Result The proposed method was evaluated on EyePACS dataset, and multiple experiments were performed on synthetic datasets DRIVE （digital retinal images for vessel extraction）, CHASEDB1 （child heart and health study in England）, ROC （retinopathy online challenge）, realistic datasets RF （real fundus） and HRF （high-resolution fundus） to demonstrate the generalizability of our model. Experimental results show that the evaluation metrics peak signal-to-noise ratio （PSNR） and learned perceptual image patch similarity （LPIPS） are improved on average by 1.992 9 and 0.028 9, respectively, compared with the original diffusion model （learning enhancement from degradation （Led））. Moreover, the proposed approach was used as a preprocessing module for downstream tasks. The experiment on retinal vessel segmentation is adopted to prove that our approach can benefit the downstream tasks in clinical application. The results of segmentation experiments on the DRIVE dataset show that all the segmentation metrics improve compared with the original diffusion model. Specifically, the area under the curve （AUC）, accuracy （Acc）, and sensitivity （Sen） are improved by 0.031 4, 0.003 0, and 0.073 8 on average, respectively. Conclusion The proposed method provides a practical tool for fundus image deblurring and a new perspective to improve the quality and accuracy of diagnostic. This approach has a positive impact on patients and ophthalmologists and is expected to promote further development in the interdisciplinary research of ophthalmology and computer science.

Keywords

fundus image enhancement convolutional dictionary learning sparse representation diffusion model conditional diffusion model

在线采编平台

论文出版

年度会议

下载中心

年度信息