基于模糊扩散模型的透射文档图像复原算法

王义杰; 龚嘉鑫; 梁宗宝; 崇乾鹏; 程翔; 徐金东

发布时间： 2024-09-12
摘要点击次数： 10
全文下载次数： 6
DOI:
| Volume | Number

基于模糊扩散模型的透射文档图像复原算法

王义杰¹, 龚嘉鑫¹, 梁宗宝¹, 崇乾鹏¹, 程翔², 徐金东¹(1.烟台大学计算机与控制工程学院;2.北京大学信息科学与技术学院)

摘要

目的在对文档进行数码成像时,墨水浓度和材质透明度等因素可能会导致文档背面内容透过纸张变得可见,从而导致数字图像中出现透射现象,影响文档图像的实际使用。针对这一现象,本文提出了一种模糊扩散模型,基于模糊逻辑的均值回归思想,不需要任何先验知识,增强扩散模型处理文档图像中随机因素的能力,不仅解决了文档图像的透射现象,而且增强了图像的视觉效果。方法本文所提方法通过均值回归随机微分方程连续添加随机噪声降低原始图像质量,将其转换为带有固定高斯噪声的透射均值状态,随后在噪声网络中引入模糊逻辑操作来推理每个像素点的隶属度关系,使模型更好地学习噪声和数据分布,在逆向过程中,通过模拟相应的逆时间随机微分方程来逐渐恢复低质量图像,获得干净的无透射图像。结果将所提算法分别在合成灰度数据集和合成彩色数据集上进行训练,并在3个合成数据集和2个真实数据集上进行测试,与现有代表性的5种方法进行了比较,所提出的方法取得了最好的视觉效果,且在一定程度上消除了原始图像中的噪声。在峰值信噪比(peak signal-to-noise ratio,PSNR)、结构相似性(Structural Similarity Index,SSIM)、学习感知图像块相似度(learned perceptual image patch similarity, LPIPS)和费雷歇初始距离(Fréchet inception distance, FID)四个评价指标上均取得了最好的结果。结论本文方法能够有效地解决不同类型文档图像中的透射现象,提高了文档图像去透射任务的准确性和效率,有望集成到各种摄像头、扫描仪等实际硬件设备。

关键词

扩散模型模糊逻辑图像复原透射去除随机微分方程

FDM: Document Image Seen-through Removal via Fuzzy Diffusion Models

(School of Computer and Control Engineering, YanTai University)

Abstract

Document images have significant applications across various fields such as OCR recognition, historical document restoration, and electronic reading. However, while scanning or shooting a document, factors like ink density and paper transparency may cause the content from the reverse side to become visible through the paper, resulting in a digital image with a `seen-through' phenomenon, which will affect practical applications. Additionally, the image acquisition process is often affected by various sources of uncertainty, including differences in camera equipment performance, paper quality, lighting conditions, lens shake, and variations in the physical properties of the documents themselves. These random factors contribute to the noise in document images and may complicate the seen-through phenomenon, thereby impacting subsequent tasks such as text recognition, word identification, and layout analysis. It's worth noting that while restoring the content of document images is important, the backgrounds of many color document images also provide valuable information. Therefore, recovering color images with complex backgrounds affected by the seen-through phenomenon presents its own challenges. While existing methods for removing seen-through effects from document images have made progress in improving image quality, algorithms specifically tailored to handle variations in the degree of seen-through effects, complex background colors, and the influence of uncertainty factors have not yet been developed. Addressing these issues, this paper aims to develop a comprehensive algorithm for addressing the diverse seen-through problems in regular document images, handwritten document images, and color document images. Consequently, we propose Fuzzy Diffusion Model (FDM) that integrates fuzzy logic with conditional diffusion models. The objective of this algorithm is to restore document images affected by various types and degrees of seen-through phenomena, introducing a novel approach to document image enhancement and restoration. Method The overall process of this algorithm can be divided into a forward diffusion process and a corresponding reverse denoising process. Firstly, we gradually add continuous Gaussian noise to the input image using mean-reverting stochastic differential equations (SDE), resulting in a seen-through mean state with fixed Gaussian noise. Subsequently, we train a neural network to progressively predict the noise at the current time step from the image with added noise and estimate the score function based on the predicted noise. Finally, in the reverse process, we gradually restore the low-quality image by simulating the corresponding reverse-time stochastic differential equation until a clean image without seen-through effects is generated. To address the uncertainty factors in document images, we specifically design a fuzzy block in the skip connection part of the noise network to compute the affiliation of each pixel point in the image. Specifically, the fuzzy operation uses 9 surrounding pixels including the pixel itself, and the final affiliation of the pixel is obtained after fuzzy inference. It is worth noting that we draw inspiration from the U-Net structure in denoising diffusion probabilistic model (DDPM), with the difference that we remove all group normalization layers and self-attention layers to improve inference efficiency. In the middle part, we introduce Atrous Spatial Pyramid Pooling (ASPP) specifically to maximize the expansion of the receptive field and extract richer features. Finding matching pairs of seen-through images in the real world is challenging, so we propose a new protocol for synthetic seen-through images. During training, we input seen-through images as conditional information along with the noise-added images into the noise network to allow the model to learn the target distribution directionally. After the model is trained, seen-through images are used as conditional input to progressively predict the noise in the noise image, generating clear document images. Result We trained our model separately on the synthetic grayscale dataset and the synthetic color dataset, and tested it on three synthetic datasets and two real datasets. The test sets include synthetic grayscale document images, synthetic color document images, synthetic handwritten document images, the Media Team Oulu document dataset, and real CET-6 seen-through document images. We compared our method with five representative existing methods, and the proposed method achieved the best visual effects, effectively eliminating the noise present in the original images to some extent. Our method achieved the best results on the four evaluation metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and Fréchet Inception Distance (FID). Compared to the methods being compared, our method achieved the best PSNR and FID values on the grayscale dataset, with values of 35.05 and 30.69, respectively. On the synthetic color dataset, we obtained the highest SSIM, LPIPS, and FID values, with values of 0.986, 0.0053, and 20.03, respectively. To validate the stability of the proposed method, we also provided the variance values when evaluating the SSIM metric, achieving the best result of 0.0053. Conclusion The proposed FDM effectively addresses various challenges in the task of removing seen-through effects from document images, including the lack of paired seen-through document images, residual seen-through effects, difficulty in handling complex backgrounds, and addressing uncertainty factors in images uniformly. As a result, it can effectively address the phenomenon of seen-through in different types of document images, enhancing the accuracy and efficiency of the task of removing seen-through from document images. It is expected to be integrated into various practical hardware devices such as cameras and scanners.

Keywords

Diffusion models, fuzzy logic, image restoration, seen-through removal, stochastic differential equations

在线采编平台

论文出版

年度会议

下载中心

年度信息