全局与局部属性一致的图像修复模型
摘 要
目的 图像修复是计算机视觉领域的研究热点之一。基于深度学习的图像修复方法取得了一定成绩,但在处理全局与局部属性联系密切的图像时难以获得理想效果,尤其在修复较大面积图像缺损时,结果的语义合理性、结构连贯性和细节准确性均有待提高。针对上述问题,提出一种基于全卷积网络,结合生成式对抗网络思想的图像修复模型。方法 基于全卷积神经网络,结合跳跃连接、扩张卷积等方法,提出一种新颖的图像修复网络作为生成器修复缺损图像;引入结构相似性(structural similarity,SSIM)作为图像修复的重构损失,从人眼视觉系统的角度监督指导模型学习,提高图像修复效果;使用改进后的全局和局部上下文判别网络作为双路判别器,对修复结果进行真伪判别,同时,结合对抗式损失,提出一种联合损失用于监督模型的训练,使修复区域内容真实自然且与整幅图像具有属性一致性。结果 为验证本文图像修复模型的有效性,在CelebA-HQ数据集上,以主观感受和客观指标为依据,与目前主流的图像修复算法进行图像修复效果对比。结果表明,本文方法在修复结果的语义合理性、结构连贯性以及细节准确性等方面均取得了进步,峰值信噪比(peak signal-to-noise ratio,PSNR)和结构相似性的均值分别达到31.30 dB和90.58%。结论 本文提出的图像修复模型对图像高级语义有更好的理解,对上下文信息和细节信息把握更精准,能取得更符合人眼视觉感受的图像修复结果。
关键词
Image inpainting model with consistent global and local attributes
Sun Jinguang, Yang Zhongwei, Huang Sheng(School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105, China) Abstract
Objective Image inpainting is a hot research topic in computer vision. In recent years, this task has been considered a conditional pattern generation problem in deep learning that has received much attention from researchers. Compared with traditional algorithms, deep-learning-based image inpainting methods can be used in more extensive scenarios with better inpainting effects. Nevertheless, these methods have limitations. For instance, their image inpainting results need to be improved in terms of semantic rationality, structural coherence, and detail accuracy when processing the close association among global and local attributed images, especially when dealing with images involving a large defect area. This paper proposes a novel image inpainting model based on the fully convolutional neural network and the idea of generative adversarial network to solve the above problems. This model optimizes the network structure, loss constraints, and training strategies to obtain improved image inpainting effects. Method First, this paper proposes a novel image inpainting network as a generator to repair defective images by using effective methods in the field of image processing. A network framework based on a fully convolutional neural network is then built in the form of an encoder-decoder. For instance, we replace part of convolutional layers in the network decoding stage with dilated convolution. We also apply dilated convolution superposition with multiple dilation rates to obtain a larger input image area compared with ordinary convolution in small-size feature graphs and then effectively increase the receptive field of the convolution kernel without increasing the calculation amount to develop a better understanding of images. We also set long-skip connections in the corresponding stage of encoding-decoding. This connection strengthens the structural information by transmitting low-level features to the decoding stage. The setting enhances the correlation among deep features and reduces the difficulties in network training. Second, we introduce structural similarity (SSIM) as the reconstruction loss of image inpainting. This image quality evaluation index is built from the perspective of the human visual perception system and differs from the common mean square error (MSE) loss per pixel. This index comprehensively evaluates via an experiment the similarity between two images in their brightness, contrast, and structure. Structural similarity, as the reconstruction loss of an image, can effectively improve the visual effects of image inpainting results. We use the improved global and local context discriminator as a two-way discriminator to determine the authenticity of the inpainting results. The global context discriminator guarantees the consistency of attributes between the image inpainting area and the entire image, whereas the local context discriminator improves the detailed performance of the image inpainting area. Combined with adversarial loss, this paper proposes a joint loss to improve the performance of the model and reduce the difficulties in its training. By drawing lessons from the training mode of generative adversarial networks, we presents a novel method to alternately train image inpainting network and image discriminative network, which obtains an ideal result. In practical applications, we only use image inpainting network to repair defective images. Result To verify the effectiveness of the proposed image inpainting model, we compare the image inpainting effect of this model with that of mainstream image inpainting algorithms on the CelebA-HQ dataset by using subjective perception and objective indicators. To achieve the best inpainting effect in controlled experiments, we use official versions of codes and examples. The image inpainting result is taken from loading pre-training files or online demos. We place the specific defect mask onto 50 randomly selected images as test cases and then apply different image inpainting algorithms to repair and collect statistics for the comparison. The CelebA-HQ dataset is a cropped and super-resolution reconstructed version of the CelebA dataset, which contains 30 000 high-resolution face images. The human face represents a special image that not only contains specific features but also an infinite amount of details. Therefore, face images can fully test the expressiveness of the image inpainting method. Considering the algorithm consistent attribute of the global and local images in the controlled experiment, experiment results show that the image inpainting model demonstrates some improvements in its semantic rationality, structural coherence, and detail performance compared with other algorithms. Subjectively, this model has a natural edge transition and a very detailed image inpainting area. Objectively, this model has a peak signal-to-noise ratio(PSNR), and SSIM of 31.30 dB and 90.58% on average, respective, both of which exceed those of mainstream deep learning-based image inpainting algorithms. To verify its generality, we test the image inpainting model on the Places2 dataset. Conclusion This paper proposes a novel image inpainting model that shows improvements in terms of network structure, cost, training strategy, and image inpainting results. This model also provides a better understanding of the high-level semantics of images. Given its highly accurate context and details, the proposed model obtains better image inpainting results from human visual perception. We will continue to improve the effect of image inpainting and explore the conditional image inpainting task in the future. Our plan is to improve and optimize this model in terms of network structure and loss constraint to reduce losses in an image during the feature extraction process under a controllable network training setup. We shall also try to make the defect mask do more work with channel domain attention mechanism to further improve the quality of image inpainting. We also plan to analyze the relationship between image boundary structure and feature reconstruction. We aim to improve the convergence speed of network training and the quality of image inpainting by using an accurate and effective loss function. Furthermore, we would use human-computer interaction or presupposed condition to affect the results of image inpainting, which explores more practical values of the model.
Keywords
image inpainting fully convolutional neural network dilated convolution skip connection adversarial loss
|