Current Issue Cover
融合感知损失的生成式对抗超分辨率算法

杨娟, 李文静, 汪荣贵, 薛丽霞(合肥工业大学计算机与信息学院, 合肥 230601)

摘 要
目的 现有的基于深度学习的单帧图像超分辨率重建算法大多采用均方误差损失作为目标优化函数,以期获得较高的图像评价指标,然而重建出的图像高频信息丢失严重、纹理边缘模糊,难以满足主观视觉感受的需求。同时,现有的深度模型往往通过加深网络的方式来获得更好的重建效果,导致梯度消失问题的产生,训练难度增加。为了解决上述问题,本文提出融合感知损失的超分辨率重建算法,通过构建以生成对抗网络为主体框架的残差网络模型,提高了对低分率图像的特征重构能力,高度还原图像缺失的高频语义信息。方法 本文算法模型包含生成器子网络和判别器子网络两个模块。生成器模块主要由包含稠密残差块的特征金字塔构成,每个稠密残差块的卷积层滤波器大小均为3×3。通过递进式提取图像不同尺度的高频特征完成生成器模块的重建任务。判别器模块通过在多层前馈神经网络中引入微步幅卷积和全局平均池化,有效地学习到生成器重建图像的数据分布规律,进而判断生成图像的真实性,并将判别结果反馈给生成器。最后,算法对融合了感知损失的目标函数进行优化,完成网络参数的更新。结果 本文利用峰值信噪比(PSNR)和结构相似度(SSIM)两个指标作为客观评价标准,在Set5和Set14数据集上测得4倍重建后的峰值信噪比分别为31.72 dB和28.34 dB,结构相似度分别为0.892 4和0.785 6,与其他方法相比提升明显。结论 结合感知损失的生成式对抗超分辨率重建算法准确恢复了图像的纹理细节,能够重建出视觉上舒适的高分辨率图像。
关键词
Generative adversarial network for image super-resolution combining perceptual loss

Yang Juan, Li Wenjing, Wang Ronggui, Xue Lixia(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China)

Abstract
Objective Single image super-resolution (SISR) is a research hotspot in computer vision. SISR aims to reconstruct a high-resolution image from its low-resolution counterpart and is widely used in video surveillance, remote sensing image, and medical imaging. In recent years, many researchers have concentrated on convolutional SISR networking to the massive development of deep learning. They constructed shallow convolutional networks, which perform poorly in improving the quality of reconstructed images. However, these methods adopt mean square error as objective function to obtain a high evaluation index. As a result, they are unable to characterize good edge details, thereby failing to sufficiently infer plausible high frequency. To address this problem, we propose a novel generative adversarial network (GAN) for image super-resolution combining perceptual loss to further improve SR performance. This method outperforms state-of-the-art methods by a large margin in terms of peak signal-to-noise ratio and structure similarity, resulting in noticeable improvement of the reconstruction results. Method SISR is inherently ill-posed because many solutions exist for any given low-resolution pixel. In other words, it is an underdetermined inverse problem that does not have a unique solution. Classical methods constrain the solution space by mitigating the prior information of a natural-scene image, thereby leading to unsatisfactory color analysis and context accuracy results with real high-resolution images. With its strong feature representation ability, CNN outperforms conventional methods. However, these forward CNNs for super-resolution are a single-path model that limits their reconstructive performance because they attempt to optimize the mean square error (MSE) in a pixelwise manner between the super-resolved image and the ground truth. Measuring pixel-wise difference cannot capture perceptual semantic well. Therefore, we propose a novel GAN for image super-resolution that integrates perceptual loss to boost visual performance. Our algorithm model consists of two modules:a generative subnetwork that is mainly composed of Laplacian feature pyramids and a pyramid that contains many dense residual blocks, which serve as the fundamental component. We introduce global residual learning in the identity branch of each residual unit to construct the dense residual block. Therefore, the full usage of all layers not only stabilizes the training process but also effectively preserves information flow through the network. As a result, the generative subnetwork can progressively extract different high-frequency scales of the reconstructed image. The other discriminative subnetwork is a type of forward CNN that introduces stride convolution and global average pooling to enlarge the receptive field and reduce spatial dimensions over a large image region to ensure efficient memory usage and fast inference. The discriminator estimates the probability that a generated high-resolution image came from the ground truth rather than the generative subnetwork by inspecting their feature maps and then feeds back the result to help the generator synthesize more perceptual high-frequency details. Finally, the algorithm model optimizes the objective function to complete the parameter updating. Result All experiments are implemented on the PyTorch framework. We train PSGAN (perceptual super-resolution using generative adversarial network) for 100 epochs by using 291 datasets. Following previous experiments, we transform all RGB images into YCbCr format and resolve the Y channel only because the human eye is most sensitive to this channel. We choose two standard datasets (Set5 and Set14) to verify the effectiveness of our proposed network compared with that of other state-of-the-art methods. For subjective visual evaluation, experiment results that the accuracy of all test samples is reasonable given that the perceptual quality difference between the original ground truth and our generated high-resolution image is not significant. Overall, PSGAN achieves superior clarity and barely shows a ripple effect. For objective evaluation, the average peak signal-to-noise ratio achieved by this method is 37.44 dB and 33.14 dB with scale factor 2 and 31.72 dB and 28.34 dB with scale factor 4 on Set5 and Set14, respectively. In the case of structure similarity measurement, the proposed approach obtains 0.961 4/0.892 4 on Set5 and 0.919 3/0.785 6 on Set14, respectively, which indicates PSGAN produces the best index results. In terms of perceptual measures, we calculate the FSIM of each method, and our PSGAN obtains 0.92/0.91 on Set5 and 0.92/0.88 on Set14, respectively. Experiment results demonstrate that our method improves the unsampled image quality by a large margin. Conclusion We employ a compact and recurrent CNN that mainly consists of dense residual blocks to super-resolve high-resolution image progressively. Comprehensive experiments show that PSGAN achieves considerable improvement in quantitation and visual perception against other state-of-the-art methods. This algorithm provides stronger supervision for brightness consistency and texture recovery and can be applied for photorealistic super-resolution of natural-scene images.
Keywords

订阅号|日报