图像超分辨率重建中的细节互补卷积模型
李浪宇1,2,3, 苏卓1,2, 石晓红4, 黄恩博1,2, 罗笑南5(1.中山大学数据科学与计算机学院, 广州 510006;2.中山大学国家数字家庭工程技术研究中心, 广州 510006;3.中山大学深圳研究院, 深圳 518057;4.中山大学新华学院信息科学学院, 广州 510520;5.桂林电子科技大学计算机与信息安全学院, 桂林 541004) 摘 要
目的 现有的超分辨卷积神经网络为了获得良好的高分辨率图像重建效果需要越来越深的网络层次和更多的训练,因此存在了对于样本数量依懒性大,参数众多致使训练困难以及训练所需迭代次数大,硬件需求大等问题。针对存在的这些问题,本文提出一种改进的超分辨率重建网络模型。方法 本文区别于传统的单输入模型,采取了一种双输入细节互补的网络模型,在原有的SRCNN单输入模型特征提取映射网络外,添加了一个新的输入。本文结合图像局部相似性,构建了一个细节补充网络来补充图像特征,并使用一层卷积层将细节补充网络得到的特征与特征提取网络提取的特征融合,恢复重建高分辨率图像。结果 本文分别从主观和客观的角度,对比了本文方法与其他主流方法之间的数据对比和效果对比情况,在与SRCNN在相似网络深度的情况下,本文方法在放大3倍时的PSNR数值在Set5以及Set14数据下分别比SRCNN高出0.17 dB和0.08 dB。在主观的恢复图像效果上,本文方法能够很好的恢复图像边缘以及图像纹理细节。结论 实验证明,本文所提出的细节互补网络模型能够在较少的训练以及比较浅的网络下获得有效的重建图像并且保留更多的图像细节。
关键词
Mutual-detail convolution model for image super-resolution reconstruction
Li Langyu1,2,3, Su Zhuo1,2, Shi Xiaohong4, Huang Enbo1,2, Luo Xiaonan5(1.School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;2.National Engineering Research Center of Digital Life, Sun Yat-sen University, Guangzhou 510006, China;3.Research Institute of Sun Yat-sen University in Shenzhen, Shenzhen 518057, China;4.School of Information Science, Xinhua College of Sun Yat-sen University, Guangzhou 510520, China;5.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China) Abstract
Objective Single-image super-resolution (SR) is a classical problem in computer vision. In visual information processing, high-resolution images are still desired for considerable useful information, such as medical, remote sensing imaging, video surveillance, and entertainment. However, we can obtain low-resolution images of specific objects in some scenes only, such as long-distance shooting, due to the limitation of physical devices. SR has attracted considerable attention from computer vision communities in the past decades. We address the problem of generating a high-resolution image given a low-resolution image, which is commonly referred to as single-image SR. Early methods include bicubic interpolation, Lanczos resampling, statistical priors, neighbor embedding, and sparse coding. In recent years, a series of convolutional neural network (CNN) models has been proposed for single-image SR. Deep learning attempts to learn layered, hierarchical representations of high-dimensional data. However, the classical CNN for SR is a single-input model that limits its performance. These CNNs require deep networks, considerable training consumption, and a large number of sample images to obtain images with good details. These requirements lead to the use of numerous parameters to train the networks, the increased number of iterations for training, and the need for large hardware. In view of these existing problems, an improved super-resolution reconstruction network model is proposed. Method Unlike the traditional single-input model, we adopt a mutual-detail convolution model with double input. The combination of paths of different scales enables the model to synthesize a wide range of receptive fields. The different features of image blocks with different sizes are complemented at different scales. Low-dimensional and high-dimensional features are combined to supplement the details of the restoration images to improve the quality and detail of reconstructed images. Traditional self-similarity-based methods can also be combined with neural networks. The entire convolution model can be divided into three parts:F1, F2, and F3 networks. F1 is the feature extraction and nonlinearly mapping network with four layers. Filters with spatial sizes of 9×9, and 3×3 are used. F2 is the detail network used to complement the features of F1. F2 consists of two layers and filters with spatial sizes of 11×11 and 5×5. F3 is the reconstruction network. We use mean squared error as the loss function. The loss is minimized using stochastic gradient descent (SGD) with the standard backpropagation. The network takes an original low-resolution image and an interpolated low-resolution image (to the desired size) as inputs and predicts the image details. Our method adds a new input to supplement the high-frequency information that is lost during the reconstruction process. As shown in the literature, deep learning generally benefits from big-data training. We use a training dataset of 500 images from BSD500, and the flipped and rotated versions of the training images are considered. We rotate the original images by 90° and 270°. The training images are split into 33×33 and 39×39, with a stride of 14, by considering training time and storage complexities. We set a mini batch size of SGD to 64 and the momentum parameter to 0.9. Result We use Set5 and Set14 as the validation sets. From previous experiments, we follow the conventional approach to super-resolving color images. We transform the color images into the YCbCr space. The SR algorithms are applied only on the Y channel, whereas the Cb and Cr channels are upscaled by bicubic interpolation. We show the quantitative and qualitative results of our method in comparison with those of state-of-the-art methods. Unlike traditional methods and SRCNN, our method can obtain better peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) values of the experimental results shown in the Set5 and Set14 datasets. For the upscaling factor 3, the average gains on PSNR achieved by our method are 0.17 and 0.08 dB higher than those of the next best approach, SRCNN, on the two datasets. A similar trend is observed when we use SSIM as the performance metric. Unlike the training times of SRCNN, the iterations of our approach are decreased by two orders of magnitude. With a lightweight structure, our method achieves superior performance to that of state-of-the-art methods. Conclusion The experiments show that the proposed method can effectively reconstruct images with considerable details with minimal training and relatively shallow networks. However, unlike the result of a very deep neural network, the result of our method is not sufficiently precise, and the network structure is relatively simple. We will consider using deep layers to acquire numerous image features at different layers and extending our model to several image tasks in the next work.
Keywords
|