Current Issue Cover
多阶段融合网络的图像超分辨率重建

沈明玉, 俞鹏飞, 汪荣贵, 杨娟, 薛丽霞(合肥工业大学计算机与信息学院, 合肥 230601)

摘 要
目的 近年来,深度卷积神经网络成为单帧图像超分辨率重建任务中的研究热点。针对多数网络结构均是采用链式堆叠方式使得网络层间联系弱以及分层特征不能充分利用等问题,提出了多阶段融合网络的图像超分辨重建方法,进一步提高重建质量。方法 首先利用特征提取网络得到图像的低频特征,并将其作为两个子网络的输入,其一通过编码网络得到低分辨率图像的结构特征信息,其二通过阶段特征融合单元组成的多路径前馈网络得到高频特征,其中融合单元将网络连续几层的特征进行融合处理并以自适应的方式获得有效特征。然后利用多路径连接的方式连接不同的特征融合单元以增强融合单元之间的联系,提取更多的有效特征,同时提高分层特征的利用率。最后将两个子网络得到的特征进行融合后,利用残差学习完成高分辨图像的重建。结果 在4个基准测试集Set5、Set14、B100和Urban100上进行实验,其中放大规模为4时,峰值信噪比分别为31.69 dB、28.24 dB、27.39 dB和25.46 dB,相比其他方法的结果具有一定提升。结论 本文提出的网络克服了链式结构的弊端,通过充分利用分层特征提取更多的高频信息,同时利用低分辨率图像本身携带的结构特征信息共同完成重建,并取得了较好的重建效果。
关键词
Image super-resolution reconstruction via deep network based on multi-staged fusion

Shen Mingyu, Yu Pengfei, Wang Ronggui, Yang Juan, Xue Lixia(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China)

Abstract
Objective Image super-resolution is an important branch of digital image processing and computer vision. This method has been widely used in video surveillance, medical imaging, and security and surveillance imaging in recent years. Super-resolution aims to reconstruct a high-resolution image from an observed degraded low-resolution one. Early methods include interpolation, neighborhood embedding, and sparse coding. Deep convolutional neural network has recently become a major research topic in the field of single image super-resolution reconstruction. This network can learn the mapping between high-and low-resolution images better than traditional learning-based methods. However, many deep learning-based methods present two evident drawbacks. First, most methods use chained stacking to create the network. Each layer of the network is only related to its previous layer, leading to weak inter-layer relationships. Second, the hierarchical features of the network are partially utilized. These shortcomings can lead to loss of high frequency components. A novel image super-resolution reconstruction method based on multi-staged fusion network is proposed to address these drawbacks. This method is used to improve the quality of image reconstruction. Method Numerous studies have shown that feature re-usage can improve the capability of the network to extract and express features. Thus, our research is based on the idea of feature re-usage. We implemented this idea through the multipath connection, which includes two forms, namely, global multipath mode and local fusion unit. First, the proposed model uses an interpolated low-resolution image as input. The feature extraction network extracts shallow features as the mixture network's input. Mixture network consists of two parts. The first one is pixel encoding network, which is used to obtain structural feature information of the image. This network presents four weight layers, each consisting of 64 filters with a size of 1×1, which can guarantee that the feature map distribution will be protected. This process is similar to those of encoding and decoding pixels. The other one is multi-path feedforward network, which is used to extract the high-frequency components needed for reconstruction. This network is formed by staged feature fusion units connected by multi-path mode. Each fusion unit is composed of dense connection, residual learning, and feature selection layers. The dense connection layer is composed of four weight layers with 32 filters with a size of 3×3. This layer is used to improve the nonlinear mapping capability of the network and extract substantial high frequency information. The residual learning layer contains a 1×1 weight layer to alleviate the vanishing gradient problem. Feature selection layer uses a 1×1 weight layer to obtain effective features. Then, the multi-path mode is used to connect different units, which could enhance the relationship between the fusion units. This mode extracts substantial effective features and increases the utilization of hierarchical features. Both sub-networks output 64 feature-maps, fusing their output features as input of reconstructed network that includes a 1×1 weight layer. Therefore, the final residual image between low-and high-resolution images can be obtained. Finally, the reconstructed image can be obtained by combining the original low-resolution and residual images. In the training process, we select the rectified linear unit as the activation function to accelerate the training process and avoid gradient vanishing. For a weight layer with a filter size of 3×3, we pad one pixel to ensure that all feature-maps have the same size, which can improve the edge information of the reconstructed image. Furthermore, the initial learning rate is set to 0.1 and then decreased to half every 10 epochs, which can accelerate network convergence. We set mini-batch size of SGD and momentum parameter to 0.9. We use 291 images as the training set. In addition, we used data augmentation (rotation 90°,180°, 270°, and vertical flip) to augment the training set, which could avoid the overfitting problems and increase sample diversity. The network is trained with multiple scale factors (×2,×3, and×4) to ensure that it could be used to solve the reconstruction problem of different scale factors. Result All experiments are implemented under the PyTorch framework. We use four common benchmark sets (Set5, Set14, B100, and Urban100) to evaluate our model. Moreover, we use peak signal-to-noise ratio as evaluation criteria. The images of RGB space are converted to YCbCr space. The proposed algorithm only reconstructs the luminance channel Y because human vision is highly sensitive to the luminance channel. The Cb and Cr channels are reconstructed by using the interpolation method. Experimental results on four benchmark sets for scaling factor of four are 31.69 dB, 28.24 dB, 27.39 dB, and 25.46 dB, respectively. The proposed method shows better performance and visual effects than Bicubic, A+, SRCNN, VDSR, DRCN, and DRRN. In addition, we have validated the effectiveness of the proposed components, which includes multipath mode, staged fusion unit, and pixel coding network. Conclusion The proposed network overcomes the shortcoming of the chain structure and extracts substantial high-frequency information by fully utilizing the hierarchical features. Moreover, such network simultaneously uses the structural feature information carried by the low-resolution image to complete the reconstruction together. Furthermore, techniques that include dense connection and residual learning are adopted to accelerate convergence and mitigate gradient problems during training. Extensive experiments show that the proposed method can reconstruct an image with more high-frequency details than other methods with the same preprocessing step. We will consider using the idea of recursive learning and increasing the number of training samples to optimize the model further in the subsequent work.
Keywords

订阅号|日报