轻量级注意力特征选择循环网络的超分重建
摘 要
目的 深度卷积网络在图像超分辨率重建领域具有优异性能,越来越多的方法趋向于更深、更宽的网络设计。然而,复杂的网络结构对计算资源的要求也越来越高。随着智能边缘设备(如智能手机)的流行,高效能的超分重建算法有着巨大的实际应用场景。因此,本文提出一种极轻量的高效超分网络,通过循环特征选择单元和参数共享机制,不仅大幅降低了参数量和浮点运算次数(floating point operations,FLOPs),而且具有优异的重建性能。方法 本文网络由浅层特征提取、深层特征提取和上采样重建3部分构成。浅层特征提取模块包含一个卷积层,产生的特征循环经过一个带有高效通道注意力模块的特征选择单元进行非线性映射提取出深层特征。该特征选择单元含有多个卷积层的特征增强模块,通过保留每个卷积层的部分特征并在模块末端融合增强层次信息。通过高效通道注意力模块重新调整各通道的特征。借助循环机制(循环6次)可以有效提升性能且大幅减少参数量。上采样重建通过参数共享的上采样模块同时将浅层与深层特征进放大、融合得到高分辨率图像。结果 与先进的轻量级网络进行对比,本文网络极大减少了参数量和FLOPs,在Set5、Set14、B100、Urban100和Manga109等基准测试数据集上进行定量评估,在图像质量指标峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)上也获得了更好的结果。结论 本文通过循环的特征选择单元有效挖掘出图像的高频信息,并通过参数共享机制极大减少了参数量,实现了轻量的高质量超分重建。
关键词
Lightweight attention feature selection recursive network for super-resolution
Xu Wenjie1,2, Song Huihui1,2, Yuan Xiaotong1,2, Liu Qingshan1,2(1.Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China;2.Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China) Abstract
Objective Deep convolutional neural network has shown strong reconstruction ability in image super-resolution (SR) task. Efficient super-resolution has a great practical application scenario due to the popularity of intelligent edge devices such as mobile phones. A very lightweight and efficient super-resolution network has been proposed. The proposed method has reduced the number of parameters and floating point operations(FLOPs) greatly and achieved excellent reconstruction performance based on recursive feature selection module and parameter sharing mechanism. Method The proposed lightweight attention feature selection recursive network (AFSNet)has mainly evolved three key components:low-level feature extraction, high-level feature extraction and upsample reconstruction. In the low-level feature extraction part, the input low-resolution image has passed through a 3×3 convolutional layer to extract the low-level features. In the high-level feature extraction part, a recursive feature selection module(FSM) to capture the high-level features has been designed. At the end of the network, a shared upsample block to super-resolve low-level and high-level features has been utilized to obtain the final high-resolution image. Specifically, the FSM has contained a feature enhancement block and an efficient channel attention block. The feature enhancement block has four convolutional layers. Different from other cascaded convolutional layers, this block has retained part of features in each convolutional layer and fused them at the end of this module. Features extracted from different convolutional layers have different levels of hierarchical information, so the proposed network can choose to preserve part of them step-by-step and aggregate them at the end of this module. An efficient channel attention (ECA) block has been presented following the feature enhancement block. Different from the channel attention (CA) in the residual channel attention networks(RCAN), the ECA has avoided the dimensionality reduction operation, which involves two 1×1 convolutional layers to realize no-linear mapping and cross-channel interaction. A local cross-channel interaction strategy has been implemented excluded dimensionality reduction via one-dimensional (1D) convolution. Furthermore, ECA block has adaptively opted kernel size of 1D convolution for determining coverage of local cross-channel interaction. The proposed ECA block has not increased the parameter numbers to improve the reconstruction performance.This network has employed recursive mechanism to share parameters across the efficient feature enhancement block as well to reduce the number of parameters extremely. In the end of the high-level feature extraction part, this network has concatenated and fused the output of all the FSM. The research network can capture valuable contextual information via this multi-stage feature fusion (MSFF) mechanism. In the upsample reconstruction part, this network has utilized a shared upsample block to reconstruct the low-level and high-level features into a high-resolution image, which includes a convolutional layer and a sub-pixel layer. The high-resolution image has fused low and high frequency information together without increasing the parameter numbers. Result The DF2K dataset as training dataset has been adopted, which includes 800 images from the DIV2K dataset and 2 650 images from the Flickr2k dataset. Data augmentation has been performed based on random horizontal flipping and 90 degree rotation further. The corresponding low-resolution image has been obtained by bicubic downsampling from the high-resolution image (the downscale scale is×2,×3,×4). The evaluation has used five benchmark datasets:Set5, Set14, B100, Urban100 and Manga109 respectively. Peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) have been used as the evaluation metrics to measure reconstruction performance.The AFSNet crops borders and the metrics in the luminance channel of transformed YCbCr space have been calculated following the evaluation protocol in residual dense network(RDN). In training process, 16 low-resoution patches of size 48×48 and their corresponding high-resolution patches have been randomly cropped. In the high-level feature extraction stage, six recursive feature selection modules have been used. The number of channels in each convolution layer C=64 for our FSM has been set. In each channel split operation, the features of 16 channels have been preserved. The remaining 48 channels have been continued to perform the next convolution. The network parameters with Adam optimizer have been optimized. The network has been trained using L1 loss function. The initial learning rate has been set to 2E-4 and decreased half for every 200 epochs. The network has been implemented under the PyTorch framework with an NVIDIA 2080 Ti GPU for acceleration. The proposed AFSNet with several state-of-the-art lightweight convolutional neural networks(CNNs)-based SISR methods has been compared. The AFSNet has achieved the best performance in terms of both PSNR and SSIM among all compared methods in almost all benchmark datasets excluded×2 results on the Set5. The AFSNet has much less parameter numbers and much smaller FLOPs in particular. For×4 SR in the Set14 test dataset, the PSNR results have increased 0.4 dB, 0.6 dB and 0.43 dB respectively compared with SRFBN-S, IDN and CARN-M. The parameter numbers of AFSNet have been decreased by 47%, 53% and 38%. Meanwhile, the 24.5 G FLOPs of AFSNet have been superior to 30 G FLOPs as usual. In addition, the AFSNet has conducted ablation study on the effectiveness of the ECA module and MSFF mechanism. The AFSNet has selected×4 Set5 as test dataset.The PSNR results have been decrease by 0.09 dB and 0.11 dB, which shows the effectiveness of the proposed ECA module and MSFF mechanism when the AFSsNet dropped out ECA module and MSFF mechanism respectively. Conclusion The research has presented a lightweight attention feature selection recursive network for super-resolution, which improved reconstruction performance without large parameters and FLOPs. The network has employed a 3×3 convolutional layer in the low-level feature extraction part to extract low-resolution(LR) low-level features, then six recursive feature selection modules have been used to learn non-linear mapping and exploit high-level features. The FSM has preserved hierarchical features step-by-step and aggregated them according to the importance of candidate features based on the proposed efficient channel attention module evalution. Meanwhile, multi-stage feature fusion by concatenating outputs of all the FSM has been conducted to effectively capture contextual information of different stages. The extracted low-level and high-level features have been upsampled by a parameter-shared upsample block.
Keywords
image super-resolution lightweight networks recursive mechanism parameter share feature enhancement efficient channel attention
|