结合深度学习的单幅遥感图像超分辨率重建
摘 要
目的 克服传统遥感图像超分辨率重建方法依赖同一场景多时相图像序列且需预先配准等缺点,解决学习法中训练效率低和过拟合问题,同时削弱插值操作后的块效应,增强单幅遥感图像超分辨率重建效果。方法 首先构造基于四层卷积的深度神经网络结构,并在结构中前三层卷积后添加参数修正线性单元层和局部响应归一化层进行优化,经过训练得到遥感图像超分辨率重建模型,其次,对多波段遥感图像的亮度空间进行双三次插值,然后使用该模型对插值结果进行重建,并在亮度空间重建结果指导下,使用联合双边滤波来提升其色度空间边缘细节。结果 应用该方法对实验遥感图像进行2倍、3倍、4倍重建时在无参考指标上均优于对比方法,平均清晰度提升约2.5个单位,同时取得了较好的全参考评价结果,在2倍重建时峰值信噪比较传统插值法提升了约2 dB,且平均训练效率较其他学习法提升3倍以上,所得遥感图像重建结果在目视效果上更加细致、自然。结论 实验结果表明,本文设计的网络抗过拟合能力强、训练效率高,重建时针对单幅遥感图像,无需依赖图像序列且不受波段影响,重建结果细节表现较好,具有较强的普适性。
关键词
Super-resolution reconstruction of single remote sensing image combined with deep learning
Li Xin1,2, Wei Hongwei1, Zhang Hongqun1(1.Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, China;2.University of Chinese Academy of Sciences, Beijing 100049, China) Abstract
Objective Super-resolution (SR), which restores a high-resolution (HR) image from single or sequential low-resolution (LR) images, is a widely applied technology in image processing, especially in the remote sensing field. HR remote sensing images are increasingly sought with the rapid advancement of remote sensing technology in agriculture and forestry monitoring, urban planning, and military reconnaissance. However, traditional interpolation-based methods cannot achieve a satisfying effect, while reconstruction-based methods require pre-registration and are constrained by the lack of sequential images. In several modern learning-based methods, complicated network, considerable training time, and neglect of chrominance space still require improvement. To solve these problems, a novel SR method combined with deep learning is proposed in this paper to achieve high-quality SR reconstruction of single remote sensing image, thereby overcoming traditional drawbacks, such as dependence on image sequences or registration. The proposed method also aims to improve the efficiency and reduce the overfitting risk during training and provide a reference for the weakening block effect of chrominance interpolation. Method The proposed SR reconstruction process is conducted from the luminance and chrominance spaces of single remote sensing image. First, a network model named PL-CNN that is based on a four-layer convolutional neural network (CNN) is optimized with parametric rectified linear unit (PReLU) and local response normalization (LRN) layers considering the autocorrelation and texture richness of remote sensing images. In the PL-CNN, the first to the fourth convolutional layers can successively achieve feature extraction, enhancement, nonlinear mapping, and reconstruction. The deployment of PReLU can accelerate the training speed and retain the image features simultaneously. The LRN layers are used to avoid overfitting, thereby enhancing the final SR effect further. Then, the proposed PL-CNN with an iteration of 2.5 million is trained with an upscaling factor to obtain the SR model by taking the mean square error as the loss function. The training data from the UC Merced land use dataset, with a 0.3 m resolution, thereby covering 21 categories of remote sensing scenes. The training inputs are used to simulate the LR remote sensing image patches, and the outputs correspond to the original HR remote sensing images. For multiband images, the model is utilized to obtain a reconstructed result in the luminance space. Then, a joint bilateral filtering with a pixel scope of 3×3 under the guidance of the result is introduced to improve the edge details of the chrominance space after bicubic interpolation. A single-band image could be considered a special case of multiband image in which its reconstruction excludes the chrominance part. Result A series of simulation experiments is conducted to verify the validity and applicability of the proposed SR method, and a dataset (RS5) that includes five remote sensing images with different sizes and resolutions is established to serve as the experimental images. Full-and no-reference evaluations are applied to value the quality of the SR reconstructed images objectively and fairly. Full-reference evaluation indexes include peak signal-to-noise ratio (PSNR) and structure similarity index (SSIM), while the no-reference evaluation indexes include spatial and spectral entropies (SSEQ) and clarity. Results show that the proposed reconstruction of RS5 is superior to others at no-reference evaluation indexes with upscaling factors of 2, 3, and 4. The SSEQ is enhanced, and the mean clarity value improves by 2.5 standard units. The proposed method's results also display advantageous PSNR and efficiency, thereby achieving 2 dB better in PSNR than in bicubic interpolation algorithm and limiting the average training time to one-third or less than the other learning-based methods. The visualization of the first-layer filters is rich in textures, and the typical feature maps are gradually enhanced along with the layers. The capability of joint bilateral filtering to remove the block effect and sharpen the edges is easily verified by observing the images of the chrominance space before and after filtering. Furthermore, the PSNR result continuously improves with the increase in iteration, thereby indicating a potential ameliorated orientation. A Landsat-8 image of Tangshan, China is selected for reconstruction through the PL-CNN method and decomposition into red, green, and blue bands to verify the band applicability of the proposed method. The PSNR result for each band is more than 28 dB, and the average SSIM is approximately 98.5%. The mean value and standard deviation of the original and reconstructed images in the three bands are near, thus manifesting that the proposed method is unrestricted to band factors and has a robust applicability. Conclusion A SR reconstruction method of single remote sensing image combined with deep learning is proposed. The optimized network, namely, PL-CNN, on the basis the CNN extracts additional features and performs well in terms of anti-overfitting. Moreover, the PReLU structure can effectively accelerate the training process. Experimental results suggest that the proposed method is unrestricted to the image sequence or band, thereby aiming for a single remote sensing image and considering the chrominance space, and the reconstruction quality under several upscaling factors provides evident advantages over the traditional SR reconstruction methods. Owing to the natural and clear visual effect of images reconstructed with PL-CNN, the method has broad prospects, especially in the remote sensing field. Future studies may be conducted using additional samples, appropriately increasing the iterations, and focusing on high upscaling factors.
Keywords
remote sensing image super resolution deep learning convolutional neural networks(CNN) joint bilateral filtering
|