Current Issue Cover
条件生成对抗遥感图像时空融合

李昌洁1,2, 宋慧慧1,2, 张开华1,2, 张晓露1,2, 刘青山1,2(1.南京信息工程大学大气环境与装备技术协同创新中心, 南京 210044;2.江苏省大数据分析技术重点实验室, 南京 210044)

摘 要
目的 卫星遥感技术在硬件方面的局限导致获取的遥感图像在时间与空间分辨率之间存在矛盾,而时空融合提供了一种高效、低成本的方式来融合具有时空互补性的两类遥感图像数据(典型代表是Landsat和MODIS (moderate-resolution imaging spectroradiometer)图像),生成同时具有高时空分辨率的融合数据,解决该问题。方法 提出了一种基于条件生成对抗网络的时空融合方法,可高效处理实际应用中的大量遥感数据。与现有的学习模型相比,该模型具有以下优点:1)通过学习一个非线性映射关系来显式地关联MODIS图像和Landsat图像;2)自动学习有效的图像特征;3)将特征提取、非线性映射和图像重建统一到一个框架下进行优化。在训练阶段,使用条件生成对抗网络建立降采样Landsat和MODIS图像之间的非线性映射,然后在原始Landsat和降采样Landsat之间训练多尺度超分条件生成对抗网络。预测过程包含两层:每层均包括基于条件生成对抗网络的预测和融合模型。分别实现从MODIS到降采样Landsat数据之间的非线性映射以及降采样Landsat与原始Landsat之间的超分辨率首建。结果 在基准数据集CIA (coleam bally irrigation area)和LGC (lower Gwydir catchment)上的结果表明,条件生成对抗网络的方法在4种评测指标上均达到领先结果,例如在CIA数据集上,RMSE (root mean squared error)、SAM (spectral angle mapper)、SSIM (structural similarity)和ERGAS (erreur relative global adimensionnelle desynthese)分别平均提高了0.001、0.15、0.008和0.065;在LGC数据集上分别平均提高了0.001 2、0.7、0.018和0.008 9。明显优于现有基于稀疏表示的方法与基于卷积神经网络的方法。结论 本文提出的条件生成对抗融合模型,能够充分学习Landsat和MODIS图像之间复杂的非线性映射,产生更加准确的融合结果。
关键词
Spatiotemporal fusion of satellite images via conditional generative adversarial learning

Li Changjie1,2, Song Huihui1,2, Zhang Kaihua1,2, Zhang Xiaolu1,2, Liu Qingshan1,2(1.Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China;2.Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044, China)

Abstract
Objective Spatiotemporal fusion of satellite images is an important problem in the research of remote sensing fusion. With the intensification of global environmental changes, satellite remote sensing data plays an indispensable role in monitoring crop growth and landform changes. In the field of dynamic monitoring, high temporal resolution becomes an important attribute of required remote sensing data because continuous observation is basic requirement for dynamic monitoring. Moreover, the fragmentation of the global terrestrial landscape makes these applications require remote sensing data with higher spatial resolutions. However, remote sensing data with high spatial and high temporal resolutions are difficult to be captured by current satellite platforms due to constraints of technology and cost. For example, Landsat images mainly have a high spatial resolution but a low temporal resolution. By contrast, MODIS(moderate-resolution imaging spectroradiometer) images have a high temporal resolution but a low spatial resolution. Spatiotemporal fusion provides an effective method to fuse the two types of remote sensing data featured by complementary spatial and temporal properties (Landsat and MODIS images are typical representatives) to generate fused data with high spatial and high temporal resolutions, which can also bring great convenience to our research on the actual terrain and landform changes. Method A spatiotemporal fusion method based on the conditional generative adversarial network (CGAN), which can effectively handle massive remote sensing data in practical applications, is proposed to solve this problem. As for CGAN, GAN(generative advensarial network) is extended to CGAN that introduces the internal ground truth image as the condition variable to guide discriminator network learning, making the training of the network more directional and easier. In this study, the asymmetric Laplacian pyramid network is used as the generator of the CGAN, and the VGG(visual geometry group) net is taken as the discriminator of the CGAN. The asymmetric Laplacian pyramid network mainly consists of two branches:a high-frequency branch (mainly extracts the image details or residual images) and a low-frequency extraction branch (extracts shallow features). The two branches progressively reconstruct the images in a coarse-to-fine manner. The discriminator of the CGAN is the VGG19 (visual geometry group 19-layer net) network, where the ReLU activation function is replaced by the Leaky ReLU function, and the number of channels of the convolutional kernels is increased by a factor of 2 from 64 to 1 024. Then, a fully connected layer and a sigmoid activation function are used to obtain the probability of the sample class. In this study, a CGAN model is designed for the nonlinear mapping and a CGAN superresolution model for downsampled Landsat to reconstruct original Landsat images. Compared with existing shallow learning methods, especially for the sparse-representation-based ones, the proposed CGAN based model has the following merits:1) explicitly correlating MODIS and downsampled Landsat images by learning a nonlinear mapping relationship, 2) automatically learning and extracting effective image features and image details, and 3) unifying feature extraction, nonlinear mapping, and image reconstruction into one optimization framework. In the training stage, a nonlinear mapping is first trained between the MODIS and downsampled Landsat data using the CGAN model. Then, multiscale superresolution CGAN is trained between the downsampled Landsat and original Landsat data. The prediction procedure contains two layers, and each layer consists of a CGAN-based prediction and a fusion model. The fusion model takes the high pass model which will be explained in the next paper. One of the two layers achieves nonlinear mapping from the MODIS to downsampled Landsat data, and the other layer is the superresolution reconstructed network of the set that is used to perform image superresolution of two and five times of upsampling scales, respectively. Result Four indicators are commonly used to evaluate the performance of spatiotemporal fusion of remote sensing images. The first one is root mean square error, which measures the radiometric between the fusion result and ground truth. The spectral angle mapper is leveraged as the second index to measure the spectral distortion of the result. The structural similarity is taken as the third metric, measuring the similarity of the overall spatial structures between the fusion result and ground truth. Finally, the erreur relative global adimensionnelle de synthese is selected as the last index to evaluate the overall fusion result. Extensive evaluations are executed on two groups of commonly used Landsat-MODIS benchmark datasets. For the fusion results, a quantitative evaluation of the visual effects of all predicted dates and one key date shows that the method can achieve more accurate fusion results compared with sparse representation-based methods and deep convolutional networks. Conclusion A CGAN model that introduces an external condition to reconstruct images better is proposed. A non-linear mapping CGAN is trained to deal with the highly nonlinear correspondence relations between between downsampled Landset and MODIS data. Moreover, a multiscale superresolution CGAN is trained to bridge the huge spatial resolution gap (10 times) between original and downsampled Landsat data. Experimental verification is performed on existing methods, such as sparse representation-based methods and deep convolutional neural network methods. Experiment results show that our model outperforms several state-of-the-art spatiotemporal fusion approaches.
Keywords

订阅号|日报