Current Issue Cover
多模态遥感图像配准方法研究综述

朱柏1,2, 叶沅鑫1,2(1.西南交通大学地球科学与工程学院, 成都 611756;2.高速铁路安全运营空间信息技术国家地方联合工程实验室, 成都 611756)

摘 要
随着对地观测技术的不断发展,从星载、机载和地面平台上众多的一体化立体观测设施被发射,这些传感器设备可以动态提供不同空间、时间和光谱分辨率的多模态遥感图像,只有充分利用各类多模态遥感图像才能有效地为自然资源管理、防灾减灾和环境监测等不同应用提供更可靠和全面的对地观测结果。但是由于不同传感器之间的成像机理不同,多模态图像之间呈现显著的辐射差异、几何差异、时相差异和视角差异等,给多模态遥感图像高精度的配准带来了巨大的挑战。为推进多模态遥感图像配准研究技术的发展,本文对当前主流的多模态遥感图像配准方法系统性地进行了梳理、分析、介绍和总结。首先梳理了单模态到多模态遥感图像配准的研究发展演化过程;然后分别分析了基于区域、基于特征和基于深度学习方法中代表性算法的核心思想,并给出已开源代码的链接;同时分享了现有公开的多模态遥感图像配准数据集,介绍了数据集的详细内容和特点;最后给出了现阶段多模态遥感图像高精度配准研究中所存在的一些问题和严峻挑战,并对未来研究的发展趋势进行了前瞻性的展望,旨在推动多模态遥感图像配准领域实现更加深入的突破和创新。
关键词
Multimodal remote sensing image registration: a survey

Zhu Bai1,2, Ye Yuanxin1,2(1.Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 611756, China;2.State-Province Joint Engineering Laboratory of Spatial Information Technology for High-Speed Railway Safety, Chengdu 611756, China)

Abstract
The advent of new infrastructure construction and the era of intelligent photogrammetry have facilitated the rapid development of global aerospace and aviation remote sensing technology. Numerous multi-sensors integrating stereoscopic observation facilities have been launched from spaceborne, airborne, and terrestrial platforms, and the types of sensors have also developed from traditional single-mode sensors (e.g., optical sensors) to a new generation of multimodal sensors (e.g., multispectral, hyperspectral, light detection and ranging(LiDAR), and synthetic aperture radar(SAR) sensors). These advanced sensor devices can dynamically provide multimodal remote sensing images with different spatial, temporal, and spectral resolutions. They can obtain more reliable, comprehensive, and accurate observation results than single-modal sensors through joint processing of spaceborne, airborne, and terrestrial multimodal data. Therefore, investigating multimodal remote sensing image registration has great scientific significance. Multi-level and multi-perspective Earth observation can be effectively achieved only by fully integrating and utilizing various multimodal remote sensing images. In order to promote the development of multimodal remote sensing image registration research technology, we systematically sort out, analyze, introduce, and summarize the current mainstream registration methods for multimodal remote sensing images. We first sort out the research development and evolution process from single-modal to multimodal remote sensing image registration. We then analyze the core ideas of representative algorithms among area-based, feature-based, and deep-learning-based pipelines, while the contribution of the author team in the field of multimodal remote sensing image registration is introduced. Area-based registration (template matching) pipeline mainly includes two types: information theory-based and structural feature-based registration methods. The structural feature-based method consists of sparse structural features and dense structural features. From the perspective of the robustness and efficiency of comprehensive registration, dense-structure-feature-based methods have obvious effectiveness and advantages in handling significant nonlinear radiation differences between multimodal remote sensing images and can meet many current application needs. By contrast, area-based registration pipeline generally relies on geo-referencing of remote sensing images to predict the rough range of template matching. Feature-based registration methods can be refined into three categories: feature registration based on gradient optimization, local self-similarity (LSS), and phase consistency. The feature registration of gradient optimization usually designs consistent gradients for specific multimodal images. The generalization of this type of method based on gradient optimization is generally poor, and it has difficulty maintaining the same performance on other types of multimodal images. The feature registration of LSS also has limitations, given that the relatively low discriminative power of LSS descriptors may result in the inability to maintain robust matching performance in the presence of complex nonlinear radiation differences. The feature registration of phase consistency has high computational complexity, and the registration process is generally time consuming. Feature-based registration pipeline utilizes the local spatial relationship between adjacent pixels to construct a high-dimensional information feature vector for each feature point. Compared with template matching methods, they usually face a heavy computational burden, and inevitable serious outliers are prone to occur in matching, especially in multimodal registration situations where scale, rotation, and radiation differences exist simultaneously. In general, the registration robustness of feature-based methods is not as stable as that of area-based methods. The deep-learning-based pipeline can be divided into modular and end-to-end registration methods. The most common strategy for modular registration methods is to embed deep networks into feature-based or region-based methods. This approach takes advantage of the complete data-driven and high-dimensional deep feature extraction capability of deep learning to generate more robust features or more effective descriptors or similarity measures, which improves the robustness of image registration. Modular registration methods can be subdivided into three categories: learning-based template matching, learning-based feature matching, and style transfer-based modal unification. Modular registration methods are easy to train and have strong flexibility, but it has difficulty avoiding the error accumulation problem that easily occurs in multi-stage tasks and may fall into local optimality. The end-to-end registration methods directly estimate the geometric transformation parameters or deformation field to achieve image registration by directly constructing an end-to-end neural network structure. The training objectives of the end-to-end network are consistent and can obtain the global optimal solution. However, some problems arise, such as high training difficulty and poor interpretability. Moreover, no complete and comprehensive database containing all types of multimodal remote sensing image pairs is available to date, and the lack of training and testing data greatly limits the development of deep learning-based registration methods. Furthermore, we share existing public registration datasets of multimodal remote sensing images, as well as supplement by a small number of registration datasets in the field of computer vision. Finally, the existing problems and challenges in the current research on high-precision registration of multimodal remote sensing images are analyzed. A forward-looking outlook on the development trend of future research is given, which aims to promote further breakthroughs and innovations in the field of multimodal remote sensing image registration.
Keywords

订阅号|日报