Current Issue Cover
  • 发布时间: 2024-09-12
  • 摘要点击次数:  7
  • 全文下载次数: 10
  • DOI:
  •  | Volume  | Number
基于混合注意力的双分支U-Net高光谱全色锐化

杨勇, 王晓争, 刘轩, 黄淑英, 刘紫阳, 王书昭(天津工业大学)

摘 要
目的 高光谱(Hyperspectral, HS)全色锐化旨在融合高空间分辨率全色(Panchromatic, PAN)图像和低空间分辨率高光谱(Low Resolution Hyperspectral, LRHS)图像,生成高空间分辨率高光谱(High Resolution Hyperspectral, HRHS)图像。现有全色锐化算法往往忽略PAN和HS图像之间的模态差异从而造成特征提取不精确,导致融合结果中存在光谱畸变和空间失真。针对这一问题,本文提出一种基于混合注意力机制的双分支U-Net(Dual-branch U-Net Based on Hybrid Attention, DUNet-HA)来实现PAN与HS图像的多尺度空间-光谱特征的提取和融合。方法 网络中,混合注意力模块(Hybrid Attention Module, HAM)被设计来对网络中的每个尺度特征进行编码。在HAM中,利用通道和空间自注意力模块来增强光谱和空间特征,同时构建一个双交叉注意力模块(Double Cross Attention Module, DCAM),通过学习PAN与HS图像跨模态特征的空间-光谱依赖关系来引导两种特征的重建。与经典的混合Transformer结构相比,设计的DCAM可以通过计算与查询位置无关的交叉注意力权重来实现两种图像特征的校正,在降低模型计算量的同时,可提升网络的性能。结果 在三个广泛使用的HS图像数据集上与最新的11种方法进行了比较,在Pavia center数据集中,相比性能第2的方法hyperRefiner,其峰值信噪比(Peak signal-to-noise ratio, PSNR)提升了1.10dB,光谱角制图(Spectral angle mapper, SAM)降低了0.40;在Botswana数据集中,其PSNR提升了1.29dB,SAM降低了0.14;在Chikusei数据集中,其PSNR提升了0.39dB,SAM降低了0.12。结论 结果表明所提出的DUNet-HA结构能更好地融合光谱-空间信息,显著提升高光谱全色锐化结果图像的质量。
关键词
Dual-branch U-Net based on hybrid attention for hyperspectral pansharpening

Yang Yong, Wang Xizozheng, Liu Xuan, Huang Shuying, Liu Ziyang, Wang Shuzhao(Tiangong University)

Abstract
Objective Hyperspectral (HS) images are obtained by sampling hundreds of contiguous narrow spectral bands using spectral imaging systems, providing rich spectral information. However, due to the low energy of narrow spectral bands, HS images typically have lower spatial resolution. In contrast, single-band panchromatic (PAN) images from PAN imaging systems provide rich spatial information but have lower spectral resolution. In some remote sensing applications that require high-resolution hyperspectral (HRHS) images with high spectral and spatial resolution, neither PAN nor HS images alone can meet the requirements. Therefore, the HS pansharpening aims to fuse the spatial information from PAN images with the spectral information from HS images to obtain HRHS images. This technology has received considerable attention in the field of remote sensing and is of great significance in various remote sensing tasks such as military surveillance, environmental monitoring, object identification, and classification. The HS pansharpening methods are mainly divided into two categories: traditional methods and deep learning (DL)-based methods. The traditional methods can be classified into four classes: component substitution-based methods, multi-resolution analysis-based methods, Bayesian-based methods, and model-based methods. Although these traditional methods are easy to implement and physically interpretable, they often suffer from spatial and spectral distortion issues due to inappropriate prior assumptions and imprecise manual feature extraction. Due its powerful feature learning capability, DL-based methods have been widely applied to HS pansharpening tasks. Although these methods have better performance compared to traditional methods, spectral and spatial distortions still exist in the fused images due to the neglecting the need to handle spectral and spatial features differently and the complex mapping relationships between multi-channel images. In recent years, since the introduction of the Transformer architecture that can learn global correlation features of images, some researchers have attempted to improve the performance of HS pansharpening by establishing relationships between two modal features using this architecture. However, the application of Transformer structures has been limited due to its high computational cost and low parameter efficiency. To effectively fuse PAN and HS images, this paper proposes a dual-branch U-Net network based on hybrid attention (DUNet-HA) for HS pansharpening to achieve multi-scale feature fusion. At each scale, spatial attention branches, spectral attention branches, and dual-cross attention module branches are constructed. These branches are used to enhance the spatial and spectral features of PAN and HS images, respectively, and to achieve complementary cross-modal features. The dual-cross attention module is designed to avoid the complex query matrix computation process in Transformers. Method The proposed DUNet-HA includes two U-Net branches, one for PAN images and the other for upsampled HS images, to extract and complement texture and spectral features. At each scale, a hybrid attention module (HAM) is constructed to encode features from both types of images. The HAM comprises a spatial attention module (Spa-A), a spectral attention module (Spe-A), and a dual-cross attention module (DCAM). Spa-A and Spe-A enhance the texture and spectral features of PAN and HS images, respectively, while DCAM corrects and complements these features. The enhanced and corrected features are integrated to obtain the encoded features at each scale. The decoder primarily facilitates feature fusion and reconstruction. In addition, we use DCAM to capture global contextual information and directly integrate encoded features, decoded features, and corrected complementary features at the same scale to better handle high-level spatial and spectral features. The DCAM proposed in this paper is a novel cross-attention structure that uses query-independent matrix computation instead of the attention computation in Transformer architecture, reducing computational cost. DCAM maps the cross-feature space of PAN and HS images to guide feature interaction for correction and supplementation. Result To validate the effectiveness of the proposed DUNet-HA, extensive experiments were conducted on three widely used HS datasets: Pavia center, Botswana, and Chikusei. We compared DUNet-HA with several state-of-the-art (SOTA) methods, including five traditional methods (CNMF, CFPCA, SFIM, GSA, MTF_GLP_HPM), and six DL-based methods (SSFCNN, HyperPNN, DHP-DARN, DIP-Hyperkite, Hyper-DSNet, and HyperRefiner). To evaluate the performance of all methods, we used five objective indicators: spectral cross correlation (SCC), spectral angle mapper (SAM), root mean square error (RMSE), erreur relative globale adimensionnelle de synthèse (ERGAS), and peak signal-to-noise ratio (PSNR). Experimental results with a scale factor of 4 demonstrate that the proposed method outperforms other SOTA methods in both objective results and visual effects. Specifically, on the Pavia center dataset, PSNR, SAM, and ERGAS of the proposed method is improved by 1.10 dB, 0.40, and 0.28, respectively, compared to the second-best method, HyperRefiner. Additionally, the objective results on the other two datasets also surpassed those of HyperRefiner. The visual results indicate that our proposed method is superior in recovering fine-grained spatial textures and spectral details. Ablation studies further demonstrate that the DCAM structure significantly improves the fusion process. Conclusion This paper proposes a dual-branch interactive U-Net network named DUNet-HA for HS pansharpening. This network extracts and reconstructs spatial and spectral information from PAN and HS images through a parallel dual U-Net structure to achieve more accurate fusion results. At each scale of the network's encoder, a HAM is constructed to enhance the spatial features of PAN images and the spectral features of HS images using spatial attention and spectral attention, respectively. Additionally, the DCAM is utilized to complement these features, which can reduce the modality differences between PAN and HS image features and enables their mutual supplementation for feature interaction guidance. Compared to the classic hybridTransformer attention structure, DCAM improves network performance while reducing the number of parameters and computational cost. Extensive experimental results on three widely used HS datasets demonstrate that the proposed DUNet-HA outperforms several SOTA methods in both quantitative and qualitative evaluations.
Keywords

订阅号|日报