Current Issue Cover
结合双边交叉增强与自注意力补偿的点云语义分割

朱仲杰1,2, 张荣1, 白永强1, 王玉儿1, 孙嘉敏1,2(1.浙江万里学院宁波市DSP重点实验室, 宁波 315000;2.中国海洋大学信息科学与工程学院, 青岛 266000)

摘 要
目的 针对现有点云语义分割方法对几何与语义特征信息利用不充分,导致分割性能不佳,特别是局部细粒度分割精度不足的问题,提出一种结合双边交叉增强与自注意力补偿的充分融合几何与语义上下文信息的点云语义分割新算法以提升分割性能。方法 首先,设计基于双边交叉增强的空间聚合模块,将局部几何与语义上下文信息映射到同一空间进行交叉学习增强后聚合为局部上下文信息。然后,基于自注意力机制提取全局上下文信息与增强后的局部上下文信息进行融合,补偿局部上下文信息的单一性,得到完备特征图。最后,将空间聚合模块各阶段输出的多分辨率特征输入特征融合模块进行多尺度特征融合,得到最终的综合特征图以实现高性能语义分割。结果 实验结果表明,在S3DIS(Stanford 3D indoor spaces dataset)数据集上,本文算法的平均交并比(mean intersection over union,mIoU)、平均类别精度(mean class accuracy,mAcc)和总体精度(overall accuracy,OA)分别为70.2%、81.7%和88.3%,与现有优秀算法RandLA-Net相比,分别提高2.4%、2.0%和1.0%。同时,对S3DIS数据集Area 5单独测试,本文算法的mIoU为66.2%,较RandLA-Net提高5.0%。结论 空间聚合模块不仅能够充分利用局部几何与语义上下文信息增强局部上下文信息,而且基于自注意力机制融合局部与全局上下文信息,增强了特征的完备性以及局部与全局的关联性,可以有效提升点云局部细粒度的分割精度。在可视化分析中,相较于对比算法,本文算法对点云场景的局部细粒度分割效果明显提升,验证了本文算法的有效性。
关键词
Bilateral cross enhancement with self-attention compensation for semantic segmentation of point clouds

Zhu Zhongjie1,2, Zhang Rong1, Bai Yongqiang1, Wang Yuer1, Sun Jiamin1,2(1.Ningbo Key Laboratory of DSP, Zhejiang Wanli University, Ningbo 315000, China;2.Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266000, China)

Abstract
Objective Point cloud semantic segmentation is a computer vision task that aims to segment 3D point cloud data and assign corresponding semantic labels to each point. Specifically, according to the location and other attributes of the point, point cloud semantic segmentation involves assigning each point to predefined semantic categories, such as ground, buildings, vehicles, and pedestrians. Existing methods for point cloud semantic segmentation can be broadly categorized into three types: projection-, voxel-, and raw point cloud-based methods. Projection-based methods project the 3D point cloud onto a 2D plane (e.g., an image) and then apply standard image-based segmentation techniques. Voxel-based methods divide the point cloud space into regular voxel grids and assign semantic labels to each voxel. Both methods require data transformation, which inevitably leads to some loss of feature information. By contrast, raw point cloud-based methods directly process the point cloud without any transformation, which ensures the integrity of the input algorithm network with the original point cloud data. The geometric and semantic feature information of each point in the point cloud scene needs to be fully considered and utilized to achieve accurate semantic segmentation tasks. Existing methods for point cloud semantic segmentation generally extract, process, and utilize geometric and semantic feature information separately, without considering their correlation. This approach leads to less precise local fine-grained segmentation. Therefore, this study proposes a new algorithm for point cloud semantic segmentation based on bilateral cross-enhancement and self-attention compensation. It not only fully utilizes the geometric and semantic feature information of the point cloud but also constructs offsets between them as a medium for information interaction. In addition, the fusion of local and global feature information is achieved, which enhances feature completeness and overall segmentation performance. This fusion process enhances the integrity of features and ensures the full representation and utilization of local and global contexts during the segmentation process. By considering the overall information of the point cloud scene, this algorithm demonstrates better performance in segmenting local fine-grained details and larger-scale structures. Method First, the original input point cloud data are preprocessed to extract geometric contextual information and initial semantic contextual information. The geometric contextual information is represented by the original coordinates of the point cloud in 3D space, while the initial semantic contextual information is extracted using a multilayer perceptron. Next, a spatial aggregation module is designed, which consists of bilateral cross-enhancement and self-attention mechanism units. In the bilateral cross-enhancement units, local geometric and semantic contextual information is preliminarily extracted by constructing local neighborhoods for the preprocessed geometric contextual information and initial semantic contextual information. Then, offsets are constructed to facilitate cross-learning and enhancement of the local geometric and semantic contextual information by mapping it onto a common space. Finally, the enhanced local geometric and semantic contextual information is aggregated to local contextual information. Next, using the self-attention mechanism, global contextual information is extracted and fused with the local contextual information to compensate for the singularity of the local contextual information, which results in a comprehensive feature map. Finally, the multi-resolution feature maps obtained at different stages of the spatial aggregation module are fed into the feature fusion module for multi-scale feature fusion, which produces the final comprehensive feature map. Thus, high-performance semantic segmentation is achieved. Result Experimental results on the Stanford 3D indoor spaces dataset(S3DIS) show a mean intersection over union (mIoU) of 70.2%, a mean class accuracy of (mAcc) 81.7%, and an overall accuracy (OA) of 88.3%, which are 2.4%, 2.0%, and 1.0% higher than those of the existing representative algorithm RandLA-Net. Meanwhile, for Area 5 of the S3DIS, the mIoU is 66.2%, which is 5.0% higher than that of RandLA-Net. In addition, visualizations of the segmentation results are achieved on the Semantic3D dataset. Conclusion By utilizing the spatial aggregation module, the proposed algorithm maximizes the utilization of geometric and semantic contextual information, which enhances the details of local contextual information. In addition, the integration of local and global contextual information through self-attention mechanism ensures comprehensive feature representation. As a result, the proposed algorithm achieves a significant improvement in the segmentation accuracy of fine-grained details in point clouds. Visual analysis further validates the effectiveness of the algorithm. Compared with baseline algorithms, the proposed algorithm demonstrates clear superiority in the fine-grained segmentation of local regions in point cloud scenes. This result serves as partial evidence, which confirms the effectiveness of the proposed algorithm in addressing challenges related to point cloud segmentation tasks. In conclusion, the spatial aggregation module and its fusion of local and global contextual information significantly improve the segmentation accuracy of local details in point clouds. This approach offers a promising solution to enhance the segmentation accuracy of fine-grained details in point cloud local regions.
Keywords

订阅号|日报