Current Issue Cover
面向形状特征的多维度多层级点云分析

徐嘉利1, 方志军1, 伍世虔2(1.上海工程技术大学电子电气工程学院, 上海 201620;2.武汉科技大学机械自动化学院, 武汉 430081)

摘 要
目的 3维点云是编码几何信息的主要数据结构,与2维视觉数据不同的是,点云中隐藏了3维物体中重要的形状特征。为更好地从无序的点云中挖掘形状特征,本文提出一种能够端到端且鲁棒地处理点云数据的多维度多层级神经网络(multi-dimensional multi-layer neural network,MM-Net)。方法 多维度特征修正与融合(multi-dimensional feature correction and fusion module,MDCF)模块从多个维度自适应地修正局部特征和逐点特征,并将其整合至高维空间以获得丰富的区域形状。另一方面,多层级特征衔接(multi-layer feature articulation module,MLFA)模块利用多个层级间的远程依赖关系,推理得到网络所需的全局形状。此外设计了两种分别应用于点云分类与分割任务的网络结构MM-Net-C (multi-dimensional multi-layer feature classification network)和MM-Net-S (multi-dimensional multi-layer feature segmentation network)。结果 在公开的ModelNet40数据集与ShapeNet数据集上进行测试,并与多种方法进行比较。在ModelNet40数据集中,MM-Net-C的分类精度较PointNet++和DGCNN (dynamic graph convolutional neural network)方法分别提高了2.2%和1.9%;在ShapeNet数据集中,MM-Net-S的分割精度较ELM (extreme learning machine)和A-CNN (annularly convolutional neural networks)方法分别提高了1.2%和0.4%。此外,在ModelNet40数据集中的消融实验验证了多维度多层级神经网络(MM-Net)架构的可靠性,消融实验的结果也表明了多维度特征修正与融合(MDCF)模块和多层级特征衔接(MLFA)模块设计的必要性。结论 本文提出的多维度多层级神经网络(MM-Net)在分类与分割任务中取得了优秀的性能。
关键词
Multi-dimensional multi-layer point cloud analysis for shape features

Xu Jiali1, Fang Zhijun1, Wu Shiqian2(1.School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China;2.College of Machinery and Automation, Wuhan University of Science and Technology, Wuhan 430081, China)

Abstract
Objective With the widespread use of depth cameras and 3D scanning equipment, 3D data with point clouds as the main structure have become more readily available to people. As a result, 3D point clouds are widely used in practical applications such as self-driving cars, location recognition, robot localization, and remote sensing. In recent years, the great success of convolutional neural networks (CNNs) has changed the landscape of 2D computer vision. However, CNNs cannot directly process unstructured data such as point clouds due to the disorderly, irregular characteristics of 3D point clouds. Therefore, mine shape features from disordered point clouds have become a viable research direction in point cloud analysis. Method An end-to-end multidimensional multilayer neural network (MM-Net), which can directly process point cloud data, is presented in this paper. The multi-dimensional feature correction and fusion (MDCF) module can correct local features in different dimensions rationally. First, the local area division unit, using farthest point sampling and ball query, constructs local areas at different radii from which the 10D geometric relations and local features required are obtained for the module. Inspired by related research, the module uses geometric relations to modify the point-wise features, enhance the interaction between points, and encode useful local features, which are supplemented by point-wise features. Finally, the shape features of different region ranges are fused and mapped to a higher dimensional space. At the same time, the multi-layer feature articulation (MLFA) module focuses on integrating the contextual relationships between local regions to extract global features. In particular, these local regions are seen as distinct nodes, and global features are acquired by using convolution and jump fusion. The MLFA module uses the long-range dependencies between multiple layers to reason about the global shape required for the network. Furthermore, two network architectures (multidimensional multi-layer feature classification network (MM-Net-C) and multidimensional multi-layer feature segmentation network (MM-Net-S)) for point cloud classification and segmentation tasks are designed in this paper. In detail, MM-Net-C goes through three tandem MDCF modules with three layers of interlinked local shape features. The global features are then obtained by connecting and integrating the correlations between each local region through the MLFA module. In MM-Net-S, after processing by the MLFA module, the object data are encoded global feature vector with 1 024 dimensions. Then, the features are summed to obtain shapes that fuse local and global information, so that they are linked to the labels of the objects (e.g., motorbikes, cars). This process is followed by feature propagation, where successive up sampling operations are performed to recover the details in the original object data and to obtain a robust point-wise vector. Finally, the outputs of the different feature propagation layers are integrated and fed into the convolution operation. The features are transformed to obtain an accurate prediction of each point cloud within the object. Result The method in this paper is adequately tested on the publicly available ModelNet40 dataset and ShapeNet dataset. The experimental results are compared with various methods. In the ModelNet40 dataset, MM-Net-C is compared with several pnt-based (input point cloud coordinates only), such as dynamic graph convolutional neural network(DGCNN) (92.2%) with 1.9% accuracy improvement and relation-shape convolutional neural network(RS-CNN) (93.6%) with 0.5% accuracy improvement. MM-Net-C is also compared with several pnt-nor (coordinates and normal vectors of the input point cloud) based:point attention transformers(PAT) (91.7%) improves accuracy by 2.4%; PointConv (92.5%) improves accuracy by 1.6%; PointASNL (93.2%) improves accuracy by 0.9%. Even when several studies input more points for training, MM-Net-C still outperforms them. For example, PointNet++ (5 k, 91.9%) improves accuracy by 2.2%, and self-organizing network(SO-Net) (5 k, 93.4%) improves accuracy by 0.7%. In addition, MM-Net-C achieves higher accuracy rates than other studies with less complexity. For example, compared with PointCNN (8.20 M, 91.7%), MM-Net-C has less than one-eighth of the number of parameters while the accuracy rate is increased by 2.4%. Compared with RS-CNN (1.41 M, 93.6%), MM-Net-C has 0.33 M fewer parameters while the accuracy rate is increased by 0.5%. In the ShapeNet dataset, MM-Net-S compared with DGCNN (85.1%), the accuracy is improved by 1.4%; compared with shape-oriented convolutional neural network(SO-CNN) (85.7%), the accuracy is improved by 0.8%; and compared with annularly convolutional neural networks(A-CNN) (86.1%), the accuracy is improved by 0.4%. Ablation experiments are also conducted on the ModelNet40 dataset to confirm the effectiveness of the MM-Net architecture. The ablation experiments results validate the need for the MDCF module and MLFA module design. The results further confirm that MDCF module, which uses rich point-wise features modified and fused with potential local features, can effectively improve the network's mining of shape information within a local region. By contrast, the MLFA module captures contextual information at the global scale and reinforces the long-range dependency links that exist between different layers, effectively enhancing the robustness of the model in dealing with complex shapes. Ablation experiments are conducted on whether the MDCF needs to be designed with different dimensions. The experimental results demonstrate that MM-Net performs better than RS-CNN for the same dimensionality. Conclusion In this paper, an MM-Net with MDCF module and MLFA module as core components is proposed. After conducting sufficient experiments, thorough comparisons and verifying MM-Net, a higher correct rate is achieved with the advantage of fewer parameters.
Keywords

订阅号|日报