融合图卷积和差异性池化函数的点云数据分类分割模型
摘 要
目的 深度网络用于3维点云数据的分类分割任务时,精度与模型在全局和局部特征上的描述能力密切相关。现有的特征提取网络,往往将全局特征和不同尺度下的局部特征相结合,忽略了点与点之间的结构信息和位置关系。为此,通过在分类分割模型中引入图卷积神经网络(graph convolution neural network,GCN)和改进池化层函数,增强局部特征表征能力和获取更丰富的全局特征,改善模型对点云数据的分类分割性能。方法 GCN模块通过K近邻算法构造图结构,利用相邻点对的边缘卷积获取局部特征,在深度网络模型中动态扩展GCN使模型获得完备的局部特征。在池化层,通过选择差异性的池化函数,联合提取多个全局特征并进行综合,保证模型在数据抖动时的鲁棒性。结果 在ModelNet40、ShapeNet和S3DIS(stanford large-scale 3D indoor semantics)数据集上进行分类、部分分割以及语义场景分割实验,验证模型的分类分割性能。与PointNet相比,在ModelNet40分类实验中,整体精度和平均分类精度分别提升4%和3.7%;在ShapeNet部分分割数据集和S3DIS室内场景数据集中,平均交并比(mean intersection-over-union, mIoU)分别高1.4%和9.8%。采用不同的池化函数测试结果表明,本文提出的差异性池化函数与PointNet提出的池化函数相比,平均分类精度提升了0.9%,有效改善了模型性能。结论 本文改进的网络模型可以有效获取点云数据中的全局和局部特征,实现更优的分类和分割效果。
关键词
Point cloud data classification and segmentation model using graph CNN and different pooling functions
Zhang Xinliang, Fu Pengfei, Zhao Yunji, Xie Heng, Wang Wanru(College of Electrical Engineering and Automation, Henan Polytechnic Univercity, Jiaozuo 454000, China) Abstract
Objective The depth feature representation of the 3D model is the key and premise of 3D target recognition and 3D model semantic segmentation. It has broad application prospects in the fields of robot, automatic driving, virtual reality, and remote sensing mapping. Semantic segmentation has achieved great progress with the help of deep learning, but most of the methods are used to process 2D images. Given the large amount of data, uneven density, and irregular shape of unstructured 3D point clouds, their classification and segmentation still have enormous challenges. Traditional convolutional neural networks (CNNs) require regularized data as input. The point cloud needs to be converted into multi-view or a voxel mesh to process. The existing deep learning network used for directly processing point cloud data solves the disorder problem of point cloud through the pooling layer of CNN. Thus, the network model can directly classify and segment the point cloud data. As for the classification and segmentation model dealing with point cloud data, its accuracy is closely related to the ability of the network to describe global and local features. Existing feature extraction networks often combine global features with local features at different scales, ignoring the structural information and position relationship among points. Thus, the global feature vectors with more significant features cannot be generated in the pooling layer, resulting in low classification and segmentation accuracy. Method To improve the performance of the network model, the graph CNN (GCN) and the improved pooling layer function are introduced in the classification and segmentation model. The method can enhance the ability of local feature representation and obtain more abundant global features. The processing ability of the network model to point cloud data can be improved. In the GCN, a graph structure is constructed by connecting the vertex with the nearest K points through the K-nearest neighbor algorithm. The convolution operation is then carried out on the edge and relative position relationship of the adjacent point pairs in the graph structure. Consequently, the more detailed local features implicit in the point cloud data are extracted. The graph structure in the GCN model is not fixed. It is dynamically updated and the graph convolution module can be stacked numerous times in the network to further perceive the local characteristics of point cloud data. In the network pooling layer, a hybrid pooling structure is adopted composed of two parallel pooling channels to obtain the global feature vectors. The maximum pooling channel is used to obtain the maximal feature vector, while another maximum-average pooling channel is used to obtain a synthetic feature concerning the maximal and mean feature vectors. The acquired characteristic vectors are concatenated to obtain the final global feature vector of the network. Consequently, the network provides good robustness for the jittered data. Result The datasets ModelNet40, ShapeNet, and Stanford 3D indoor semantics (S3DIS) are mostly used for testing the performance of classification, partial segmentation, and semantic scene segmentation. Several experiments are carried out on the above three datasets to validate the performance of the model. In the classification experiment of ModelNet40, the proposed model achieves a better classification effect compared with the other competitive models. The overall accuracy and average classification accuracy are improved by 4% and 3.7%, respectively, compared with PointNet. In the partial segmentation ShapeNet dataset, the mean intersection-over-union (mIoU) is used as the index for evaluating model segmentation performance. In the comparison test, the proposed model in this study also obtains a satisfactory segmentation result. Specifically, our model’s mIoU is 1.4% higher than that of PointNet. In S3DIS indoor scene dataset, our model’s mIoU is 9.8% higher than PointNet. Furthermore, different pooling functions are tested and investigated to verify the effectiveness of the proposed hybrid pooling function in this study. Results show that the proposed hybrid pooling function in this study improves the average classification accuracy, exhibiting a 0.9% increase compared with the pooling function by PointNet. Conclusion Experimental results show that the local features of point cloud data can be effectively extracted by introducing GCN into the network model. The hybrid pooling function also yields great improvement in generating global characteristics with additional information. In general, the proposed network model can effectively obtain the global and local features of point cloud data and achieve better classification and segmentation effects.
Keywords
point cloud deep learning graph convolution neural network (GCN) hybrid pooling function classification and segmentation joint feature
|