感知器残差网络和超限学习机融合的3D物体识别
摘 要
目的 随着3D扫描技术和虚拟现实技术的发展,真实物体的3D识别方法已经成为研究的热点之一。针对现有基于深度学习的方法训练时间长,识别效果不理想等问题,提出了一种结合感知器残差网络和超限学习机(ELM)的3D物体识别方法。方法 以超限学习机的框架为基础,使用多层感知器残差网络学习3D物体的多视角投影特征,并利用提取的特征数据和已知的标签数据同时训练了ELM分类层、K最近邻(KNN)分类层和支持向量机(SVM)分类层识别3D物体。网络使用增加了多层感知器的卷积层替代传统的卷积层。卷积网络由改进的残差单元组成,包含多个卷积核个数恒定的并行残差通道,用于拟合不同数学形式的残差项函数。网络中半数卷积核参数和感知器参数以高斯分布随机产生,其余通过训练寻优得到。结果 提出的方法在普林斯顿3D模型数据集上达到了94.18%的准确率,在2D的NORB数据集上达到了97.46%的准确率。该算法在两个国际标准数据集中均取得了当前最好的效果。同时,使用超限学习机框架使得本文算法的训练时间比基于深度学习的方法减少了3个数量级。结论 本文提出了一种使用多视角图识别3D物体的方法,实验表明该方法比现有的ELM方法和深度学习等最新方法的识别率更高,抗干扰性更强,并且其调节参数少,收敛速度快。
关键词
3D object recognition combining perceptron residual network and extreme learning machine
Huang Qiang1, Wang Yongxiong1,2(1.School of Optical Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;2.Shanghai Engineering Research Center of Assistive Devices, Shanghai 200093, China) Abstract
Objective With the development of 3D scanning technology and virtual reality technology, the 3D recognition method for actual objects has become a major research topic. It is also one of the most challenging tasks in understanding natural scenes. Recognizing objects by taking photos on a smartphone has been widely used because 2D images are relatively easy to acquire and process. Recent advances in real-time SLAM and laser scanning technology have contributed to the availability of 3D models of actual objects. There is a great need for the effective methods to process 3D models and further recognize the corresponding 3D objects or 3D scenes. Some studies have attempted to use image-based methods to obtain 3D features through deep convolutional neural networks, and they have high memory efficiency. Other studies have used point-set methods or volume-based methods. The input forms of these methods are closer to the structure of the actual objects, and accordingly, the networks become more complicated and require huge computing resources. These studies have made some progress. However, the accuracy and real-time performance of 3D object recognition must still be improved. To deal with this problem, this study proposes a new 3D object recognition model that combines the perceptron residual network and the extreme learning machine (ELM). Method This model uses the proposed multi-layer perceptron residual network to learn the multi-view projection features of 3D objects on the basis of the framework of the extreme learning machine. The network model also uses a multi-channel integrated classifier composed of extreme learning machine, K-nearest neighbor (KNN), and support vector machine (SVM) to identify 3D objects. This study is not just stacking various classifiers, which has a huge risk of overfitting. After obtaining the prediction output vector of ELM, the difference e between the first and the second largest probability value is calculated. When the difference is small, it indicates that the two corresponding categories are both close to the real category, and the current classifier has a high probability of classification error. The other two classifiers are used for classification. Without loss of precision, we use the comparison of the difference e with the threshold T to avoid using multiple classifiers at a time. Unlike AdaBoost, in most cases, the network only uses one classifier. To increase the nonlinearity of the low-level network, we use a convolutional layer with a multi-layer perceptron instead of the traditional convolutional layer. The convolutional network consists of the proposed improved residual unit. This unit contains multiple parallel residual channels with a constant number of convolution kernels, which can be fitted to residual functions of different mathematical forms, wherein convolution kernel parameters of the same size are shared. Different from the traditional extreme learning machine, half of the convolution kernel parameters and perceptron parameters in the network are randomly generated by Gaussian distribution, and the remaining parameters are obtained through training optimization. The extracted feature data and the known label data are used to train the extreme learning machine classification layer, KNN classification layer, and SVM classification layer. A confidence threshold is set at the output layer to allow the network to select whether to use the KNN classifier and the SVM classifier, and the voting mechanism is used to select the output class of our network. Result The proposed method achieves 94.18% accuracy on the Princeton 3D model dataset and 97.46% accuracy on the NORB 2D image dataset. The Princeton 3D model dataset is the currently widely used benchmark dataset for validating 3D object recognition. The 3D model in this dataset contains common furniture, vehicles, musical instruments, and electronics. The NORB dataset is one of the most commonly used image datasets. Our method has achieved the best results in the two benchmark datasets. In the framework of extreme learning machine, the training time of the proposed algorithm is reduced by three orders of magnitude compared with the training time of other deep learning methods. This approach is suitable for practical applications. In addition, we verify the effects of different parameters on recognition performance, such as the number of projected views, the number of residual channels, and the confidence threshold T of the classification layer. Conclusion Experiments show that the proposed method has higher recognition accuracy and stronger anti-interference than existing ELM methods and deep learning methods. It has less adjustment parameters and faster convergence. The proposed network is suitable for 3D object recognition and common image recognition. This study explores a network that can deal with high-dimensional data with low complexity, and experiments demonstrate that the performance of this network is excellent.
Keywords
multi-layer perceptron residual network multi-channel classifier extreme learning machine (ELM) 3D object recognition feature extraction
|