深度纯追随的拟人化无人驾驶转向控制模型
单云霄1, 黄润辉2, 何泽1, 龚志豪1, 景民3, 邹雪松3(1.中山大学计算机学院, 广州 510006;2.中山大学智能工程学院, 广州 510006;3.云洲智能科技有限公司, 珠海 519080) 摘 要
目的 在无人驾驶系统技术中,控制车辆转向以跟踪特定路径是实现驾驶的关键技术之一,大量基于传统控制的方法可以准确跟踪路径,然而如何在跟踪过程中实现类人的转向行为仍是当前跟踪技术面临的挑战性问题之一。现有传统转向模型并没有参考人类驾驶行为,难以实现过程模拟。此外,现有大多数基于神经网络的转向控制模型仅仅以视频帧作为输入,鲁棒性和可解释性不足。基于此,本文提出了一个融合神经网络与传统控制器的转向模型:深度纯追随模型(deep pure pursuit,deep PP)。方法 在deep PP中,首先利用卷积神经网络(convolutional neural network,CNN)提取驾驶环境的视觉特征,同时使用传统的纯追随(pure pursuit,PP)控制器融合车辆运动模型以及自身位置计算跟踪给定的全局规划路径所需的转向控制量。然后,通过拼接PP的转向结果向量和视觉特征向量得到融合特征向量,并构建融合特征向量与人类转向行为之间的映射模型,最终实现预测无人驾驶汽车转向角度。结果 实验将在CARLA(Center for Advanced Research on Language Acquisition)仿真数据集和真实场景数据集上进行,并与Udacity挑战赛的CNN模型和传统控制器进行对比。实验结果显示,在仿真数据集的14个复杂天气条件下,deep PP比CNN模型和传统转向控制器更贴近无人驾驶仪的转向指令。在使用均方根误差(root mean square error,RMSE)作为衡量指标时,deep PP相比于CNN模型提升了50.28%,相比于传统控制器提升了35.39%。最后,真实场景实验验证了提出的模型在真实场景上的实用性。结论 本文提出的拟人化转向模型,综合了摄像头视觉信息、位置信息和车辆运动模型信息,使得无人驾驶汽车的转向行为更贴近人类驾驶行为,并在各种复杂驾驶条件下保持了高鲁棒性。
关键词
Human-like steering model for autonomous driving based on deep pure pursuit method
Shan Yunxiao1, Huang Runhui2, He Ze1, Gong Zhihao1, Jing Min3, Zou Xuesong3(1.School of Computer Science, Sun Yat-sen University, Guangzhou 510006, China;2.School of Intelligent Systems Engineering, Sun Yat-sen University, Guangzhou 510006, China;3.Zhuhai Yunzhou Intelligence Technology Co. Ltd., Zhuhai 519080, China) Abstract
Objective Path tracking is not a new topic, being part of the various components of an autonomous vehicle that aim to steer the vehicle to track a defined path. The traditional steering controller, which uses location information and path information to steer autonomous vehicles, cannot achieve human-like driving behaviors according to real-life driving scenes or environments. When human-like steering behavior is considered a feature in the steering model, the steering problem of autonomous vehicles becomes challenging. The traditional steering controller tracks the defined path by predicting the steering angle of the front wheel according to the current location of the vehicles and the path information, but it is only a purely mechanical driving behavior rather than a human-like driving behavior. Thus, researchers employ a neural network as a steering model, training the neural network by using the images captured from the front-facing camera mounted on the vehicle along with the associated steering angles either from the perspective of human beings or simulators; this network is also known as end-to-end neural network. Nevertheless, most of the existing neural networks consider only the visual camera frames as input, ignoring other available information such as location, motion, and model of vehicle. The training dataset of the end-to-end neural network is supposed to cover all kinds of driving weather or scenes, such as rainy day, snow day, overexposure, and underexposure, so that the network can learn as much as possible the relationship between the image frames and driving behaviors, and enhance the universality of the neural network. The end-to-end neural network also relies on large-scale training datasets to enhance the robustness of the network. Overdependence on cameras results in the steering performance being greatly affected by the environment. Therefore, the combination of the traditional steering controller and end-to-end neural network can complement each other's advantages. With the use of only small-scale datasets that cover fewer driving scenes for training, the control behaviors of the new network can be human-like, robust, and able to cover multiple driving scenes. In this paper, we proposed a fusion neural network framework called deep pure pursuit (deep PP) to incorporate a convolutional neural network (CNN) with a traditional steering controller to build a robust steering model. Method In this study, a human-like steering model that fuses visual geometry group network (VGG)-type CNN and a traditional steering controller is built. The VGG-type CNN consists of 8 layers, including three convolutional layers, three pooling layers, and two fully connected layers. It uses 3×3 non-stride convolutions with 32, 64, and 128 kernels are used. Following each convolutional layer, a 2×2 max-pooling layer with stride 2 is configured to decrease the used parameters. The fully connected layers are designed to function as a controller for steering. While CNN extracts visual features from video frames, PP is employed to utilize the location information and motion model information. Fifty target points of the defined path ahead of the vehicle are selected to calculate the predict front-wheel steering angle by PP. The minimum and maximum look-ahead distance of PP are separately set to 1.5 m and 20 m, respectively, ahead of the vehicle. After visual features from the CNN model and 50 steering angles from PP are extracted, a combinational feature vector is proposed to integrate visual features with 50 steering angles. The features are concatenated with the fully connected layers to build the mapping relationship. In our augmentation, the images are flipped and rotated to improve the self-recovery capacities from a poor location or orientation. In each image, the bottom 30 pixels and the top 40 pixels are cropped to remove the front of the car and most of the sky above the horizon, and then the processed images are resized to a lower resolution to accelerate the training and testing. Our model is implemented in Google's TensorFlow. The experiments are conducted on a Titan X GPU. The max number of epochs is set to 10. Each epoch contains 10 000 frames to train the model. The batch size is set to 32. Adam optimizer with learning rate 1E-4 is deployed to train our model. The activation function of our model is ReLU. Root mean square error(RMSE) was used to evaluate the performance of different models. Result To train and validate our proposed solution, we collect datasets by using CARLA(Center for Advanced Research on Language Acquisition) simulator and a real-life autonomous vehicle. In the simulation dataset, we trained the models under the ClearNoon weather parameter and evaluated on 14 instances of poor driving weather. In the real-life dataset, 13 080 frames are collected for training, and 2 770 frames are collected for testing. We compared our model with a CNN model of the Udacity challenge and a traditional steering controller, PP, in verifying the effectiveness of deep PP. Experiment results show that our steering model can track the steering commands from the autopilot in CARLA more closely than CNN and PP can under 14 instances of poor driving conditions and improve the RMSE by 50.28% and 35.39%, separately. In real-life experiments, the proposed model is tested on a real-life dataset to prove its applicability. The discussion of different look-ahead distance demonstrates that PP controller is sensitive to the look-head distance. The maximal deviation from the human driver's steering commands reaches 0.245 2 rad. The discussion of location noise on the PP controller and deep PP proves that deep PP can better maintain robustness to location drift. Conclusion In this study, we proposed a fusion neural network framework that incorporates visual features from the camera with additional location information and motion model information. Experiment results show that our model can track the steering commands of autopilot or human driver more closely than the CNN model of the Udacity challenge and PP and maintained high robustness under 14 poor driving conditions.
Keywords
|