Current Issue Cover
面向智能驾驶的平行视觉感知:基本概念、框架与应用

李轩1,2,3, 王飞跃3(1.鹏城实验室, 深圳 518055;2.北京理工大学自动化学院, 北京 100081;3.中国科学院自动化研究所复杂系统管理与控制国家重点实验室, 北京 100190)

摘 要
目的 视觉感知技术是智能车系统中的一项关键技术,但是在复杂挑战下如何有效提高视觉性能已经成为智能驾驶领域的重要研究内容。本文将人工社会(artificial societies)、计算实验(computational experiments)和平行执行(parallel execution)构成的ACP方法引入智能驾驶的视觉感知领域,提出了面向智能驾驶的平行视觉感知,解决了视觉模型合理训练和评估问题,有助于智能驾驶进一步走向实际应用。方法 平行视觉感知通过人工子系统组合来模拟实际驾驶场景,构建人工驾驶场景使之成为智能车视觉感知的“计算实验室”;借助计算实验两种操作模式完成视觉模型训练与评估;最后采用平行执行动态优化视觉模型,保障智能驾驶对复杂挑战的感知与理解长期有效。结果 实验表明,目标检测的训练阶段虚实混合数据最高精度可达60.9%,比单纯用KPC(包括:KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute),PASCAL VOC(pattern analysis,statistical modelling and computational learning visual object classes)和MS COCO(Microsoft common objects in context))数据和虚拟数据分别高出17.9%和5.3%;在评估阶段相较于基准数据,常规任务(-30°且垂直移动)平均精度下降11.3%,环境任务(雾天)平均精度下降21.0%,困难任务(所有挑战)平均精度下降33.7%。结论 本文为智能驾驶设计和实施了在实际驾驶场景难以甚至无法进行的视觉计算实验,对复杂视觉挑战进行分析和评估,具备加强智能车在行驶过程中感知和理解周围场景的意义。
关键词
Parallel visual perception for intelligent driving: basic concept, framework and application

Li Xuan1,2,3, Wang Feiyue3(1.Peng Cheng Laboratory, Shenzhen 518055, China;2.School of Automation, Beijing Institute of Technology, Beijing 100081, China;3.The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China)

Abstract
Objective As a promising solution to traffic congestion and accidents, intelligent vehicles are receiving increasing attention. Efficient visual perception technology can meet the safety, comfortable, and convenience requirements of intelligent vehicles. Therefore, visual perception is a key technology in intelligent vehicle systems. Intelligent driving focuses on improving visual performance under complex tasks. However, the complex imaging conditions bring significant challenges to visual perception research. As we know, vision models rely on diverse datasets to ensure performance. Unfortunately, obtaining annotations by hand is cumbersome, labor intensive, and error prone. Moreover, the cost of data collection and annotation is high. As a result of the limitation of model design and data diversity, general visual tasks still face problems such as weather and illumination changes, and occlusions. A critical question arises naturally: How could we ensure that an intelligent vehicle is able to drive safely in complex and challenging traffic? In this paper, the artificial systems, computational experiments, and parallel execution (ACP) method is introduced into the field of visual perception. We propose parallel visual perception for intelligent driving. The purpose of this paper is to solve the problem of reasonable training and evaluation of the vision model of intelligent driving, which is helpful for the further application of intelligent vehicles. Method Parallel visual perception consists of three parts: artificial driving scene, computational experiments, and parallel execution. Specifically, artificial driving scene is a scene defined by software, which is completed by modern 3D model software, computer graphics, and virtual reality. Artificial driving scene modeling adopts the combination of artificial subsystems, which is helpful for intelligent driving to perceive and understand the experiment of complex conditions. In the artificial scene, we use computer graphics to automatically generate accurate ground-truth labels, including semantic/instance segmentation, object bounding box, object tracking, optical flow, and depth. According to the imaging conditions, we design 19 challenging tasks divided into normal, environmental, and difficult tasks. The reliability of the vision model requires repeatable computational experiments to obtain the optimal solution. Two models of computational experiments are used, namely, learning and training, and experiment and evaluation. In the training stage, the artificial driving scene provides a large variety of virtual images, which, combined with the real images, can improve the performance of the vision model. Therefore, the experiment can be conducted in an artificial driving scene at a low cost and with high efficiency. In the evaluation stage, complex imaging conditions (weather, illumination, and occlusion) in an artificial driving scene can be used to comprehensively evaluate the performance of the vision model. The vision algorithm can be specially tested, which is helpful to improve the visual perception performance of intelligent driving. The parallel execution in artificial and real driving scenes can ensure dynamic and long-term vision model training and evaluation. Through the virtual and real interaction method, the experimental results of the vision model in the artificial driving scene can become a possible result of the real system. Result This paper presents a systematic method to design driving scene tasks and generate virtual datasets for vehicle intelligence testing research. Currently, the virtual dataset consists of 39 010 frames (virtual training data with 27 970 frames, normal tasks with 5 520 frames, environmental tasks with 2 760 frames, and difficult tasks with 2 760 frames) taken from our constructed artificial scenes. In addition, we conduct a series of comparative experiments for visual object detection. In the training stage, the experimental results show that the training data with large scale and diversity can greatly improve the performance of object detection. In addition, the data augmentation method can significantly improve the accuracy of the vision models. For instance, the highest accuracy of the mixed training sets is 60.9%, and that of KPC(KITTI(Karlsruhe Institute of Technology and Toyta Technological Institute), PASCAL VOC(pattern analysis, statistical modelling and computational learning visual object classes), MS COCO(Microsoft common objects in context)) and pure virtual data decreased by 17.9% and 5.3%, respectively. In the evaluation stage, compared with the baseline model, the average accuracy of normal tasks (-30° and up-down) decreased by 11.3%, environmental tasks (fog) by 21.0%, and difficult tasks (all challenges) by 33.7%. Experimental results suggest that 1) object detectors are slightly disturbed under different camera angles and are more challenged when the height and angle of the camera are changed simultaneously. The vision model of intelligent vehicle is prone to overfitting, which is why object detection can be performed under limited conditions only; 2) the vision model cannot obtain the features of different environments from the training data. Therefore, bad weather (e.g., fog and rain) causes a stronger degradation of performance than normal tasks; and 3) the performance of object detection will be greatly influenced in difficult tasks, which is mainly caused by the poor generalization performance of the vision model. Conclusion In this study, we use computer graphics, virtual reality technology, and machine learning theory to build artificial driving scenes and generate a realistic and challenging virtual driving dataset. On this basis, we conduct visual perception experiments under complex imaging conditions. The vision models of intelligent vehicle are effectively trained and evaluated in artificial and real driving scenes. In the future, we plan to add more visual challenges to the artificial driving scene.
Keywords

订阅号|日报