强化学习的自动驾驶控制技术研究进展

潘峰; 鲍泓

发布时间： 2021-01-15
摘要点击次数： 5391
全文下载次数： 1465
DOI: 10.11834/jig.200428
2021 | Volume 26 | Number 1

强化学习的自动驾驶控制技术研究进展

潘峰^1,2, 鲍泓²(1.北京化工大学, 北京 100029;2.北京联合大学, 北京 100101)

摘要

自动驾驶车辆的本质是轮式移动机器人，是一个集模式识别、环境感知、规划决策和智能控制等功能于一体的综合系统。人工智能和机器学习领域的进步极大推动了自动驾驶技术的发展。当前主流的机器学习方法分为：监督学习、非监督学习和强化学习3种。强化学习方法更适用于复杂交通场景下自动驾驶系统决策和控制的智能处理，有利于提高自动驾驶的舒适性和安全性。深度学习和强化学习相结合产生的深度强化学习方法成为机器学习领域中的热门研究方向。首先对自动驾驶技术、强化学习方法以及自动驾驶控制架构进行简要介绍，并阐述了强化学习方法的基本原理和研究现状。随后重点阐述了强化学习方法在自动驾驶控制领域的研究历史和现状，并结合北京联合大学智能车研究团队的研究和测试工作介绍了典型的基于强化学习的自动驾驶控制技术应用，讨论了深度强化学习的潜力。最后提出了强化学习方法在自动驾驶控制领域研究和应用时遇到的困难和挑战，包括真实环境下自动驾驶安全性、多智能体强化学习和符合人类驾驶特性的奖励函数设计等。研究有助于深入了解强化学习方法在自动驾驶控制方面的优势和局限性，在应用中也可作为自动驾驶控制系统的设计参考。

关键词

自动驾驶决策控制马尔可夫决策过程强化学习数据驱动自主学习

Research progress of automatic driving control technology based on reinforcement learning

Pan Feng^1,2, Bao Hong²(1.Beijing University of Chemical Technology, Beijing 100029, China;2.Beijing Union University, Beijing 100101, China)

Abstract

Research on fully automatic driving has been largely spurred by some important international challenges and competitions, such as the well-known Defense Advanced Research Projects Agency Grand Challenge held in 2005. Self-driving cars and autonomous vehicles have migrated from laboratory development and testing conditions to driving on public roads. Self-driving cars are autonomous decision-making systems that process streams of observations coming from different on-board sources, such as cameras, radars, lidars, ultrasonic sensors, global positioning system units, and/or inertial sensors. The development of autonomous vehicles offers a decrease in road accidents and traffic congestions. Most driving scenarios can be simply solved with classical perception, path planning, and motion control methods. However, the remaining unsolved scenarios are corner cases where traditional methods fail. In the past decade, advances in the field of artificial intelligence (AI) and machine learning (ML) have greatly promoted the development of autonomous driving. Autonomous driving is a challenging application domain for ML. ML methods can be divided into supervised learning, unsupervised learning, and reinforcement learning (RL). RL is a family of algorithms that allow agents to learn how to act in different situations. In other words, a map or a policy is established from situations (states) to actions to maximize a numerical reward signal. Most autonomous vehicles have a modular hierarchical structure and can be divided into four components or four layers, namely, perception, decision making, control, and actuator. RL is suitable for decision making and control in complex traffic scenarios to improve the safety and comfort of autonomous driving. Traditional controllers utilize an a priori model composed of fixed parameters. When robots or other autonomous systems are used in complex environments, such as driving, traditional controllers cannot foresee every possible situation that the system has to cope with. An RL controller is a learning controller and uses training information to learn their models over time. With every gathered batch of training data, the approximation of the true system model becomes accurate. Deep neural networks have been applied as function approximators for RL agents, thereby allowing agents to generalize knowledge to new unseen situations, along with new algorithms for problems with continuous state and action spaces. This paper mainly introduces the current status and progress of the application of RL methods in autonomous driving control. This paper consists of five sections. The first section introduces the background of autonomous driving and some basic knowledge about ML and RL. The second section briefly describes the architecture of autonomous driving framework. The control layer is an important part of an autonomous vehicle and has always been a key area of autonomous driving technology research. The control system of autonomous driving mainly includes lateral control and longitudinal control, namely, steering control and velocity control. Lateral control deals with the path tracking problem, and longitudinal control deals with the problem of tracking the reference speed and keeping a safe distance from the preceding vehicle. The third section introduces the basic principles of RL methods and focuses on the current research status of RL in autonomous driving control. RL algorithms are based on Markov decision process and aim to learn mapping from situations to actions to maximize a scalar reward or reinforcement signal. RL is a new and extremely old topic in AI. It gradually became an active and identifiable area of ML in 1980 s. Q-learning is a widely used RL algorithm. However, it is based on tabular setting and can only deal with those problems with low dimension and discrete state/action spaces. A primary goal of AI is to solve complex tasks from unprocessed, high-dimensional, sensory input. Significant progress has been made by combining deep learning for sensory processing with RL, resulting in the “deep Q network” (DQN) algorithm that is capable of human-level performance on many Atari video games using unprocessed pixels for input. However, DQN can only handle discrete and low-dimensional action spaces. Deep deterministic policy gradient was proposed to handle those problems with continuous state/action spaces. It can learn policies directly from raw pixel inputs. The fourth section generalizes some typical applications of RL algorithm in autonomous driving, including some studies of our team. Unlike supervised learning, RL is more suitable for decision making and control of autonomous driving. Most of the RL algorithms used in autonomous driving mostly combine deep learning and use raw pixels as input to achieve end-to-end control. The last section discusses the challenges encountered in the application of RL algorithms in autonomous driving control. The first challenge is how to deploy the RL model trained on a simulator to run in a real environment and ensure safety. The second challenge is the RL problem in an environment with multiple participants. Multiagent RL is a direction of RL development, but training multiagents is more complicated than training a single agent. The third challenge is how to train an agent with a reasonable reward function. In most RL settings, we typically assume that a reward function is given, but this is not always the case. Imitation learning and reverse RL provide an effective solution for obtaining the real reward function that makes the performance of the agent close to a human. This article helps to understand the advantages and limitations of RL methods in autonomous driving control, the potential of deep RL, and can serve as reference for the design of automatic driving control systems.

Keywords

autonomous driving decision control Markov decision process(MDP) reinforcement learning(RL) data-driven autonomous learning

在线采编平台

论文出版

年度会议

下载中心

年度信息