Current Issue Cover
提升预测框定位稳定性的视频目标检测

郝腾龙1,2,3, 李熙莹1,2,3(1. 中山大学智能工程学院智能交通研究中心, 广州 510006;2.
2. 广东省智能交通系统重点实验室, 广州 510006;3.
3. 视频图像智能分析与应用技术公安部重点实验室, 广州 510006)

摘 要
目的 目前视频目标检测(object detection from video)领域大量研究集中在提升预测框定位准确性,对于定位稳定性提升的研究则较少。然而,预测框定位稳定性对多目标跟踪、车辆行驶控制等算法具有重要影响,为提高预测框定位稳定性,本文提出了一种扩张性非极大值抑制(expanded non-maximum suppression,Exp_NMS)方法和帧间平滑策略(frame bounding box smooth,FBBS)。方法 目标检测阶段使用YOLO(you only look once)v3神经网络,非极大值抑制阶段通过融合多个预测框信息得出结果,增强预测框在连续视频流中的稳定性。后续利用视频相邻帧信息关联的特点,对预测框进行平滑处理,进一步提高预测框定位稳定性。结果 选用UA-DETRAC(University at Albany detection and tracking benchmark dataset)数据集进行分析实验,使用卡尔曼滤波多目标跟踪算法进行辅助验证。本文在MOT(multiple object tracking)评价指标基础上,设计了平均轨迹曲折度(average track-tortuosity,AT)来直观、量化地衡量预测框定位稳定性及跟踪轨迹的平滑度。实验结果表明,本文方法几乎不影响预测框定位准确性,且对定位稳定性有大幅改善,相应跟踪质量得到显著提升。测试视频的MOTA(multiple object tracking accuracy)提升6.0%、IDs(identity switches)减少16.8%,跟踪FP(false positives)类型错误下降45.83%,AT下降36.57%,mAP(mean average precision)仅下降0.07%。结论 从非极大值抑制和前后帧信息关联两个角度设计相关策略,经实验验证,本文方法在基本不影响预测框定位准确性的前提下,可有效提升预测框定位稳定性。
关键词
Video object detection method for improving the stability of bounding box

Hao Tenglong1,2,3, Li Xiying1,2,3(1. Research Centre of Intelligent Transportation System, School of Engineering, Sun Yat-sen University, Guangzhou 510006, China;2.
2. Key Laboratory of Intelligent Transportation System of Guangdong Province, Guangzhou 510006, China;3.
3. Key Laboratory of Video and Image Intelligent Analysis and Application Technology, Ministry of Public Security, People's Republic of China, Guangzhou 510006, China)

Abstract
Objective With the development of convolutional neural networks (CNNs), the speed and accuracy of CNN-based object detection algorithms have remarkably improved. However, the bounding boxes of the same target change intensively in adjacent frames when the algorithms are applied to the videos frame by frame, thereby reflecting the poor stability of the bounding box. This problem has received minimal attention because the object detection for single image does not have this problem. In the object detection from video (VID), stability refers to whether the bounding box of the same target changes smoothly and uniformly in successive video frames. Accuracy refers to the degree of overlap between the bounding box and the actual position. Mean average precision (mAP) is the commonly used evaluation index. It only considers the accuracy and ignores the stability. However, the stability of bounding box is extremely important for engineering applications. In self-driving systems, system stability is directly related to driving safety. At present, the self-driving study enters the L5 stage, and the vehicle driving control needs to sense and predict the movement of surrounding vehicles and pedestrians to make decisions rather than simply reacting in accordance with specific external conditions. Object detection is the basic algorithm of self-driving system to sense the surrounding environment. Poor stability negatively impact all the algorithms that analyze the object detection result, ultimately reducing the stability of the entire self-driving system and creating potential safety hazards. Thus, designing strategies to solve this problem are necessary. We propose expanded non-maximum suppression (Exp_NMS) and frame bounding box smoothing (FBBS) strategies in this paper. Method We design the Exp_NMS and FBBS strategies on the basis of YOLO(you only look once)v3 object detection algorithm. The overall process of the algorithm is to send the video frame by frame to the YOLOv3 network for object detection. We then use Exp_NMS to eliminate redundant bounding boxes and utilize FBBS to smooth the results. In the Exp_NMS strategy, the results are obtained by fusing multiple bounding box information because the original NMS strategy may directly discard some bounding boxes and cause poor stability. In the FBBS strategy, we use the adjacent frame information association thinking, which is widely used in VID algorithms. Different from conventional strategies, FBBS uses least squares regression to achieve information transmission between adjacent frames rather than additional information, such as optical flow. FBBS has a certain optimization effect on multidetection and missed detection errors and has a better effect on the stability problem. Result The scenarios in engineering applications are variable and complicated. Thus, the scenarios in training dataset should be as many as possible in the experiment. This paper uses MIO-TCD(miovision traffic camera dataset) as the object detection training dataset collected from thousands of real traffic scenarios and utilize UA-DETRAC(University at Albany datection and tracking benchmark dataset) as the test dataset. The MIO-TCD dataset cannot evaluate the multiobject tracking results. This paper uses YOLOv3 and Kalman filter multiobject tracking algorithms for verification experiments. The stability of the bounding box has a significant effect on the tracking algorithm, and most tracking algorithms are based on Kalman filter. This paper designs a parameter called average track-tortuosity (AT) to measure the stability of the bounding box and the smoothness of the tracking trajectory. Experimental results prove that our method can significantly improve the stability of the bounding box without affecting its accuracy, and the accuracy of the tracking algorithm is improved. Multiple object tracking accuracy is increased by 6.0%, and track id switch is reduced by 16.8% when Exp_NMS and FBBS are used. The number of tracking false positive errors is reduced by 45.83%, the AT is decreased by 36.57%, and mAP is only reduced by 0.07%. Conclusion In this paper, we design two strategies from the perspective of NMS and adjacent frame information association by analyzing the causes and manifestations of the bounding box stability problem. The experimental results show that the two strategies can significantly enhance the stability of bounding box without affecting its accuracy.
Keywords

订阅号|日报