Current Issue Cover
双目机器视觉及RetinaNet模型的路侧行人感知定位

连丽容1, 罗文婷1, 秦勇2, 李林1(1.福建农林大学交通与土木工程学院, 福州 350000;2.北京交通大学轨道交通控制与安全国家重点实验室, 北京 100084)

摘 要
目的 行人感知是自动驾驶中必不可少的一项内容,是行车安全的保障。传统激光雷达和单目视觉组合的行人感知模式,设备硬件成本高且多源数据匹配易导致误差产生。对此,本文结合双目机器视觉技术与深度学习图像识别技术,实现对公共路权环境下路侧行人的自动感知与精准定位。方法 利用双目道路智能感知系统采集道路前景图像构建4种交通环境下的行人识别模型训练库;采用RetinaNet深度学习模型进行目标行人自动识别;通过半全局块匹配(semi-global block matching,SGBM)算法实现行人道路前景图像对的视差值计算;通过计算得出的视差图分别统计U-V方向的视差值,提出结合行人识别模型和U-V视差的测距算法,实现目标行人的坐标定位。结果 实验统计2.5 km连续测试路段的行人识别结果,对比人工统计结果,本文算法的召回率为96.27%。与YOLOv3(you only look once)和Tiny-YOLOv3方法在4种交通路况下进行比较,平均F值为96.42%,比YOLOv3和Tiny-YOLOv3分别提高0.9%和3.03%;同时,实验利用标定块在室内分别拍摄3 m、4 m和5 m不同距离的20对双目图像,验证测距算法,计算标准偏差皆小于0.01。结论 本文提出的结合RetinaNet目标识别模型与改进U-V视差算法能够实现对道路行人的检测,可以为自动驾驶的安全保障提供技术支持,具有一定的应用价值。
关键词
Roadside pedestrian detection and location based on binocular machine vision and RetinaNet

Lian Lirong1, Luo Wenting1, Qin Yong2, Li Lin1(1.School of Traffic and Civil Engineering, Fujian Agriculture and Forestry University, Fuzhou 350000, China;2.State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100084, China)

Abstract
Objective Deep learning has been widely used in the field of computer vision. The application of target recognition on driverless vehicles field via using the extraction based on convolutional neural networks (CNNs). However, the environment of traffic road is complex and changeable, it is difficult to achieve obstacle detection under the actual traffic conditions. The variable characteristic of yielded traffic pedestrian makes pedestrian detection more prominent in road obstacle detection. 1) Currently, most pedestrian recognition models are trained and tested based on a simple background, and few researches have been done on the recognition effect of pedestrian targets in complex road traffic realities. Image parallax has been customized in target ranging based on the development of binocular stereo vision. Image pairs have been captured via binocular stereo vision cameras. Parallax value for left and right images have been calculated based stereo matching algorithms. The depth maps have been obtained based on disparity maps further. Ultimately, the detection of road obstacles is implemented. 2) The difficulties to extract, match and track image sequence feature points and reconstruct projection scenes have been resolving. A new algorithm has been proposed to extract obstacle coordinate information on U-V histograms via counting disparity values in the U-V direction. The two-dimensional plane information in the original image has been converted into line segment information in the U-V direction via calculating the U-V parallax image. Least squares method, Hough transform and other line extraction methods have been used to extract road and obstacle-related line segments further. 3) This type of method is simple to calculate and is conducive to real-time performance, but has a large impact on noise in complex environments. The methodology which combines deep learning and modifies U-V parallax algorithm has proposed to realize the detection of road pedestrians (including recognition and location of pedestrian) that improve the driving safety of vehicles on the road. Method The binocular road intelligent perception system has been used to collect road pedestrian foreground images. The training dataset has been established based on the data collected under four types of roadways. RetinaNet model has been utilized on pedestrian recognition. A deep residual network (ResNet) has been as a feature extraction network. The feature pyramid network (FPN) has been used to form multi-scale features to strengthen the feature network containing multi-scale target information. The two feature networks have been applied respectively. Two fully convolutional network (FCN) subnetworks with the same structure with different parameters have been used to implement tasks including the target box category classification and bounding box position regression. Pedestrian data library has been established to feed RetinaNet network for training and testing in training phase. The trials-based batch size has been set to 24 and learning rate has been to be 0.000 1. The accomplishment completion of training process has reached 100 epochs. Random 400 samples have been chosen from training samples as validation data to test the model performance in each time of training. Counting iteration loss value in each epoch and selected the model corresponding to the minimum value as pedestrian recognition model. The horizontal gradient filtering has been conducted on the left image, and then calculates the Birchfield and Tomasi (BT) cost value of the left and right images have been calculated subsequently. The cost value of the left and right images has been fused, and the current cost value has been substituted replaced based on the sum of the cost value of the area around the pixel via traversing pixel by pixel. The cost value has been optimize using semi-global matching (SGM) cost aggregation algorithm. The disparity corresponding to the lowest matching error has been opted to calculate the image disparity based on winner takes all (WTA). The false parallax value has been eliminated via confidence detection, and the parallax holes have been supplemented via sub-pixel interpolation. The left and right consistency has been used to eliminate the parallax error caused by the left and right occlusion. The disparity map has presented noisy due to the interference of the complicated environment of the traffic road. First, the median filtering has been used to perform preliminary denoising processing on the disparity map to obtain a better disparity map. The parallax statistical range has narrowed to inside bounding box to remove irrelevant parallax interference as much as possible. Next, through traversing all the parallax values within the target pedestrian rectangular bounding box to find. The maximum parallax value has replaced all other parallax values in the bounding box. The number of disparities in the U-V direction has been re-counted based on the improved disparity map. At last, the coordinate positions of pedestrians have been obtained. The improved U-V parallax algorithm has filled the parallax holes inside of the bounding box and replaced the noise parallax with the maximum parallax value to improve the accuracy of pedestrian positioning. Result Compared with the artificial statistical results, the recall rate is 96.27% based on the experimental statistics of the pedestrian recognition results of the self-training RetinaNet model of the 2 500 m continuous test section. In comparison of the you only look once v3 (YOLOv3) and Tiny-YOLOv3 methods under four traffic conditions, the average F-value can reach 96.42%, 0.9% higher than YOLOv3, and 3.03% higher than Tiny-YOLOv3. A calibration block to shoot 20 pairs of binocular images at different distances of 3 m, 4 m, and 5 m in the laboratory to verify the distance measurement algorithm. The calculated standard deviation has been less than 0.01. Conclusion In this study, RetinaNet model combined with U-V parallax algorithm have been proposed to identify and positioning the pedestrians. Effectively pedestrian detection in the traffic environment has been proposed, and it is significance for the safety of driverless vehicles.
Keywords

订阅号|日报