Current Issue Cover
改进R-FCN模型的小尺度行人检测

刘万军, 董利兵, 曲海成(辽宁工程技术大学软件学院, 葫芦岛 125105)

摘 要
目的 为了有效解决传统行人检测算法在分辨率低、行人尺寸较小等情境下检测精度低的问题,将基于区域全卷积网络(region-based fully convolutional networks,R-FCN)的目标检测算法引入到行人检测中,提出一种改进R-FCN模型的小尺度行人检测算法。方法 为了使特征提取更加准确,在ResNet-101的conv5阶段中嵌入可变形卷积层,扩大特征图的感受野;为提高小尺寸行人检测精度,在ResNet-101中增加另一条检测路径,对不同尺寸大小的特征图进行感兴趣区域池化;为解决小尺寸行人检测中的误检问题,利用自举策略的非极大值抑制算法代替传统的非极大值抑制算法。结果 在基准数据集Caltech上进行评估,实验表明,改进的R-FCN算法与具有代表性的单阶段检测器(single shot multiBox detector,SSD)算法和两阶段检测器中的Faster R-CNN(region convolutional neural network)算法相比,检测精度分别提高了3.29%和2.78%;在相同ResNet-101基础网络下,检测精度比原始R-FCN算法提高了12.10%。结论 本文提出的改进R-FCN模型,使小尺寸行人检测精度更加准确。相比原始模型,改进的R-FCN模型对行人检测的精确率和召回率有更好的平衡能力,在保证精确率的同时,具有更大的召回率。
关键词
Small-scale pedestrian detection based on improved R-FCN model

Liu Wanjun, Dong Libing, Qu Haicheng(School of Software, Liaoning Technical University, Huludao 125105, China)

Abstract
Objective Pedestrian detection is a research hotspot in the field of image processing and computer vision, and it is widely used in fields such as automatic driving, intelligent monitoring, and intelligent robots. The traditional pedestrian detection method based on background modeling and machine learning can obtain a better pedestrian detection rate under certain conditions, but it cannot meet the requirements of practical applications. As deep convolutional neural networks have made great progress in general object detection, more and more scholars have improved the general object detection framework and introduced it to pedestrian detection. Compared with traditional methods, the accuracy and robustness of pedestrian detection based on deep learning methods have been improved significantly, and many breakthroughs have been made. However, the detection effect for small-scale pedestrians is not ideal. This is mainly due to a series of convolution pool operations of the convolutional neural network, which makes the feature map of small-scale pedestrians smaller, have a lower resolution, and lose serious information, leading to detection failure. To effectively solve the problem of low detection accuracy of traditional pedestrian detection algorithms in the context of low resolution and small pedestrian size, an object detection algorithm called region-based fully convolutional network (R-FCN) is introduced into pedestrian detection. This study proposes an improved small-scale pedestrian detection algorithm for R-FCN. Method The method in this study inherits the advantage of R-FCN, which employs the region proposal network to generate candidate regions of interest and position-sensitive score maps to classify and locate targets. At the same time, because the new residual network (ResNet-101) has less calculation, few parameters, and good accuracy, this study uses the ResNet-101 network as the basic network. Compared with the original R-FCN, this study mainly has the following improvements:Considering that the pedestrians in the Caltech dataset have multiple scale transformations, all 3×3 conventional convolutional layers of the Conv5 stage of ResNet-101 are first expanded into deformable convolutional layers. Therefore, the effective step size of the convolution block can be reduced from 32 pixels to 16 pixels, the expansion rate can be changed from 1 to 2, the pad is set to 2, and the step size is 1. Deformable convolution can increase the generalization ability of the model, expand the receptive field of the feature map, and improve the accuracy of R-FCN feature extraction. Then, another position-sensitive score map is added in the training phase. Because the feature distinguishing ability of Conv1-3 stages in ResNet-101 is weaker than that of Conv4 stage, a new layer of position-sensitive score map is added after the Conv4 layer to detect multi-scale pedestrians in parallel with the original position-sensitive score map after the Conv5 layer. Finally, the non-maximum suppression (NMS) method often leads to missed detection of neighbor pedestrians in crowed scenes. Therefore, this study improves the traditional NMS algorithm and proposes the NMS algorithm for bootstrap strategy to solve the problem of pedestrian misdetection. Result The experiment is evaluation on the benchmark dataset Caltech. The experimental results show that the improved R-FCN algorithm improves the detection accuracy by 3.29% and 2.78% compared with the representative single shot multiBox detector (SSD) algorithm of the single-stage detector and the faster region convolutional neural netowrk(Faster R-CNN) algorithm of the two-stage detector, respectively. Under the same ResNet-101 basic network, the detection accuracy is 12.10% higher than the original R-FCN algorithm. Online hard example mining (OHEM) is necessary for Caltech, which has achieved a 7.38% improvement because the Caltech dataset contains a large number of confounding instances in complex backgrounds, allowing the full use of OHEM. In the Conv5 stage of the ResNet-101 network, a deformable convolutional layer is used, which is 0.89% higher than the ordinary convolutional layer. Using the multi-path detection structure can increase the detection accuracy by 2.50%. The bootstrap strategy is used to correct the non-maximum suppression, which is 1.67% better than the traditional NMS algorithm. Conclusion The improved R-FCN model proposed in this study makes the detection accuracy of small-sized pedestrians more accurate and improves the phenomenon of pedestrian false detection in the case of low resolution. Compared with the original R-FCN model, the improved R-FCN model has a better ability to balance the accuracy rate and recall rate of pedestrian detection and has a greater recall rate when ensuring the accuracy rate. However, the accuracy of pedestrian detection in complex scenes is slightly low. Thus, future research will focus on improving the accuracy of pedestrian detection in complex scenes.
Keywords

订阅号|日报