视觉注意机制下结合语义特征的行人检测
黎宁1,2, 龚元1, 许莙苓1, 顾晓蓉3, 徐涛4, Zhou Huiyu5(1.南京航空航天大学 电子信息工程学院, 南京 211106;2.南京航空航天大学 雷达成像与微波光子技术教育部重点实验室, 南京 211106;3.南京航空航天大学理学院,南京 211106;4.中国民航大学中国民航信息技术科研基地, 天津 300300;5.School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast BT3 9DT, Unite Kingdom) 摘 要
目的 为研究多场景下的行人检测,提出一种视觉注意机制下基于语义特征的行人检测方法。方法 首先,在初级视觉特征基础上,结合行人肤色的语义特征,通过将自下而上的数据驱动型视觉注意与自上而下的任务驱动型视觉注意有机结合,建立空域静态视觉注意模型;然后,结合运动信息的语义特征,采用运动矢量熵值计算运动显著性,建立时域动态视觉注意模型;在此基础上,以特征权重融合的方式,构建时空域融合的视觉注意模型,由此得到视觉显著图,并通过视觉注意焦点的选择完成行人检测。结果 选用标准库和实拍视频,在Matlab R2012a平台上,进行实验验证。与其他视觉注意模型进行对比仿真,本文方法具有良好的行人检测效果,在实验视频上的行人检测正确率达93%。结论 本文方法在不同的场景下具有良好的鲁棒性能,能够用于提高现有视频监控系统的智能化性能。
关键词
Semantic feature-based visual attention model for pedestrian detection
Li Ning1,2, Gong Yuan1, Xu Junling1, Gu Xiaorong3, Xu Tao4, Zhou Huiyu5(1.College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;2.Key Laboratory of Radar Imaging and Microwave Photonics, Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;3.College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;4.Information Technology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China;5.School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast BT3 9DT, Unite Kingdom) Abstract
Objective Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism. Method The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model. Result Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video. Conclusion This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.
Keywords
|