Current Issue Cover
快速深度学习的鲁棒视觉跟踪

戴铂, 侯志强, 余旺盛, 胡丹, 范舜奕(空军工程大学信息与导航学院, 西安 710077)

摘 要
目的 基于深度学习的视觉跟踪算法具有跟踪精度高、适应性强的特点,但是,由于其模型参数多、调参复杂,使得算法的时间复杂度过高。为了提升算法的效率,通过构建新的网络结构、降低模型冗余,提出一种快速深度学习的算法。方法 鲁棒特征的提取是视觉跟踪成功的关键。基于深度学习理论,利用海量数据离线训练深度神经网络,分层提取描述图像的特征;针对网络训练时间复杂度高的问题,通过缩小网络规模得以大幅缓解,实现了在GPU驱动下的快速深度学习;在粒子滤波框架下,结合基于支持向量机的打分器的设计,完成对目标的在线跟踪。结果 该方法精简了特征提取网络的结构,降低了模型复杂度,与其他基于深度学习的算法相比,具有较高的时效性。系统的跟踪帧率总体保持在22帧/s左右。结论 实验结果表明,在目标发生平移、旋转和尺度变化,或存在光照、遮挡和复杂背景干扰时,本文算法能够实现比较稳定和相对快速的目标跟踪。但是,对目标的快速移动和运动模糊的鲁棒性不够高,容易受到相似物体的干扰。
关键词
Robust visual tracking via fast deep learning

Dai Bo, Hou Zhiqiang, Yu Wangsheng, Hu Dan, Fan Shunyi(Institute of Information and Navigation, Air Force Engineering University, Xi'an 710077, China)

Abstract
Objective Deep learning-based trackers can always achieve high tracking precision and strong adaptability in diff-erent scenarios. However, because the number of the parameter is large and finetuning is challenging, the time complexity is high. To improve efficiency, we proposed a tracker based on fast deep learning through construction of a new network with less redundancy. Method The feature extractor plays the most important role in a visual tracking system. Based on the theory of deep learning, we proposed a deep neural network to describe essential features of images. Fast deep learning can be achieved by restricting the network size. With the help of GPU(graphics processing unit), the time complexity of the network training is released to a large extent. Under the framework of particle filter, the proposed method combined the deep learning extractor with a support vector machine scoring professor to distinguish the target from the background. Result The condensed network structure reduced the complexity of the model. Compared with other deep learning-based trackers, the proposed method can achieve higher efficiency. The frame rate is kept at 22 frames per second on average. Conclusion Experiments on an open tracking benchmark demonstrate that both the robustness and timeliness of the proposed tracker is promising when the appearance of the target changes contains translation, rotation, and scale, or the interference contains illumination, occlusion, and cluttered background. Unfortunately, the tracker is not robust enough when the target moves fast or the motion blur and some similar objects exist.
Keywords

订阅号|日报