自适应多特征融合相关滤波目标跟踪
摘 要
目的 针对现实场景中跟踪目标背景复杂、光照变化、快速运动、旋转等问题,提出自适应多特征融合的相关滤波跟踪算法。方法 提取目标的HOG(histogram of oriented gradients)特征和利用卷积神经网络提取高、低层卷积特征,借助一种自适应阈值分割方法评估每种特征的有效性,得到特征融合的权重比。根据权重系数融合每种特征的响应图,并据此得到目标的新估计位置,利用尺度相关滤波器计算目标尺度,得到目标尺度完成跟踪。结果 在OTB(object tracking benchmark)-2013公开数据集上进行实验,在对多特征融合进行分析的基础上,测试了本文算法在11种不同属性下的跟踪性能,并与当前流行的7种算法进行对比分析。结果表明,本文算法的成功率和精确度均排名第1,相较于基准算法DSST (discriminative scale space tracking)跟踪精确度提高了4%,成功率提高了6%。在复杂场景下比其他主流算法更具有鲁棒性。结论 本文算法以DSST相关滤波跟踪器为基准算法,借助自适应阈值分割方法评估每种特征的有效性,自适应融合两层卷积特征和HOG特征,使得判别性越强的单一特征融合权重越大,较好表达了目标的外观模型,在背景复杂、目标消失、光照变化、快速运动、旋转等场景下表现出较强的跟踪准确性。
关键词
Correlation filter target tracking algorithm based on adaptive multifeature fusion
Zhang Yanlin, Qian Xiaoyan, Zhang Miao, Ge Hongjuan(College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China) Abstract
Objective Target tracking is one of the basic problems in the field of computer vision. It is widely used in security monitoring, military operations, and automatic driving, among others. Tracking algorithms based on correlation filtering have been developed rapidly in recent years because of their fast and efficient features. However, designing a robust tracking algorithm remains a challenging problem due to background clutter, illumination variation, fast motion, rotation, and other complex factors. Building an effective appearance model is a key factor in tracking the success of an algorithm. The expressions of the current appearance model have two major types. In the first type, the appearance model is based on manual design. The common artificial design appearance model is the histogram of oriented gradients (HOG) feature because this feature can efficiently describe the contour and shape information of a target by calculating the direction gradient of the local area of the detected image. In the second type, the appearance model is based on deep learning. Low-level convolutional features contain rich texture information but are unable to adapt to background changes. High-level convolutional features contain rich semantic information that distinguishes backgrounds from targets even in complex contexts. Different informations of an image are described due to varying features; thus, this study proposes a correlation filter tracking method to achieve the effect of adaptive multifeature fusion. Method In this work, the DSST(discriminative scale space tracking) correlation filter is adopted as the benchmark algorithm, and conv1 and conv5 of the convolutional neural network (CNN) ImageNet-VGG (visual geometry group)-2 048 are used. First, the HOG feature of the target is extracted. Then, the high-and low-level convolutional features of the target are extracted using the CNN. The characteristic response graph is obtained. The maximum peak and shape of the response graph reflect the accurate information of the tracking results. Second, to evaluate the validity of the feature, using the area ratio of the peak as the new index is proposed to distinguish the confidence level of the correlation response graph. The validity of each feature is evaluated using an adaptive threshold segmentation method. If the peak of the response graph is sharp and the periphery is smooth, then the tracking result is reliable. The weight ratio of the feature fusion is obtained, such that the feature is improved and the fusion coefficient is increased. Lastly, the response graph of each feature is fused in accordance with the fusion coefficient, the final response output is calculated, and the target response position is determined from the maximum response value in the response graph. Scale-dependent filter estimation scales are reintroduced to achieve adaptive target tracking. Result To effectively evaluate the performance of the proposed method, the algorithm is tested on the public dataset OTB (object tracking benchmark)-2013. The 50 videos mostly contain 11 different challenges encountered in the target tracking process (including background complex, deformation, object disappearance, and scale variation). This study compares the algorithm with seven mainstream algorithms. The accuracy and success rate are used as the evaluation and tracking performance indicators. These algorithms are divided into two major categories: traditional tracking algorithms with representative and top ranks: ASLA (adaptive structural local sparse appearance), SCM (sparsity-based collaborative model), TLD (tracking-learning-detection). Correlation filtering algorithms: CFNet (correlation filter networks), KCF (kernel correlation filter), DSST, SAMF (scale adaptive with multiple features). The experimental results show that the proposed algorithm achieves the highest success rate and accuracy compared with the other algorithms. The accuracy of the proposed method on the OTB-2013 dataset is 77.8%. Those of the other algorithms are as follows: CFNet (76.1%), DSST (74.6%), KCF (73.5%), SAMF (72.5%), and the traditional algorithm SCM (67.8%). The success rate of the algorithm proposed in this work is 71.5%. Those of the other algorithms are CFNet (71.4%), DSST (67.5%), SAMF (66.2%), and KCF (61.1%). The algorithm presented in this study increases tracking accuracy by 4% and improves success rate by 6%. From the aforementioned experimental data analysis, the method can effectively improve tracking performance. The proposed method ranks first in terms of accuracy compared with CFNet, DSST, SAMF, and KCF in seven attributes: background clutter, deformation, out-of-view, illumination variation, in-plane rotation, out-of-plane rotation, and fast motion. Compared with the other algorithms, the algorithm proposed in this study achieves the highest success rate in the scenes of nine attributes. Conclusion Different information of an image are described due to varying features; thus, this study proposes a correlation filter tracking method to achieve the effect of adaptive multifeature fusion. A CNN is used to extract high- and low-layer convolutional and HOG features. The adaptive threshold segmentation method is proposed to evaluate the validity of each feature. The two-layer convolutional and HOG features are adaptively fused. The response graph is fused in accordance with the fusion coefficient using feature validity analysis. Compared with most feature fusion methods that connect features serially or in parallel, this algorithm increases the fusion weight of a single feature with strong discriminability, and the appearance model of the target can be accurately represented. Therefore, the proposed algorithm exhibits strong robustness and tracking accuracy in scenarios with low resolution and scale variation. The presented target tracking method will be further studied in the future under occlusion, motion blur, and fast motion conditions.
Keywords
target tracking convolution feature correlation filter feature fusion adaptive threshold segmentation
|