加权多特征外观表示的实时目标追踪
摘 要
目的 目标跟踪是计算机视觉领域重点研究方向之一,在智能交通、人机交互等方面有着广泛应用。尽管目前基于相关滤波的方法由于其高效、鲁棒在该领域取得了显著进展,但特征的选择和表示一直是追踪过程中建立目标外观时的首要考虑因素。为了提高外观模型的鲁棒性,越来越多的跟踪器中引入梯度特征、颜色特征或其他组合特征代替原始灰度单一特征,但是该类方法没有结合特征本身考虑不同特征在模型中所占的比重。方法 本文重点研究特征的选取以及融合方式,通过引入权重向量对特征进行融合,设计了基于加权多特征外观模型的追踪器。根据特征的计算方式,构造了一项二元一次方程,将权重向量的求解转化为确定特征的比例系数,结合特征本身的维度信息,得到方程的有限组整数解集,最后通过实验确定最终的比例系数,并将其归一化得到权重向量,进而构建一种新的加权混合特征模型对目标外观建模。结果 采用OTB-100中的100个视频序列,将本文算法与其他7种主流算法,包括5种相关滤波类方法,以精确度、平均中心误差、实时性为评价指标进行了对比实验分析。在保证实时性的同时,本文算法在Basketball、DragonBaby、Panda、Lemming等多个数据集上均表现出了更好的追踪结果。在100个视频集上的平均结果与基于多特征融合的尺度自适应跟踪器相比,精确度提高了1.2%。结论 本文基于相关滤波的追踪框架在进行目标的外观描述时引入权重向量,进而提出了加权多特征融合追踪器,使得在复杂动态场景下追踪长度更长,提高了算法的鲁棒性。
关键词
Real-time visual tracking via weighted multi-feature fusion on an appearance model
Chen Yingying, Fang Sheng, Li Zhe(College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China) Abstract
Objective Visual tracking is an important research direction in the field of computer vision and is widely applied in intelligent transportation, human-computer interaction, and other areas. Correlation filter-based trackers (CFTs) have achieved excellent performance due to their efficiency and robustness in tracking field. However, the design of a robust tracking algorithm for complex dynamic scenes is challenging due to the influence of lighting, fast motion, background interference, target rotation, scale change, occlusion, and other factors. In addition, the selection and presentation of features are constantly used as the primary considerations in establishing a target appearance model during tracking. To improve the robustness of the appearance model, many trackers introduce gradient feature, color feature, or several other combined features rather than a single gray feature. However, they do not discuss the role of each feature and their relationships in the model. Method The research on correlation filter theory achieves remarkable improvements. On the basis of this research, the appearance model is used to represent the target and verify the observation. This process is the most important part of any tracking algorithm. Moreover, the features are fundamental and difficult in appearance representation. Therefore, this study mainly focuses on the selection and combination of features. Gradient feature, color feature, and raw pixel have been discussed in previous works. As a common descriptor of shape and edge, gradient feature is invariable in translation and light and performs well in the tracking scene of deformation, light change, and partial occlusion. However, the gradient feature of the target is not evident, and the description capability of the feature is weakened when considerable noise is encountered in the background, target rotation, and target blur. The color of the target and background can be distinguished although they are usually different. On this basis, a new tracking method called weighted multi-feature fusion (WMFF) tracker is proposed via the introduction of a weight vector to fuse multiple feature on the appearance model. The model is dominated by gradient features and is supplemented by color feature and original pixels, which can compensate the inadequacies of single-gradient feature and provide the utilization of the color features of color, thereby making features complementary to each other. In detail, this study constructs a three-variable linear equation on weights based on the calculation method of each feature. The proportional relationships in this equation are solved rather than their specific values. The gradient feature can transform the solutions of weight vector to determine the proportional coefficients of each feature by using it as a criterion. Therefore, the equation is a system of linear equations of two unknowns. In addition, the equation has a limited integer solution set, and the final proportion coefficient is determined by experimental verification on test sequence in terms of the dimension information of feature calculation. This method normalizes the proportion coefficient as weight vector and builds a new weighted feature-mixing model of target appearance to model. The WMFF tracker adopts a detection-based tracking framework, which includes feature extraction, model construction, filter training, target center detection, and model update. Result A total of 100 video sequences from the object tracking benchmark datasets (herein, OTB-100 datasets) are adopted in the experiments to compare the performance with seven other state-of-the-art trackers, which include five CFTs. A total of 11 different attributes, such as illumination, occlusion, and scale variation, are annotated on video sequences. Comparisons and analyses are performed for these trackers by using precision, average center error, average Pascal VOC overlap ratio, and median frame per second as evaluation standards. Precision and success plots of different datasets are also presented, and the performance of different attributes are discussed. Experimental results on benchmark OTB-100 datasets demonstrate that our tracker can achieve real-time and better performance compared with other methods, especially on Basketball, DragonBaby, Panda, and Lemming sequences. The edge contours, especially the gradient information of the target, are unremarkable when the scene is subjected to motion blur due to occlusion or deformation, which causes the appearance model constructed by the gradient feature not being able to distinguish the target accurately and thus tracking failure easily occurs. Meanwhile, the WMFF tracker can utilize the color feature as a supplement to construct the appearance model in time to obtain a robust tracking effect when the gradient feature is invalid. The color feature has the same level of importance as the gradient feature and achieves an ideal feature combination effect. The performance of the proposed method outperforms other algorithms on multiple datasets, and the average results on OTB-100 datasets show that the precision is improved by 1.2% compared with a scale-adaptive kernel CFT with feature integration tracker. Conclusion In this study, a weight vector is introduced to combine features in describing the appearance of the target, and a WMFF tracker is proposed based on a CFT framework. A new hybrid feature HCG is dominated by gradient feature and is supplemented by color and gray feature, which can be used to model the appearance of the target. This model can compensate the deficiency of single feature and enables the function of each feature. This model not only can make the features complement one another but also make the appearance model adapt to multiple complex scenes. The WMFF tracker makes the tracking length longer than other trackers in complex dynamic scenes and improves the robustness of the algorithm.
Keywords
|