正例投票下的L1目标跟踪算法
摘 要
目的 传统的L1稀疏表示目标跟踪,是将所有候选目标表示为字典模板的线性组合,只考虑了字典模板的整体信息,没有分析目标的局部结构。针对该方法在背景杂乱时容易出现跟踪漂移的问题,提出一种基于正例投票的目标跟踪算法。方法 本文将目标表示成图像块粒子的组合,考虑目标的局部结构。在粒子滤波框架内,构建图像块粒子置信函数和相似性函数,提取正例图像块。最终通过正例权重投票估计跟踪目标的最佳位置。结果 在14组公测视频序列上进行跟踪实验,与多种优秀的目标跟踪算法相比,本文跟踪算法在目标受到背景杂乱、遮挡、光照变化等复杂环境干扰下最为稳定,重叠率达到了0.7,且取得了最低的平均跟踪误差5.90,反映了本文算法的可靠性和有效性。结论 本文正例投票下的L1目标跟踪算法,与经典方法相比,能够解决遮挡、光照变化和快速运动等问题的同时,稳定可靠地实现背景杂乱序列的鲁棒跟踪。
关键词
L1 object-tracking algorithm with positive example voting
Hu Liangmei, Wang Jian, Zhang Jun, Zhang Xudong(School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China) Abstract
Objective Visual tracking estimates the states of a moving target in a video. This technology is the most important and fundamental topic in computer vision and has several applications, such as surveillance, vehicle tracking, robotics, and human-computer interaction. L1 object-tracking based on sparse representation expresses each target candidate as a linear combination of dictionary templates. In such tracking, the global information is considered without analyzing the local information. To overcome drifting problems in background clutter, this paper proposes a tracking method based on positive patch voting. Method Given the over completeness of sparse representation dictionary and sensitivity to changing local features, we present the target by a set of image patch particles to consider the local structure of target templates. Extracting image patches is the core of our algorithm and directly affects the result of tracking. Specifically, we present a tracking reliability metric to measure how positively a patch can be tracked. Accordingly, a probability model is proposed to estimate the distribution of positive patches under a sequential Monte Carlo framework. To estimate how likely a patch can be preliminarily obtained, we adopt the peak-to-sidelobe ratio as a confidence metric. This ratio is widely used in signal processing to measure the signal peak strength in response map. The confidence function is proportional to the response map of image patches, and is a distance function between the templates and patches. Instead of using computationally intensive unsupervised clustering methods to group the image patches, we simply divide the image into two regions by a rectangle box that is obtained by the L1 method centering at the target. We then formulate a similarity function to measure the patches that are inside the bounding box and a confidence score that is higher than zero. We label the patches that obtained the high score. Finally, we calculate the weight of all positive patches and vote the optimal location of the tracked target. Result The traditional target tracking based on sparse representation simply considers global information without analyzing the local information, and L1 object tracking easily produces drifting problems under complex situations. Thus, we present the target by a set of image patches and formulate a new patch reliability metric to extract the positive patches. Qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed algorithm can handle highly diverse and challenging situations of visual tracking. Conclusion Sparse representation is applied to visual tracker by modeling the target appearance using a sparse approximation over a template set. However, the proposed method cannot adapt to complex and dynamic scenes because of various factors, such as background clutter and illumination change. By formulating the confidence and similarity function, we extract the positive patches. We finally find the target location according to voting of positive patch weight. Unlike other classical methods, our method can deal with occlusion, illumination change, and fast movement to accomplish robust tracking in sequence with background clutters.
Keywords
|