Current Issue Cover
变分调整约束下的反向低秩稀疏学习目标跟踪

田丹1,2, 张国山1, 张娟瑾3(1.天津大学电气自动化与信息工程学院, 天津 300072;2.沈阳大学信息工程学院, 沈阳 110044;3.辽宁省科学技术馆, 沈阳 110016)

摘 要
目的 低秩稀疏学习目标跟踪算法在目标快速运动和严重遮挡等情况下容易出现跟踪漂移现象,为此提出一种变分调整约束下的反向低秩稀疏学习目标跟踪算法。方法 采用核范数凸近似低秩约束描述候选粒子间的时域相关性,去除不相关粒子,适应目标外观变化。通过反向稀疏表示描述目标表观,用候选粒子稀疏表示目标模板,减少在线跟踪中L1优化问题的数目,提高跟踪效率。在有界变差空间利用变分调整对稀疏系数差分建模,约束目标表观在相邻帧间具有较小变化,但允许连续帧间差异存在跳跃不连续性,以适应目标快速运动。结果 实验利用OTB(object tracking benchmark)数据集中的4组涵盖了严重遮挡、快速运动、光照和尺度变化等挑战因素的标准视频序列进行测试,定性和定量对比了本文算法与5种热点算法的跟踪效果。定性分析基于视频序列的主要挑战因素进行比较,定量分析通过中心点位置误差(central pixel error,CPE)比较跟踪算法的精度。与CNT(convolutional networks training)、SCM(sparse collaborative model)、IST(inverse sparse tracker)、DDL(discriminative dictionary learning)和LLR(locally low-rank representation)算法相比,平均CPE值分别提高了2.80、4.16、13.37、35.94和41.59。实验结果表明,本文算法达到了较高的跟踪精度,对上述挑战因素更具鲁棒性。结论 本文提出的跟踪算法,综合了低秩稀疏学习和变分优化调整的优势,在复杂场景下具有较高的跟踪精度,特别是对严重遮挡和快速运动情况的有效跟踪更具鲁棒性。
关键词
Object tracking via reverse low-rank sparse learning with variation regularization

Tian Dan1,2, Zhang Guoshan1, Zhang Juanjin3(1.School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China;2.School of Information Engineering, Shenyang University, Shenyang 110044, China;3.Liaoning Science and Technology Museum, Shenyang 110016, China)

Abstract
Objective Visual object tracking is an important research subject in computer vision. It has extensive applications that include surveillance, human-computer interaction, and medical imaging. The goal of tracking is to estimate the states of a moving target in a video sequence. Considerable effort has been devoted to this field, but many challenges remain due to appearance variations caused by heavy occlusion, illumination variation, and fast motion. Low-rank approximation can acquire the underlying structure of a target because some candidate particles have extremely similar appearance. This approximation can prune irrelevant particles and is robust to global appearance changes, such as pose change and illumination variation. Sparse representation formulates candidate particles by using a linear combination of a few dictionary templates. This representation is robust against local appearance changes, e.g., partial occlusions. Therefore, combining low-rank approximation with sparsity representation can improve the efficiency and effectiveness of object tracking. However, object tracking via low-rank sparse learning easily results in tracking drift when facing objects with fast motion and severe occlusions. Therefore, reverse low-rank sparse learning with a variation regularization-based tracking algorithm is proposed. Method First, a low-rank constraint is used to restrain the temporal correlation of the objective appearance, and thus, remove the uncorrelated particles and adapt the object appearance change. The rank minimization problem is known to be computationally intractable. Hence, we resort to minimizing its convex envelope via a nuclear norm. Second, the traditional sparse representation method requires solving numerous L1 optimization problems. Computational cost increases linearly with the number of candidate particles. We build an inverse sparse representation formulation for object appearance using candidate particles to represent the target template inversely, reducing the number of L1 optimization problems for online tracking from candidate particles to one. Third, variation regularization is introduced to model the sparse coefficient difference. The variation method can model the variable selection problem in bounded variation space, which can restrict object appearance with only a slight difference between consecutive frames but allow the existing difference between individual frames to jump discontinuously, and thus, adapt to fast object motion. Lastly, an online updating scheme based on alternating iteration is proposed for tracking computation. Each iteration updates one variable at a time. Meanwhile, the other variables are fixed to their latest values. To accommodate target appearance change, we also use a local updating scheme to update the local parts individually. This scheme captures changes in target appearance even when heavy occlusion occurs. In such case, the unoccluded local parts are still updated in the target template and the occluded ones are discarded. Consequently, we can obtain a representation coefficient for the observation model and realize online tracking. Result To evaluate our proposed tracker, qualitative and quantitative analyses are performed using MATLAB on benchmark tracking sequences (occlusion 1, David, boy, deer) obtained from OTB (object tracking benchmark) datasets. The selected videos include many challenging factors in visual tracking, such as occlusion, fast motion, illumination, and scale variation. The experimental results show that when faced with these challenging situations in the benchmark tracking dataset, the proposed algorithm can perform tracking effectively in complicated scenes. Comparative studies with five state-of-the-art visual trackers, namely, DDL (discriminative dictionary learning), SCM (sparse collaborative model), LLR (locally low-rank representation), IST (inverse sparse tracker), and CNT (convolutional networks training) are conducted. To achieve fair comparison, we use publicly available source codes or results provided by the authors. Among these trackers, DDL, SCM, LLR, and IST are the most relevant. Compared with the CNT tracker, we mostly consider that deep networks have attracted considerable attention in complicated visual tracking tasks. For qualitative comparison, the representative tracking results are discussed on the basis of the major challenging factors in each video. For quantitative comparison, the central pixel error (CPE), which records the Euclidean distance between the central location of the tracked target and the manually labeled ground truth, is used. When the metric value is smaller, the tracking results are more accurate. From the evolution process of CPE versus frame number, our tracker achieves the best results in these challenging sequences. In particular, our tracker outperforms the SCM, IST, and CNT trackers in terms of occlusion, illumination, and scale variation sequences. It outperforms the LLR, IST, and DDL trackers in terms of fast motion sequences. These results demonstrate the effectiveness and robustness of our tracker to occlusion, illumination, scale variation, and fast motion. Conclusion Qualitative and quantitative evaluations demonstrate that the proposed algorithm achieves higher precision compared with many state-of-the-art algorithms. In particular, it exhibits better adaptability for objects with fast motion. In the future, we will extend our tracker by applying deep learning to enhance its discriminatory ability.
Keywords

订阅号|日报