基于中层时空特征的人体行为识别
摘 要
目的 人体行为识别是计算机视觉领域的一个重要研究课题,具有广泛的应用前景.针对局部时空特征和全局时空特征在行为识别问题中的局限性,提出一种新颖、有效的人体行为中层时空特征.方法 该特征通过描述视频中时空兴趣点邻域内局部特征的结构化分布,增强时空兴趣点的行为鉴别能力,同时,避免对人体行为的全局描述,能够灵活地适应行为的类内变化.使用互信息度量中层时空特征与行为类别的相关性,将视频识别为与之具有最大互信息的行为类别.结果 实验结果表明,本文的中层时空特征在行为识别准确率上优于基于局部时空特征的方法和其他方法,在KTH数据集和日常生活行为(ADL)数据集上分别达到了96.3%和98.0%的识别准确率.结论 本文的中层时空特征通过利用局部特征的时空分布信息,显著增强了行为鉴别能力,能够有效地识别多种复杂人体行为.
关键词
Human action recognition using mid-level spatial-temporal features
Wang Taiqing, Wang Shengjin(School of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Beijing 100084, China) Abstract
Objective Human action recognition is an important research topic in the field of computer vision;this method has promising potential applications. On the basis of the limitations of local and global spatial-temporal features, a novel and effective middle-level spatial-temporal feature is proposed for action recognition. Method The proposed feature encodes the structural distribution of local features in the neighborhood of the spatial-temporal interest point (STIP), there by improving the discriminative power of STIP.This feature can model the flexible intra-action variations. Pointwise mutual information is introduced to measure the correlation between the mid-level feature and the action.The video clip is finally classified as the action category that has the greatest mutual information with the mid-level features. Result Experimental results validated the advantage of the proposed mid-level feature over the local-feature-based baseline methods and other published results. The mid-level feature achieved 96.3% and 98.0% recognition accuracies on the KTH and ADL(Activities of daily living) action datasets, respectively. Conclusion The proposed mid-level spatial-temporal feature enhances the discriminative power of actions by harnessing the spatial-temporal distribution of local spatial-temporal features.Consequently, this feature is capable of recognizing realistic human actions.
Keywords
action recognition spatial-temporal interest point mid-level spatial-temporal feature pointwise mutual information
|