Current Issue Cover
视觉地形分类的词袋框架综述

吴航1, 刘保真1, 苏卫华1, 张文昌2, 孙景工1(1.军事医学科学院卫生装备研究所, 天津 300161;2.清华大学国家智能技术与系统重点实验室, 北京 100084)

摘 要
目的 视觉地形分类是室外移动机器人领域的一个研究热点。基于词袋框架的视觉地形分类方法,聚集和整合地形图像的视觉底层特征,建立底层特征统计分布与高层语义之间的联系,已成为目前视觉地形分类的常用方法和标准范式。本文全面综述视觉地形分类中的词袋框架,系统性总结现有研究工作,同时指出未来的研究方向。方法 词袋框架主要包括4个步骤:特征提取、码本聚类、特征编码、池化与正则化。对各步骤中的不同方法加以总结和比较,建立地形分类数据集,评估不同方法对地形识别效果的影响。结果 对词袋框架各步骤的多种方法进行系统性的分类和总结,利用地形数据集进行评估,发现每个步骤对最后生成的中层特征性能都至关重要。特异性特征设计、词袋框架改进和特征融合研究是未来重要的研究方向。结论 词袋框架缩小低层视觉特征与高层语义之间的语义鸿沟,生成中层语义表达,提高视觉地形分类效果。视觉地形分类的词袋框架方法研究具有重要意义。
关键词
Bag of words for visual terrain classification:a comprehensive study

Wu Hang1, Liu Baozhen1, Su Weihua1, Zhang Wenchang2, Sun Jinggong1(1.Institute of Medical Equipment, Academy of Military Medical Science, Tianjin 300161, China;2.The State Key Laboratory of Intelligent Technology and System, Tsinghua University, Beijing 100084, China)

Abstract
Objective Unlike a mobile robot in an indoor-structured environment, an outdoor robot should recognize non-geometric terrain characteristics within a reasonable time and adjust the appropriate path, gait, and motion planning strategies to cope with the terrain. Visual terrain classification has become a hot topic in outdoor mobile robot research. The bag-of-visual-words (BOVW) framework, which can aggregate low-level visual descriptors and establish contact with semantic features, has become the most common approach and an effective paradigm for visual terrain classification. In this paper, we provide a comprehensive study of each step in the BOVW framework for visual terrain classification. Diverse methods in each step are introduced and summarized, and their characteristics and relations are explored. Method The BOVW framework includes four main steps: 1) feature extraction, 2) codebook generation, 3) feature coding, and 4) pooling and normalization. Feature extraction acquires low-level feature information from the terrain images to develop local descriptors. In the codebook generation step, a codebook is formed through clustering. The coding step uses the codebook to map the descriptors in the terrain image to the coding space. Then, coding results are aggregated into a single vector, that is, the mid-level feature, of the fixed length by pooling and normalization. Finally, the mid-level feature is fed into a linear or nonlinear classifier, such as SVM, for terrain classification. The diverse methods in each step are summarized and compared systematically. The performances of the method are preliminarily tested on a terrain dataset. Result The BOVW framework for visual terrain classification is reviewed in the paper. We also present a preliminary comparison of different BOVW frameworks for visual terrain classification on the terrain dataset. On the basis of the result, we find that every step is crucial in contributing to the final classification performance, and an improper choice in one step will markedly weaken the effectiveness and efficiency of the visual classification system as a whole. New handcrafted descriptors that are specific to the visual terrain, modified BOVW framework, and feature fusion are three potential research directions. Conclusion Visual terrain classification is an important technology for recognizing non-geometric terrain characteristics for outdoor mobile robots. Compared with other sensors, visual information most closely resembles the manner by which humans perceive the environment and provides richer terrain information, and visual terrain classification has become a hotspot issue in outdoor mobile robot technology. However, visual appearances of the same terrain type may exhibit vast differences, and various types of terrain may appear highly similar. Therefore, these issues engender numerous challenges to visual terrain classification. Both effectiveness and efficiency are necessary factors that should be taken into account in the design of the visual terrain classification system. Therefore, studies on the BOVW for visual terrain classification are of considerable significance.
Keywords

订阅号|日报