Current Issue Cover
融合视觉感知特性的场景分类算法

史静, 朱虹, 王栋, 杜森(西安理工大学自动化与信息工程学院, 西安 710048)

摘 要
目的 目前对于场景分类问题,由于其内部结构的多样性和复杂性,以及光照和拍摄角度的影响,现有算法大多通过单纯提取特征进行建模,并没有考虑场景图像中事物之间的相互关联,因此,仍然不能达到一个理想的分类效果。本文针对场景分类中存在的重点和难点问题,充分考虑人眼的视觉感知特性,利用显著性检测,并结合传统的视觉词袋模型,提出了一种融合视觉感知特性的场景分类算法。方法 首先,对图像进行多尺度分解,并提取各尺度下的图像特征,接着,检测各尺度下图像的视觉显著区域,最后,将显著区域信息与多尺度特征进行有机融合,构成多尺度融合窗选加权SIFT特征(WSSIFT),对场景进行分类。结果 为了验证本文算法的有效性,该算法在3个标准数据集SE、LS以及IS上进行测试,并与不同方法进行比较,分类准确率提高了约3%~17%。结论 本文提出的融合视觉感知特性的场景分类算法,有效地改善了单纯特征描述的局限性,并提高了图像的整体表达。实验结果表明,该算法对于多个数据集都具有较好的分类效果,适用于场景分析、理解、分类等机器视觉领域。
关键词
Scene classification algorithm of fusing visual perception

Shi Jing, Zhu Hong, Wang Dong, Du Sen(School of Automation and Information Engineering, Xi'an University of Technology, Xi'an 710048, China)

Abstract
Objective Scene classification is an important part of machine vision. The content of scene is identified by analyzing the objects in the scene and its relative position. In recent years, the amount of image surged has introduced great challenges in image recognition, retrieval, and classification. Accurately obtaining the information needed by users for processing vast data is becoming increasingly urgent in this field. Early image recognition technology has focused mainly on describing the low-level information of images. The bag-of-words model is applied in document processing. This model transforms the document to a combination of keywords first and then conducts matching on the basis of the frequency of keywords. In recent years, this method has been applied to image processing successfully by researchers in computer vision. The image is represented to the document in the bag-of-words model. The visual words of image can be generated by image feature extraction, and the bag-of-words of image can be completed on the basis of the frequency of visual words. At present, an ideal classification effect cannot be achieved easily because of the diversity and complexity of the internal structure of scene classification. Physiological and psychological research has shown that the human visual system pays more attention to significant regions than significant points, and these regions are referred to as saliency regions. Visual attention model is a new major topic in research. Saliency analysis finds the region with most interests and most content of the image by use of a certain calculation method and represents with a saliency figure. In this study, a scene classification algorithm based on visual perception is proposed to address the key and difficult problems in scene classification. Specifically, the visual perception characteristics of the human eyes are considered and significance detection combined with traditional bag-of-visual-words model is employed. Method On the basis of visual significance and phonetic model, this study fully considers the visual attention area of the human eye and avoids the shortcoming of simple low-level features of failing to capture the interrelationships among targets. On this basis, a multi-scale fusion WSSIFT feature is established using the prominence of the region of interest and the underlying characteristics of screening and weighting to avoid the neglect of important details and remove some of the redundant features. First, the image is decomposed in multi-scale and the image features at each scale are extracted. Second, the visual area of image is detected at each scale. Finally, significant region information and multi-scale feature are integrated to constitute the multi-scale fusion WSSIFT feature and classify scenes. Result The proposed algorithm is tested on three standard datasets, namely, SE, LS and IS, to verify its effectiveness, and the results are compared with those of different methods. The classification accuracy of the proposed method is improved by approximately 3%~17%. Conclusion The proposed scene classification algorithm can effectively improve the limitation of simple feature description and the overall expression of image. This method is based on the simple use of image features in scene classification with insufficient feature extraction and neglected interrelation of each object in the scene. This method fully considers human visual perception. On the basis of preserving the advantages of the local subordinate characteristic model, the fusion detection algorithm is used to study the overall sensitivity of image in consideration of the interrelationship between the entire scene and the enhancement in the local information. Accordingly, the multi-scale fusion WSSIFT feature is constructed. Experimental results show that the proposed algorithm exerts good classification effect on multiple datasets. The results of the proposed method on three standard datasets are superior to those of other algorithms. The novel algorithm can be applied to other machine vision fields, such as analysis, understanding, and classification of scenes.
Keywords

订阅号|日报