融合深度特征和多核增强学习的显著目标检测
摘 要
目的 针对现有基于手工特征的显著目标检测算法对于显著性物体尺寸较大、背景杂乱以及多显著目标的复杂图像尚不能有效抑制无关背景区域且完整均匀高亮显著目标的问题,提出了一种利用深度语义信息和多核增强学习的显著目标检测算法。方法 首先对输入图像进行多尺度超像素分割计算,利用基于流形排序的算法构建弱显著性图。其次,利用已训练的经典卷积神经网络对多尺度序列图像提取蕴含语义信息的深度特征,结合弱显著性图从多尺度序列图像内获得可靠的训练样本集合,采用多核增强学习方法得到强显著性检测模型。然后,将该强显著性检测模型应用于多尺度序列图像的所有测试样本中,线性加权融合多尺度的检测结果得到区域级的强显著性图。最后,根据像素间的位置和颜色信息对强显著性图进行像素级的更新,以进一步提高显著图的准确性。结果 在常用的MSRA5K、ECSSD和SOD数据集上与9种主流且相关的算法就准确率、查全率、F-measure值、准确率—召回率(PR)曲线、加权F-measure值和覆盖率(OR)值等指标和直观的视觉检测效果进行了比较。相较于性能第2的非端到端深度神经网络模型,本文算法在3个数据集上的平均F-measure值、加权F-measure值、OR值和平均误差(MAE)值,分别提高了1.6%,22.1%,5.6%和22.9%。结论 相较于基于手工特征的显著性检测算法,本文算法利用图像蕴含的语义信息并结合多个单核支持向量机(SVM)分类器组成强分类器,在复杂图像上取得了较好的检测效果。
关键词
Salient object detection via deep features and multiple kernel boosting learning
Zhang Qing1, Li Yun1, Li Wenju1, Lin Jiajun2, Xiao Mang1, Chen Feiyun1(1.School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China;2.School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China) Abstract
Objective Salient object detection identifies the most conspicuous and eye-attracting objects or regions in images. Results are often expressed by saliency maps, in which the intensity of each pixel presents the strength of the probability that the pixel belongs to a salient region. Visual saliency detection has been used as a pre-processing step for facilitating a wide range of vision applications, including image and video compression, image retargeting, visual tracking, and robot navigation. Although the performance of salient object detection approaches has dramatically improved in the last few years, it remains challenging in computer vision tasks. Most existing methods focus on handcrafted features and use distinct prior knowledge, such as contrast, center, background, and objectness priors, to enhance performance. Recently, convolutional neural network (CNN)-based approaches have shown to be remarkably effective and successfully broken the limits of traditional handcrafted feature-based methods. The recent CNN-based salient object detection approaches have been successful in overcoming the disadvantages of handcrafted feature-based approaches and have greatly enhanced the performance of saliency detection. These CNN-based models, especially the end-to-end ones, have shown their superiority on feature extraction and efficiently captured high-level information about the objects and their cluttered surroundings. The existing handcrafted feature-based salient object detection algorithms are insufficient in effectively suppressing irrelevant backgrounds and uniformly highlighting the entire salient object and on complicated images with large salient object, cluttered backgrounds, and multiple salient objects. We propose a salient object detection scheme based on multiple kernel boosting learning and deep semantic information to overcome this drawback. Method First, we segment the input image into multiscale superpixels and obtain weak saliency maps through graph-based manifold ranking. Second, we extract the deep features involving semantic information by using classic CNN. We obtain reliable training sets through the multiscale weak saliency maps to develop a strong salient object detection model by using multiple kernel boosting learning. Then, saliency maps are directly produced by samples from the multiscale superpixel images, which are infused to generate a strong saliency map. Finally, a pixel-level saliency map is refined in accordance with the color and position to improve the detection performance. Result The proposed moodel is compared with 11 state-of-the-art methods to evaluate its performance in terms of precision, recall, F-measure, PR (precision-recall) curve, weighted F-measure, OR (overlapping ratio) and MAE (mean absolute error) scores, and visual effect on three popular and public datasets, namely, MSRA5K, ECSSD, and SOD. Experimental results show the improvements over the state-of-the-art methods. The F-measure score of our algorithm increased by 0.7%, 2.0%, and 2.1%; the weighted F-measure increased by 18.9%, 27.6%, and 19.8%; the OR scores increased by 2.9%, 6.8%, and 7.2%; and the MAE scores increased by 34.5%, 26.9%, and 7.5% compared with the saliency results produced by the non-end-to-end deep learning model whose performance ranks second on MSRA5K, ECSSD, and SOD, respectively. The experiments on visual effect show that our method performs well in various complex images, such as saliency objects and backgrounds that share similar appearance, multiple salient objects, salient objects with complex texture and structure, and clutter backgrounds. The proposed approach not only uniformly highlights the entire salient objects but also efficiently preserves the contour of salient objects under various scenarios. Moreover, we conduct experiments on three datasets in terms of PR curves to evaluate the performance of each component of the proposed algorithm. Moreover, the average running time of our algorithm and the methods based on non-end-to-end CNNs is presented. The implementation is performed on ECSSD dataset by using MATLAB or C, and most of the test images have a resolution of 300×400 pixels. An efficient C/C++ implementation based on parallelized components would decrease our model’s computation time and render it feasible for real-world application. Conclusion The proposed salient object detection model demonstrates good performance on complicated images compared with the salient object detection method based on handcrafted features, which learns a strong classifier with four single kernel SVM(support vector machine) and uses classic CNN. Further improvements of salient object detection algorithm on dataset with complex and confusing background images are worth expecting. In further research, we plan to utilize additional features from a CNN and construct an end-to-end model, which would improve performance and save computation cost. Moreover, our further work will pay attention to small and salient object detections in video.
Keywords
salient object detection saliency detection deep feature multiple kernel boosting learning multiscale detection
|