Current Issue Cover
深度聚类注意力机制下的显著对象检测

陈庆文1, 谢宏文2, 查浩2, 奚瑜1, 张雪3,4(1.中国电建集团西北勘测设计研究院有限公司, 西安 710065;2.水电水利规划设计总院, 北京 100120;3.天津大学智能与计算学部, 天津 300350;4.天津市机器学习重点实验室, 天津 300350)

摘 要
目的 为了得到精确的显著对象分割结果,基于深度学习的方法大多引入注意力机制进行特征加权,以抑制噪声和冗余信息,但是对注意力机制的建模过程粗糙,并将所有特征均等处理,无法显式学习不同通道以及不同空间区域的全局重要性。为此,本文提出一种基于深度聚类注意力机制(deep cluster attention,DCA)的显著对象检测算法DCANet (DCA network),以更好地建模特征级别的像素上下文关联。方法 DCA显式地将特征图分别在通道和空间上进行区域划分,即将特征聚类分为前景敏感区和背景敏感区。然后在类内执行一般性的逐像素注意力加权,并在类间进一步执行语义级注意力加权。DCA的思想清晰易懂,参数量少,可以便捷地部署到任意显著性检测网络中。结果 在6个数据集上与19种方法的对比实验验证了DCA对得到精细显著对象分割掩码的有效性。在各项评价指标上,部署DCA之后的模型效果都得到了提升。在ECSSD (extended cornplex scene saliency dataset)数据集上,DCANet的性能比第2名在F值上提升了0.9%;在DUT-OMRON (Dalian University of Technology and OMRON Corporation)数据集中,DCANet的性能比第2名在F值上提升了0.5%,平均绝对误差(mean absolute error,MAE)降低了3.2%;在HKU-IS数据集上,DCANet的性能比第2名在F值上提升了0.3%, MAE降低了2.8%;在PASCAL (pattern analysis,statistical modeling and computational learning)-S (subset)数据集上,DCANet的性能则比第2名在F值上提升了0.8%,MAE降低了4.2%。结论 本文提出的深度聚类注意力机制通过细粒度的通道划分和空间区域划分,有效地增强了前景敏感类的全局显著得分。与现有的注意力机制相比,DCA思想清晰、效果明显、部署简单,同时也为一般性的注意力机制研究提供了新的可行的研究方向。
关键词
Salient object detection based on deep clustering attention mechanism

Chen Qingwen1, Xie Hongwen2, Zha Hao2, Xi Yu1, Zhang Xue3,4(1.China Power Construction Corporation Northwest Survey Design and Research Institute Co., Ltd., Xi'an 710065, China;2.China Renewable Energy Engineering Institute, Beijing 100120, China;3.College of Intelligence and Computing, Tianjin University, Tianjin 300350, China;4.Tianjin Key Laboratory of Machine Learning, Tianjin 300350, China)

Abstract
Objective Salient object detection is a basic task in the field of computer vision, which simulates the human visual attention mechanism and quickly detects attractive objects in the scene that are most likely to represent user query variables and contain the most information. As a preprocessing step of other vision tasks, such as image resizing, visual tracking, person re-identification, and image segmentation, salient object detection plays a very important role. The traditional salient object detection method mainly uses the method of manually extracting features of the image to detect. However, this process is time-consuming and labor-intensive, and the results cannot meet the requirements. With the rise of deep learning, a large number of feature extraction algorithms based on convolutional neural networks have emerged. Compared with traditional feature extraction methods, using deep neural networks to extract features has better quality and more accurate prediction. In order to obtain accurate salient object segmentation results, deep learning-based methods mostly introduce attention mechanisms for feature weighting to suppress noise and redundant information. However, the modeling process of the existing attention mechanism is quite rough, which treats each position in the feature tensor equally and directly solves the attention score. This strategy cannot explicitly learn the global importance of different channels and different spatial regions, which may lead to missed detection or misdetection. To this end, in this study, we propose a deep clustering attention (DCA) mechanism to better model the feature-level pixel-by-pixel relationship. Method In this study, the proposed DCA explicitly divides the feature tensors into several categories channel-wise and spatial-wise; that is, it clusters the features into foreground and background sensitive regions. Then, general per-pixel attention weighting is performed within each class, and semantical attention weighting is further performed inter-classes. The idea of DCA is easy to understand, whose parameter quantity is also small and can be deployed in any salient detection network. This method can efficiently separate the foreground and background regions. In addition, through supervised learning on the edges of salient objects, the prediction can get clearer edges, and the results are more accurate. Result Comparison of 19 state-of-the-art methods on six large public datasets demonstrates the effectiveness of DCA in modeling pixel-wise attention, which is very helpful for obtaining finely salient object segmentation mask. On various evaluation indicators, the effects of the model after the deployment of DCA have improved. On the extended cornplex scene saliency(ECSSD) dataset, the performance of DCANet increased by 0.9% over the second place (F-measure value). On the Dalian University of Technology and OMRON Corporation(DUT-OMRON) dataset, the performance of DCANet increased by 0.5% over the second place (F-measure value), and the MAE decreased by 3.2%. On the HKU-IS dataset, the performance of DCANet is 0.3% higher than the second place (F-measure value), and the MAE is reduced by 2.8%. On the pattern analysis, statistical modeling and computational learning(PASCAL)-subset(S) dataset, the performance of DCANet is 0.8% higher than the second place (F-measure value), and the MAE is reduced by 4.2%. Conclusion The DCA proposed in this study effectively enhances the globally salient scores of foreground sensitive classes through more fine-grained channel partitioning and spatial region partitioning. This paper analyzes the deficiencies of the existing salient object detection algorithm based on attention mechanism and proposes a method for explicitly dividing feature channels and spatial regions. The attention modeling mechanism helps the model training process perceive and adapt tasks quickly. Compared with the existing attention mechanism, the idea of DCA is clear, the effect is significant, and it is simple to deploy. Meanwhile, DCA provides a viable new research direction for the study of more general attention mechanisms.
Keywords

订阅号|日报