Current Issue Cover
结合深度多标签解析的哈希服装检索

原尉峰1,2, 郭佳明1,2, 苏卓1,2, 罗笑南3, 周凡1,2(1.中山大学数据科学与计算机学院, 广州 510006;2.中山大学国家数字家庭工程技术研究中心, 广州 510006;3.桂林电子科技大学计算机与信息安全学院, 桂林 541004)

摘 要
目的 服装检索对于在线服装的推广和销售有着重要的作用。而目前的服装检索算法无法准确地检索出非文本描述的服装。特别是对于跨场景的多标签服装图片,服装检索算法的准确率还有待提升。本文针对跨场景多标签服装图片的差异性较大以及卷积神经网络输出特征维度过高的问题,提出了深度多标签解析和哈希的服装检索算法。方法 该方法首先在FCN(fully convolutional network)的基础上加入条件随机场,对FCN的结果进行后处理,搭建了FCN粗分割加CRFs(conditional random fields)精分割的端到端的网络结构,实现了像素级别的语义识别。其次,针对跨场景服装检索的特点,我们调整了CCP(Clothing Co-Parsing)数据集,并构建了Consumer-to-Shop数据集。针对检索过程中容易出现的语义漂移现象,使用多任务学习网络分别训练了衣物分类模型和衣物相似度模型。结果 我们首先在Consumer-to-Shop数据集上进行了服装解析的对比实验,实验结果表明在添加了CRFs作为后处理之后,服装解析的效果有了明显提升。然后与3种主流检索算法进行了对比,结果显示,本文方法在使用哈希特征的条件下,也可以取得较好的检索效果。在top-5正确率上比WTBI(where to buy it)高出1.31%,比DARN(dual attribute-aware ranking network)高出0.21%。结论 针对服装检索的跨场景效果差、检索效率低的问题,本文提出了一种基于像素级别语义分割和哈希编码的快速多目标服装检索方法。与其他检索方法相比,本文在多目标、多标签服装检索场景有一定的优势,并且在保持了一定检索效果的前提下,有效地降低了存储空间,提高了检索效率。
关键词
Clothing retrieval by deep multi-label parsing and Hashing

Yuan Weifeng1,2, Guo Jiaming1,2, Su Zhuo1,2, Luo Xiaonan3, Zhou Fan1,2(1.School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;2.National Engineering Research Center of Digital Life, Sun Yat-sen University, Guangzhou 510006, China;3.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China)

Abstract
Objective Clothing retrieval is a technology that combines clothing inspection, clothing classification, and feature learning, which plays an important role in clothing promotion and sales. Current clothing retrieval algorithms are mainly based on deep neural network. These algorithms initially learn the high-dimensional features of a clothing image through the network and compare the high-dimensional features between different images to determine clothing similarity. These clothing retrieval algorithms usually possess a semantic gap problem. They cannot connect the clothing features with semantic information, such as color, texture, and style, which results in their insufficient interpretability. Therefore, these algorithms cannot adapt another domain and usually fail in retrieving several clothing with new styles. The accuracy of clothing retrieval algorithm should be improved, especially for cross-domain multilabel clothing image. This study proposes a new cloth retrieval pipeline with deep multilabel parsing and hashing to increase the cross-domain clothing retrieval accuracy and reduce the high-dimensional output features of the deep neural network. Method On the basis of the semantic expression of street shot photos, we introduce and improve a fully convolutional network (FCN) structure to parse clothing in pixel level. To overcome the fragment label and noise problem, we employ conditional random fields (CRFs) to the FCN as a post process. In addition, a new image retrieval algorithm based on multi-task learning and Hashing is proposed to solve the semantic gap problem and dimension disaster in clothing retrieval. On the basis of extracted image features, a Hashing algorithm is used to map the high-dimensional feature vectors to low-dimensional Hamming space while maintaining their similarities. Hence, the dimension disaster problem in the clothing retrieval algorithm can be solved, and a real-time performance can be achieved. Moreover, we reorganize the Consumer-to-Shop database based on cross-scene clothing retrieval. The database is organized in accordance with shops and consumers' photos to ensure that the clothes under the same ID are similar. We also propose a clothing classification model and integrate this model on a traditional clothing similarity model to overcome the semantic drift problem. In summary, the proposed clothing retrieval model can be divided into two parts. The first part is a semantic segmentation network for street shot photos, which is used to identify the specific clothing target in the image. The second part is a Hashing model based on the multi-task network, which can map the high-dimensional network features to the low-latitude hash space. Result We modify the Clothing Co-Parsing dataset and establish the Consumer-to-Shop dataset. We conduct a clothing parsing experiment for the modified dataset. We find that the FCN might drop the detailed features of an image. The segmentation results show blurred edges and color blocking effect after several up-sampling operations. To overcome these limitations, CRFs are used in the method for subsequent correction. The experimental results show that many areas are recognized as correct labels, and fine color blocks are replaced by smooth segmentation results after the addition of CRFs as post-processing, which are easily recognized by human intuition. Then, we compare our method with three mainstream retrieval algorithms, and the results show that our method can achieve top-level accuracy with the usage of hash features. The top-5 accuracy is 1.31% higher than that of WTBI and 0.21% higher than that of DARN. Conclusion We propose a deep multilabel parsing and hashing retrieval network to increase the efficiency and accuracy of clothing retrieval algorithm. For the clothing parsing task, the modified FCN-CRFs model shows the best subjective visual effects among other methods and achieves a superior time performance. For the clothing retrieval task, an approximate nearest neighbor search technique is employed and a hashing algorithm is used to simplify high-dimensional features. At the same time, the clothing classification and clothing similarity models are trained by using a multi-task learning network to solve the semantic drift phenomena during retrieval. In comparison with other clothing retrieval methods, our method shows several advantages in multi-label clothing retrieval scenarios. Our method achieves the highest score in top-10 accuracy, effectively reduces storage space, and improves retrieval efficiency.
Keywords

订阅号|日报