面向视觉搜索的空间局部敏感哈希方法
摘 要
目的 视觉检索需要准确、高效地从大型图像或者视频数据集中检索出最相关的视觉内容,但是由于数据集中图像数据量大、特征维度高的特点,现有方法很难同时保证快速的检索速度和较好的检索效果。方法 对于面向图像视频数据的高维数据视觉检索任务,提出加权语义局部敏感哈希算法(weighted semantic locality-sensitive hashing, WSLSH)。该算法利用两层视觉词典对参考特征空间进行二次空间划分,在每个子空间里使用加权语义局部敏感哈希对特征进行精确索引。其次,设计动态变长哈希码,在保证检索性能的基础上减少哈希表数量。此外,针对局部敏感哈希(locality sensitive hashing, LSH)的随机不稳定性,在LSH函数中加入反映参考特征空间语义的统计性数据,设计了一个简单投影语义哈希函数以确保算法检索性能的稳定性。结果 在Holidays、Oxford5k和DataSetB数据集上的实验表明,WSLSH在DataSetB上取得最短平均检索时间0.034 25 s;在编码长度为64位的情况下,WSLSH算法在3个数据集上的平均精确度均值(mean average precision,mAP)分别提高了1.2%32.6%、1.7%19.1%和2.6%28.6%,与几种较新的无监督哈希方法相比有一定的优势。结论 通过进行二次空间划分、对参考特征的哈希索引次数进行加权、动态使用变长哈希码以及提出简单投影语义哈希函数来对LSH算法进行改进。由此提出的加权语义局部敏感哈希(WSLSH)算法相比现有工作有更快的检索速度,同时,在长编码的情况下,取得了更为优异的性能。
关键词
Locality-sensitive hashing approach based on semantic space for visual retrieval
Huang Xiaoyan1, Sun Bin1, Yang Zhanyuan1, Zhu Yingying1, Tian Qi2(1.College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518000, China;2.Huawei Technologies Co., Ltd., Shenzhen 518000, China) Abstract
Objective Visual retrieval methods need to accurately and efficiently retrieve the most relevant visual content from large-scale images or video datasets. However,due to a large amount of image data and high feature dimensionality in the dataset, existing methods face difficulty in ensuring fast retrieval speed and good retrieval results. Hashing is a widely studied solution for approximate nearest neighbor search, which aims to convert high-dimensional data items into a low-dimensional representation or a hash code consisting of a set of bit sequences. Locality-sensitive hashing (LSH) is a data-independent, unsupervised hashing algorithm that provides asymptotic theoretical properties, thereby ensuring performance. LSH is considered as one of the most common methods for fast nearest-neighbor search in high-dimensional space. Nevertheless, if the number of hash functions k is set too small, it leads to too many data items falling into each hash bucket, thus increasing the query response time. By contrast, if k is set too large, the number of data items in each hash bucket is reduced. Moreover, to achieve the desired search accuracy, LSH usually needs to use long hash codes, thereby reducing the recall rate. Although the use of multiple hash tables can alleviate this problem, it significantly increases memory cost and query time. Besides, due to the semantic gap between the visual semantic space and metric space, LSH may not obtain good search performance. Method For visual retrieval of high-dimensional data,we first propose a hash algorithm called weighted semantic locality-sensitive hashing (WSLSH), which is based on feature space partitioning, to address the aforementioned drawbacks of LSH. While building the indices, WSLSH considers the distance relationship between reference and query features, divides the reference feature space into two subspaces by a two-layer visual dictionary, and employs weighted-semantic locality sensitive hashing in each subspace to index, thereby forming a hierarchical index structure. The proposed algorithm can rapidly converge the target to a small range in the process of large-scale retrieval and make accurate queries, which greatly improves the retrieval speed. Then, dynamic variable-length hashing codes are applied in a hashing table to retrieve multiple hashing buckets, which can reduce the number of hashing tables and improve the retrieval speed based on guaranteeing the retrieval performance. Through these two improvements, the retrieval speed can be greatly improved. In addition, to solve the random instability of LSH, statistical information reflecting the semantics of reference feature space is introduced into the LSH function, and a simple projection semantic-hashing function is designed to ensure the stability of the retrieval performance. Result Experiment results on Holidays, Oxford5k, and DataSetB datasets show that the retrieval accuracy and retrieval speed are effectively improved in comparison with the representative unsupervised hash methods. WSLSH achieves the shortest average retrieval time (0.034 25 s) on DataSetB. When the encoding length is 64 bit, the mean average precision (mAP) of the WSLSH algorithm is improved by 1.2%32.6%,1.7%19.1%, and 2.6%28.6%. WSLSH is not highly sensitive to the size change of the reference feature subset involved, so the retrieval time does not change significantly, which reflects the retrieval advantage of WSLSH for large-scale datasets. With the increase of encoding length, the performance of the WSLSH algorithm is improved gradually. When the encoding length is 64 bit, the WSLSH algorithm obtains the highest precision and recall on the three datasets, which is superior to other compared methods. Conclusion The LSH algorithm is improved by performing feature space division twice, weighting the number of hash indexes of reference features, dynamically using variable-length hash codes, and introducing a simple-projection semantic-hash function. Thus, the proposed WSLSH algorithm has faster retrieval speed. In the case of long encoding length, WSLSH achieves better performance than the compared works and shows high application value for large-scale image datasets.
Keywords
feature space partitioning locality-sensitive hashing(LSH) dynamic variable-length hashing code visual retrieval nearest neighbor search
|