Results 1 - 10
of
29
Comparing apples to oranges: a scalable solution with heterogeneous hashing
- In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2013
"... Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment, i.e., the data entities to be indexed are of the same type. Realizing that heterogeneous entities ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
(Show Context)
Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment, i.e., the data entities to be indexed are of the same type. Realizing that heterogeneous entities and relationships are also ubiquitous in the real world ap-plications, there is an emerging need to retrieve and search similar or relevant data entities from multiple heterogeneous domains, e.g., recommending relevant posts and images to a certain Facebook user. In this paper, we address the problem of “comparing apples to oranges ” under the large s-cale setting. Specifically, we propose a novel Relation-aware Heterogeneous Hashing (RaHH), which provides a general framework for generating hash codes of data entities sit-ting in multiple heterogeneous domains. Unlike some ex-isting hashing methods that map heterogeneous data in a common Hamming space, the RaHH approach constructs a Hamming space for each type of data entities, and learns op-timal mappings between them simultaneously. This makes the learned hash codes flexibly cope with the characteristics of different data domains. Moreover, the RaHH framework encodes both homogeneous and heterogeneous relationships between the data entities to design hash functions with im-proved accuracy. To validate the proposed RaHH method, we conduct extensive evaluations on two large datasets; one is crawled from a popular social media sites, Tencent Wei-bo, and the other is an open dataset of Flickr(NUS-WIDE). The experimental results clearly demonstrate that the RaH-H outperforms several state-of-the-art hashing methods with significant performance gains.
Optimized product quantization
- TPAMI
"... Abstract-Product quantization (PQ) is an effective vector quantization method. A product quantizer can generate an exponentially large codebook at very low memory/time cost. The essence of PQ is to decompose the high-dimensional vector space into the Cartesian product of subspaces and then quantize ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract-Product quantization (PQ) is an effective vector quantization method. A product quantizer can generate an exponentially large codebook at very low memory/time cost. The essence of PQ is to decompose the high-dimensional vector space into the Cartesian product of subspaces and then quantize these subspaces separately. The optimal space decomposition is important for the PQ performance, but still remains an unaddressed issue. In this paper, we optimize PQ by minimizing quantization distortions w.r.t. the space decomposition and the quantization codebooks. We present two novel solutions to this challenging optimization problem. The first solution iteratively solves two simpler sub-problems. The second solution is based on a Gaussian assumption and provides theoretical analysis of the optimality. We evaluate our optimized product quantizers in three applications: (i) compact encoding for exhaustive ranking [1], (ii) building inverted multi-indexing for non-exhaustive search
Discrete Graph Hashing
"... Hashing has emerged as a popular technique for fast nearest neighbor search in gi-gantic databases. In particular, learning based hashing has received considerable attention due to its appealing storage and search efficiency. However, the perfor-mance of most unsupervised learning based hashing meth ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Hashing has emerged as a popular technique for fast nearest neighbor search in gi-gantic databases. In particular, learning based hashing has received considerable attention due to its appealing storage and search efficiency. However, the perfor-mance of most unsupervised learning based hashing methods deteriorates rapidly as the hash code length increases. We argue that the degraded performance is due to inferior optimization procedures used to achieve discrete binary codes. This paper presents a graph-based unsupervised hashing model to preserve the neigh-borhood structure of massive data in a discrete code space. We cast the graph hashing problem into a discrete optimization framework which directly learns the binary codes. A tractable alternating maximization algorithm is then proposed to explicitly deal with the discrete constraints, yielding high-quality codes to well capture the local neighborhoods. Extensive experiments performed on four large datasets with up to one million samples show that our discrete optimization based graph hashing method obtains superior search accuracy over state-of-the-art un-supervised hashing methods, especially for longer codes. 1
Composite Quantization for Approximate Nearest Neighbor Search
"... This paper presents a novel compact coding ap-proach, composite quantization, for approximate nearest neighbor search. The idea is to use the composition of several elements selected from the dictionaries to accurately approximate a vec-tor and to represent the vector by a short code composed of the ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
(Show Context)
This paper presents a novel compact coding ap-proach, composite quantization, for approximate nearest neighbor search. The idea is to use the composition of several elements selected from the dictionaries to accurately approximate a vec-tor and to represent the vector by a short code composed of the indices of the selected ele-ments. To efficiently compute the approximate distance of a query to a database vector using the short code, we introduce an extra constraint, con-stant inter-dictionary-element-product, resulting in that approximating the distance only using the distance of the query to each selected ele-ment is enough for nearest neighbor search. Ex-perimental comparisonwith state-of-the-art algo-rithms over several benchmark datasets demon-strates the efficacy of the proposed approach. 1.
Supervised discrete hashing
- In Proc. CVPR
, 2015
"... Recently, learning based hashing techniques have at-tracted broad research interests because they can support efficient storage and retrieval for high-dimensional data such as images, videos, documents, etc. However, a ma-jor difficulty of learning to hash lies in handling the dis-crete constraints ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Recently, learning based hashing techniques have at-tracted broad research interests because they can support efficient storage and retrieval for high-dimensional data such as images, videos, documents, etc. However, a ma-jor difficulty of learning to hash lies in handling the dis-crete constraints imposed on the pursued hash codes, which typically makes hash optimizations very challenging (NP-hard in general). In this work, we propose a new super-vised hashing framework, where the learning objective is to generate the optimal binary hash codes for linear clas-sification. By introducing an auxiliary variable, we refor-mulate the objective such that it can be solved substantially efficiently by employing a regularization algorithm. One of the key steps in this algorithm is to solve a regulariza-tion sub-problem associated with the NP-hard binary op-timization. We show that the sub-problem admits an ana-lytical solution via cyclic coordinate descent. As such, a high-quality discrete solution can eventually be obtained in an efficient computing manner, therefore enabling to tack-le massive datasets. We evaluate the proposed approach, dubbed Supervised Discrete Hashing (SDH), on four large image datasets and demonstrate its superiority to the state-of-the-art hashing methods in large-scale image retrieval. 1.
Harmonious Hashing
- PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
"... Hashing-based fast nearest neighbor search technique has attracted great attention in both research and industry areas recently. Many existing hashing approaches encode data with projection-based hash functions and represent each projected dimension by 1-bit. However, the dimensions with high varian ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Hashing-based fast nearest neighbor search technique has attracted great attention in both research and industry areas recently. Many existing hashing approaches encode data with projection-based hash functions and represent each projected dimension by 1-bit. However, the dimensions with high variance hold large energy or information of data but treated equivalently as dimensions with low variance, which leads to a serious information loss. In this paper, we introduce a novel hashing algorithm called Harmonious Hashing which aims at learning hash functions with low information loss. Specifically, we learn a set of optimized projections to preserve the maximum cumulative energy and meet the constraint of equivalent variance on each dimension as much as possible. In this way, we could minimize the information loss after binarization. Despite the extreme simplicity, our method outperforms superiorly to many state-of-the-art hashing methods in large-scale and high-dimensional nearest neighbor search experiments.
Weighted hashing for fast large scale similarity search
- in Proceedings of the 22nd ACM Conference on information and knowledge management
"... ABSTRACT Similarity search, or finding approximate nearest neighbors, is an important technique for many applications. Many recent research demonstrate that hashing methods can achieve promising results for large scale similarity search due to its computational and memory efficiency. However, most ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT Similarity search, or finding approximate nearest neighbors, is an important technique for many applications. Many recent research demonstrate that hashing methods can achieve promising results for large scale similarity search due to its computational and memory efficiency. However, most existing hashing methods treat all hashing bits equally and the distance between data examples is calculated as the Hamming distance between their hashing codes, while different hashing bits may carry different amount of information. This paper proposes a novel method, named Weighted Hashing (WeiHash), to assign different weights to different hashing bits. The hashing codes and their corresponding weights are jointly learned in a unified framework by simultaneously preserving the similarity between data examples and balancing the variance of each hashing bit. An iterative coordinate descent optimization algorithm is designed to derive desired hashing codes and weights. Extensive experiments on two large scale datasets demonstrate the superior performance of the proposed research over several state-of-the-art hashing methods.
Hashing for Similarity Search: A Survey
, 2014
"... Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this pap ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.
Large-scale supervised multimodal hashing with semantic correlation maximization
- In Proceedings of the AAAI Conference on Artificial Intelligence
, 2014
"... Due to its low storage cost and fast query speed, hashing has been widely adopted for similarity search in mul-timedia data. In particular, more and more attentions have been payed to multimodal hashing for search in multimedia data with multiple modalities, such as im-ages with tags. Typically, sup ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Due to its low storage cost and fast query speed, hashing has been widely adopted for similarity search in mul-timedia data. In particular, more and more attentions have been payed to multimodal hashing for search in multimedia data with multiple modalities, such as im-ages with tags. Typically, supervised information of se-mantic labels is also available for the data points in many real applications. Hence, many supervised mul-timodal hashing (SMH) methods have been proposed to utilize such semantic labels to further improve the search accuracy. However, the training time complex-ity of most existing SMH methods is too high, which makes them unscalable to large-scale datasets. In this paper, a novel SMH method, called semantic correlation maximization (SCM), is proposed to seamlessly inte-grate semantic labels into the hashing learning proce-dure for large-scale data modeling. Experimental results on two real-world datasets show that SCM can signifi-cantly outperform the state-of-the-art SMH methods, in terms of both accuracy and scalability.
Adaptive Object Retrieval with Kernel Reconstructive Hashing
"... Hashing is very useful for fast approximate similarity search on large database. In the unsupervised settings, most hashing methods aim at preserving the similarity defined by Euclidean distance. Hash codes generated by these ap-proaches only keep their Hamming distance corresponding to the pairwise ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Hashing is very useful for fast approximate similarity search on large database. In the unsupervised settings, most hashing methods aim at preserving the similarity defined by Euclidean distance. Hash codes generated by these ap-proaches only keep their Hamming distance corresponding to the pairwise Euclidean distance, ignoring the local dis-tribution of each data point. This objective does not hold for k-nearest neighbors search. In this paper, we firstly propose a new adaptive similarity measure which is consistent with k-NN search, and prove that it leads to a valid kernel. Then we propose a hashing scheme which uses binary codes to p-reserve the kernel function. Using low-rank approximation, our hashing framework is more effective than existing meth-ods that preserve similarity over arbitrary kernel. The pro-posed kernel function, hashing framework, and their combi-nation have demonstrated significant advantages compared with several state-of-the-art methods. 1.