Results 1 
9 of
9
Beyond Locality–Sensitive Hashing
"... We present a new data structure for the c–approximate near neighbor problem (ANN) in the Euclidean space. For n points in Rd, our algorithm achieves Oc(dnρ) query time and Oc(n1+ρ + nd) space, where ρ ≤ 7/(8c2) + O(1/c3) + oc(1). This is the first improvement over the result by Andoni and Indyk (FOC ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We present a new data structure for the c–approximate near neighbor problem (ANN) in the Euclidean space. For n points in Rd, our algorithm achieves Oc(dnρ) query time and Oc(n1+ρ + nd) space, where ρ ≤ 7/(8c2) + O(1/c3) + oc(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality–sensitive hashing lower bound proved by O’Donnell, Wu and Zhou (ITCS 2011). By a standard reduction we obtain a data structure for the Hamming space and ℓ1 norm with ρ ≤ 7/(8c) + O(1/c3/2) + oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998). 1
Approximate kflat Nearest Neighbor Search
 Proc. 47th Annu. ACM Sympos. Theory Comput. (STOC), 783–792
, 2015
"... Let k be a nonnegative integer. In the approximate kflat nearest neighbor (kANN) problem, we are given a set P ⊂ Rd of n points in ddimensional space and a fixed approximation factor c> 1. Our goal is to preprocess P so that we can efficiently answer approximate kflat nearest neighbor queries ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Let k be a nonnegative integer. In the approximate kflat nearest neighbor (kANN) problem, we are given a set P ⊂ Rd of n points in ddimensional space and a fixed approximation factor c> 1. Our goal is to preprocess P so that we can efficiently answer approximate kflat nearest neighbor queries: given a kflat F, find a point in P whose distance to F is within a factor c of the distance between F and the closest point in P. The case k = 0 corresponds to the wellstudied approximate nearest neighbor problem, for which a plethora of results are known, both in low and high dimensions. The case k = 1 is called approximate line nearest neighbor. In this case, we are aware of only one provably efficient data structure, due to Andoni, Indyk, Krauthgamer, and Nguyễn (AIKN) [2]. For k ≥ 2, we know of no previous results. We present the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary k. We use a data structure for 0ANNqueries as a black box, and the performance depends on the parameters of the 0ANN solution: suppose we have an 0ANN structure with query time O(nρ) and space requirement O(n1+σ), for ρ, σ> 0. Then we can answer kANN queries in time O(nk/(k+1−ρ)+t) and space O(n1+σk/(k+1−ρ) + n logO(1/t) n). Here, t> 0 is an arbitrary constant and the Onotation
Tight Lower Bounds for DataDependent LocalitySensitive Hashing
, 2015
"... We prove a tight lower bound for the exponent ρ for datadependent LocalitySensitive Hashing schemes, recently used to design efficient solutions for the capproximate nearest neighbor search. In particular, our lower bound matches the bound of ρ ≤ 1 2c−1+o(1) for the `1 space, obtained via the rec ..."
Abstract
 Add to MetaCart
We prove a tight lower bound for the exponent ρ for datadependent LocalitySensitive Hashing schemes, recently used to design efficient solutions for the capproximate nearest neighbor search. In particular, our lower bound matches the bound of ρ ≤ 1 2c−1+o(1) for the `1 space, obtained via the recent algorithm from [Andoni–Razenshteyn,
Document Retrieval in Big Data
"... Abstract—Nearest Neighbor Search for similar document retrieval suffers from an efficiency problem when scaled to a large dataset. In this paper, we introduce an unsupervised approach based on Locality Sensitive Hashing to alleviate its search complexity problem. The advantage of our proposed approa ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Nearest Neighbor Search for similar document retrieval suffers from an efficiency problem when scaled to a large dataset. In this paper, we introduce an unsupervised approach based on Locality Sensitive Hashing to alleviate its search complexity problem. The advantage of our proposed approach is that it does not need to scan all the documents for retrieving topK Nearest Neighbors, instead, a number of hash table lookup operations are conducted to retrieve the topK candidates. Experiments on two massive news and tweets datasets demonstrate that our approach is able to achieve over an order of speedup compared with the traditional Information Retrieval method and maintain reasonable precision.
1Feature Fusion for Efficient ContentBased Video Retrieval
"... Abstract—Contentbased video retrieval is a complex task because of the large amount of information in single items and because databases of videos can be very large. In this paper we explore a possible solution for efficient similar item retrieval. In our experiments we combine relevant feature set ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Contentbased video retrieval is a complex task because of the large amount of information in single items and because databases of videos can be very large. In this paper we explore a possible solution for efficient similar item retrieval. In our experiments we combine relevant feature sets together with a learned Mahalanobis metric while using an efficient nearest neighbor search algorithm. The efficient nearest neighbor algorithms we compare are Locality Sensitive Hashing and Vantage Point trees. The two options are compared to several baseline systems in the general video retrieval framework. We used three sets of features to test the system: SURF features, color histograms and topics. The topics where extracted using a Latent Dirichlet Allocation topic model. We show that fusing the individual feature sets with a learned metric improves the performance upon the best individual feature set. The feature fusion can be combined with an efficient nearest neighbor search algorithm to reduce the number of exact distance computations with limited impact on retrieval performance. Index Terms—Contentbased video retrieval, feature fusion, metric learning, efficient retrieval, nearest neighbor search, locality sensitive hashing, vantage point trees I.
Smooth Tradeoffs between Insert and Query Complexity in Nearest Neighbor Search
"... Abstract Locality Sensitive Hashing (LSH) has emerged as the method of choice for high dimensional similarity search, a classical problem of interest in numerous applications. LSHbased solutions require that each data point be inserted into a number A of hash tables, after which a query can be ans ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Locality Sensitive Hashing (LSH) has emerged as the method of choice for high dimensional similarity search, a classical problem of interest in numerous applications. LSHbased solutions require that each data point be inserted into a number A of hash tables, after which a query can be answered by performing B lookups. The original LSH solution of In this paper, we present an algorithm for performing similarity search under the Euclidean metric that resolves the problem above. Our solution is inspired by Entropy LSH, but uses a very different analysis to achieve a smooth tradeoff between insert and query complexity. Our results improve upon or match, up to lower order terms in the exponent, best known dataoblivious algorithms for main memory LSH for the Euclidean metric.
Abstract Impact of Spam Exposure on User Engagement
"... In this paper we quantify the effect of unsolicited emails (spam) on behavior and engagement of email users. Since performing randomized experiments in this setting is rife with practical and moral issues, we seek to determine causal relationships using observational data, something that is difficul ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper we quantify the effect of unsolicited emails (spam) on behavior and engagement of email users. Since performing randomized experiments in this setting is rife with practical and moral issues, we seek to determine causal relationships using observational data, something that is difficult in many cases. Using a novel modification of a user matching method combined with a time series regression on matched user pairs, we develop a framework for such causal inference that is particularly suited for the spam exposure use case. Using our matching technique, we objectively quantify the effect that continued exposure to spam has on user engagement in Yahoo! Mail. We find that indeed spam exposure leads to significantly, both statistically and economically, lower user engagement. The impact is nonlinear; large changes impact users in a progressively more negative fashion. The impact is the strongest on “voluntary ” categories of engagement such as composed emails and lowest on “responsive ” engagement metrics. Our estimation technique and results not only quantify the negative impact of abuse, but also allow decision makers to estimate potential engagement gains from proposed investments in abuse mitigation. 1
A Survey on Nearest Neighbor Search Methods
"... Nowadays, the need to techniques, approaches, and algorithms to search on data is increased due to improvements in computer science and increasing amount of information. This ever increasing information volume has led to time and computation complexity. Recently, different methods to solve such prob ..."
Abstract
 Add to MetaCart
(Show Context)
Nowadays, the need to techniques, approaches, and algorithms to search on data is increased due to improvements in computer science and increasing amount of information. This ever increasing information volume has led to time and computation complexity. Recently, different methods to solve such problems are proposed. Among the others, nearest neighbor search is one of the best techniques to this end which is focused by many researchers. Different techniques are used for nearest neighbor search. In addition to put an end to some complexities, variety of these techniques has made them suitable for different applications such as pattern recognition, searching in multimedia data, information retrieval, databases, data mining, and computational geometry to name but a few. In this paper, by opening a new view to this problem, a comprehensive evaluation on structures, techniques and different algorithms in this field is done and a new categorization of techniques in NNS is presented. This categorization is consists of seven groups: Weighted,