Results 1  10
of
209
Distance metric learning for large margin nearest neighbor classification
 In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract

Cited by 685 (15 self)
 Add to MetaCart
(Show Context)
We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Fast construction of nets in lowdimensional metrics and their applications
 SIAM Journal on Computing
, 2006
"... We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, s ..."
Abstract

Cited by 130 (14 self)
 Add to MetaCart
(Show Context)
We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This datastructure is then applied to obtain improved algorithms for the following problems: approximate nearest neighbor search, wellseparated pair decomposition, spanner construction, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near linear and the space being used is linear. 1
BMultiProbe LSH: Efficient indexing for highdimensional similarity search
 in Proc. 33rd Int. Conf. Very Large Data Bases
"... Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate ..."
Abstract

Cited by 111 (3 self)
 Add to MetaCart
(Show Context)
Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multiprobe LSH that overcomes this drawback. Multiprobe LSH is built on the wellknown LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropybased LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multiprobe LSH method and evaluated the implementation with two different highdimensional datasets. Our evaluation shows that the multiprobe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multiprobe LSH has a similar timeefficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropybased LSH method, to achieve the same search quality, multiprobe LSH uses less query time and 5 to 8 times fewer number of hash tables. 1.
Learning to hash with binary reconstructive embeddings
 in Proc. NIPS, 2009
"... Fast retrieval methods are increasingly critical for many largescale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitl ..."
Abstract

Cited by 110 (1 self)
 Add to MetaCart
(Show Context)
Fast retrieval methods are increasingly critical for many largescale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinatedescent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing stateoftheart techniques. 1
Autonomous helicopter aerobatics through apprenticeship learning
 International Journal of Robotics Research
"... Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s capabilities. We present apprenticeship learning ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
(Show Context)
Autonomous helicopter flight is widely regarded to be a highly challenging control problem. Despite this fact, human experts can reliably fly helicopters through a wide range of maneuvers, including aerobatic maneuvers at the edge of the helicopter’s capabilities. We present apprenticeship learning algorithms, which leverage expert demonstrations to efficiently learn good controllers for tasks being demonstrated by an expert. These apprenticeship learning algorithms have enabled us to significantly extend the state of the art in autonomous helicopter aerobatics. Our experimental results include the first autonomous execution of a wide range of maneuvers, including but not limited to inplace flips, inplace rolls, loops and hurricanes, and even autorotation landings, chaos and tictocs, which only exceptional human pilots can perform. Our results also include complete airshows, which require autonomous transitions between many of these maneuvers. Our controllers perform as well as, and often even better than, our expert pilot.
Automatic analysis of malware behavior using machine learning
, 2009
"... Malicious software—so called malware—poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in form of computer viruses, Internet worms and ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
Malicious software—so called malware—poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during runtime provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behaviorbased analysis, capable to process the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the runtime overhead of current analysis methods, while providing an accurate discovery and discrimination of novel malware variants. 1
Searching dynamic point sets in spaces with bounded doubling dimension
 In The thirtyeighth annual ACM symposium on Theory of computing (STOC
, 2006
"... We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ɛ)approximate neares ..."
Abstract

Cited by 42 (15 self)
 Add to MetaCart
(Show Context)
We present a new data structure that facilitates approximate nearest neighbor searches on a dynamic set of points in a metric space that has a bounded doubling dimension. Our data structure has linear size and supports insertions and deletions in O(log n) time, and finds a (1 + ɛ)approximate nearest neighbor in time O(log n) +(1/ɛ) O(1). The search and update times hide multiplicative factors that depend on the doubling dimension; the space does not. These performance times are independent of the aspect ratio (or spread) of the points. Categories and Subject Descriptors: F.2.2 [Nonnumerical Algorithms and Problems]:Sorting and searching, computations on discrete structures; E.1 [Data Structures]:Graphs and networks, trees.
Unsupervised methods for determining object and relation synonyms on the web
 Journal of Artificial Intelligence Research
, 2009
"... The task of identifying synonymous relations and objects, or synonym resolution, is critical for highquality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither handtagged training examples nor domain knowledge is ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
(Show Context)
The task of identifying synonymous relations and objects, or synonym resolution, is critical for highquality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither handtagged training examples nor domain knowledge is available. The paper presents a scalable, fullyimplemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver, introduces a probabilistic relational model for predicting whether two strings are coreferential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78 % precision and 68 % recall, and resolves relations with 90 % precision and 35 % recall. Several variations of Resolver’s probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97 % precision and 95 % recall on a data set from the TREC corpus.
A posteriori multiprobe locality sensitive hashing
 ACM Multimedia
, 2008
"... Efficient highdimensional similarity search structures are essential for building scalable contentbased search systems on featurerich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for approximate similarity search. Among the most rec ..."
Abstract

Cited by 39 (11 self)
 Add to MetaCart
(Show Context)
Efficient highdimensional similarity search structures are essential for building scalable contentbased search systems on featurerich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for approximate similarity search. Among the most recent variations of LSH, multiprobe LSH techniques have been proved to overcome the overlinear space cost drawback of common LSH. Multiprobe LSH is built on the wellknown LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by previous work on probabilistic similarity search structures and improves upon recent theoretical work on multiprobe and query adaptive LSH. Whereas these methods are based on likelihood criteria that a given bucket contains query results, we define a more reliable a posteriori model taking account some prior about the queries and the searched objects. This prior knowledge allows a better quality control of the search and a more accurate selection of the most probable buckets. We implemented a nearest neighbors search based on this paradigm and performed experiments on different real visual features datasets. We show that our a posteriori scheme outperforms other multiprobe LSH while offering a better quality control. Comparisons to the basic LSH technique show that our method allows consistent improvements both in space and time efficiency.