Results 1 
7 of
7
Maximum innerproduct search using cone trees
 In SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM
, 2012
"... The problem of efficiently finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied. However, the closely related problem of efficiently finding the best match with respect to the innerproduct has never been explo ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
The problem of efficiently finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied. However, the closely related problem of efficiently finding the best match with respect to the innerproduct has never been explored in the general setting to the best of our knowledge. In this paper we consider this problem and contrast it with the previous problems considered. First, we propose a general branchandbound algorithm based on a (single) tree data structure. Subsequently, we present a dualtree algorithm for the case where there are multiple queries. Our proposed branchandbound algorithms are based on novel innerproduct bounds. Finally we present a new data structure, the cone tree, for increasing the efficiency of the dualtree algorithm. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique in some cases.
Fast Exact Maxkernel Search
"... The wide applicability of kernels makes the problem of maxkernel search ubiquitous and more general than the usual similarity search in metric spaces. We focus on solving this problem efficiently. We begin by characterizing the inherent hardness of the maxkernel search problem with a novel notion ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
The wide applicability of kernels makes the problem of maxkernel search ubiquitous and more general than the usual similarity search in metric spaces. We focus on solving this problem efficiently. We begin by characterizing the inherent hardness of the maxkernel search problem with a novel notion of directional concentration. Following that, we present a method to use an O(nlogn) algorithm to index any set of objects (points in R D or abstract objects) directly in the Hilbert space without any explicit feature representations of the objects in this space. We present the first provably O(logn) algorithm for exact maxkernel search using this index. Empirical results for a variety of data sets as well as abstract objects demonstrate up to 4 orders of magnitude speedup in some cases. Extensions for approximate maxkernel search are also presented. 1
DualTree Fast Exact MaxKernel Search
, 2013
"... The problem of maxkernel search arises everywhere: given a query point pq, a set of reference objects Sr and some kernel K, find arg maxpr∈Sr K(pq, pr). Maxkernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include im ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The problem of maxkernel search arises everywhere: given a query point pq, a set of reference objects Sr and some kernel K, find arg maxpr∈Sr K(pq, pr). Maxkernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include image matching, information retrieval, bioinformatics, similarity search, and collaborative filtering (to name just a few). However, there are no generalized techniques for efficiently solving maxkernel search. This paper presents a singletree algorithm called singletree FastMKS which returns the maxkernel solution for a single query point in provably O(logN) time (where N is the number of reference objects), and also a dualtree algorithm (dualtree FastMKS) which is useful for maxkernel search with many query points. If the set of query points is of size O(N), this algorithm returns a solution in provably O(N) time, which is significantly better than the O(N2) linear scan solution; these bounds are dependent on the expansion constant of the data. These algorithms work for abstract objects, as they do not require explicit representation of the points in kernel space. Empirical results for a variety of datasets show up to 5 orders of magnitude speedup in some cases. In addition, we present approximate extensions of the FastMKS algorithms that can achieve further speedups. 1 Maxkernel search
A Comparative Study of Social Media and Traditional Polling in the Egyptian Uprising of 2011
"... ..."
(Show Context)
Supervisors:
, 2014
"... Distributional semantics is a research area investigating unsupervised datadriven models for quantifying semantic relatedness. This thesis investigates the possibilities of using distributional semantic models for sentiment classification of utterances, by composing distributional vectors of words ..."
Abstract
 Add to MetaCart
(Show Context)
Distributional semantics is a research area investigating unsupervised datadriven models for quantifying semantic relatedness. This thesis investigates the possibilities of using distributional semantic models for sentiment classification of utterances, by composing distributional vectors of words in utterances. For evaluation I use a set of manually classified movie reviews. While the purpose of this study has been to test compositions in distributional semantic model, the work has mainly been focused on finding a useful model configuration for the DSM. The thesis concludes that more associative window sizes performed better than less associative ones. Weighting the DSM by PPMI gave the most stable performance improvements as well. Context selection is essential for achieving higher scores. While DSM does not reach beyond baseline results in its evaluation, there are still unexplored areas in which potential improvements may lie.
MEASURING THE INFLUENCE OF MAINSTREAM MEDIA
, 2014
"... Measuring the influence of mainstream media on twitter users ..."