Results 1  10
of
615
Fast random walk with restart and its applications
 In ICDM ’06: Proceedings of the 6th IEEE International Conference on Data Mining
, 2006
"... How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captionin ..."
Abstract

Cited by 174 (19 self)
 Add to MetaCart
(Show Context)
How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic precomputation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, communitylike structure. We exploit the linearity by using lowrank matrix approximation, and the community structure by graph partitioning, followed by the ShermanMorrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in precomputation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation. 1
The pyramid match kernel: Efficient learning with sets of features
 Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract

Cited by 136 (10 self)
 Add to MetaCart
(Show Context)
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
"... Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regre ..."
Abstract

Cited by 118 (11 self)
 Add to MetaCart
(Show Context)
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GPUCB, an intuitive upperconfidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GPUCB compares favorably with other heuristical GP optimization approaches. 1.
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 110 (45 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
Confidenceweighted linear classification
 In ICML ’08: Proceedings of the 25th international conference on Machine learning
, 2008
"... We introduce confidenceweighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distributi ..."
Abstract

Cited by 95 (15 self)
 Add to MetaCart
(Show Context)
We introduce confidenceweighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. 1.
Gaussian Processes for Signal StrengthBased Location Estimation
 In Proc. of Robotics Science and Systems
, 2006
"... Abstract — Estimating the location of a mobile device or a robot from wireless signal strength has become an area of highly active research. The key problem in this context stems from the complexity of how signals propagate through space, especially in the presence of obstacles such as buildings, wa ..."
Abstract

Cited by 93 (8 self)
 Add to MetaCart
(Show Context)
Abstract — Estimating the location of a mobile device or a robot from wireless signal strength has become an area of highly active research. The key problem in this context stems from the complexity of how signals propagate through space, especially in the presence of obstacles such as buildings, walls or people. In this paper we show how Gaussian processes can be used to generate a likelihood model for signal strength measurements. We also show how parameters of the model, such as signal noise and spatial correlation between measurements, can be learned from data via hyperparameter estimation. Experiments using WiFi indoor data and GSM cellphone connectivity demonstrate the superior performance of our approach. I.
WiFiSLAM Using Gaussian Process Latent Variable Models
 In Proceedings of IJCAI 2007
, 2007
"... WiFi localization, the task of determining the physical location of a mobile device from wireless signal strengths, has been shown to be an accurate method of indoor and outdoor localization and a powerful building block for locationaware applications. However, most localization techniques require ..."
Abstract

Cited by 74 (6 self)
 Add to MetaCart
WiFi localization, the task of determining the physical location of a mobile device from wireless signal strengths, has been shown to be an accurate method of indoor and outdoor localization and a powerful building block for locationaware applications. However, most localization techniques require a training set of signal strength readings labeled against a ground truth location map, which is prohibitive to collect and maintain as maps grow large. In this paper we propose a novel technique for solving the WiFi SLAM problem using the Gaussian Process Latent Variable Model (GPLVM) to determine the latentspace locations of unlabeled signal strength data. We show how GPLVM, in combination with an appropriate motion dynamics model, can be used to reconstruct a topological connectivity graph from a signal strength sequence which, in combination with the learned Gaussian Process signal strength model, can be used to perform efficient localization. 1
Nonlinear Matrix Factorization with Gaussian Processes
"... A popular approach to collaborative filtering is matrix factorization. In this paper we develop a nonlinear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to ..."
Abstract

Cited by 72 (1 self)
 Add to MetaCart
(Show Context)
A popular approach to collaborative filtering is matrix factorization. In this paper we develop a nonlinear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous stateoftheart performance. 1.
Adaptive Regularization of Weight Vectors
 Advances in Neural Information Processing Systems 22
, 2009
"... We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle nonseparable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform ..."
Abstract

Cited by 65 (14 self)
 Add to MetaCart
We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle nonseparable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidenceweighted online learning techniques and show empirically that AROW achieves stateoftheart performance and notable robustness in the case of nonseparable data. 1
GPBayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models
 in Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS
, 2008
"... Abstract — Bayesian filtering is a general framework for recursively estimating the state of a dynamical system. The most common instantiations of Bayes filters are Kalman filters (extended and unscented) and particle filters. Key components of each Bayes filter are probabilistic prediction and obse ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
(Show Context)
Abstract — Bayesian filtering is a general framework for recursively estimating the state of a dynamical system. The most common instantiations of Bayes filters are Kalman filters (extended and unscented) and particle filters. Key components of each Bayes filter are probabilistic prediction and observation models. Recently, Gaussian processes have been introduced as a nonparametric technique for learning such models from training data. In the context of unscented Kalman filters, these models have been shown to provide estimates that can be superior to those achieved with standard, parametric models. In this paper we show how Gaussian process models can be integrated into other Bayes filters, namely particle filters and extended Kalman filters. We provide a complexity analysis of these filters and evaluate the alternative techniques using data collected with an autonomous microblimp. I.