Results 1  10
of
119
Cover trees for nearest neighbor
 In Proceedings of the 23rd international conference on Machine learning
, 2006
"... ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be const ..."
Abstract

Cited by 212 (0 self)
 Add to MetaCart
ABSTRACT. We present a tree data structure for fast nearest neighbor operations in generalpoint metric spaces. The data structure requires space regardless of the metric’s structure. If the point set has an expansion constant � in the sense of Karger and Ruhl [KR02], the data structure can be constructed in � time. Nearest neighbor queries obeying the expansion bound require � time. In addition, the nearest neighbor of points can be queried in time. We experimentally test the algorithm showing speedups over the brute force search varying between 1 and 2000 on natural machine learning datasets. 1.
Training a support vector machine in the primal
 Neural Computation
, 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract

Cited by 154 (5 self)
 Add to MetaCart
(Show Context)
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.
Dynamic social network analysis using latent space models
 SIGKDD Explorations, Special Issue on Link Mining
"... This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n ..."
Abstract

Cited by 117 (5 self)
 Add to MetaCart
This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in pdimensional Euclidean latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional KDtrees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for nonlinear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and realworld data on up to 11,000 entities which indicate nearlinear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS coauthorship data. 1.
Nonparametric density estimation: toward computational tractability
 In SIAM International Conference on Data Mining
, 2003
"... Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric methods such as kernel density e ..."
Abstract

Cited by 71 (9 self)
 Add to MetaCart
(Show Context)
Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric methods such as kernel density estimation and smoothing splines. While neither choice should be universally preferred for all situations, a wellknown benefit of nonparametric methods is their ability to achieve estimation optimality for ANY input distribution as more data are observed, a property that no model with a parametric assumption can have, and one of great importance in exploratory data analysis and mining where the underlying distribution is decidedly unknown. To date, however, despite a wealth of advanced underlying statistical theory, the use of nonparametric methods has been limited by their computational intractibility for all but the smallest datasets. In this paper, we present an algorithm for kernel density estimation, the chief nonparametric approach, which is dramatically faster than previous algorithmic approaches in terms of both dataset size and dimensionality. Furthermore, the algorithm provides arbitrarily tight accuracy guarantees, provides anytime convergence, works for all common kernel choices, and requires no parameter tuning. The algorithm is an instance of a new principle of algorithm design: multirecursion, or higherorder divideandconquer.
Learning spectral clustering, with application to speech separation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost fun ..."
Abstract

Cited by 69 (6 self)
 Add to MetaCart
Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled datasets. We apply our learning algorithm to the blind onemicrophone speech separation problem, casting the problem as one of segmentation of the spectrogram.
Fast gaussian process regression using kdtrees
 In Advances in Neural Information Processing Systems 18
, 2006
"... The computation required for Gaussian process regression with n training examples is about O(n 3) during training and O(n) for each prediction. This makes Gaussian process regression too slow for large datasets. In this paper, we present a fast approximation method, based on kdtrees, that significa ..."
Abstract

Cited by 47 (3 self)
 Add to MetaCart
(Show Context)
The computation required for Gaussian process regression with n training examples is about O(n 3) during training and O(n) for each prediction. This makes Gaussian process regression too slow for large datasets. In this paper, we present a fast approximation method, based on kdtrees, that significantly reduces both the prediction and the training times of Gaussian process regression. 1
Automatic online tuning for fast Gaussian summation
"... Many machine learning algorithms require the summation of Gaussian kernel functions, an expensive operation if implemented straightforwardly. Several methods have been proposed to reduce the computational complexity of evaluating such sums, including tree and analysis based methods. These achieve va ..."
Abstract

Cited by 35 (13 self)
 Add to MetaCart
(Show Context)
Many machine learning algorithms require the summation of Gaussian kernel functions, an expensive operation if implemented straightforwardly. Several methods have been proposed to reduce the computational complexity of evaluating such sums, including tree and analysis based methods. These achieve varying speedups depending on the bandwidth, dimension, and prescribed error, making the choice between methods difficult for machine learning tasks. We provide an algorithm that combines tree methods with the Improved Fast Gauss Transform (IFGT). As originally proposed the IFGT suffers from two problems: (1) the Taylor series expansion does not perform well for very low bandwidths, and (2) parameter selection is not trivial and can drastically affect performance and ease of use. We address the first problem by employing a tree data structure, resulting in four evaluation methods whose performance varies based on the distribution of sources and targets and input parameters such as desired accuracy and bandwidth. To solve the second problem, we present an online tuning approach that results in a black box method that automatically chooses the evaluation method and its parameters to yield the best performance for the input data, desired accuracy, and bandwidth. In addition, the new IFGT parameter selection approach allows for tighter error bounds. Our approach chooses the fastest method at negligible additional cost, and has superior performance in comparisons with previous approaches. 1
Rapid Evaluation of Multiple Density Models
 In Artificial Iintelligence and Statistics
, 2003
"... When highlyaccurate and/or assumptionfree density estimation is needed, nonparametric methods are often called upon  most notably the popular kernel density estimation (KDE) method. However, the practitioner is instantly faced with the formidable computational cost of KDE for appreciable da ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
(Show Context)
When highlyaccurate and/or assumptionfree density estimation is needed, nonparametric methods are often called upon  most notably the popular kernel density estimation (KDE) method. However, the practitioner is instantly faced with the formidable computational cost of KDE for appreciable dataset sizes, which becomes even more prohibitive when many models with different kernel scales (bandwidths) must be evaluated  this is necessary for finding the optimal model, among other reasons. In previous work we presented an algorithm for fast KDE which addresses large dataset sizes and large dimensionalities, but assumes only a single bandwidth. In this paper we present a generalization of that algorithm allowing multiple models with different bandwidths to be computed simultaneously, in substantially less time than either running the singlebandwidth algorithm for each model independently, or running the standard exhaustive method. We show examples of computing the likelihood curve for 100,000 data and 100 models ranging across 3 orders of magnitude in scale, in minutes or seconds.
Integrated protein interaction networks for 11 microbes
 In Proceedings of the 10th Annual International Conference on Research in Computational Molecular Biology (RECOMB
, 2006
"... Abstract. We have combined four different types of functional genomic data to create high coverage protein interaction networks for 11 microbes. Our integration algorithm naturally handles statistically dependent predictors and automatically corrects for differing noise levels and data corruption in ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
(Show Context)
Abstract. We have combined four different types of functional genomic data to create high coverage protein interaction networks for 11 microbes. Our integration algorithm naturally handles statistically dependent predictors and automatically corrects for differing noise levels and data corruption in different evidence sources. We find that many of the predictions in each integrated network hinge on moderate but consistent evidence from multiple sources rather than strong evidence from a single source, yielding novel biology which would be missed if a single data source such as coexpression or coinheritance was used in isolation. In addition to statistical analysis, we demonstrate via case study that these subtle interactions can discover new aspects of even well studied functional modules. Our work represents the largest collection of probabilistic protein interaction networks compiled to date, and our methods can be applied to any sequenced organism and any kind of experimental or computational technique which produces pairwise measures of protein interaction. 1
DualTree Fast Gauss Transforms
 Advances in Neural Information Processing Systems 18
, 2006
"... In previous work we presented an efficient approach to computing kernel summations which arise in many machine learning methods such as kernel density estimation. This approach, dualtree recursion with finitedifference approximation, generalized existing methods for similar problems arising in c ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
In previous work we presented an efficient approach to computing kernel summations which arise in many machine learning methods such as kernel density estimation. This approach, dualtree recursion with finitedifference approximation, generalized existing methods for similar problems arising in computational physics in two ways appropriate for statistical problems: toward distribution sensitivity and general dimension, partly by avoiding series expansions. While this proved to be the fastest practical method for multivariate kernel density estimation at the optimal bandwidth, it is much less efficient at largerthanoptimal bandwidths. In this work, we explore the extent to which the dualtree approach can be integrated with multipolelike Hermite expansions in order to achieve reasonable efficiency across all bandwidth scales, though only for low dimensionalities. In the process, we derive and demonstrate the first truly hierarchical fast Gauss transforms, effectively combining the best tools from discrete algorithms and continuous approximation theory. 1