Results 1  10
of
99
Learning to rank using gradient descent
 In ICML
, 2005
"... We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data f ..."
Abstract

Cited by 346 (16 self)
 Add to MetaCart
We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine. 1.
Fast Binary Feature Selection with Conditional Mutual Information
 Journal of Machine Learning Research
, 2004
"... We propose in this paper a very fast feature selection technique based on conditional mutual information. ..."
Abstract

Cited by 101 (1 self)
 Add to MetaCart
We propose in this paper a very fast feature selection technique based on conditional mutual information.
Boosting as a Regularized Path to a Maximum Margin Classifier
 Journal of Machine Learning Research
, 2004
"... In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l 1 constraint on the coefficient vector. This helps understand the success of boosting with ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l 1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria (exponential and binomial loglikelihood), we further show that as the constraint is relaxedor equivalently as the boosting iterations proceedthe solution converges (in the separable case) to an "l 1 optimal" separating hyperplane. We prove that this l 1 optimal separating hyperplane has the property of maximizing the minimal l 1 margin of the training data, as defined in the boosting literature.
Kernel matching pursuit
 Machine Learning
, 2002
"... Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, a ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, and how it can be used to build kernelbased solutions to machinelearning problems, while keeping control of the sparsity of the solution. We also derive MDL motivated generalization bounds for this type of algorithm, and compare them to related SVM (Support Vector Machine) bounds. Finally, links to boosting algorithms and RBF training procedures, as well as an extensive experimental comparison with SVMs for classification are given, showing comparable results with typically sparser models. 1
AdaBoosting neural networks
 Neural Computation
, 1997
"... Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multilayer artificial neural networks. We show that training multilayer neural networks in which the number of ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multilayer artificial neural networks. We show that training multilayer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors. 1
Point matching as a classification problem for fast and robust object pose estimation
 In Conference on Computer Vision and Pattern Recognition
, 2004
"... We propose a novel approach to point matching under large viewpoint and illumination changes that is suitable for accurate object pose estimation at a much lower computational cost than stateoftheart methods. Most of these methods rely either on using ad hoc local descriptors or on estimating loc ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
We propose a novel approach to point matching under large viewpoint and illumination changes that is suitable for accurate object pose estimation at a much lower computational cost than stateoftheart methods. Most of these methods rely either on using ad hoc local descriptors or on estimating local affine deformations. By contrast, we treat wide baseline matching of keypoints as a classification problem, in which each class corresponds to the set of all possible views of such a point. Given one or more images of a target object, we train the system by synthesizing a large number of views of individual keypoints and by using statistical classification tools to produce a compact description of this view set. At runtime, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use a classification method to reduce matching error rates, and to move some of the computational burden from matching to training, which can be performed beforehand. In the context of pose estimation, we present experimental results for both planar and nonplanar objects in the presence of occlusions, illumination changes, and cluttered backgrounds. We will show that our method is both reliable and suitable for initializing realtime applications. 1.
An iterative method for multiclass costsensitive learning
 In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Costsensitive learning addresses the issue of classification in the presence of varying costs associated with different types of misclassification. In this paper, we present a method for solving multiclass costsensitive learning problems using any binary classification algorithm. This algorithm i ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
Costsensitive learning addresses the issue of classification in the presence of varying costs associated with different types of misclassification. In this paper, we present a method for solving multiclass costsensitive learning problems using any binary classification algorithm. This algorithm is derived using three key ideas: 1) iterative weighting; 2) expanding data space; and 3) gradient boosting with stochastic ensembles. We establish some theoretical guarantees concerning the performance of this method. In particular, we show that a certain variant possesses the boosting property, given a form of weak learning assumption on the component binary classifier. We also empirically evaluate the performance of the proposed method using benchmark data sets and verify that our method generally achieves better results than representative methods for costsensitive learning, in terms of predictive performance (cost minimization) and, in many cases, computational efficiency.
LDAHash: Improved matching with smaller descriptors
, 2010
"... SIFTlike local feature descriptors are ubiquitously employed in such computer vision applications as contentbased retrieval, video analysis, copy detection, object recognition, phototourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometri ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
SIFTlike local feature descriptors are ubiquitously employed in such computer vision applications as contentbased retrieval, video analysis, copy detection, object recognition, phototourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptorsareonlyapproximatelyinvariantinpractice. Secondly, descriptors are usually highdimensional (e.g. SIFT is represented as a 128dimensional vector). In largescale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space, in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.
The dynamics of adaboost: Cyclic behavior and convergence of margins
 Journal of Machine Learning Research
, 2004
"... In order to study the convergence properties of the AdaBoost algorithm, we reduce AdaBoost to a nonlinear iterated map and study the evolution of its weight vectors. This dynamical systems approach allows us to understand AdaBoost’s convergence properties completely in certain cases; for these cases ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
In order to study the convergence properties of the AdaBoost algorithm, we reduce AdaBoost to a nonlinear iterated map and study the evolution of its weight vectors. This dynamical systems approach allows us to understand AdaBoost’s convergence properties completely in certain cases; for these cases we find stable cycles, allowing us to explicitly solve for AdaBoost’s output. Using this unusual technique, we are able to show that AdaBoost does not always converge to a maximum margin combined classifier, answering an open question. In addition, we show that “nonoptimal ” AdaBoost (where the weak learning algorithm does not necessarily choose the best weak classifier at each iteration) may fail to converge to a maximum margin classifier, even if “optimal ” AdaBoost produces a maximum margin. Also, we show that if AdaBoost cycles, it cycles among “support vectors”, i.e., examples that achieve the same smallest margin.
Coresets, sparse greedy approximation and the FrankWolfe algorithm
 Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛcoresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the FrankWolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.