Results 1 - 10
of
48
On the algorithmic implementation of multiclass kernel-based vector machines
- Journal of Machine Learning Research
, 2001
"... In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic ob ..."
Abstract
-
Cited by 239 (10 self)
- Add to MetaCart
In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem we are able to incorporate kernels with a compact set of constraints and decompose the dual problem into multiple optimization problems of reduced size. We describe an efficient fixed-point algorithm for solving the reduced optimization problems and prove its convergence. We then discuss technical details that yield significant running time improvements for large datasets. Finally, we describe various experiments with our approach comparing it to previously studied kernel-based methods. Our experiments indicate that for multiclass problems we attain state-of-the-art accuracy.
Logistic Regression, AdaBoost and Bregman Distances
, 2000
"... We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt al ..."
Abstract
-
Cited by 171 (39 self)
- Add to MetaCart
We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms can be divided into two types based on whether the parameters are iteratively updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms which interpolates smoothly between these two extremes. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliary-function proof technique. As one of our sequential-update algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show that all of our algorithms generalize easily to the multiclass case, and we contrast the new algorithms with iterative scaling. We conclude with a few experimental results with synthetic data that highlight the behavior of the old and newly proposed algorithms in different settings.
On the Learnability and Design of Output Codes for Multiclass Problems
- In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
, 2000
"... . Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In this paper we discuss for the first time the problem of designing output codes for multiclass problem ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
. Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In this paper we discuss for the first time the problem of designing output codes for multiclass problems. For the design problem of discrete codes, which have been used extensively in previous works, we present mostly negative results. We then introduce the notion of continuous codes and cast the design problem of continuous codes as a constrained optimization problem. We describe three optimization problems corresponding to three different norms of the code matrix. Interestingly, for the l 2 norm our formalism results in a quadratic program whose dual does not depend on the length of the code. A special case of our formalism provides a multiclass scheme for building support vector machines which can be solved efficiently. We give a time and space efficient algorithm for solving the quadratic program. We describe preliminary experiments with synthetic data show that our algorithm is often two orders of magnitude faster than standard quadratic programming packages. We conclude with the generalization properties of the algorithm. Keywords: Multiclass categorization,output coding, SVM 1.
The 2001 Trading Agent Competition
- IEEE Internet Computing
, 2000
"... The 2001 Trading Agent Competition was the second in a series of events aiming to shed light on research issues in automating trading strategies. Based on a challenging market scenario in the domain of travel shopping, the competition presents agents with difficult issues in bidding strategy, market ..."
Abstract
-
Cited by 85 (9 self)
- Add to MetaCart
The 2001 Trading Agent Competition was the second in a series of events aiming to shed light on research issues in automating trading strategies. Based on a challenging market scenario in the domain of travel shopping, the competition presents agents with difficult issues in bidding strategy, market prediction, and resource allocation. Entrants in 2001 demonstrated substantial progress over the prior year, with the overall level of competence exhibited suggesting that trading in online markets is a viable domain for highly autonomous agents.
Statistical Behavior and Consistency of Classification Methods based on Convex Risk Minimization
, 2001
"... We study how close the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. ..."
Abstract
-
Cited by 85 (4 self)
- Add to MetaCart
We study how close the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification error function. The measurement of closeness is characterized by the loss function used in the estimation. We show that such a classification scheme can be generally regarded as a (non maximum-likelihood) conditional in-class probability estimate, and we use this analysis to compare various convex loss functions that have appeared in the literature. Furthermore, the theoretical insight allows us to design good loss functions with desirable properties. Another aspect of our analysis is to demonstrate the consistency of certain classification methods using convex risk minimization.
Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
- Advances in Neural Information Processing System 14
, 2001
"... This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval). In such domains a cascade of simple classifiers each trained to achieve high detection rates and modes ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval). In such domains a cascade of simple classifiers each trained to achieve high detection rates and modest false positive rates can yield a final detector with many desirable features: including high detection rates, very low false positive rates, and fast performance. Achieving extremely high detection rates, rather than low error, is not a task typically addressed by machine learning algorithms.
Incorporating Prior Knowledge into Boosting
, 2002
"... We describe a modification to the AdaBoost algorithm that permits the incorporation of prior human knowledge as a means of compensating for a shortage of training data. We give a convergence result for the algorithm. ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
We describe a modification to the AdaBoost algorithm that permits the incorporation of prior human knowledge as a means of compensating for a shortage of training data. We give a convergence result for the algorithm.
ATTac-2001: A learning, autonomous bidding agent
- In Agent Mediated Electronic Commerce IV. LNCS
, 2002
"... Abstract. Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This paper presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. The core of our approach is learning a m ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Abstract. Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This paper presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. The core of our approach is learning a model of the empirical price dynamics based on past data and using the model to analytically calculate, to the greatest extent possible, optimal bids. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). ATTac-2001 uses boosting techniques to learn conditional distributions of auction clearing prices. We present experiments demonstrating the effectiveness of this predictor relative to several reasonable alternatives. 1
Hyperlink Ensembles: A Case Study in Hypertext Classification
- Information Fusion
, 2001
"... In this paper, we introduce hyperlink ensembles, a novel type of ensemble classifier for classifying hypertext documents. Instead of using the text on a page for deriving features that can be used for training a classifier, we suggest to use portions of texts from all pages that point to the targ ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this paper, we introduce hyperlink ensembles, a novel type of ensemble classifier for classifying hypertext documents. Instead of using the text on a page for deriving features that can be used for training a classifier, we suggest to use portions of texts from all pages that point to the target page. A hyperlink ensemble is formed by obtaining one prediction for each hyperlink that points to a page.
A Geometric Approach to Leveraging Weak Learners
- Computational Learning Theory: 4th European Conference (EuroCOLT '99
, 1998
"... . AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over t ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
. AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is the direction of steepest descent. We introduce a new leveraging algorithm based on a natural potential function. For this potential function, the direction of steepest descent can have negative components. Therefore we provide two transformations for obtaining suitable distributions from these directions of steepest descent. The resulting algorithms have bounds that are incomparable to AdaBoost's, and their empirical performance is similar to AdaBoost's. 1 Introduction Algorithms like AdaBoost [7] that are able to improve the hypotheses generated by weak learning methods have great potential and practical benefits. We call any such algorithm a leverag...

