Results 1  10
of
97
Solving multiclass learning problems via errorcorrecting output codes
 Journal of Artificial Intelligence Research
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning ..."
Abstract

Cited by 569 (9 self)
 Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decisiontree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decisiontree pruning. Finally,we show thatlike the other methodsthe errorcorrecting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that errorcorrecting output codes provide a generalpurpose method for improving the performance of inductive learning programs on multiclass problems. 1.
Boosting a Weak Learning Algorithm By Majority
, 1995
"... We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas pr ..."
Abstract

Cited by 423 (15 self)
 Add to MetaCart
We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas presented by Schapire in his paper "The strength of weak learnability", and represents an improvement over his results. The analysis of our algorithm provides general upper bounds on the resources required for learning in Valiant's polynomial PAC learning framework, which are the best general upper bounds known today. We show that the number of hypotheses that are combined by our algorithm is the smallest number possible. Other outcomes of our analysis are results regarding the representational power of threshold circuits, the relation between learnability and compression, and a method for parallelizing PAC learning algorithms. We provide extensions of our algorithms to cases in which the conc...
The Lack of A Priori Distinctions Between Learning Algorithms
, 1996
"... This is the first of two papers that use offtraining set (OTS) error to investigate the assumption free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in ..."
Abstract

Cited by 123 (5 self)
 Add to MetaCart
This is the first of two papers that use offtraining set (OTS) error to investigate the assumption free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as viceversa, for loss functions like zeroone loss. In particular, this is true if A is crossvalidation and B is "anticrossvalidation" (choose the learning algorithm with largest crossvalidation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the VapnikChervonenkis dimension of your generalizer is small; and the trainin...
Bounding the VapnikChervonenkis dimension of concept classes parameterized by real numbers
 Machine Learning
, 1995
"... Abstract. The VapnikChervonenkis (VC) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the VC dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are au ..."
Abstract

Cited by 91 (1 self)
 Add to MetaCart
Abstract. The VapnikChervonenkis (VC) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the VC dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are automatic for discrete concept classes, but hitherto little has been known about what general conditions guarantee polynomial bounds on VC dimension for classes in which concepts and examples are represented by tuples of real numbers. In this paper, we show that for two general kinds of concept class the VC dimension is polynomially bounded in the number of real numbers used to define a problem instance. One is classes where the criterion for membership of an instance in a concept can be expressed as a formula (in the firstorder theory of the reals) with fixed quantification depth and exponentiallybounded length, whose atomic predicates are polynomial inequalities of exponentiallybounded degree. The other is classes where containment of an instance in a concept is testable in polynomial time, assuming we may compute standard arithmetic operations on reals exactly in constant time. Our results show that in the continuous case, as in the discrete, the real barrier to efficient learning in the Occam sense is complexitytheoretic and not informationtheoretic. We present examples to show how these results apply to concept classes defined by geometrical figures and neural nets, and derive polynomial bounds on the VC dimension for these classes. Keywords: Concept learning, information theory, VapnikChervonenkis dimension, Milnor’s theorem 1.
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Using taxonomy, discriminants, and signatures for navigating in text databases
 In Proceedings of the 23rd VLDB Conference
, 1997
"... We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through ..."
Abstract

Cited by 76 (5 self)
 Add to MetaCart
We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a at unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamically with respect to where the user is located at any time. Weshowhowto update such databases with new documents with high speed and accuracy. Weuse techniques from statistical pattern recognition to e ciently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multilevel classi er. At each node, this classi er can ignore the large number of noise words in a document. Thus the classi er has a small model size and is very fast. However, owing to the use of contextsensitive features, the classi er is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!. 1
First order jkclausal theories are PAClearnable
 Artificial Intelligence
, 1994
"... We present positive PAClearning results for the nonmonotonic inductive logic programming setting. In particular, we show that first order rangerestricted clausal theories that consist of clauses with up to k literals of size at most j each are polynomialsample polynomialtime PAClearnable with on ..."
Abstract

Cited by 64 (27 self)
 Add to MetaCart
We present positive PAClearning results for the nonmonotonic inductive logic programming setting. In particular, we show that first order rangerestricted clausal theories that consist of clauses with up to k literals of size at most j each are polynomialsample polynomialtime PAClearnable with onesided error from positive examples only. In our framework, concepts are clausal theories and examples are finite interpretations. We discuss the problems encountered when learning theories which only have infinite nontrivial models and propose a way to avoid these problems using a representation change called flattening. Finally, we compare our results to PAClearnability results for the normal inductive logic programming setting. 1
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
Distance Measures for Point Sets and Their Computation
 Acta Informatica
, 1997
"... We consider the problem of measuring the similarity or distance between two finite sets of points in a metric space, and computing the measure. This problem has applications in, e.g., computational geometry, philosophy of science, updating or changing theories, and machine learning. We review some o ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
We consider the problem of measuring the similarity or distance between two finite sets of points in a metric space, and computing the measure. This problem has applications in, e.g., computational geometry, philosophy of science, updating or changing theories, and machine learning. We review some of the distance functions proposed in the literature, among them the minimum distance link measure, the surjection measure, and the fair surjection measure, and supply polynomial time algorithms for the computation of these measures. Furthermore, we introduce the minimum link measure, a new distance function which is more appealing than the other distance functions mentioned. We also present a polynomial time algorithm for computing this new measure. We further address the issue of defining a metric on point sets. We present the metric infimum method that constructs a metric from any distance functions on point sets. In particular, the metric infimum of the minimum link measure is a quite int...
SEARCH, polynomial complexity, and the fast messy genetic algorithm
, 1995
"... Blackbox optimizationoptimization in presence of limited knowledge about the objective functionhas recently enjoyed a large increase in interest because of the demand from the practitioners. This has triggered a race for new high performance algorithms for solving large, difficult problems. Si ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
Blackbox optimizationoptimization in presence of limited knowledge about the objective functionhas recently enjoyed a large increase in interest because of the demand from the practitioners. This has triggered a race for new high performance algorithms for solving large, difficult problems. Simulated annealing, genetic algorithms, tabu search are some examples. Unfortunately, each of these algorithms is creating a separate field in itself and their use in practice is often guided by personal discretion rather than scientific reasons. The primary reason behind this confusing situation is the lack of any comprehensive understanding about blackbox search. This dissertation takes a step toward clearing some of the confusion. The main objectives of this dissertation are: 1. present SEARCH (Search Envisioned As Relation & Class Hierarchizing)an alternate perspective of blackbox optimization and its quantitative analysis that lays the foundation essential for transcending the limits of random enumerative search; 2. design and testing of the fast messy genetic algorithm. SEARCH is a general framework for understanding blackbox optimization in terms of relations,