Results 1 - 10
of
92
Solving multiclass learning problems via error-correcting output codes
- Journal of Artificial Intelligence Research
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning ..."
Abstract
-
Cited by 448 (9 self)
- Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decision-tree pruning. Finally,we show that|like the other methods|the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems. 1.
Boosting a Weak Learning Algorithm By Majority
, 1995
"... We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas pr ..."
Abstract
-
Cited by 358 (15 self)
- Add to MetaCart
We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas presented by Schapire in his paper "The strength of weak learnability", and represents an improvement over his results. The analysis of our algorithm provides general upper bounds on the resources required for learning in Valiant's polynomial PAC learning framework, which are the best general upper bounds known today. We show that the number of hypotheses that are combined by our algorithm is the smallest number possible. Other outcomes of our analysis are results regarding the representational power of threshold circuits, the relation between learnability and compression, and a method for parallelizing PAC learning algorithms. We provide extensions of our algorithms to cases in which the conc...
The Lack of A Priori Distinctions Between Learning Algorithms
, 1996
"... This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in ..."
Abstract
-
Cited by 103 (5 self)
- Add to MetaCart
This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the Vapnik-Chervonenkis dimension of your generalizer is small; and the trainin...
Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers
- Machine Learning
, 1995
"... Abstract. The Vapnik-Chervonenkis (V-C) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the V-C dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are au ..."
Abstract
-
Cited by 89 (1 self)
- Add to MetaCart
Abstract. The Vapnik-Chervonenkis (V-C) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the V-C dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are automatic for discrete concept classes, but hitherto little has been known about what general conditions guarantee polynomial bounds on V-C dimension for classes in which concepts and examples are represented by tuples of real numbers. In this paper, we show that for two general kinds of concept class the V-C dimension is polynomially bounded in the number of real numbers used to define a problem instance. One is classes where the criterion for membership of an instance in a concept can be expressed as a formula (in the first-order theory of the reals) with fixed quantification depth and exponentially-bounded length, whose atomic predicates are polynomial inequalities of exponentially-bounded degree. The other is classes where containment of an instance in a concept is testable in polynomial time, assuming we may compute standard arithmetic operations on reals exactly in constant time. Our results show that in the continuous case, as in the discrete, the real barrier to efficient learning in the Occam sense is complexity-theoretic and not information-theoretic. We present examples to show how these results apply to concept classes defined by geometrical figures and neural nets, and derive polynomial bounds on the V-C dimension for these classes. Keywords: Concept learning, information theory, Vapnik-Chervonenkis dimension, Milnor’s theorem 1.
Special Purpose Parallel Computing
- Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract
-
Cited by 77 (5 self)
- Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Using taxonomy, discriminants, and signatures for navigating in text databases
- In Proceedings of the 23rd VLDB Conference
, 1997
"... We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora,suchas internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a at unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamically with respect to where the user is located at any time. Weshowhowto update such databases with new documents with high speed and accuracy. Weuse techniques from statistical pattern recognition to e ciently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multi-level classi er. At each node, this classi er can ignore the large number of noise words in a document. Thus the classi er has a small model size and is very fast. However, owing to the use of context-sensitive features, the classi er is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!. 1
First order jk-clausal theories are PAC-learnable
- Artificial Intelligence
, 1994
"... We present positive PAC-learning results for the nonmonotonic inductive logic programming setting. In particular, we show that first order range-restricted clausal theories that consist of clauses with up to k literals of size at most j each are polynomialsample polynomial-time PAC-learnable with on ..."
Abstract
-
Cited by 63 (27 self)
- Add to MetaCart
We present positive PAC-learning results for the nonmonotonic inductive logic programming setting. In particular, we show that first order range-restricted clausal theories that consist of clauses with up to k literals of size at most j each are polynomialsample polynomial-time PAC-learnable with one-sided error from positive examples only. In our framework, concepts are clausal theories and examples are finite interpretations. We discuss the problems encountered when learning theories which only have infinite non-trivial models and propose a way to avoid these problems using a representation change called flattening. Finally, we compare our results to PAC-learnability results for the normal inductive logic programming setting. 1
SEARCH, polynomial complexity, and the fast messy genetic algorithm
, 1995
"... Blackbox optimization---optimization in presence of limited knowledge about the objective function---has recently enjoyed a large increase in interest because of the demand from the practitioners. This has triggered a race for new high performance algorithms for solving large, difficult problems. Si ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
Blackbox optimization---optimization in presence of limited knowledge about the objective function---has recently enjoyed a large increase in interest because of the demand from the practitioners. This has triggered a race for new high performance algorithms for solving large, difficult problems. Simulated annealing, genetic algorithms, tabu search are some examples. Unfortunately, each of these algorithms is creating a separate field in itself and their use in practice is often guided by personal discretion rather than scientific reasons. The primary reason behind this confusing situation is the lack of any comprehensive understanding about blackbox search. This dissertation takes a step toward clearing some of the confusion. The main objectives of this dissertation are: 1. present SEARCH (Search Envisioned As Relation & Class Hierarchizing)---an alternate perspective of blackbox optimization and its quantitative analysis that lays the foundation essential for transcending the limits of random enumerative search; 2. design and testing of the fast messy genetic algorithm. SEARCH is a general framework for understanding blackbox optimization in terms of relations,
Neural Networks with Quadratic VC Dimension
, 1996
"... This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a long-standing open question, namely whether the well-known O(w log w) bound, known for hard-threshold nets, also held fo ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a long-standing open question, namely whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.
Distance Measures for Point Sets and Their Computation
- Acta Informatica
, 1997
"... We consider the problem of measuring the similarity or distance between two finite sets of points in a metric space, and computing the measure. This problem has applications in, e.g., computational geometry, philosophy of science, updating or changing theories, and machine learning. We review some o ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
We consider the problem of measuring the similarity or distance between two finite sets of points in a metric space, and computing the measure. This problem has applications in, e.g., computational geometry, philosophy of science, updating or changing theories, and machine learning. We review some of the distance functions proposed in the literature, among them the minimum distance link measure, the surjection measure, and the fair surjection measure, and supply polynomial time algorithms for the computation of these measures. Furthermore, we introduce the minimum link measure, a new distance function which is more appealing than the other distance functions mentioned. We also present a polynomial time algorithm for computing this new measure. We further address the issue of defining a metric on point sets. We present the metric infimum method that constructs a metric from any distance functions on point sets. In particular, the metric infimum of the minimum link measure is a quite int...

