Results 1  10
of
44
Solving multiclass learning problems via errorcorrecting output codes
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass l ..."
Abstract

Cited by 730 (8 self)
 Add to MetaCart
(Show Context)
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes&quot;). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decisiontree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decisiontree pruning. Finally,we show thatlike the other methodsthe errorcorrecting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that errorcorrecting output codes provide a generalpurpose method for improving the performance of inductive learning programs on multiclass problems.
Selection of relevant features and examples in machine learning
 ARTIFICIAL INTELLIGENCE
, 1997
"... In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been mad ..."
Abstract

Cited by 590 (2 self)
 Add to MetaCart
(Show Context)
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 236 (8 self)
 Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
(Show Context)
To Mom, Dad, and Susan, for their support and encouragement.
A Polynomialtime Algorithm for Learning Noisy Linear Threshold Functions
, 1996
"... In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and s ..."
Abstract

Cited by 75 (14 self)
 Add to MetaCart
In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in polynomial time with the Ellipsoid Algorithm or Interior Point methods. Alternatively, simple greedy algorithms such as the Perceptron Algorithm are often used in practice and have certain provable noisetolerance properties; but, their running time depends on a separation parameter, which quanties the amount of "wiggle room" available for a solution, and can be exponential in the description length of the input. In this paper, we show how simple greedy methods can be used to nd weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter. Suitably combining these hypotheses results in a polynomialtime algorithm for learning linear threshold functions in the PAC model in the presence of random classification noise. (Also, a polynomialtime algorithm for learning linear threshold functions in the Statistical Query model of Kearns.) Our algorithm is based on a new method for removing outliers in data. Specifically, for any set S of points in R n , each given to b bits of precision, we show that one can remove only a small fraction of S so that in the remaining set T , for every vector v, max x2T (v x) 2 poly(n; b)E x2T (v x) 2 ; i.e., for any hyperplane through the origin, the maximum distance (squared) from a point in T to the plane is at most polynomially larger than the average. After removing these outliers, we are able to show that a modified v...
New results for learning noisy parities and halfspaces
 In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS
, 2006
"... We address wellstudied problems concerning the learnability of parities and halfspaces in the presence of classification noise. Learning of parities under the uniform distribution with random classification noise, also called the noisy parity problem is a famous open problem in computational learni ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
(Show Context)
We address wellstudied problems concerning the learnability of parities and halfspaces in the presence of classification noise. Learning of parities under the uniform distribution with random classification noise, also called the noisy parity problem is a famous open problem in computational learning. We reduce a number of basic problems regarding learning under the uniform distribution to learning of noisy parities, thus highlighting the central role of this problem for learning under the uniform distribution. We show that under the uniform distribution, learning parities with adversarial classification noise reduces to learning parities with random classification noise. Together with the parity learning algorithm of Blum et al. [BKW03], this gives the first nontrivial algorithm for learning parities with adversarial noise. We show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables. We show that learning of juntas reduces to learning noisy parities of variables. These reductions work even in the presence of random classification noise in the original DNF or junta. We then consider the problem of learning halfspaces over with adversarial noise or finding a halfspace that maximizes the agreement rate with a given set of examples. Finding the best halfspace is known to behard [GJ79, PV88] and many inapproximability results are known for this problem [ABSS97, HSH95, AK95, BDEL00, BB02]. We show that even if there is a halfspace that correctly classifies fraction of the given examples, it is hard to find a halfspace that is correct on a fraction for any
assuming
An Extensible MetaLearning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a metalearning approach to integrating the results of mul...
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
, 1993
"... We derive general bounds on the complexity oflearning in the Statistical Query model and in the ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
We derive general bounds on the complexity oflearning in the Statistical Query model and in the
Statistical Queries and Faulty PAC Oracles
 In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory
, 1993
"... In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [1 ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
(Show Context)
In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescr...
Learning DNF Over The Uniform Distribution Using A Quantum Example Oracle
 SIAM J. COMPUT
, 1995
"... We generalize the notion of PAC learning from an example oracle to a notion of efficient learning on a quantum computer using a quantum example oracle. This quantum example oracle is a natural extension of the traditional PAC example oracle, and it immediately follows that all PAClearnable function ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
We generalize the notion of PAC learning from an example oracle to a notion of efficient learning on a quantum computer using a quantum example oracle. This quantum example oracle is a natural extension of the traditional PAC example oracle, and it immediately follows that all PAClearnable function classes are learnable in the quantum model. Furthermore, we obtain positive quantum learning results for classes that are not known to be PAC learnable. Specifically, we show that DNF is efficiently learnable with respect to the uniform distribution by a quantum algorithm using a quantum example oracle. While it was already known that DNF is uniformlearnable using a membership oracle, we prove that a quantum example oracle with respect to uniform is less powerful than a membership oracle.