Results 1 - 10
of
180
Solving multiclass learning problems via error-correcting output codes
- Journal of Artificial Intelligence Research
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning ..."
Abstract
-
Cited by 448 (9 self)
- Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decision-tree pruning. Finally,we show that|like the other methods|the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems. 1.
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
PROBEN1 - a set of neural network benchmark problems and benchmarking rules
, 1994
"... Proben1 is a collection of problems for neural network learning in the realm of pattern classification and function approximation plus a set of rules and conventions for carrying out benchmark tests with these or similar problems. Proben1 contains 15 data sets from 12 different domains. All datasets ..."
Abstract
-
Cited by 156 (0 self)
- Add to MetaCart
Proben1 is a collection of problems for neural network learning in the realm of pattern classification and function approximation plus a set of rules and conventions for carrying out benchmark tests with these or similar problems. Proben1 contains 15 data sets from 12 different domains. All datasets represent realistic problems which could be called diagnosis tasks and all but one consist of real world data. The datasets are all presented in the same simple format, using an attribute representation that can directly be used for neural network training. Along with the datasets, Proben1 defines a set of rules for how to conduct and how to document neural network benchmarking. The purpose of the problem and rule collection is to give researchers easy access to data for the evaluation of their algorithms and networks and to make direct comparison of the published results feasible. This report describes the datasets and the benchmarking rules. It also gives some basic performance measures indicating the difficulty of the various problems. These measures can be used as baselines for comparison.
Error Correlation And Error Reduction In Ensemble Classifiers
, 1996
"... Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus ..."
Abstract
-
Cited by 139 (21 self)
- Add to MetaCart
Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus on data selection and classifier training methods, in order to "prepare" classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data is in limited supply. 2 1 Introduction A classifier's ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; Wolpert, 1990). In...
A Neural Network Approach to Topic Spotting
, 1995
"... This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict multiple topics using shared hidden features. In the context of this model, we compare two approaches to d ..."
Abstract
-
Cited by 134 (1 self)
- Add to MetaCart
This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict multiple topics using shared hidden features. In the context of this model, we compare two approaches to dimensionality reduction in representation: one based on term selection and another based on Latent Semantic Indexing (LSI). Two different methods are proposed for improving LSI representations for the topic spotting task. We find that term selection and our modified LSI representations lead to similar topic spotting performance, and that this performance is equal to or better than other published results on the same corpus. 1 Introduction Topic spotting is the problem of identifying which of a set of predefined topics are present in a natural language document. More formally, given a set of n topics and a document, the task is to output for each topic the probability that the topic is prese...
Robust Speech Recognition Using Articulatory Information
, 1998
"... Whereas most state-of-the-art speech recognition systems use spectral or cepstral representations of the speech signal, there have also been some promising attempts at using articulatory information. These attempts have been motivated by two major assumptions: first, coarticulation can be modeled mo ..."
Abstract
-
Cited by 67 (1 self)
- Add to MetaCart
Whereas most state-of-the-art speech recognition systems use spectral or cepstral representations of the speech signal, there have also been some promising attempts at using articulatory information. These attempts have been motivated by two major assumptions: first, coarticulation can be modeled more naturally due to the inherently asynchronous nature of articulatory information. Second, it is assumed that the overall patterns in the speech signal caused by articulatory gestures are more robust to noise and speaker-dependent acoustic variation than spectral parameters. A third assumption can be made, viz. that acoustic and articulatory representations of speech can supply mutually complementary information to a speech recognizer, in which case the combination of these representations might be beneficial. Previously, articulatory-based speech recognizers have exclusively been developed for clean speech; the potential of an articulatory representation of the speech signal for noisy test...
Face Detection With Information-Based Maximum Discrimination
- In Computer Vision and Pattern Recognition
, 1997
"... In this paper we present a visual learning technique that maximizes the discrimination between positive and negative examples in a training set. We demonstrate our technique in the context of face detection with complex background without color or motion information, which has proven to be a challen ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
In this paper we present a visual learning technique that maximizes the discrimination between positive and negative examples in a training set. We demonstrate our technique in the context of face detection with complex background without color or motion information, which has proven to be a challenging problem. We use a family of discrete Markov processes to model the face and background patterns and estimate the probability models using the data statistics. Then, we convert the learning process into an optimization, selecting the Markov process that optimizes the information-based discrimination between the two classes. The detection process is carried out by computing the likelihood ratio using the probability model obtained from the learning procedure. We show that because of the discrete nature of these models, the detection process is, by almost two orders of magnitude, less computationally expensive than neural network approaches. However, no improvement in terms of correct-answ...
Novel Cluster-Based Probability Model for Texture Synthesis, Classification, and Compression
- In Visual Communications and Image Processing
, 1993
"... We present a new probabilistic modeling technique for high-dimensional vector sources, and consider its application to the problems of texture synthesis, classification, and compression. Our model combines kernel estimation with clustering, to obtain a semiparametric probability mass function estima ..."
Abstract
-
Cited by 62 (6 self)
- Add to MetaCart
We present a new probabilistic modeling technique for high-dimensional vector sources, and consider its application to the problems of texture synthesis, classification, and compression. Our model combines kernel estimation with clustering, to obtain a semiparametric probability mass function estimate which summarizes --- rather than contains --- the training data. Because the model is cluster based, it is inferable from a limited set of training data, despite the model's high dimensionality. Moreover, its functional form allows recursive implementation that avoids exponential growth in required memory as the number of dimensions increases. Experimental results are presented for each of the three applications considered. 1. INTRODUCTION In many information processing tasks individual data samples exhibit a great deal of statistical interdependence, and should be treated jointly (e.g., in vectors) rather than separately. For some tasks this requires modeling vectors probabilistically....
Linear and Order Statistics Combiners for Pattern Classification
- Combining Artificial Neural Nets
, 1999
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification resul ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

