Results 1  10
of
213
Solving multiclass learning problems via errorcorrecting output codes
 Journal of Artificial Intelligence Research
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning ..."
Abstract

Cited by 564 (9 self)
 Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decisiontree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decisiontree pruning. Finally,we show thatlike the other methodsthe errorcorrecting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that errorcorrecting output codes provide a generalpurpose method for improving the performance of inductive learning programs on multiclass problems. 1.
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
An Application of Recurrent Nets to Phone Probability Estimation
 IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract

Cited by 193 (8 self)
 Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
PROBEN1  a set of neural network benchmark problems and benchmarking rules
, 1994
"... Proben1 is a collection of problems for neural network learning in the realm of pattern classification and function approximation plus a set of rules and conventions for carrying out benchmark tests with these or similar problems. Proben1 contains 15 data sets from 12 different domains. All datasets ..."
Abstract

Cited by 182 (0 self)
 Add to MetaCart
Proben1 is a collection of problems for neural network learning in the realm of pattern classification and function approximation plus a set of rules and conventions for carrying out benchmark tests with these or similar problems. Proben1 contains 15 data sets from 12 different domains. All datasets represent realistic problems which could be called diagnosis tasks and all but one consist of real world data. The datasets are all presented in the same simple format, using an attribute representation that can directly be used for neural network training. Along with the datasets, Proben1 defines a set of rules for how to conduct and how to document neural network benchmarking. The purpose of the problem and rule collection is to give researchers easy access to data for the evaluation of their algorithms and networks and to make direct comparison of the published results feasible. This report describes the datasets and the benchmarking rules. It also gives some basic performance measures indicating the difficulty of the various problems. These measures can be used as baselines for comparison.
Error Correlation And Error Reduction In Ensemble Classifiers
, 1996
"... Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus ..."
Abstract

Cited by 156 (23 self)
 Add to MetaCart
Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus on data selection and classifier training methods, in order to "prepare" classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data is in limited supply. 2 1 Introduction A classifier's ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; Wolpert, 1990). In...
A Neural Network Approach to Topic Spotting
, 1995
"... This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict multiple topics using shared hidden features. In the context of this model, we compare two approaches to d ..."
Abstract

Cited by 152 (1 self)
 Add to MetaCart
This paper presents an application of nonlinear neural networks to topic spotting. Neural networks allow us to model higherorder interaction between document terms and to simultaneously predict multiple topics using shared hidden features. In the context of this model, we compare two approaches to dimensionality reduction in representation: one based on term selection and another based on Latent Semantic Indexing (LSI). Two different methods are proposed for improving LSI representations for the topic spotting task. We find that term selection and our modified LSI representations lead to similar topic spotting performance, and that this performance is equal to or better than other published results on the same corpus. 1 Introduction Topic spotting is the problem of identifying which of a set of predefined topics are present in a natural language document. More formally, given a set of n topics and a document, the task is to output for each topic the probability that the topic is prese...
Robust Speech Recognition Using Articulatory Information
, 1998
"... Whereas most stateoftheart speech recognition systems use spectral or cepstral representations of the speech signal, there have also been some promising attempts at using articulatory information. These attempts have been motivated by two major assumptions: first, coarticulation can be modeled mo ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
Whereas most stateoftheart speech recognition systems use spectral or cepstral representations of the speech signal, there have also been some promising attempts at using articulatory information. These attempts have been motivated by two major assumptions: first, coarticulation can be modeled more naturally due to the inherently asynchronous nature of articulatory information. Second, it is assumed that the overall patterns in the speech signal caused by articulatory gestures are more robust to noise and speakerdependent acoustic variation than spectral parameters. A third assumption can be made, viz. that acoustic and articulatory representations of speech can supply mutually complementary information to a speech recognizer, in which case the combination of these representations might be beneficial. Previously, articulatorybased speech recognizers have exclusively been developed for clean speech; the potential of an articulatory representation of the speech signal for noisy test...
A Tutorial on TextIndependent Speaker Verification
 EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING 2004:4, 430–451
, 2004
"... This paper presents an overview of a stateoftheart textindependent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, ..."
Abstract

Cited by 77 (9 self)
 Add to MetaCart
This paper presents an overview of a stateoftheart textindependent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with realworld data. The evaluation of a speaker verification system is then detailed, and the detection error tradeoff (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including onsite applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a
Analysis of decision boundaries in linearly combined neural classifiers
 Pattern Recognition
, 1996
"... Abstract Combining or integrating the outputs of several pattern classifiers has led to improved performance in a multitude of applications. This paper provides an analytical framework to quantify the improvements in classification results due to combining. We show that combining networks linearly i ..."
Abstract

Cited by 75 (22 self)
 Add to MetaCart
Abstract Combining or integrating the outputs of several pattern classifiers has led to improved performance in a multitude of applications. This paper provides an analytical framework to quantify the improvements in classification results due to combining. We show that combining networks linearly in output space reduces the variance of the actual decision region boundaries around the optimum boundary. This result is valid under the assumption that the a posteriori probability distributions for each class are locally monotonic around the Bayes optimum boundary. In the absence of classifier bias, the error is shown to be proportional to the boundary variance, resulting in a simple expression for error rate improvements. In the presence of bias, the error reduction, expressed in terms of a bias reduction factor, is shown to be less than or equal to the reduction obtained in the absence of bias. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions and combining in output space. Combining Decision boundary Neural networks Hybrid networks Variance reduction. Pattern classification 1.
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 74 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.