Results 1 - 10
of
17
Realisable Classifiers: Improving Operating Performance on Variable Cost Problems.
- University of Southampton, UK
, 1998
"... A novel method is described for obtaining superior classification performance over a variable range of classification costs. By analysis of a set of existing classifiers using a receiver operating characteristic (###)curve,a set of new realisable classifiers may be obtained by a random combinatio ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
A novel method is described for obtaining superior classification performance over a variable range of classification costs. By analysis of a set of existing classifiers using a receiver operating characteristic (###)curve,a set of new realisable classifiers may be obtained by a random combination of two of the existing classifiers. These classifiers lie on the convex hull that contains the original ### points for the existing classifiers. This hull is the maximum realisable ### (#####).
Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features
, 2005
"... This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems, as well as human transcriptions of voicemail speech.
Robust Full Bayesian Learning for Neural Networks
, 1999
"... In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte Carlo (MCMC) method to perform the necessary computations. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible jump MCMC simulated annealing algorithm to optimise neural networks. This algorithm enables us to maximise the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We show that by calibrating the full hierarchical ...
The Role of Prosody in a Voicemail Summarization System
- In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding
, 2001
"... When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe t ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe the system's ability to generate summaries of two test sets, having trained and validated using 700 messages from the IBM Voicemail corpus. Results measuring the quality of summary artifacts show that combined lexical and prosodic features are at least as robust as combined lexical features alone across all operating conditions. 1.
Extracting context-sensitive models in Inductive Logic Programming
- Machine Learning
, 2001
"... Given domain-specific background knowledge and data in the form of examples, an Inductive Logic Programming (ILP) system extracts models in the data-analytic sense. We view the model-selection step facing an ILP system as a decision problem, the solution of which requires knowledge of the context in ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Given domain-specific background knowledge and data in the form of examples, an Inductive Logic Programming (ILP) system extracts models in the data-analytic sense. We view the model-selection step facing an ILP system as a decision problem, the solution of which requires knowledge of the context in which the model is to be deployed. In this paper, "context" will be defined by the current specification of the prior class distribution and the client's preferences concerning errors of classification. Within this restricted setting, we consider the use of an ILP system in situations where: (a) contexts can change regularly. This can arise for example, from changes to class distributions or misclassification costs; and (b) the data are from observational studies. That is, they may not have been collected with any particular context in mind. Some repercussions of these are: (a) any one model may not be the optimal choice for all contexts; and (b) not all the background information provided may be relevant for all contexts. Using results from the analysis of Receiver Operating Characteristic curves, we investigate a technique that can equip an ILP system to reject those models that cannot possibly be optimal in any context. We present empirical results from using the technique to analyse two datasets concerned with the toxicity of chemicals (in particular, their mutagenic and carcinogenic properties). Clients can and typically do, approach such datasets with quite different requirements. For example, a synthetic chemist would require models with a low rate of commission errors which could be used to direct efficiently the synthesis of new compounds. A toxicologist on the other hand, would prefer models with a low rate of omission errors. This would enable a more complete identificati...
Genetic programming for improved receiver operating characteristics
- Second International Conference on Multiple Classifier System, volume 2096 of LNCS
, 2001
"... Abstract. Genetic programming (GP) can automatically fuse given classifiers of diverse types to produce a combined classifier whose Receiver Operating Characteristics (ROC) are better than [Scott et al.1998b]’s “Maximum Realisable Receiver Operating Characteristics” (MRROC). I.e. better than their c ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. Genetic programming (GP) can automatically fuse given classifiers of diverse types to produce a combined classifier whose Receiver Operating Characteristics (ROC) are better than [Scott et al.1998b]’s “Maximum Realisable Receiver Operating Characteristics” (MRROC). I.e. better than their convex hull. This is demonstrated on a satellite image processing bench mark using Naive Bayes, Decision Trees (C4.5) and Clementine artificial neural networks. 1
ROC Optimisation of Safety Related Systems
, 2004
"... Many safety related and critical systems warn of potentially dangerous events; for example the Short Term Conflict Alert (STCA) system warns of airspace infractions between aircraft. Although installed with current technology such critical systems may become out of date due to changes in the circums ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Many safety related and critical systems warn of potentially dangerous events; for example the Short Term Conflict Alert (STCA) system warns of airspace infractions between aircraft. Although installed with current technology such critical systems may become out of date due to changes in the circumstances in which they function, operational procedures and the regulatory environment. Current practice is to `tune' by hand the many parameters governing the system in order to optimise the operating point in terms of the true positive and false positive rates, which are frequently associated with highly imbalanced costs. In this
Using Boundary Methods for Estimating Class Separability
, 1998
"... Designing and operating a classification system becomes drastically more difficult as the data dimensionality increases. A feature extraction (FE) step is often used to reduce the data dimensionality to mitigate this complexity. Thus FE may be viewed as a form of data compression whos objective is t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Designing and operating a classification system becomes drastically more difficult as the data dimensionality increases. A feature extraction (FE) step is often used to reduce the data dimensionality to mitigate this complexity. Thus FE may be viewed as a form of data compression whos objective is to minimize the consequences reducing the dimensionality has on class separability. This differs from the normal objective of data compression which is to minimize distortion, typically measured in the mean squared sense. It is often unclear whether the resulting features from a FE method provide an optimum set for classification. Further, extracting discrimination features from finite data sets increases in difficulty as the dimensionality of the data increases. The need for features to reduce complexity, combined with the difficulties of extracting features, justifies the need for studying ways of ranking feature sets for classification, i.e. feature set evaluation (FSE) techniques. This ...
Evaluation of Extractive Voicemail Summarization
- In Proc. ISCA Workshop on Multilingual Spoken Document Retrieval, Hong Kong
, 2003
"... This paper is about the evaluation of a system that generates short text summaries of voicemail messages, suitable for transmission as text messages. Our approach to summarization is based on a speech-recognized transcript of the voicemail message, from which a set of summary words is extracted. The ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper is about the evaluation of a system that generates short text summaries of voicemail messages, suitable for transmission as text messages. Our approach to summarization is based on a speech-recognized transcript of the voicemail message, from which a set of summary words is extracted. The system uses a classifier to identify the summary words, with each word being identified by a vector of lexical and prosodic features. The features are selected using Parcel, an ROC-based algorithm. Our evaluations of the system, using a slot error rate metric, have compared manual and automatic summarization, and manual and automatic recognition (using two different recognizers). We also report on two subjective evaluations using mean opinion score of summaries, and a set of comprehension tests. The main results from these experiments were that the perceived difference in quality of summarization was affected more by errors resulting from automatic transcription, than by the automatic summarization process.
Feature selection for the classification of crosstalk in multi-channel audio
- in Proc. EuroSpeech
, 2003
"... An extension to the conventional speech / nonspeech classification framework is presented for a scenario in which a number of microphones record the activity of speakers present at a meeting (one microphone per speaker). Since each microphone can receive speech from both the participant wearing the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
An extension to the conventional speech / nonspeech classification framework is presented for a scenario in which a number of microphones record the activity of speakers present at a meeting (one microphone per speaker). Since each microphone can receive speech from both the participant wearing the microphone (local speech) and other participants (crosstalk), the recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe a classifier in which a Gaussian mixture model (GMM) is used to model each class. A large set of potential acoustic features are considered, some of which have been employed in previous speech / nonspeech classifiers. A combination of two feature selection algorithms is used to identify the optimal feature set for each class. Results from the GMM classifier using the selected features are superior to those of a previously published approach. 1.

