Results 1 - 10
of
13
Graphical models and automatic speech recognition
- Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic-, pronunciation-, and language-modeling levels. A number of speech recognition techniques born directly out of the graphical-models paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov model-based speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Mining Reference Tables for Automatic Text Segmentation
- IN PROCEEDINGS OF THE TENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2004
"... Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data wareh ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data warehouses and relational databases to develop an automatic segmentation system. Thus, we overcome limitations of existing supervised text segmentation approaches, which require comprehensive manually labeled training data. Our segmentation system is robust, accurate, and efficient, and requires no additional manual effort. Thorough evaluation on real datasets demonstrates the robustness and accuracy of our system, with segmentation accuracy exceeding state of the art supervised approaches.
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
Hidden Markov models as a support for diagnosis: Formalization of the problem and synthesis of the solution
- PROCEEDINGS OF THE 25TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, OCTOBER 2006. – PRELIMINARY ARCHITECTURE SPECIFICATION PAGE 94
, 2006
"... In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heur ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heuristics. This paper proposes instead a general framework and a formalism to model such over-time diagnosis scenarios, and to find appropriate solutions. As such, it is very beneficial to system designers to support design choices. Taking advantage of the characteristics of the hidden Markov models formalism, widely used in pattern recognition, the paper proposes a formalization of the diagnosis process, addressing the complete chain constituted by monitored component, deviation detection and state diagnosis. Hidden Markov models are well suited to represent problems where the internal state of a certain entity is not known and can only be inferred from external observations of what this entity emits. Such over-time diagnosis is a first class representative of this category of problems. The accuracy of diagnosis carried out through the proposed formalization is then discussed, as well as how to concretely use it to perform state diagnosis and allow direct comparison of alternative solutions.
Recognizing activities in multiple contexts using transfer learning
- In AAAI AI in Eldercare Symposium
, 2008
"... Activities of daily living are good indicators of the health status of elderly. Therefore, automating the monitoring of these activities is a crucial step in future care giving. However, many models for activity recognition rely on labeled examples of activities for learning the model parameters. Du ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Activities of daily living are good indicators of the health status of elderly. Therefore, automating the monitoring of these activities is a crucial step in future care giving. However, many models for activity recognition rely on labeled examples of activities for learning the model parameters. Due to the high variability of different contexts, parameters learned for one context can not automatically be used in another. In this paper, we present a method that allows us to transfer knowledge of activity recognition from one context to the next, a task called transfer learning. We show the effectiveness of our method using real world datasets.
Gaze-Contingent Automatic Speech Recognition
, 2006
"... This study investigated recognition systems that combine loosely coupled modalities, integrating eye movements in an Automatic Speech Recognition (ASR) system as an exemplar. A probabilistic framework for combining modalities was formalised and applied to the specific case of integrating eye movemen ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This study investigated recognition systems that combine loosely coupled modalities, integrating eye movements in an Automatic Speech Recognition (ASR) system as an exemplar. A probabilistic framework for combining modalities was formalised and applied to the specific case of integrating eye movement and speech. A corpus of a matched eye movement and related spontaneous conversational British English speech for a visual-based, goal-driven task was collected. This corpus enabled the relationship between the modalities to be verified. Robust extraction of visual attention from eye movement data was investigated using Hidden Markov Models and Hidden Semi-Markov Models. Gaze-contingent ASR systems were developed from a research-grade baseline ASR system by redistributing language model probability mass according to the visual attention. The best performing systems maintained the Word Error Rates but showed an increase in the Figure of Merit- a measure of the keyword spotting accuracy and integration success. The core values of this work may be useful for developing robust multimodal decoding system functions.
Applications of classifying bidding strategies for the CAT Tournament
- Proceedings of the International Trading Agent Design and Analysis Workshop (TADA 2008
, 2008
"... In the CAT Tournament, specialists facilitate transactions between buyers and sellers with the intention of maximizing profit from commission and other fees. Each specialist must find a well-balanced strategy that allows it to entice buyers and sellers to trade in its market while also retaining the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In the CAT Tournament, specialists facilitate transactions between buyers and sellers with the intention of maximizing profit from commission and other fees. Each specialist must find a well-balanced strategy that allows it to entice buyers and sellers to trade in its market while also retaining the buyers and sellers that are currently subscribed to it. Classification techniques can be used to determine the distribution of bidding strategies used by all traders subscribed to a particular specialist. Our experiments showed that Hidden Markov Model classification yielded the best results. The distribution of strategies, along with other competition-related factors, can be used to determine the optimal action in any given game state. Experimental data shows that the GD and ZIP bidding strategies are more volatile than the RE and ZIC strategies, although no traders ever readily switch specialists. An MDP framework for determining optimal actions given an accurate distribution of bidding strategies is proposed as a motivator for future work. 1
Boosting Diverse Learners for Domain Agnostic Time Series Classification
"... Although most classification methods benefit from the incorporation of domain knowledge, some situations call for a single algorithm that applies to a wide range of diverse domains. In such cases, the techniques and biases that prove useful in one domain may be irrelevant or even harmful in another. ..."
Abstract
- Add to MetaCart
Although most classification methods benefit from the incorporation of domain knowledge, some situations call for a single algorithm that applies to a wide range of diverse domains. In such cases, the techniques and biases that prove useful in one domain may be irrelevant or even harmful in another. This paper addresses the problem of constructing a domain agnostic time series classification algorithm that allows safe inclusion of domain-specific methods that may be highly effective in some domains yet detrimental in others. Our approach combines MBoost, an extension to AdaBoost that allows robust boosting of multiple weak learners, with SAMME, a multiclass extension of AdaBoost which does not rely on a reduction to a set of binary problems. The resulting algorithm allows the safe and efficient combination of multiple learning algorithms for multiclass classification. 1.
Hidden Dynamic Models for Speech Processing Applications
"... c○Leo Jingyu Lee 2004I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. ..."
Abstract
- Add to MetaCart
c○Leo Jingyu Lee 2004I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research.
Temporal Patterns
, 2006
"... Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words ..."
Abstract
- Add to MetaCart
Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality. But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively, one stroke at a time and their processing should take advantage of this. This thesis explores ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching. We present

