Results 1 - 10
of
17
Shared-Distribution Hidden Markov Models for Speech Recognition
, 1991
"... Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generaliz ..."
Abstract
-
Cited by 227 (5 self)
- Add to MetaCart
Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shared-distribution model is proposed to replace generalized triphone models for speaker-independent continuous speech recognition. Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will inc...
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition
"... In this paper a new, tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition is proposed. The search consists of two parts: a forward, time-synchronous, trellis search and a backward, time asynchronous, tree search. In the first module the well know ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
In this paper a new, tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition is proposed. The search consists of two parts: a forward, time-synchronous, trellis search and a backward, time asynchronous, tree search. In the first module the well known Viterbi algorithm is used for finding the best hypothesis and for preparing a map of all partial paths scores time synchronously. In the second module a tree search is used to grow partial paths backward and time asynchronously. Each partial path in the backward tree search is rank ordered in a stack by the corresponding full path score, which is computed by adding the partial path score with the best possible score of the remaining path obtained from the trellis path map. In each path growing cycle, the current best partial path, which is at the top of the stack, is extended by one arc (word). The new tree-trellis search is different from the traditional time synchronous Viterbi search in its ability for finding not just the best but the N-best paths of different word content. The new search is also different from the A * algorithm, or the stack algorithm, in its capability for providing an exact, full path score estimate of any given partial (i.e., incomplete) path before its completion. When compared with the best candidate Viterbi search, the search complexities for finding the N-best strings are rather low, i.e., only a fraction more computation is needed.
Heterogeneous Acoustic Measurement And Multiple Classifiers For Speech Recognition
, 1998
"... The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a contrasting approach using more detailed and more diverse acoustic measurements, which we refer to as heterogeneous measurements.
Speech Recognition System Design Based on Automatically Derived Units
, 1999
"... In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function. In particular,
The Developmental Approach to Intelligent Robots
- IN AAAI SPRING SYMPOSIUM SERIES, INTEGRATING ROBOTIC RESEARCH: TAKING THE NEXT LEAP
, 1998
"... Integration of major cognitive capabilities and behavioral capabilities is crucial for robots to perform harder, general tasks. Several requirements are raised in this paper: (1) a successful integration of such capabilities need to start with an architecture that is suited for integration, (2) the ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Integration of major cognitive capabilities and behavioral capabilities is crucial for robots to perform harder, general tasks. Several requirements are raised in this paper: (1) a successful integration of such capabilities need to start with an architecture that is suited for integration, (2) the development of each individual capability must be an integral process of overall integration process, and (3) the content-level integration process must be fully automated. The developmental approach to intelligent robots is presented in this paper as a way for integrating cognitive and behavioral capabilities in an automated way. This approach is motivated by human cognitive development from infant to adult. Central in the approach is the methodology for sensor-effector-rich robots to perform general-purpose, autonomous, incremental learning directly using their sensors and effectors through interactions with the realworld environment. In other words, the objective is to fully automate the learning process, so that the robots can learn in an automatic mode that is close to the way animals (and humans) learn. A comparison between the developmental approach and other existing approaches is provided.
Context-Dependent Modeling in a Segment-Based Speech Recognition System
- S.M. thesis, MIT
, 1997
"... The goal of this thesis is to explore various strategies for incorporating contextual information into a segment-based speech recognition system, while maintaining computational costs at a level acceptable for implementation in a real-time system. The latter is achieved by using context-independent ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The goal of this thesis is to explore various strategies for incorporating contextual information into a segment-based speech recognition system, while maintaining computational costs at a level acceptable for implementation in a real-time system. The latter is achieved by using context-independent models in the search, while contextdependent models are reserved for re-scoring the hypotheses proposed by the contextindependent system. Within this framework, several types of context-dependent sub-word units were evaluated, including word-dependent, biphone, and triphone units. In each case, deleted interpolation was used to compensate for the lack of training data for the models. Other types of context-dependent modeling, such as context-dependent boundary modeling and "offset" modeling, were also used successfully in the re-scoring pass. The evaluation of the system was performed using the Resource Management task. Context-dependent segment models were able to reduce the error rate of t...
Hidden Markov Models in Text Recognition
- International Journal of Pattern Recognition and Artificial Intelligence
, 1995
"... A multi-level multifont character recognition is presented. The system proceeds by first delimiting the context of the characters. As a way or enhancing system performance, typographical information is extracted and used for font identification before actual character recognition is performed. This ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A multi-level multifont character recognition is presented. The system proceeds by first delimiting the context of the characters. As a way or enhancing system performance, typographical information is extracted and used for font identification before actual character recognition is performed. This has the advantage of sure character identification as well as text reproduction in original form. The font identification is based on decision trees where the characters are automatically arranged differently in confusion classes according to the physical characteristics of fonts. The character recognizers are built around the first and second order hidden Markov models (HMM) as well as Euclidean distance measures. The HMMs use the Viterbi and the Extended Viterbi algorithms to which enhancements were made. Also present is a majority-vote system that polls the other systems for "advice" before deciding on the identity of a character. Among other things, this last system is shown to give bett...
Symbolic models and emergent models: A review
- IEEE Trans. Autonomous Mental Development
, 2012
"... Abstract—There exists a large conceptual gap between symbolic models and emergent models for the mind. Many emergent models work on low-level sensory data, while many symbolic models deal with high-level abstract (i.e., action) symbols. There has been relatively little study on intermediate represen ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—There exists a large conceptual gap between symbolic models and emergent models for the mind. Many emergent models work on low-level sensory data, while many symbolic models deal with high-level abstract (i.e., action) symbols. There has been relatively little study on intermediate representations, mainly because of a lack of knowledge about how representations fully autonomously emerge inside the closed brain skull, using information from the exposed two ends (the sensory end and the motor end). As reviewed here, this situation is changing. A fundamental challenge for emergent modelsisabstraction,which symbolic models enjoy through human handcrafting. The term abstract refers to properties disassociated with any particular form. Emergent abstraction seems possible, although the brain appears to never receive a computer symbol (e.g., ASCII code) or produce such a symbol. This paper reviews major agent models with an emphasis on representation. It suggests two different ways to relate symbolic representations with emergent representations: One is based on their categorical definitions. The other considers that a symbolic representation corresponds to a brain’s outside behaviors observed and handcrafted by other outside human observers; but an emergent representation is inside the brain. Index Terms—Agents, attention, brain architecture, complexity, computer vision, emergent representation, graphic models, mental
unknown title
"... Abstract—In this paper, a new method of image edge-detection and characterization is presented. “Parametric Filtering method ” uses a judicious defined filter, which preserves the signal correlation structure as input in the autocorrelation of the output. This leads, showing the evolution of the ima ..."
Abstract
- Add to MetaCart
Abstract—In this paper, a new method of image edge-detection and characterization is presented. “Parametric Filtering method ” uses a judicious defined filter, which preserves the signal correlation structure as input in the autocorrelation of the output. This leads, showing the evolution of the image correlation structure as well as various distortion measures which quantify the deviation between two zones of the signal (the two Hamming signals) for the protection of an image edge. Keywords—Edge detection, parametrable recursive filter, autocorrelation structure, distortion measurements. I.

