Results 1 - 10
of
26
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Multi Stream Speech Recognition
, 1996
"... . In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a diff ..."
Abstract
-
Cited by 113 (16 self)
- Add to MetaCart
. In this paper, we discuss a new automatic speech recognition (ASR) approach based on independent processing and recombination of several feature streams. In this framework, it is assumed that the speech signal is represented in terms of multiple input streams, each input stream representing a different characteristic of the signal. If the streams are entirely synchronous, they may be accommodated simply (as they usually are in state-of-the-art systems). However, as discussed in the paper, it may be required to permit some degree of asynchrony between streams. This paper introduces the basic framework of a statistical structure that can accommodate multiple (asynchronous) observation streams (possibly exhibiting different frame rates). This approach will then be applied to the particular case of multi-band speech recognition and will be shown to yield significantly better noise robustness. 2 IDIAP--RR 96-07 1 Introduction In current automatic speech recognition (ASR) systems, the a...
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Modular Neural Networks for Learning Context-Dependent Game Strategies
- Master’s thesis, Computer Speech and Language Processing
, 1992
"... The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss decision. Recent results by Tesauro in the domain of backgammon indicate that a neural network, trained by TD methods to evaluate positions generated by self-play, can reach an advanced level of backgammon skill. For my summer thesis project, I first implemented the TD/neural network learning algorithms and confirmed Tesauro's results, using the domains of tic-tac-toe and backgammon. Then, motivated by Waibel's success with modular neural networks for phoneme recognition, I experimented with using two modular architectures (DDD and Meta-Pi) in place of the monolithic networks. I found that using the modular networks significantly enhanced the ability of the backgammon evaluator to change it...
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio--temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Phoneme Probability Estimation with Dynamic Sparsely Connected Artificial Neural Networks
, 1997
"... This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully connec ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully connected recurrent network grows super-linear with the number of hidden units, schemes for sparse connection and connection pruning are explored. It is found that sparsely connected networks outperform their fully connected counterparts with an equal number of connections. The implementation of the combined architecture and training scheme is described in detail. The networks are evaluated in a hybrid HMM/ANN system for phoneme recognition on the TIMIT database, and for word recognition on the WAXHOLM database. The achieved phone error-rate, 27.8%, for the standard 39 phoneme set on the core test-set of the TIMIT database is in the range of the lowest reported. All training and simulation softwar...
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units
, 1998
"... In recent years, there has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn finite-state tasks, with interesting results regarding the induction of simple finite-state machines from input-output strings. Parallel work has studied the computational power of DT ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
In recent years, there has been a lot of interest in the use of discrete-time recurrent neural nets (DTRNN) to learn finite-state tasks, with interesting results regarding the induction of simple finite-state machines from input-output strings. Parallel work has studied the computational power of DTRNN in connection with finite-state computation. This paper describes a simple strategy to devise stable encodings of finite-state machines in computationally capable discrete-time recurrent neural architectures with sigmoid units, and gives a detailed presentation on how this strategy may be applied to encode a general class of finite-state machines in a variety of commonly-used first- and second-order recurrent neural networks. Unlike previous work that either imposed some restrictions to state values, or used a detailed analysis based on fixed-point attractors, the present approach applies to any positive, bounded, strictly growing, continuous activation function, and uses simple bounding criteri...

