Results 1 - 10
of
12
Large Vocabulary Continuous Speech Recognition: a Review
- of INCIS Project, Schedule 6 in (Small
, 1996
"... This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance and it is typical of the current generation of recognition systems. 2 System Overview
Sphinx-4: A flexible open source framework for speech recognition
, 2004
"... Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on area ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “researchready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.
Efficient Search Using Posterior Phone Probability Estimates
- In Proc. ICASSP
, 1995
"... In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, t ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly well-suited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backed-off trigram languagemodel. For this task, our single-pass decodertook around 15× realtime on an HP73...
Decoder Technology For Connectionist Large Vocabulary Speech Recognition
, 1995
"... The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when long-span language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihood- and posterior-based pruning. The use of the posterior-based phone deactivation pruning techniques is well-suited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The single-pass decoder has been evaluate on the large vocabulary North American Business News task using a...
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
Optimising the lexical representation to improve A* lexical search
, 1994
"... The A* algorithm is defined in a directed graph formalism. Pruning, path merging and modification of the algorithm to output word graphs rather than N-best lists are discussed. The concept of quotient graphs is utilised to improve the lexical graph. Results from experiments with the continuous speec ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The A* algorithm is defined in a directed graph formalism. Pruning, path merging and modification of the algorithm to output word graphs rather than N-best lists are discussed. The concept of quotient graphs is utilised to improve the lexical graph. Results from experiments with the continuous speech recognition task of the WAXHOLM project are reported. The use of a particular quotient graph in the first pass of the A* algorithm is shown to speed up the search significantly without degrading the accuracy. Introduction In addition to the acoustical analysis, lexical constraints are perhaps the most obvious knowledge source available for automatic speech recognition. Lately, much effort has been directed at incorporating lexical knowledge early in the recognition process (Zue et al. 1991, Schwartz et al. 1992, Murveit et al. 1993). This reduces the search space for other, possibly more computationally intensive components of the system, such as context-dependent acoustical analysis and ...
A survey on automatic speech recognition with an illustrative example on continuous speech recognition
- of Mandarin,” Computat. Linguistics Chinese Language Processing
, 1996
"... For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems on personal computers, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. In this paper we review some of the key advances in several areas of automatic speech recognition. We also illustrate, by examples, how these key advances can be used for continuous speech recognition of Mandarin. Finally we elaborate the requirements in designing successful real-world applications and address technical challenges that need to be harnessed in order to reach the ultimate goal of providing an easy-to-use, natural, and flexible voice interface between people and machines.
Shrinking Language Models by Robust Approximation
- in Proc. IEEE Int'l. Conf. on Acoustics, Speech, and Signal Processing '98
, 1998
"... We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorith ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We study the problem of reducing the size of a language model while preserving recognition performance (accuracy and speed). A successful approach has been to represent language models by weighted finite-state automata (WFAs). Analogues of classical automata determinization and minimization algorithms then provide a general method to produce smaller but equivalent WFAs. We extend this approach by introducing the notion of approximate determinization. We provide an algorithm that, when applied to language models for the North American Business task, achieves 25--35% size reduction compared to previous techniques, with negligible effects on recognition time and accuracy. 1. INTRODUCTION An important goal of language model engineering is to produce small language models that guarantee fast and accurate automatic speech recognition (ASR). In practice we see tradeoffs: e.g., in size vs. accuracy and in accuracy vs. speed. There has been recent progress, however, on automatic methods for r...
Rigorous Approximated Determinization of Weighted Automata
"... Abstract—A nondeterministic weighted finite automaton (WFA) maps an input word to a numerical value. Applications of weighted automata include formal verification of quantitative properties, as well as text, speech, and image processing. Many of these applications require the WFAs to be deterministi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—A nondeterministic weighted finite automaton (WFA) maps an input word to a numerical value. Applications of weighted automata include formal verification of quantitative properties, as well as text, speech, and image processing. Many of these applications require the WFAs to be deterministic, or work substantially better when the WFAs are deterministic. Unlike NFAs, which can always be determinized, not all WFAs have an equivalent deterministic weighted automaton (DWFA). In [1], Mohri describes a determinization construction for a subclass of WFA. He also describes a property of WFAs (the twins property), such that all WFAs that satisfy the twins property are determinizable and the algorithm terminates on them. Unfortunately, many natural WFAs cannot be determinized. In this paper we study approximated determinization of WFAs. We describe an algorithm that, given a WFA A and an approximation factor t ≥ 1, constructs a DWFA A ′ that t-determinizes A. Formally, for all words w ∈ Σ ∗ , the value of w in A ′ is at least its value in A and at most t times its value in A. Our construction involves two new ideas: attributing states in the subset construction by both upper and lower residues, and collapsing attributed subsets whose residues can be tightened. The larger the approximation factor is, the more attributed subsets we can collapse. Thus, t-determinization is helpful not only for WFAs that cannot be determinized, but also in cases determinization is possible but results in automata that are too big to handle. In addition, t-determinization is useful for reasoning about the competitive ratio of online algorithms. We also describe a property (the t-twins property) and use it in order to characterize t-determinizable WFAs. Finally, we describe a polynomial algorithm for deciding whether a given WFA has the t-twins property. Index Terms—Weighted automata; Determinization; I.
A*-Admissible Key-Phrase Spotting With Sub-Syllable Level Utterance Verification
- Proc. ICSLP
"... In this paper, we propose an A*-admissible key-phrase spotting framework, which needs little domain knowledge and is capable of extracting salient key-phrase fragments from an input utterance in real-time. There are two key features in our approach. Firstly, the acoustic models and the search framew ..."
Abstract
- Add to MetaCart
In this paper, we propose an A*-admissible key-phrase spotting framework, which needs little domain knowledge and is capable of extracting salient key-phrase fragments from an input utterance in real-time. There are two key features in our approach. Firstly, the acoustic models and the search framework are specially designed such that very high degree vocabulary flexibility can be achieved for any desired application tasks. Secondly, the search framework uses an efficient two-pass A* search to generate N-best key-phrase candidates and then several sub-syllable level verification functions are properly weighted and used to further improve the recognition accuracy. Experimental results show that the A*-admissible key-phrase spotting with sub-word level utterance method outperforms the baseline methods used in common approaches. 1. INTRODUCTION In recent years, various spoken dialog systems have been widely investigated for the fast growing demand for real-world applications. It is diff...

