Results 11 - 20
of
52
A voice-controlled automatic telephone switchboard and directory information system
- Speech Communication
, 1997
"... The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room n ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party. In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database. We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19 % (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%. The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper. The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
Is N-Best Dead
- In Proceedings of the Human Language Technology Workshop
, 1994
"... We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of s ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of still more powerful knowledge sources, and for several other purposes that are outlined in the paper. 1.
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
An efficient two-pass search algorithm using word trellis index
- in Proc. ICSLP
, 1998
"... We propose an e cient two-pass search algorithm for LVCSR. Instead of conventional word graph, the rst preliminary pass generates \word trellis index", keeping track of all survived word hypotheses within the beam every time-frame. As it represents all found word boundaries non-deterministicall ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We propose an e cient two-pass search algorithm for LVCSR. Instead of conventional word graph, the rst preliminary pass generates \word trellis index", keeping track of all survived word hypotheses within the beam every time-frame. As it represents all found word boundaries non-deterministically, we can (1) obtain accurate sentence-dependent hypotheses on the second search, and (2) avoid expensive word-pair approximation on the rst pass. The second pass performs an e cient stack decoding search, where the index is referred to as predicted word list and heuristics. Experimental results on 5,000-word Japanese dictation task show that, compared with the word-graph method, this trellis-based method runs with less than 1/10 memory cost while keeping high accuracy. Finally, by handling inter-word context dependency, we achieved the word error rate of 5.6%. 1.
Adding Linguistic Constraints to Document Image Decoding: Comparing the Iterated Complete Path and Stack Algorithms
, 2000
"... Beginning with an observed document image and a model of how the image has been degraded, Document Image Decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source. The incorporation of linguistic constraints, which are expressed by a sequential ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Beginning with an observed document image and a model of how the image has been degraded, Document Image Decoding recognizes printed text by attempting to find a most probable path through a hypothesized Markov source. The incorporation of linguistic constraints, which are expressed by a sequential predictive probabilistic language model, can improve recognition accuracy significantly in the case of moderately to severely corrupted documents. Two methods of incorporating linguistic constraints in the best-path search are described, analyzed and compared. The first, called the iterated complete path algorithm, involves iteratively rescoring complete paths using conditional language model probability distributions of increasing order, expanding state only as necessary with each iteration. A property of this approach is that it results in a solution that is exactly optimal with respect to the specified source, degradation, and language models; no approximation is necessary. The second app...
The BBN/HARC spoken language understanding system
, 1993
"... We describe the design and performance of a complete spoken language understanding system currently under development at BBN. The system, dubbed HARC (Hear And Respond to Con-tinuous speech), successfully integrates state-of-the-art speech recognition and natural language understanding subsystems. T ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We describe the design and performance of a complete spoken language understanding system currently under development at BBN. The system, dubbed HARC (Hear And Respond to Con-tinuous speech), successfully integrates state-of-the-art speech recognition and natural language understanding subsystems. The system has been tested extensively on a restricted airline travel in-formation (ATIS) domain with a vocabulary of about 2000 words. HARC is implemented in portable, high-level software that runs in real time on today's workstations to support interactive online human-machme dialogs. No special purpose hardware is required other than an A/D converter to digitize the speech. The system works well for any native speaker of American English and does not require any enrollment data from the users. We present results of formal DARPA tests in Feb. '92 and Nov. '92.
A Prototype Voice-Response Questionnaire For The U.S. Census
- Proceedings of the ICSLP ‘94
, 1994
"... This paper describes a study conducted to determine the feasibility of using a spoken questionnaire to collect information for the Year 2000 Census in the USA. To refine the dialogue and to train recognizers, we collected complete protocols from over 4000 callers. For the responses labeled (about ha ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
This paper describes a study conducted to determine the feasibility of using a spoken questionnaire to collect information for the Year 2000 Census in the USA. To refine the dialogue and to train recognizers, we collected complete protocols from over 4000 callers. For the responses labeled (about half), over 99 percent of the answers contain the desired information. The recognizers trained so far range in performance from 75 percent correct on year of birth to over 99 percent for marital status. We developed a prototype system that engages the callers in a dialogue to obtain the desired information, reviews the recognized information at the end of the call, and asks the caller to identify the response categories that are incorrect. 1. INTRODUCTION We have conducted a study to determine the feasibility of using an automated spoken questionnaire to collect information for the Year 2000 Census in the United States of America. The goal of the study was to develop and evaluate a telephone ...
Integrating Language Models with Speech Recognition
- In Proceedings of the AAAI94 Workshop on the Integration of Natural Language and Speech Processing
, 1994
"... The question of how to integrate language models with speech recognition systems is becoming more important as speech recognition technology matures. For the purposes of this paper, we have classified the level of integration of current and past approaches into three categories: tightly-coupled, loo ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
The question of how to integrate language models with speech recognition systems is becoming more important as speech recognition technology matures. For the purposes of this paper, we have classified the level of integration of current and past approaches into three categories: tightly-coupled, loosely-coupled, or semicoupled systems. We then argue that loose coupling is more appropriate given the current state of the art and given that it allows one to measure more precisely which components of the language model are most important. We will detail how the speech component in our approach interacts with the language model and discuss why we chose our language model. 1 Introduction State of the art speech recognition systems achieve high recognition accuracies only on tasks that have low perplexities. The perplexity of a task is, roughly speaking, the average number of choices at any decision point. The perplexity of a task is at a minimum when the true language model is known and co...
Progress in Dynamic Programming Search for LVCSR
- Proceedings of the IEEE
, 1997
"... This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction.

