Results 1 - 10
of
10
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
An Investigation of Tightly Coupled Time Synchronous Speech Language Interfaces Using a Unification Grammar
, 1994
"... This paper reports on some experiments on time synchronous interfaces between word recognition and parsing, performed with a beam decoder and a chart parser. Using the same acoustic models, language model, and unification grammar, bottom-up and two interactive protocols were implemented and examined ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
This paper reports on some experiments on time synchronous interfaces between word recognition and parsing, performed with a beam decoder and a chart parser. Using the same acoustic models, language model, and unification grammar, bottom-up and two interactive protocols were implemented and examined. Results show that close integration is possible without unbearable time penalties, if restrictions from both modules are applied to focus the search process.
Integrating Language Models with Speech Recognition
- In Proceedings of the AAAI94 Workshop on the Integration of Natural Language and Speech Processing
, 1994
"... The question of how to integrate language models with speech recognition systems is becoming more important as speech recognition technology matures. For the purposes of this paper, we have classified the level of integration of current and past approaches into three categories: tightly-coupled, loo ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
The question of how to integrate language models with speech recognition systems is becoming more important as speech recognition technology matures. For the purposes of this paper, we have classified the level of integration of current and past approaches into three categories: tightly-coupled, loosely-coupled, or semicoupled systems. We then argue that loose coupling is more appropriate given the current state of the art and given that it allows one to measure more precisely which components of the language model are most important. We will detail how the speech component in our approach interacts with the language model and discuss why we chose our language model. 1 Introduction State of the art speech recognition systems achieve high recognition accuracies only on tasks that have low perplexities. The perplexity of a task is, roughly speaking, the average number of choices at any decision point. The perplexity of a task is at a minimum when the true language model is known and co...
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...
An Investigation of Tightly Coupled Time Synchronous Speech Language Interfaces Using a Unification Grammar
- In Proceedings of AAAI-94 Workshop on Integration of Natural Languageand Speech Processing
, 1994
"... This paper reports on some experiments on time synchronous interfaces between word recognition and parsing, performed with a beam decoder and a chart parser. Using the same acoustic models, language model, and unification grammar, bottom-up and two interactive protocols were implemented and examined ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper reports on some experiments on time synchronous interfaces between word recognition and parsing, performed with a beam decoder and a chart parser. Using the same acoustic models, language model, and unification grammar, bottom-up and two interactive protocols were implemented and examined. Results show that close integration is possible without unbearable time penalties, if restrictions from both modules are applied to focus the search process. 1 Introduction Integration of speech and language technology has been of growing interest for a couple of years. A variety of interfaces has been introduced between acoustic and linguistic processing. In this article, we concentrate on some as we think prototypical variations of the time synchronous strategies. There are several reasons why these strategies are of special interest: ffl Humans seem to do acoustic--linguistic processing in a tightly coupled manner. ffl A partial syntactic analysis of the word sequence exists while it...
GLR-Parsing of Word Lattices
, 1994
"... The goal of this thesis is to present an approach that allows the efficient integration of speech recognition and language understanding using Tomita's generalized LR-parsing algorithm. For this purpose the GLRP-algorithm is revised so that an agenda mechanism can be used to control the flow of com ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The goal of this thesis is to present an approach that allows the efficient integration of speech recognition and language understanding using Tomita's generalized LR-parsing algorithm. For this purpose the GLRP-algorithm is revised so that an agenda mechanism can be used to control the flow of computation of the parsing process. Subsequently two agenda strategies are presented that describe how this mechanism could be used to form a state-of-the-art coupling between the recognition and the constraint processing system. The first strategy uses an A*-search with exactly computed rest costs at each point of the search. It is reformulated so that the number of redundant parsing operations can be reduced. The second strategy uses an incremental integration of the two systems and finds the analysis which is evaluated as the best one and which is correct with a beam search method. This approach has been implemented and its performance has been tested on 10 word lattices. Keywords: Integra...
A Viterbi-based morphological analysis for speech and natural language integration
- In Proceedings of the 17th international conference on computer processing of oriental languages (ICCPOL), Hong-Kong
, 1997
"... This paper presents a statistical/symbolic hybrid morphological analysis, called Vmorph, for large scale speech and natural language integration for Korean. In the V-morph approach, statistical Viterbi-based lexical decoding and symbolic morphological modeling are integrated together on top of conne ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents a statistical/symbolic hybrid morphological analysis, called Vmorph, for large scale speech and natural language integration for Korean. In the V-morph approach, statistical Viterbi-based lexical decoding and symbolic morphological modeling are integrated together on top of connectionist phoneme recognition engine. Linguistic characteristics of Korean are appropriately considered in this speech and language modeling. Unlike word-based speech and language integration for most of the Indo-European language researches, we developed a morpheme-graph as a speech and language integration model for agglutinative languages. Preliminary experiments on morpheme spotting and sentence recognition based on connectionist phoneme recognition results verify that the V-morph and morpheme-graph are viable for complex integrated speech and natural language processing, and the approaches can be extended to other morphologically-complex agglutinative languages such as Japanese. 1 Introd...
Efficient Generalized LR Parsing of Word Lattices
- Presented at Bar-Ilan Symposium on Foundations of Artificial Intelligence (BISFAI'93
, 1993
"... A word lattice is an efficient representation of a large set of possible sentence candidates, of which only a few are grammaticaland thus parsable. Word lattices are a common output of some speech recognizers, and may also arise as a result of multiple part-of-speech tags of sentence words. Parsing ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A word lattice is an efficient representation of a large set of possible sentence candidates, of which only a few are grammaticaland thus parsable. Word lattices are a common output of some speech recognizers, and may also arise as a result of multiple part-of-speech tags of sentence words. Parsing a word lattice involves finding a path of connecting words within the lattice that is grammatical. In typical cases, where the words of the lattice are assigned probabilities or confidence scores, the goal of the parser is to find the grammatical path of highest overall score within the lattice. We describe an efficient algorithm for parsing such word lattices. Our algorithm is based on a Generalized LR style substring parser, that can parse an input string in arbitrary order. An efficient computation strategy is achieved by using an A heuristic to determine the order in which words of the lattice are processed. 1 Introduction This paper is concerned with the problem of parsing word latt...
Natural Language Analysis and Generation Technologies
, 1993
"... Analysis and generation are the two main aspects in the natural language processing. In this paper we survey some of the progress made towards natural language analysis (parsing) and generation. Particularly, we consider syntactic analysis and present two frequently used parsing algorithms, namely C ..."
Abstract
- Add to MetaCart
Analysis and generation are the two main aspects in the natural language processing. In this paper we survey some of the progress made towards natural language analysis (parsing) and generation. Particularly, we consider syntactic analysis and present two frequently used parsing algorithms, namely Chart and GLR parsing algorithms. Then we go on to tell the importance of context sensitiveness in syntactic analysis by surveying probabilistic parsing methods, which are some of the recent developments made in this direction. In the second part of this paper we first give a brief survey on natural generation researches and discuss the future research direction. Part I: Natural Language Analysis 1 Introduction The idea of natural language processing emerged with the advent of the electronic computer. The parsing (syntactic analysis) and generating natural languages began with the formal linguistic theory developed by N. Chomsky who classified languages into four classes: unrestricted langua...
continuous speech understanding for
, 1996
"... Integrated speech and morphological processing in a connectionist ..."

