Results 1 - 10
of
12
Precise N-Gram Probabilities from Stochastic Context-Free Grammars
, 1994
"... We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation o ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations derived from the grammar. The procedure is fully implemented and has proved viable and useful in practice.
The Berkeley Restaurant Project
, 1994
"... This paper describes the architecture and performance of the Berkeley Restaurant Project (BeRP), a medium-vocabulary, speaker-independent, spontaneous continuous speech understanding system currently under development at ICSI. BeRP serves as a testbed for a number of our speech-related research proj ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
This paper describes the architecture and performance of the Berkeley Restaurant Project (BeRP), a medium-vocabulary, speaker-independent, spontaneous continuous speech understanding system currently under development at ICSI. BeRP serves as a testbed for a number of our speech-related research projects, including robust feature extraction, connectionist phonetic likelihood estimation, automatic induction of multiplepronunciation lexicons, foreign accent detection and modeling, advanced language models, and lip-reading. In addition, it has proved quite usable in its function as a database frontend, even though many of our subjects are non-native speakers of English. 1 OVERVIEW The BeRP system functions as a knowledge consultant whose domain is restaurants in the city of Berkeley, California. As a knowledge consultant, it draws inspiration from earlier consultants like VOYAGER [15]. Users ask spoken language questions of BeRP, which directs questions to the user and then queries a dat...
Extensions to Constraint Dependency Parsing for Spoken Language Processing
- COMPUTER SPEECH AND LANGUAGE
, 1995
"... A text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [24, 25] is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
A text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [24, 25] is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and to efficiently process multiple sentence candidates that are likely to arise in spoken language processing. The benefits of the CDG parsing approach are summarized. Additionally, the development of CDG grammars using our grammar tools and parser is discussed.
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
Integrating Language Models with Speech Recognition
- In Proceedings of the AAAI94 Workshop on the Integration of Natural Language and Speech Processing
, 1994
"... The question of how to integrate language models with speech recognition systems is becoming more important as speech recognition technology matures. For the purposes of this paper, we have classified the level of integration of current and past approaches into three categories: tightly-coupled, loo ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
The question of how to integrate language models with speech recognition systems is becoming more important as speech recognition technology matures. For the purposes of this paper, we have classified the level of integration of current and past approaches into three categories: tightly-coupled, loosely-coupled, or semicoupled systems. We then argue that loose coupling is more appropriate given the current state of the art and given that it allows one to measure more precisely which components of the language model are most important. We will detail how the speech component in our approach interacts with the language model and discuss why we chose our language model. 1 Introduction State of the art speech recognition systems achieve high recognition accuracies only on tasks that have low perplexities. The perplexity of a task is, roughly speaking, the average number of choices at any decision point. The perplexity of a task is at a minimum when the true language model is known and co...
Continuous Speech Recognition in the WAXHOLM Dialogue System
, 1996
"... This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for research purposes, a general mode where the input consists of a phone-graph also exists. A lexicon with multiple pronunciations for many words and a class bigram-grammar is used. The lexicon and grammar constraints are represented by a lexical graph, optimised for efficient lexical decoding. The decoding is performed in a two-pass search. The first pass is a Viterbi beam-search and the second is an A* stackdecoding search. Pruning-strategies and memory management in the two passes are discussed in the report. Several different output formats are available. Results can be reported either on the word or phoneme level with or without the time alignment information. Multiple hypotheses can be output either as standard Nbest lists or in a more compact word-graph format. Continuous speech recognition can be performed on a standard UNIX workstation in real-time with a lexicon of about 1000 words.
Parallel Viterbi Search Algorithm for Speech Recognition
- E-mail address: mertins@math.tu-clausthal.de MATHEMATISCHES INSTITUT I, UNIVERSIT AT KARLSRUHE, KAISERSTR. 12, 76128
, 1992
"... The Viterbi search is an important, but computationally expensive, algorithm for speech recognition. Even with the substantial advances expected in processor technology, the massive computational resources required will remain prohibitive for operation of a speech recognition system in real time. Th ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The Viterbi search is an important, but computationally expensive, algorithm for speech recognition. Even with the substantial advances expected in processor technology, the massive computational resources required will remain prohibitive for operation of a speech recognition system in real time. This problem motivates the development of a parallel Viterbi search algorithm. A software implementation of a Viterbi search algorithm was written for NuMesh, a network of programmable communications routers supporting a set of digital signal processors with local memory. Communication between the processors occurs in the logical pattern of a binary tree, embedded in the physical topology of a twodimensional Cartesian mesh. Despite the limited architecture of the routers, efficient merging and broadcasting of data were achieved by simple protocols for pipelined communication. Experimental results were collected in evaluation of an analytical model, which projects excellent scaling of performan...
Providing Computer Game Characters with Conversational Abilities
- IN PROC.OF INTELLIGENT VIRTUAL AGENT (IVA05
, 2005
"... This paper pres ents the NICE fairy-tale game s ys tem, whi ch enables ad lts and children to engage in convers ation with an imated characters in a 3D world. In this paper we arg e that s poken dialog e tech nology have the potential to greatly enrichen the s er's experience in f t re comp te ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper pres ents the NICE fairy-tale game s ys tem, whi ch enables ad lts and children to engage in convers ation with an imated characters in a 3D world. In this paper we arg e that s poken dialog e tech nology have the potential to greatly enrichen the s er's experience in f t re comp ter games . We als o pres ent s ome req irements that have to be f lfilled to s cces s f lly integrate s poken dialog e technology with a comp ter game appl ication. Finally, we briefly des cribe an implemented s ys tem that has provide d comp ter game characters with s ome conversational abilities that kids have interacted with in studies.
The Berkeley Restaurant Project
"... This paper describes the architecture and performance of the Berkeley Restaurant Project (BeRP), a medium-vocabulary, speaker-independent, spontaneous continuous speech understanding system currently under development at ICSI. BeRP serves as a testbed for a number of our speech-related research proj ..."
Abstract
- Add to MetaCart
This paper describes the architecture and performance of the Berkeley Restaurant Project (BeRP), a medium-vocabulary, speaker-independent, spontaneous continuous speech understanding system currently under development at ICSI. BeRP serves as a testbed for a number of our speech-related research projects, including robust feature extraction, connectionist phonetic likelihood estimation, automatic induction of multiplepronunciation lexicons, foreign accent detection and modeling, advanced language models, and lip-reading. In addition, it has proved quite usable in its function as a database frontend, even though many of our subjects are non-native speakers of English. 1 OVERVIEW The BeRP system functions as a knowledge consultant whose domain is restaurants in the city of Berkeley, California. As a knowledge consultant, it draws inspiration from earlier consultants like VOYAGER [15]. Users ask spoken language questions of BeRP, which directs questions to the user and then queries a dat...
Towards a Unified Framework for Sub-lexical and Supra-lexical Linguistic Modeling
, 2002
"... Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational inter ..."
Abstract
- Add to MetaCart
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundamental challenges for establishing robust and effective human/computer communications. On the one hand, the speech recognition component in a conversational interface lives in a rich system environment. Diverse sources of knowledge are available and can potentially be beneficial to its robustness and accuracy. For example, the natural language understanding component can provide linguistic knowledge in syntax and semantics that helps constrain the recognition search space. On the other hand, the speech recognition component also faces the challenge of spontaneous speech, and it is important to address the casualness of speech using the knowledge sources available. For example, sub-lexical linguistic information would be very useful in providing linguistic support for previously unseen words, and dynamic reliability modeling may help improve recognition robustness for poorly articulated speech.

