Results 1 - 10
of
11
Lexical Modeling in a Speaker Independent Speech Understanding System
, 1993
"... Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even la ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even larger systems, capable of recognizing 20,000 words are just now being developed. Speech understanding systems have recently been developed that perform fairly well within a restricted domain. While the size and performance of modern speech recognition and understanding systems are impressive, it is evident to anyone who has used these systems that the technology is primitive compared to our own human ability to understand speech. Some of the difficulties hampering progress in the fields of speech recognition and understanding stem from the many sources of variation that occur during human communication. One of the sources of variation that occurs in human communication is the different ways that words can be pronounced. There are many causes of pronunciation variation, such as: the phonetic environment in which the word occurs, the dialect of the speaker,
Probabilistic Methods in Spoken Dialogue Systems
- Philosophical Transactions of the Royal Society (Series A
, 1999
"... This paper presents a probabilistic framework for modelling spoken dialogue systems. On the assumption that the overall system behaviour can be represented as a Markov Decision Process, the optimisation of dialogue management strategy using reinforcement learning is reviewed. Examples of learning be ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
This paper presents a probabilistic framework for modelling spoken dialogue systems. On the assumption that the overall system behaviour can be represented as a Markov Decision Process, the optimisation of dialogue management strategy using reinforcement learning is reviewed. Examples of learning behaviour are presented for both dynamic programming and sampling methods, but the latter is preferred. The paper concludes by noting the importance of user simulation models for the practical application of these techniques and the need for developing methods of mapping system features in order to achieve suciently compact state spaces.
A Phonetic Model of English Intonation
, 1992
"... This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F 0 descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operat ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F 0 descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operate. It is also argued that this model should be capable of both analysis (F 0 to phonology) and synthesis (phonology to F 0 ). Existing phonetic models are reviewed and it is shown that none meet the specification for the type of formal model required. A new phonetic model is presented that has three levels of description: the F 0 level, the intermediate level and the phonological level. The intermediate level uses the three basic elements of rise, fall and connection to model F 0 contours. A mathematical equation is specified for each of these elements so that a continuous F 0 contour can be created from a sequence of elements. The phonological system uses H and L to describe high and low pi...
Automatic Acquisition of Language Models for Speech Recognition
, 1994
"... This thesis focuses on the automatic acquisition of language structure and the subsequent use of the learned language structure to improve the performance of a speech recognition system. First, we develop a grammar inference process which is able to learn a grammar describing a large set of training ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This thesis focuses on the automatic acquisition of language structure and the subsequent use of the learned language structure to improve the performance of a speech recognition system. First, we develop a grammar inference process which is able to learn a grammar describing a large set of training sentences. The process of acquiring this grammar is one of generalization so that the resulting grammar predicts likely sentences beyond those contained in the training set. From the grammar we construct a novel probabilistic language model called the phrase class n-gram model (pcng), which is a natural generalization of the word class n-gram model [11] to phrase classes. This model utilizes the grammar in such a way that it maintains full coverage of any test set while at the same time reducing the complexity, or number of parameters, of the resulting predictive model. Positive results are shown in terms of perplexity of the acquired phrase class n-gram models and in terms of reduction of ...
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
The Use Of Linguistic Hierarchies In Speech Understanding
- IN PROC. ICSLP
, 1998
"... This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottom-up from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and letter-to-sound/sound-to-letter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.
Multilingual Human-Computer Interactions: From Information Access To Language Learning
- In Proc. ICSLP
, 1996
"... This paper describes our recent work in developing multilingual conversational systems that support human-computer interactions. Our approach is based on the premise that a common semantic representation can be extracted from the input for all languages, at least within the context of restricted dom ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper describes our recent work in developing multilingual conversational systems that support human-computer interactions. Our approach is based on the premise that a common semantic representation can be extracted from the input for all languages, at least within the context of restricted domains. In our design of such systems, language dependent information is separated from the system kernel as much as possible, and encoded in external data structures. The internal system manager, discourse and dialogue component, and database are all maintained in a language transparent form. We will describe two possible application areas for such multilingual capabailities: on-line information access using multilingual spoken dialogue, and the learning and maintenance of a foreign language using a multilingual conversational system.
Integrating Experimental Models of Syntax, Phonology, and Accent/Dialect in a Speech Recognizer
- In Proceedings of the 12 th National Conference on Artificial Intelligence Workshop on the Integration of Natural Language and Speech Processing
, 1994
"... This paper describes three preliminary experiments in adding new language knowledge to the recognizer BeRP: ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper describes three preliminary experiments in adding new language knowledge to the recognizer BeRP:
Integrating A Context-Dependent Phrase Grammar In The Variable N-Gram Framework
, 2000
"... This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phras ..."
Abstract
- Add to MetaCart
This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phrase grammar and a variable n-gram without the need of explicitly handling multi-word lexical items. The combined variable n-gram phrase grammar improves recognition accuracy on the Switchboard corpus over both the baseline trigram and using a variable n-gram alone. 1. INTRODUCTION Although words in English are reasonable lexical units for language modeling, there are many cases that longer lexical units may be more appropriate. Frequently used word sequences, such as I mean or you know, are so common in conversational speech that they may be effectively used by the speaker as a single lexical item. We call these multiword units "phrases". There are several ways of treating a multi-word sequ...
Viterbi Beam Search with Layered Bigrams
, 1996
"... We outline an implementation of Viterbi beam search that incorporates layered bigrams. Layered bigrams are class bigrams in which some nodes are themselves bigrams, resulting in a recursive structure. The implementation is in C ++ and involves a hierarchy of classes. The paper outlines the main con ..."
Abstract
- Add to MetaCart
We outline an implementation of Viterbi beam search that incorporates layered bigrams. Layered bigrams are class bigrams in which some nodes are themselves bigrams, resulting in a recursive structure. The implementation is in C ++ and involves a hierarchy of classes. The paper outlines the main concepts and the corresponding C ++ classes.

