Results 1 - 10
of
15
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Algorithms for Grapheme-Phoneme Translation for English and French: Applications
- COMPUTATIONAL LINGUISTICS
, 1997
"... Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names and ad-dresses, since they can complement orthographic search algorithms that make use of permutation, deletion, and insertion by allowing for a comparison with the phonetic equivalent. In databases, phonetics can help retrieve a word or a proper name without the user needing to know the correct spelling. A phonetic index is built with the vocabulary of the application. This could be an entire dictionary, or a list of proper names. The searched word is then converted into phonetics and retrieved with its information, if the word is in the phonetic index. This phonetic lookup can be used to retrieve a misspelled word in a dictionary or a database, or in a text editor to suggest corrections. Such rules are also necessary to formalize grapheme-phoneme correspondences in speech synthesis architecture. In text-to-speech systems, these rules are typically used to create phonemes
ANGIE: A new framework for speech analysis based on morpho-phonological modelling
- in Proc. ICSLP '96
, 1996
"... This paper describes a new system for speech analysis, ANGIE, which characterizes word substructure in terms of a trainable grammar. ANGIE capture morpho-phonemic and phonological phenomena through a hierarchical framework. The terminal categories can be alternately letters or phone units, yielding ..."
Abstract
-
Cited by 25 (12 self)
- Add to MetaCart
This paper describes a new system for speech analysis, ANGIE, which characterizes word substructure in terms of a trainable grammar. ANGIE capture morpho-phonemic and phonological phenomena through a hierarchical framework. The terminal categories can be alternately letters or phone units, yielding a reversible letter-tosound/sound-to-letter system. In conjunction with a segment network and acoustic phone models, the system can produce phonemicto-phonetic alignments for speech waveforms. For speech recognition, ANGIE uses a one-pass bottom-up best-first search strategy. Evaluated in the ATIS domain, ANGIE achieveda phone error rate of 36%, as compared with 40 % achieved with a baseline phone-bigram based recognizer under similar conditions. ANGIE potentially offers many attractive features, including dynamic vocabulary adaptation, as well as a framework for handling unknown words.
A Statistical Text-To-Phone Function Using Ngrams And Rules
- in ICASSP
, 1999
"... Adopting concepts from statistical language modeling and rulebased transformations can lead to effective and efficient text-tophone (TTP) functions. We present here the methods and results of one such effort, resulting in a relatively compact and fast set of TTP rules that achieves 94.5% segmental p ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Adopting concepts from statistical language modeling and rulebased transformations can lead to effective and efficient text-tophone (TTP) functions. We present here the methods and results of one such effort, resulting in a relatively compact and fast set of TTP rules that achieves 94.5% segmental phonemic accuracy. 1.
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
The Use Of Linguistic Hierarchies In Speech Understanding
- IN PROC. ICSLP
, 1998
"... This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottom-up from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and letter-to-sound/sound-to-letter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.
Pronunciation Modeling in Speech Synthesis
, 1998
"... iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel
A Semi-Automatic System for the Syllabification and Stress Assignment of Large Lexicons
, 1997
"... This Master's Thesis concerns research in the automatic analysis of the sub-lexical structure of English words. Sub-lexical structure includes linguistic categories such as syllabification, stress, phonemic representation, phonetics, and spelling. This information could be very useful in all sorts o ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This Master's Thesis concerns research in the automatic analysis of the sub-lexical structure of English words. Sub-lexical structure includes linguistic categories such as syllabification, stress, phonemic representation, phonetics, and spelling. This information could be very useful in all sorts of speech applications, including duration modeling and speech recognition. Angie is a system that can parse words, given either their phonetic or orthographic representation, into a common hierarchical framework with the categories mentioned above. A new feature enforcing morphological constraints has recently been added to this paradigm. We define "morphs" to be somewhat like syllable units of a word, but each of them are tagged morphologically, and associated with both an orthographic sequence and a phonemic representation. Each word is represented as concatenations of these morphs, which then encode both the orthography and the phonemics of the word. This thesis defines a procedure to se...
A Diphone-Based Text-to-Speech System for Scottish Gaelic
, 1997
"... In this thesis, a diphone--based text--to--speech system for Scottish Gaelic, a language spoken by about 80.000 native speakers in Scotland and Canada, is presented. Text-- to--speech systems convert orthographic text input into speech output. The present system consists of two main parts: ffl an a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this thesis, a diphone--based text--to--speech system for Scottish Gaelic, a language spoken by about 80.000 native speakers in Scotland and Canada, is presented. Text-- to--speech systems convert orthographic text input into speech output. The present system consists of two main parts: ffl an automatic phonetic transcription module which produces an orthophonic transcription of the orthographic input text ffl a speech synthesis module which synthesizes an utterance from its transcription by concatenating and modifying previously recorded speech units. Diphones, speech units that cover two sounds and the transition between them, form the basis of the synthesis module. Duration and intonation are modelled on the basis of simple heuristics. The diphone inventory was designed for the Gaelic of Bayble, Lewis. Scottish Gaelic distinguishes four main phonetic settings: velarised, palatalised, nasalised, and neutral. As the domain of these settings is the syllable, they are difficult t...
Framework for Joint Recognition of Pronounced and Spelled Proper Names
, 2000
"... In this thesis, the framework for a proper name recognition system is studied. We attempt to combine the information in both the spelling and the pronunciation of a proper name and find ways to let both sources of information improve the recognition accuracies of that name from the accuracies obta ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this thesis, the framework for a proper name recognition system is studied. We attempt to combine the information in both the spelling and the pronunciation of a proper name and find ways to let both sources of information improve the recognition accuracies of that name from the accuracies obtained by performing the spelling recognition and pronunciation recognition tasks separately. A set of "morph" sub-units is introduced. These syllable-sized units capture both orthographic and phonemic information of the names they represent. Our morphs are arbitrarily categorized into six groups: prefix, onset, rhyme, uroot, dsuf and isuf. Certain combinations of these morph categories, defined by a specific word decomposition rule, are used to represent proper names. For each proper name, the name-spelling utterance is recognized by a letter recognizer. Then, the proposed letter sequences are parsed by the TINA parser into possible sequences of morphs. These sequences of morphs are used t...

