Results 1 - 10
of
20
Open-Vocabulary Speech Indexing for Voice and Video Mail Retrieval
, 1996
"... This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow ex- tremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow ex- tremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not depend on a fixed vocabulary recognition system or on keywords that must be known well in advance. Using statistical methods developed for text, these indexing techniques allow rapid and efficient retrieval and browsing of audio and video documents. This paper presents the project background, the indexing and retrieval techniques, and a video mail retrieval application incorporating content-based audio indexing, retrieval, and browsing.
Automatic Generation Of Detailed Pronunciation Lexicons
, 1995
"... We explore different ways of "spelling" a word in a speech recognizer's lexicon and how to obtain those spellings. In particular, we compare using as the source of sub-words units for which we build acoustic models (1) a coarse phonemic representation, (2) a single, fine phonetic realization, and (3 ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
We explore different ways of "spelling" a word in a speech recognizer's lexicon and how to obtain those spellings. In particular, we compare using as the source of sub-words units for which we build acoustic models (1) a coarse phonemic representation, (2) a single, fine phonetic realization, and (3) multiple phonetic realizations with associated likelihoods. We describe how we obtain these different pronunciations from text-to-speech systems and from procedures that build decision trees trained on phonetically-labeled corpora. We evaluate these methods applied to speech recognition with the DARPA Resource Management (RM) and the North American Business News (NAB) tasks. For the RM task (with perplexity 60 grammar), we obtain 93.4% word accuracy using phonemic pronunciations, 94.1% using a single phonetic pronunciation per word, and 96.3% using multiple phonetic pronunciations per word with associated likelihoods. For the NAB task (with 60K vocabulary and 34M 1-5 grams), we obtain 87.3% word accuracy with phonemic pronunciations and 90.0% using multiple phonetic pronunciations
Pronunciation by Analogy: Impact of Implementational Choices on Performance
, 1997
"... Pronunciation by analogy (PbA) is an emerging, data-driven technique with potential application in text-to-speech (TTS) systems, as well as being an influential psychological model of reading aloud. The underlying idea is that a pronunciation for an unknown word (i.e. one not in the dictionary, or l ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Pronunciation by analogy (PbA) is an emerging, data-driven technique with potential application in text-to-speech (TTS) systems, as well as being an influential psychological model of reading aloud. The underlying idea is that a pronunciation for an unknown word (i.e. one not in the dictionary, or lexicon, of the human or machine `reader') is assembled by matching substrings of the input to substrings of known, lexical words, hypothesising a partial pronunciation for each matched substring from the lexical knowledge of the `reader', and concatenating the partial pronunciations. This paper assesses the capability of PbA to derive pronunciations for unknown words of English. As a psychological model, PbA is `underspecified', i.e. the implementor of a simulation of the process faces detailed choices which can only be resolved by trial and error. One goal for this paper is to explore the impact of certain basic implementational choices on the performance of PbA systems. The variables stud...
Grapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping Chunks
, 1996
"... We present in this paper an original extension of two data-driven algorithms for the transcription of a sequence of graphemes into the corresponding sequence of phonemes. In particular, our approach, a formal extension of the algorithm reported in [20], generalizes the algorithm originally proposed ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
We present in this paper an original extension of two data-driven algorithms for the transcription of a sequence of graphemes into the corresponding sequence of phonemes. In particular, our approach, a formal extension of the algorithm reported in [20], generalizes the algorithm originally proposed by Dedina and Nusbaum (D&N) [7], which had originally been promoted as a model of the human ability to pronounce unknown words by analogy to familiar lexical items. We will show that D&N's algorithm performs comparatively poorly when evaluated on a realistic test set, and that our extension allows us to improve substantially the performance of the analogy-based model. We will also suggest that both algorithms can be reformulated in a much more general framework, which allows us to anticipate other useful extensions. However, considering the inability to define in these models important notions like lexical neighborhood, we conclude that both approaches fail to offer a proper model of the ana...
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
A comparison of Anapron with seven other name-pronunciation systems
- JOURNAL OF THE AMERICAN VOICE INPUT/OUTPUT SOCIETY
, 1993
"... This paper presents an experiment comparing a new name-pronunciation system, Anapron, with seven existing systems: three state-of-the-art commercial systems (from Bellcore, Bell Labs, and DEC), two variants of a machinelearning system (NETtalk), and two humans. Anapron works by combining rule-based ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper presents an experiment comparing a new name-pronunciation system, Anapron, with seven existing systems: three state-of-the-art commercial systems (from Bellcore, Bell Labs, and DEC), two variants of a machinelearning system (NETtalk), and two humans. Anapron works by combining rule-based and case-based reasoning. It is based on the idea that it is much easier to improve a rule-based system by adding case-based reasoning to it than by tuning the rules to deal with every exception. In the experiment described here, Anapron used a set of rules adapted from MITalk and elementary foreignlanguage textbooks, and a case library of 5000 names. With these components --- which required relatively little knowledge engineering --- Anapron was found to perform almost at the level of the commercial systems, and significantly better than the two versions of NETtalk.
Self-Learning Techniques for Grapheme-to-Phoneme Conversion
, 1994
"... In this article, we present a comprehensive review of various experiences with different self-learning techniques applied to the task of converting a graphemic string into the corresponding phonemic sequence. We also report some experiments carried out both with English words and French proper names ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this article, we present a comprehensive review of various experiences with different self-learning techniques applied to the task of converting a graphemic string into the corresponding phonemic sequence. We also report some experiments carried out both with English words and French proper names. These experiments support the view that taking full advantage of the huge pronunciation dictionaries that we have been developing during the ONOMASTICA project is possible only if the traditional understanding of grapheme-tophoneme conversion as a classification problem is questioned.
Name pronunciation in German text-to-speech synthesis
- IN PROC. 5TH CONF. ON APPLIED NATURAL LANGUAGE PROCESSING
, 1997
"... We describe the name analysis and pronunciation component in the German version of the Bell Labs multilingual text-tospeech system. We concentrate on street names because they encompass interest- ing aspects of geographical and personal names. The system was implemented in the framework of finite-st ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We describe the name analysis and pronunciation component in the German version of the Bell Labs multilingual text-tospeech system. We concentrate on street names because they encompass interest- ing aspects of geographical and personal names. The system was implemented in the framework of finite-state transducer technology, using linguistic criteria as well as frequency distributions derived from a database. In evaluation experiments, we compared the performances of the generalpurpose text analysis and the name-specific system on training and test materials. The name-specific system significantly outperforms the generic system. The error rates compare favorably with results reported in the research literature. Finally, we discuss areas for future work.
Pronunciation Modeling in Speech Synthesis
, 1998
"... iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel
Recent Advances In Multilingual Text-To-Speech Synthesis
- IN FORTSCHRITTE DER AKUSTIK---DAGA '96
, 1996
"... ..."

