Results 1 - 10
of
18
A Computational Grammar Of Discourse-Neutral Prosodic Phrasing In English
- Computational Linguistics
, 1990
"... This paper reconsiders those assumptions and describes an analysis of phrasing that we believe corrects many of the problems of the earlier version. Like the earlier version, it has been implemented in a text-to-speech system that uses a natural language parser and prosody rules to generate informat ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
This paper reconsiders those assumptions and describes an analysis of phrasing that we believe corrects many of the problems of the earlier version. Like the earlier version, it has been implemented in a text-to-speech system that uses a natural language parser and prosody rules to generate information about the location and relative strength of prosodic phrase boundaries
A Multi-Strategy Approach to Improving Pronunciation by Analogy
"... Pronunciation by analogy (PbA) is a data-driven method for relating letters to sound, with potential application to next-generation text-to-speech systems. This paper extends previous work on PbA in several directions. First, we have included `full' pattern matching between input letter string and d ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
Pronunciation by analogy (PbA) is a data-driven method for relating letters to sound, with potential application to next-generation text-to-speech systems. This paper extends previous work on PbA in several directions. First, we have included `full' pattern matching between input letter string and dictionary entries, as well as including lexical stress in letter-to-phoneme conversion. Second, we have extended the method to phonemeto -letter conversion. Third, and most important, we have experimented with multiple, different strategies for scoring the candidate pronunciations. Individual scores for each strategy are obtained on the basis of rank and either multiplied or summed to produce a final, overall score. Five strategies have been studied and results obtained from all 31 possible combinations. The two combination methods perform comparably, with the product rule only very marginally superior to the sum rule. Nonparametric statistical analysis reveals that performance improves as more strategies are included in the combination: this trend is very highly significant ( p 0 0005). Accordingly for letter-to-phoneme conversion, best results are obtained when all five strategies are combined: word accuracy is raised to 65.5% relative to 61.7% for our best previous result and 63.0% for the best-performing single strategy. These improvements are very highly significant ( p 0 and p 0 00011 respectively). Similar results were found for phoneme-to-letter and letter-to-stress conversion, although the former was an easier problem for PbA than letter-to-phoneme conversion and the latter was harder. The main sources of error for the multi-strategy approach are very similar to those for the best single strategy, and mostly involve vowel letters and phonemes. 1
Evaluating the Pronunciation Component of Text-to-Speech Systems for English: A Performance Comparison of Different Approaches
- IN SPEECH AND LANGUAGE TECHNOLOGY (SALT) CLUB WORKSHOP ON EVALUATION IN SPEECH AND LANGUAGE TECHNOLOGY
, 1997
"... The automatic derivation of word pronunciations from input text is a central task for any text-to-speech system. For general English text at least, this is often thought to be a solved problem, with manually-derived linguistic rules assumed capable of handling `novel' words missing from the system ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
The automatic derivation of word pronunciations from input text is a central task for any text-to-speech system. For general English text at least, this is often thought to be a solved problem, with manually-derived linguistic rules assumed capable of handling `novel' words missing from the system dictionary. Data-driven methods, based on machine learning of the regularities implicit in a large pronouncing dictionary, have received considerable attention recently but are generally thought to perform less well. However, these tentative beliefs are at best uncertain without powerful methods for comparing text-to-phoneme subsystems. This paper contributes to the development of such methods by comparing the performance of four representative approaches to automatic phonemisation on the same test dictionary. As well as rule-based approaches, three data-driven techniques are evaluated: pronunciation by analogy (PbA), NETspeak and IB1-IG (a modified k-nearest neighbour method). Issues involved in comparative evaluation are detailed and elucidated. The data-driven techniques outperform rules in accuracy of letter-to-phoneme translation by a very significant margin but require aligned text-phoneme training data and are slower. Best translation results are obtained with PbA at approximately 72% words correct on a reasonably large pronouncing dictionary, compared to something like 26% words correct for the rules, indicating that automatic pronunciation of text is not a solved problem.
Pronunciation by Analogy: Impact of Implementational Choices on Performance
, 1997
"... Pronunciation by analogy (PbA) is an emerging, data-driven technique with potential application in text-to-speech (TTS) systems, as well as being an influential psychological model of reading aloud. The underlying idea is that a pronunciation for an unknown word (i.e. one not in the dictionary, or l ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Pronunciation by analogy (PbA) is an emerging, data-driven technique with potential application in text-to-speech (TTS) systems, as well as being an influential psychological model of reading aloud. The underlying idea is that a pronunciation for an unknown word (i.e. one not in the dictionary, or lexicon, of the human or machine `reader') is assembled by matching substrings of the input to substrings of known, lexical words, hypothesising a partial pronunciation for each matched substring from the lexical knowledge of the `reader', and concatenating the partial pronunciations. This paper assesses the capability of PbA to derive pronunciations for unknown words of English. As a psychological model, PbA is `underspecified', i.e. the implementor of a simulation of the process faces detailed choices which can only be resolved by trial and error. One goal for this paper is to explore the impact of certain basic implementational choices on the performance of PbA systems. The variables stud...
An Efficient Way To Learn English Grapheme-To-Phoneme Rules Automatically
- Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 1993
"... We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used the amount of expanding context around the symbol to be transformed, weighted towards the right. To apply this concept to English text-to-speech mapping, we have used the 20008-word corpus provided in the public domain by Sejnowski and Rosenberg, that was also used in the NETTALK-experiments. Phoneme-level mapping accuracies of 91 per cent with data not used in training demonstrate that the Dynamically Expanding Context is able to capture quite efficiently the contextdependent relationships in the corpus. 1 INTRODUCTION The problem addressed in this paper is automatic learning of grapheme-to-phoneme mapping rules. We present an efficient way to learn these for English by using Kohonen's c...
Comparative Evaluation Of Letter-To-Sound Conversion Techniques For English Text-To-Speech Synthesis
, 1998
"... Dictionary look-up is the primary strategy for deriving pronunciations for input words in a text-to-speech (TTS) system. This strategy is accurate for dictionary words, but it is not complete: it is impossible to list exhaustively all input words. The proper treatment of `unknown' words is currently ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Dictionary look-up is the primary strategy for deriving pronunciations for input words in a text-to-speech (TTS) system. This strategy is accurate for dictionary words, but it is not complete: it is impossible to list exhaustively all input words. The proper treatment of `unknown' words is currently an unsolved problem in TTS synthesis. There are many competing techniques for letter-to-sound conversion and the system developer must make a rational selection among them. However, it is unclear how different techniques should be properly compared. In this paper, we report a comparative assessment of the competitor methods of letter-to-sound rules, pronunciation by analogy, feedforward neural networks and a k-nearest neighbour method, with respect to their success at automatic phonemisation. This is achieved by using standardised scoring methods, test lexicon and phoneme inventories. The problem of standardising the phoneme set (`harmonisation') is deceptive: this is much harder than at fi...
Pronunciation Modeling in Speech Synthesis
, 1998
"... iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel
A Pronunciation-by-Analogy Module for the Festival Textto-Speech Synthesiser
- in 4th ISCA Workshop on Speech Synthesis
, 2001
"... Pronunciation by analogy (PbA) is a data-driven technique for the automatic phonemisation of text which is receiving renewed attention from workers in text-to-speech synthesis. It uses the dictionary which provides the primary source of pronunciations via direct look-up as a secondary source of info ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Pronunciation by analogy (PbA) is a data-driven technique for the automatic phonemisation of text which is receiving renewed attention from workers in text-to-speech synthesis. It uses the dictionary which provides the primary source of pronunciations via direct look-up as a secondary source of information about the pronunciation of unknown words. In this paper, we provide theoretical and empirical motivations for the use of PbA, review approaches to automatic pronunciation generation by analogy, and report on the implementation of a PbA module for the Festival text-to-speech synthesiser. We have used a much larger dictionary (British English Example Pronunciation or BEEP, approximately 200,000 words) than hitherto. New results of 86.7 % words correct are obtained for this dictionary on our best-performing PbA implementation. The Festival PbA module is still under development, however, and currently does less well. 1.
A Comparison Of Letter-To-Sound Conversion Techniques For English Text-To-Speech Synthesis
"... this paper are those of Elovitz et al. [2] obtained by anonymous ftp from directory comp.speech/synthesis at svr-ftp.eng.cam.ac.uk. 2.2 Pronunciation by Analogy Pronunciation by analogy (PbA) exploits the phonological knowledge implicitly contained in a dictionary of words and their corresponding pr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
this paper are those of Elovitz et al. [2] obtained by anonymous ftp from directory comp.speech/synthesis at svr-ftp.eng.cam.ac.uk. 2.2 Pronunciation by Analogy Pronunciation by analogy (PbA) exploits the phonological knowledge implicitly contained in a dictionary of words and their corresponding pronunciations. The underlying idea is that a pronunciation for an unknown word is assembled by matching substrings of the input to substrings of known, lexical words, hypothesising a partial pronunciation for each matched substring from the phonological knowledge, and concatenating the partial pronunciations. The variant of PbA evaluated here is based on Dedina and Nusbaum's PRONOUNCE [4], but with several further enhancements as detailed by Marchand and Damper [5]. In PRONOUNCE, a data structure called the pronunciation lattice is built from matching substrings in the input word and the dictionary entries. This is a graph containing information about the position and total number of matched substrings, and their partial pronunciations. A possible pronunciation for the input string then corresponds to a complete path through its lattice, with the output string assembled by concatenating the phoneme labels on the nodes/arcs in the order that they are traversed. (Different paths can, of course, correspond to the same pronunciation.) Scoring of candidate pronunciation uses two heuristics in PRONOUNCE. If there is a unique shortest path, then the pronunciation corresponding to this path is taken as the output. If there are tied shortest paths, then the pronunciation corresponding to the best scoring of these is taken as the output. The version of PbA evaluated here features several enhancements over PRONOUNCE. First, we use `full' pattern matching between input letter string and d...
The Delta Rule Development System for Speech Synthesis from Text
"... Progress in speech synthesis has been hampered by the lack of rule-writing tools of sufficient flexibility and power. This paper presents a new system, Delta, that gives linguists and programmers a versatile rule language and friendly debugging environment. Delta‘s central data structure is well-sui ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Progress in speech synthesis has been hampered by the lack of rule-writing tools of sufficient flexibility and power. This paper presents a new system, Delta, that gives linguists and programmers a versatile rule language and friendly debugging environment. Delta‘s central data structure is well-suited for representing a broad class of multi-level utterance structures. The Delta language has flexible pattern-matching expressions, control structures, and utterance manipulation statements. Its dictionary facilities provide elegant exception handling. The interactive symbolic debugger speeds rule development and tuning. Delta can not only accommodate existing synthesis models, but can also be used to develop new ones. I.

