Results 1 - 10
of
21
Parallel Networks that Learn to Pronounce English Text
- COMPLEX SYSTEMS
, 1987
"... This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed h ..."
Abstract
-
Cited by 413 (5 self)
- Add to MetaCart
This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (;i) The more words the network learns, the better it is at generalizing and correctly pronouncing new words, (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced practice is more effective for long-term retention than massed practice. Network models can be constructed that have the same performance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.
Algorithms for Grapheme-Phoneme Translation for English and French: Applications
- COMPUTATIONAL LINGUISTICS
, 1997
"... Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names and ad-dresses, since they can complement orthographic search algorithms that make use of permutation, deletion, and insertion by allowing for a comparison with the phonetic equivalent. In databases, phonetics can help retrieve a word or a proper name without the user needing to know the correct spelling. A phonetic index is built with the vocabulary of the application. This could be an entire dictionary, or a list of proper names. The searched word is then converted into phonetics and retrieved with its information, if the word is in the phonetic index. This phonetic lookup can be used to retrieve a misspelled word in a dictionary or a database, or in a text editor to suggest corrections. Such rules are also necessary to formalize grapheme-phoneme correspondences in speech synthesis architecture. In text-to-speech systems, these rules are typically used to create phonemes
An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer
- COMPUTATIONAL LINGUISTICS
, 1991
"... ... This paper describes how an algorithm for high accuracy name pronunciation was implemented in software based on a combination of cryptanalysis, statistics, and linguistics. The algorithm behind the utility is a two-stage procedure: (1) the decoding of the name to determine its etymological group ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
... This paper describes how an algorithm for high accuracy name pronunciation was implemented in software based on a combination of cryptanalysis, statistics, and linguistics. The algorithm behind the utility is a two-stage procedure: (1) the decoding of the name to determine its etymological grouping; and (2) specific letter-to-sound rules (both segmental rules as well as stress-assignment rules) that provide the synthesizer parameters with sufficient additional information to accurately pronounce the name as would a typical speaker of American English. Default language and thresholds are settable parameters and are also described. While the complexity of the software is invisible to applications writers as well as users, this functionality now makes possible the automation of highly accurate name pronunciation by parametric speech synthesizer
Knowledge of Language Origin Improves Pronunciation Accuracy of Proper Names
- In Eurosleech
, 2001
"... As it is impossible to have a lexicon with complete coverage, and a high proportion of unknown words are proper names, this paper addresses the issue of automatically finding pronunciations of unseen proper names in US English. Proper names, especially in the US, may come from a large range of ethni ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
As it is impossible to have a lexicon with complete coverage, and a high proportion of unknown words are proper names, this paper addresses the issue of automatically finding pronunciations of unseen proper names in US English. Proper names, especially in the US, may come from a large range of ethnic backgrounds. We present a model and results showing that including ethnic origin of words in a statistical model can improve pronunciation results.
An Efficient Way To Learn English Grapheme-To-Phoneme Rules Automatically
- Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 1993
"... We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used the amount of expanding context around the symbol to be transformed, weighted towards the right. To apply this concept to English text-to-speech mapping, we have used the 20008-word corpus provided in the public domain by Sejnowski and Rosenberg, that was also used in the NETTALK-experiments. Phoneme-level mapping accuracies of 91 per cent with data not used in training demonstrate that the Dynamically Expanding Context is able to capture quite efficiently the contextdependent relationships in the corpus. 1 INTRODUCTION The problem addressed in this paper is automatic learning of grapheme-to-phoneme mapping rules. We present an efficient way to learn these for English by using Kohonen's c...
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
Automatic Script Identification from Images Using Cluster-based Templates
- IEEE Transaction on Pattern Analysis and Machine Intelligence
, 1995
"... We describe a system that automatically identifies the script used in documents stored electronically in image form. The system can learn to distinguish any number of scripts. It develops a set of representative symbols (templates) for each script by clustering textual symbols from a set of training ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We describe a system that automatically identifies the script used in documents stored electronically in image form. The system can learn to distinguish any number of scripts. It develops a set of representative symbols (templates) for each script by clustering textual symbols from a set of training documents and representing each cluster by its centroid. "Textual symbols" include discrete characters in scripts such as Cyrillic, as well as adjoined characters, character fragments, and whole words in connected scripts such as Arabic. To identify a new document 's script, the system compares a subset of symbols from the document to each script's templates, screening out rare or unreliable templates, and choosing the script whose templates provide the best match. Our current system, trained on thirteen scripts, correctly identifies all test documents except those printed in fonts that differ markedly from fonts in the training set. 1. Introduction Script identification is a key part of th...
Phonetic Transcription Standards For European Names (onomastica)
, 1993
"... exchanging national names amongst the partners to create a matrix of 'nativised' pronunciations for each (thereby) foreign name in each other language. This paper details the standards identified for phonetic transcription of names as part of the ONOMASTICA project, a European-wide research initia ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
exchanging national names amongst the partners to create a matrix of 'nativised' pronunciations for each (thereby) foreign name in each other language. This paper details the standards identified for phonetic transcription of names as part of the ONOMASTICA project, a European-wide research initiative for the construction of a multi-language pronunciation lexicon of proper names. The main design criteria adopted by the consortium for the development of this multi-language pronunciation dictionary are discussed, including aspects such as phonetic transcription standards, definitions of quality, quality control mechanisms and language specific details concerning phonetic transcription and the annotation of the language of origin. 1.1 OBJECTIVES AND EXPECTED IMPACT The non-availability of large pronunciation dictionaries of names continues to impede the development of many applications in speech technology. In particular, the acceptability of applications where speech output system...
Improving Pronunciation Accuracy of Proper Names with Language Origin Classes
- in Proc. of the Seventh ESSLLI Student Session
, 2001
"... I would like to thank my advisor Alan Black for all his support and dedication, without him this thesis would not have been possible; Kenji Sagae for the insightful discussions about this thesis and, most importantly, for his patience and support; Guy Lebanon and Christian Monson, LTI colleagues, fo ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
I would like to thank my advisor Alan Black for all his support and dedication, without him this thesis would not have been possible; Kenji Sagae for the insightful discussions about this thesis and, most importantly, for his patience and support; Guy Lebanon and Christian Monson, LTI colleagues, for the discussion about unsupervised clustering; and Toni Badia for having introduced me to the field of Natural Language Processing and for his support during all these years. This work was supported by a “La Caixa ” Fellowship. ii Table of Contents Abbreviations...................................................................................................................... v Abstract.............................................................................................................................. vi

