Results 11 - 20
of
41
Pronunciation by Analogy: Impact of Implementational Choices on Performance
, 1997
"... Pronunciation by analogy (PbA) is an emerging, data-driven technique with potential application in text-to-speech (TTS) systems, as well as being an influential psychological model of reading aloud. The underlying idea is that a pronunciation for an unknown word (i.e. one not in the dictionary, or l ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Pronunciation by analogy (PbA) is an emerging, data-driven technique with potential application in text-to-speech (TTS) systems, as well as being an influential psychological model of reading aloud. The underlying idea is that a pronunciation for an unknown word (i.e. one not in the dictionary, or lexicon, of the human or machine `reader') is assembled by matching substrings of the input to substrings of known, lexical words, hypothesising a partial pronunciation for each matched substring from the lexical knowledge of the `reader', and concatenating the partial pronunciations. This paper assesses the capability of PbA to derive pronunciations for unknown words of English. As a psychological model, PbA is `underspecified', i.e. the implementor of a simulation of the process faces detailed choices which can only be resolved by trial and error. One goal for this paper is to explore the impact of certain basic implementational choices on the performance of PbA systems. The variables stud...
Speech-based retrieval using semantic co-occurrence filtering
- In Proc. ARPA Human Language Technology Workshop, Plainsboro, NJ
, 1994
"... In this paper we demonstrate that speech recognition can be effectively applied to information retrieval (IR) applications. Our system exploits the fact that the intended words of a spo-ken query tend to co-occur in text documents in close proxim-ity whereas word combinations that are the result of ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In this paper we demonstrate that speech recognition can be effectively applied to information retrieval (IR) applications. Our system exploits the fact that the intended words of a spo-ken query tend to co-occur in text documents in close proxim-ity whereas word combinations that are the result of recogni-tion errors are usually not semantically correlated and thus do not appear together. Termed "Semantic Co-occurrence Filtering " this enables the system to simultaneously disam-biguate word hypotheses and find relevant text for retrieval. The system is built by integrating standard IR and speech recognition techniques. An evaluation of the system is pre-seated and we discuss several refinements to the functionality. 1.
An Efficient Way To Learn English Grapheme-To-Phoneme Rules Automatically
- Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 1993
"... We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used the amount of expanding context around the symbol to be transformed, weighted towards the right. To apply this concept to English text-to-speech mapping, we have used the 20008-word corpus provided in the public domain by Sejnowski and Rosenberg, that was also used in the NETTALK-experiments. Phoneme-level mapping accuracies of 91 per cent with data not used in training demonstrate that the Dynamically Expanding Context is able to capture quite efficiently the contextdependent relationships in the corpus. 1 INTRODUCTION The problem addressed in this paper is automatic learning of grapheme-to-phoneme mapping rules. We present an efficient way to learn these for English by using Kohonen's c...
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
Context-Dependent Acoustic Modeling Using Graphemes For Large Vocabulary Speech Recognition
- in Proceedings the ICASSP
, 2002
"... In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort. ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort.
Speech Recognition System Design Based on Automatically Derived Units
, 1999
"... In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function. In particular,
A Language-Independent Unsupervised Model for Morphological Segmentation
, 2007
"... Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algo ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. We also propose a method for detecting variation within stems in an unsupervised fashion. The segmentation quality reached with the new algorithm is good enough to improve grapheme-to-phoneme conversion.
Advances in Analogy-Based Learning: False Friends and Exceptional Items in Pronunciation By Paradigm-Driven Analogy
- In Proceedings of I JCA I'95 workshop on 'New Approaches to Learning for Natural Language Processing
, 1995
"... When looked at from a multilingual perspective, grapheme-to-phoneme conversion is a challenging task, fraught with most of the classical NLP "vexed questions": bottle-neck problem of data acquisition, pervasiveness of exceptions, difficulty to state range and order of rule application, proper treatm ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
When looked at from a multilingual perspective, grapheme-to-phoneme conversion is a challenging task, fraught with most of the classical NLP "vexed questions": bottle-neck problem of data acquisition, pervasiveness of exceptions, difficulty to state range and order of rule application, proper treatment of context-sensitive phenomena and long-distance dependencies, and so on. The hand-crafting of transcription rules by a human expert is onerous and time-consuming, and yet, for some European languages, still stops short of a level of correctness and accuracy acceptable for practical applications. We illustrate here a self-learning multilingual system for analogy-based pronunciation which was tested on Italian, English and French, and whose performances are assessed against the output of both statistically and rule-based transcribers. The general point is made that analogy-based self-learning techniques are no longer just psycholinguistically-plausible models, but competitive tools, combining the advantages of using language-independent, self-learning, tractable algorithms, with the welcome bonus of being more reliable for applications than traditional text-to-speech systems.
Improving Pronunciation Accuracy of Proper Names with Language Origin Classes
- in Proc. of the Seventh ESSLLI Student Session
, 2001
"... I would like to thank my advisor Alan Black for all his support and dedication, without him this thesis would not have been possible; Kenji Sagae for the insightful discussions about this thesis and, most importantly, for his patience and support; Guy Lebanon and Christian Monson, LTI colleagues, fo ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
I would like to thank my advisor Alan Black for all his support and dedication, without him this thesis would not have been possible; Kenji Sagae for the insightful discussions about this thesis and, most importantly, for his patience and support; Guy Lebanon and Christian Monson, LTI colleagues, for the discussion about unsupervised clustering; and Toni Badia for having introduced me to the field of Natural Language Processing and for his support during all these years. This work was supported by a “La Caixa ” Fellowship. ii Table of Contents Abbreviations...................................................................................................................... v Abstract.............................................................................................................................. vi
Learning to Read Aloud: A Neural Network Approach Using Sparse Distributed Memory
- Journal of Computing in Civil Engineering, American Society of Civil Engineers
, 1989
"... z/Sm3 2-- ..."

