Results 1 - 10
of
14
Algorithms for Grapheme-Phoneme Translation for English and French: Applications
- COMPUTATIONAL LINGUISTICS
, 1997
"... Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names and ad-dresses, since they can complement orthographic search algorithms that make use of permutation, deletion, and insertion by allowing for a comparison with the phonetic equivalent. In databases, phonetics can help retrieve a word or a proper name without the user needing to know the correct spelling. A phonetic index is built with the vocabulary of the application. This could be an entire dictionary, or a list of proper names. The searched word is then converted into phonetics and retrieved with its information, if the word is in the phonetic index. This phonetic lookup can be used to retrieve a misspelled word in a dictionary or a database, or in a text editor to suggest corrections. Such rules are also necessary to formalize grapheme-phoneme correspondences in speech synthesis architecture. In text-to-speech systems, these rules are typically used to create phonemes
An Algorithm for High Accuracy Name Pronunciation by Parametric Speech Synthesizer
- COMPUTATIONAL LINGUISTICS
, 1991
"... ... This paper describes how an algorithm for high accuracy name pronunciation was implemented in software based on a combination of cryptanalysis, statistics, and linguistics. The algorithm behind the utility is a two-stage procedure: (1) the decoding of the name to determine its etymological group ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
... This paper describes how an algorithm for high accuracy name pronunciation was implemented in software based on a combination of cryptanalysis, statistics, and linguistics. The algorithm behind the utility is a two-stage procedure: (1) the decoding of the name to determine its etymological grouping; and (2) specific letter-to-sound rules (both segmental rules as well as stress-assignment rules) that provide the synthesizer parameters with sufficient additional information to accurately pronounce the name as would a typical speaker of American English. Default language and thresholds are settable parameters and are also described. While the complexity of the software is invisible to applications writers as well as users, this functionality now makes possible the automation of highly accurate name pronunciation by parametric speech synthesizer
XML tools and architecture for Named Entity recognition
, 1999
"... This paper reports on the development of a Named Entity recognition system developed fully within the xml paradigm. ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
This paper reports on the development of a Named Entity recognition system developed fully within the xml paradigm.
Putting People First: Specifying Proper Names in Speech Interfaces
- In Proceedings of the ACM Symposium on User Interface Software and Technology, ACM
, 1994
"... Communication is about people, not machines. But as firms and families alike spread out geographically, we rely increasingly on telecommunications tools to keep us "connected." The challenge of such systems is to enable conversation between individuals without computational infrastructure getting in ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Communication is about people, not machines. But as firms and families alike spread out geographically, we rely increasingly on telecommunications tools to keep us "connected." The challenge of such systems is to enable conversation between individuals without computational infrastructure getting in the way. This paper compares two speech-based communication systems, Phoneshell and Chatter, in how they deal with the keys to communication: proper names. Chatter, a conversational system using speech-recognition, improves upon the hierarchical nature of the touch-tone based Phoneshell by maintaining context and enabling use of anaphora. Proper names can present particular problems for speech recognizers, so an interface algorithm for reliable name specification by spelling is offered. Since individual letter recognition is non-robust, Chatter implicitly disambiguates strings of letters based on context. We hypothesize that the right interface can make faulty speech recognition as usable a...
Phonetic Transcription Standards For European Names (onomastica)
, 1993
"... exchanging national names amongst the partners to create a matrix of 'nativised' pronunciations for each (thereby) foreign name in each other language. This paper details the standards identified for phonetic transcription of names as part of the ONOMASTICA project, a European-wide research initia ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
exchanging national names amongst the partners to create a matrix of 'nativised' pronunciations for each (thereby) foreign name in each other language. This paper details the standards identified for phonetic transcription of names as part of the ONOMASTICA project, a European-wide research initiative for the construction of a multi-language pronunciation lexicon of proper names. The main design criteria adopted by the consortium for the development of this multi-language pronunciation dictionary are discussed, including aspects such as phonetic transcription standards, definitions of quality, quality control mechanisms and language specific details concerning phonetic transcription and the annotation of the language of origin. 1.1 OBJECTIVES AND EXPECTED IMPACT The non-availability of large pronunciation dictionaries of names continues to impede the development of many applications in speech technology. In particular, the acceptability of applications where speech output system...
A Computational Memory And Processing Model For Prosody
- In Proceedings of the Intl. Conf. on Spoken Language Processing
, 1998
"... This paper links prosody to the information in the text and how it is processed by the speaker. It describes the operation and output of Loq, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parame ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper links prosody to the information in the text and how it is processed by the speaker. It describes the operation and output of Loq, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parameter in the simulations varies in turn what counts as given and new in a text, and therefore, the intonational contours with which it is uttered. Currently, the system produces prosody in three different styles: child-like, adult expressive, and knowledgeable. This prosody also exhibits differences within each style -- no two simulations are alike. The limited resource approach captures some of the stylistic and individual variety found in natural prosody. 1. INTRODUCTION Ask any lay person to imitate computer speech and you will be treated to an utterance delivered in melodic and rhythmic monotone, possibly accompanied by choppy articulation and a voice quality that is nasal and strained. ...
A comparison of Anapron with seven other name-pronunciation systems
- JOURNAL OF THE AMERICAN VOICE INPUT/OUTPUT SOCIETY
, 1993
"... This paper presents an experiment comparing a new name-pronunciation system, Anapron, with seven existing systems: three state-of-the-art commercial systems (from Bellcore, Bell Labs, and DEC), two variants of a machinelearning system (NETtalk), and two humans. Anapron works by combining rule-based ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper presents an experiment comparing a new name-pronunciation system, Anapron, with seven existing systems: three state-of-the-art commercial systems (from Bellcore, Bell Labs, and DEC), two variants of a machinelearning system (NETtalk), and two humans. Anapron works by combining rule-based and case-based reasoning. It is based on the idea that it is much easier to improve a rule-based system by adding case-based reasoning to it than by tuning the rules to deal with every exception. In the experiment described here, Anapron used a set of rules adapted from MITalk and elementary foreignlanguage textbooks, and a case library of 5000 names. With these components --- which required relatively little knowledge engineering --- Anapron was found to perform almost at the level of the commercial systems, and significantly better than the two versions of NETtalk.
Employing Voice Back Channels to Facilitate Audio Document Retrieval
- Proceedings of ACM Conference on Office Information Systems (COIS
, 1988
"... Human listeners use voice back channels to indicate their comprehension of a talker’s remarks. This paper describes an attempt to build a user interface capable of employing these back channel responses for flow control purposes while presenting a variety of audio information to a listener. Acoustic ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Human listeners use voice back channels to indicate their comprehension of a talker’s remarks. This paper describes an attempt to build a user interface capable of employing these back channel responses for flow control purposes while presenting a variety of audio information to a listener. Acoustic evidence based on duration and prosody (rhythm and melody) of listeners ’ utterances is employed as a means of discriminating responses by discourse function without using word recognition. Such an interface has been applied to three tasks: speech synthesis of driving directions, speech synthesis of electronic mail, and retrieval of recorded voice messages. 1 Audio document access This paper describes research in progress to develop a user interface to facilitate voice retrieval of on-line information over a telephone connection. Information may be synthesized from text such as human authored electronic mail or a response to a database query, or it may be recorded, for example a telephone message or a dictated document. We need to control the rate and order of presentation of such audio information for an efficient interaction. We desire to exploit those aspects of human dialog behavior whereby the listener gives cues to the information provider indicating comprehension and ability to keep up. We are attempting to build an intuitive and robust user interface based on the duration and prosody (rhythm and melody) of the listener’s voice responses independent of any word recognition.
Improving Pronunciation Accuracy of Proper Names with Language Origin Classes
- in Proc. of the Seventh ESSLLI Student Session
, 2001
"... I would like to thank my advisor Alan Black for all his support and dedication, without him this thesis would not have been possible; Kenji Sagae for the insightful discussions about this thesis and, most importantly, for his patience and support; Guy Lebanon and Christian Monson, LTI colleagues, fo ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
I would like to thank my advisor Alan Black for all his support and dedication, without him this thesis would not have been possible; Kenji Sagae for the insightful discussions about this thesis and, most importantly, for his patience and support; Guy Lebanon and Christian Monson, LTI colleagues, for the discussion about unsupervised clustering; and Toni Badia for having introduced me to the field of Natural Language Processing and for his support during all these years. This work was supported by a “La Caixa ” Fellowship. ii Table of Contents Abbreviations...................................................................................................................... v Abstract.............................................................................................................................. vi
Reliable Spelling Despite Poor Spoken Letter Recognition
, 1994
"... this paper was to make speech recognition systems more usable by providing a reliable method of specifying names when recognition fails. Indeed, few things are as exasperating as repeating a command to a speech system that just isn't getting it. Being able to overcome errors in recognition of proper ..."
Abstract
- Add to MetaCart
this paper was to make speech recognition systems more usable by providing a reliable method of specifying names when recognition fails. Indeed, few things are as exasperating as repeating a command to a speech system that just isn't getting it. Being able to overcome errors in recognition of proper names by spelling is a boon to users. How long it takes to spell words out, however, is another question entirely. Clearly, spelling a word takes longer than saying it, particularly when each letter is prompted individually. Spelling "Marx" discretely can take ten seconds instead of one second to say it. Still, if spelling is the only way to get the system to recognize the name, the extra time may be worth it. A study would be worthwhile to determine if this is so. Another aspect of spelling names is dealing with collisions. Though collisions occur with varying frequency, they take extra time to deal with and drop the user into yet another modality, that of answering `yes' or `no' to the question "Is it <name>?" If the user has already said the name, spelled it continuously, and spelled it discretely, a fourth step may prove to be too much. The need to explicitly or interactively disambiguate names is not unique, however. Many lists may contain duplicate entries which require disambiguation: in my rolodex, for example, I have several entries with the last name of "Marx". Regardless of whether I speak the name, spell it out, or type it on a keyboard, I will have to explicitly disambiguate by saying which of the several Marxes I meant. Hence, although spelling names with a speech recognizer may require some explicit disambiguation on the part of the user, this may only increase the amount of disambiguation already required in the system. It is not an additional task. CONCLUSIO...

