Results 1 - 10
of
24
The Mbrola Project: Towards A Set Of High Quality Speech Synthesizers Free Of Use For Non Commercial Purposes
"... The aim of the MBROLA project, recently initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost up academic ..."
Abstract
-
Cited by 59 (0 self)
- Add to MetaCart
The aim of the MBROLA project, recently initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost up academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by Text-to-Speech synthesizers for the years to come.
Survey of the State of the Art in Human Language Technology
, 1995
"... Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Sig ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Signal Representation : : : : : : : : : : : : : : : : : : : : : : : : : : 11 Melvyn J. Hunt 1.4 Robust Speech Recognition : : : : : : : : : : : : : : : : : : : : : : 17 Richard M. Stern 1.5 HMM Methods in Speech Recognition : : : : : : : : : : : : : : : 24 Renato De Mori & Fabio Brugnara 1.6 Language Representation : : : : : : : : : : : : : : : : : : : : : : : : 35 Salim Roukos 1.7 Speaker Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : :<F35.37
Toward Interface Design for Human Language Technology: Modality and Structure as Determinants of Linguistic Complexity
, 1995
"... Before next-generation human language technology can be designed to function successfully in actual #eld settings, interface techniques will be needed that can guide users' language to coincide with current system capabilities. The present study examines how input modality and presentation struct ..."
Abstract
-
Cited by 34 (13 self)
- Add to MetaCart
Before next-generation human language technology can be designed to function successfully in actual #eld settings, interface techniques will be needed that can guide users' language to coincide with current system capabilities. The present study examines how input modality and presentation structure in#uence the linguistic complexity observed in people's spoken and written input to an interactive system. Using a semi-automatic simulation technique, language was collected during speech-only, writing-only, and combined pen#voice exchanges, and using presentation formats that either were structured or unconstrained. Results indicate that both modality and presentation format substantially in#uence linguistic complexity, although the speci#c nature of their impact di#ers. A comprehensive analysis is provided of how both factors a#ect people's observed language in terms of total words, dis#uencies, utterance length, lexical variability, perplexity, syntactic ambiguity, and semanti...
Overview of Evaluation in Speech and Natural Language Processing
, 1997
"... Introduction to Evaluation Terminology and Use We can broadly distinguish three kinds of evaluation, appropriate to three different goals. 1. Adequacy Evaluation This is determination of the fitness of a system for a purpose---will it do what is required, how well, at what cost, etc. Typically for ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Introduction to Evaluation Terminology and Use We can broadly distinguish three kinds of evaluation, appropriate to three different goals. 1. Adequacy Evaluation This is determination of the fitness of a system for a purpose---will it do what is required, how well, at what cost, etc. Typically for a prospective user, it may be comparative or not, and may require considerable work to identify a user's needs. One model is consumer organizations which publish the results of tests on, e.g., cars or appliances, and identify best buys for certain price-performance targets. This also goes by the names evaluation and evaluation proper. 476 Chapter 13: Evaluation 2. Diagnostic Evaluation This is production of a system performance profile with respect to some taxonimization of the space of possible inputs. It is typically used by system developers, but sometimes offered to end-us
Computers Seeing People
- AI Magazine
, 1999
"... this paper, we present methods that give machines the ability to see people, interpret their actions and interact with them. We present the motivating factors behind this work, examples of how such computational methods are developed and their applications. The basic reason for providing machines th ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
this paper, we present methods that give machines the ability to see people, interpret their actions and interact with them. We present the motivating factors behind this work, examples of how such computational methods are developed and their applications. The basic reason for providing machines the ability to see people really depends on the task we are associating with a machine. An industrial vision system aimed at extracting defects on an assembly line need not know anything about people. Similarly, a computer used for email and text writing need not see and perceive the users gestures and expressions. However, if our interest is to build intelligent machines that can work with us, support our needs and be our helpers, than it maybe required for these machines to know more about who they are supporting and helping. If our computers are to do more then support our text-based needs like writing papers, spreadsheets, and communicating via email; perhaps take on a role of being a personal assistant, then the ability to see a person is essential. Such an ability to perceive people is something that we take for granted in our everyday interactions with each other. At present our model of a machine or more specifically of a computer is something that is placed in the corner of the room. It is deaf, dumb, and blind, having no sense of the environment that it is in or of the person that is near it. We communicate with this computer using a coded sequence of tappings on a keyboard. Imagine a computer that knows that you are near it, that you are looking at it, knows who you are and what you are trying to do. Imagine a machine that can interpret a video signal based on who is in the scene and what they are doing. Such abilities in a computer are hard to imagine, unless it has...
Revealing translators knowledge: statistical methods in constructing practical translation lexicons for language and speech processing
- in International Journal of Speech Technology
, 2002
"... Abstract. Parallel corpora encode extremely valuable linguistic knowledge about paired languages, both in terms of vocabulary and syntax. A professional translation of a text represents a series of linguistic decisions made by the translator in order to convey as faithfully as possible the meaning o ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Abstract. Parallel corpora encode extremely valuable linguistic knowledge about paired languages, both in terms of vocabulary and syntax. A professional translation of a text represents a series of linguistic decisions made by the translator in order to convey as faithfully as possible the meaning of the original text and to produce a “natural ” text from the perspective of a native speaker of the target language. The “naturalness ” of a translation implies not only the grammaticality of the translated text, but also style and cultural or social specificity. We describe a program that exploits the knowledge embedded in the parallel corpora and produces a set of translation equivalents (a translation lexicon). The program uses almost no linguistic knowledge, relying on statistical evidence and some simplifying assumptions. Our experiments were conducted on the MULTEXT-EAST multilingual parallel corpus (Orwell’s “1984”), and the evaluation of the system performance is presented in some detail in terms of precision, recall and processing time. We conclude by briefly mentioning some applications of the automatic extracted lexicons for text and speech processing. Keywords: alignment, bitext, lemmatization, tagging, translation lexicon
Histogram Equalization of the Speech Representation for Robust Speech Recognition
, 2001
"... The noise degrades the performance of Automatic Speech Recognition systems mainly due to the mismatch between the training and recognition conditions it introduces. The noise causes a distortion of the feature space which usually presents a non-linear behavior. In order to reduce this mismatch, the ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The noise degrades the performance of Automatic Speech Recognition systems mainly due to the mismatch between the training and recognition conditions it introduces. The noise causes a distortion of the feature space which usually presents a non-linear behavior. In order to reduce this mismatch, the methods proposed for robust speech recognition try to compensate the noise effect either by obtaining an estimation of the clean speech or by adapting the recognizer acoustic models for a proper modeling of the noisy speech. In this paper we propose a method to compensate the noise effect over the speech representation. This method is based on the histogram equalization technique frequently applied for Digital Image Processing, which has been adapted to the speech representation. For each component of the feature vectors representing the speech signal, the histogram is estimated and the transformation which converts it into a reference histogram is calculated. Such transformations tend to compensate the distortion the noise produces over the different components of the feature vector and improve the performance of the recognition systems under noise conditions. We describe how the histogram equalization method can be adapted to robust speech recognition and present some recognition experiments to evaluate the proposed method.
Spoken-Language Access to Multimedia (SLAM): Masters Thesis
"... Introduction 1.1 The problem The World-Wide Web (WWW) (CERN, 1994) is a network-based standard for hypermedia documents that combines documents prepared in HyperText Markup Language (HTML) (NCSA, 1994a) with an extensible set of multimedia resources. The most popular WWW browser with available sour ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Introduction 1.1 The problem The World-Wide Web (WWW) (CERN, 1994) is a network-based standard for hypermedia documents that combines documents prepared in HyperText Markup Language (HTML) (NCSA, 1994a) with an extensible set of multimedia resources. The most popular WWW browser with available source code is Mosaic (NCSA, 1994b), a cross-platform program developed and distributed by NCSA, now running in X11-based Unix, Macintosh and PC-Windows environments. As a hypermedia viewer, Mosaic combines the flexibility and navigability of hypermedia with multimedia outputs such as audio and GIF images. The World-Wide Web, especially as viewed with Mosaic, is phenomenally popular. By mid-Spring of 1994, Internet traffic was doubling about every six months. Of this growth, 2 the World-Wide Web's proportional usage was doubling approximately every four months. In absolute volume of traffic, use of the WWW was doubling every two and a half months (Wallach, 1994). Much of the popu
The Integrality of Speech in Multimodal Interfaces
- ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION
, 1998
"... A framework of complementary behavior has been proposed which maintains that direct manipulation and speech interfaces have reciprocal strengths and weaknesses. This suggests that user interface performance and acceptance may increase by adopting a multimodal approach that combines speech and direct ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
A framework of complementary behavior has been proposed which maintains that direct manipulation and speech interfaces have reciprocal strengths and weaknesses. This suggests that user interface performance and acceptance may increase by adopting a multimodal approach that combines speech and direct manipulation. This effort examined the hypothesis that the speed, accuracy, and acceptance of multimodal speech and direct manipulation interfaces will increase when the modalities match the perceptual structure of the input attributes. A software prototype that supported a typical biomedical data collection task was developed to test this hypothesis. A group of 20 clinical and veterinary pathologists evaluated the prototype in an experimental setting using repeated measures. The results of this experiment supported the hypothesis that the perceptual structure of an input task is an important consideration when designing a multimodal computer interface. Task completion time, the number of speech errors, and user acceptance improved when interface best matched the perceptual structure of the input attributes.
User-centered Modeling for Spoken Language and Multimodal Interfaces
"... By modeling difficult sources of linguistic variability in spontaneous speech and language, interfaces can be designed that transparently guide human input to match system processing capabilities. Such work is yielding more user-centered and robust interfaces for next-generation spoken language and ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
By modeling difficult sources of linguistic variability in spontaneous speech and language, interfaces can be designed that transparently guide human input to match system processing capabilities. Such work is yielding more user-centered and robust interfaces for next-generation spoken language and multimodal systems. Historically, the development of spoken language systems has been primarily a technology-driven phenomenon. However, successful processing of spontaneous speech and dialogue, especially in actual field settings, requires a considerably broader understanding of performance issues during humancomputer spoken interactions. Research from this perspective currently represents a gap in our scientific knowledge, which is widely recognized as having generated a bottleneck in our ability to support robust speech for real commercial applications. The present article summarizes recent research on usercentered modeling of human language and performance during spoken and multimodal interaction, as well as interface design aimed at next-generation systems.

