• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Automatic prosodic analysis for computer aided pronunciation teaching (1994)

by Paul Bagshaw
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

The Prosody Module

by Anton Batliner, Jan Buckow, Heinrich Niemann, Elmar Nöth, Volker Warnke - Verbmobil: Foundations of Speech-to-Speech Translations , 1993
"... We describe the acoustic-prosodic and syntactic-prosodic annotation and classification of boundaries, accents and sentence mood integrated in the Verbmobil system for the three languages German, English, and Japanese. For the acoustic-prosodic classification, a large feature vector with normalized p ..."
Abstract - Cited by 24 (16 self) - Add to MetaCart
We describe the acoustic-prosodic and syntactic-prosodic annotation and classification of boundaries, accents and sentence mood integrated in the Verbmobil system for the three languages German, English, and Japanese. For the acoustic-prosodic classification, a large feature vector with normalized prosodic features is used. For the three languages, a multilingual prosody module was developed that reduces memory requirement considerably, compared to three monolingual modules. For classification, neural networks and statistic language models are used.

Duration Features in Prosodic Classification: Why Normalization Comes Second, and what they Really Encode.

by Anton Batliner, Elmar Nöth, Jan Buckow, Richard Huber, Volker Warnke, Heinrich Niemann , 2001
"... For the classification of boundaries and accents in German and English spontaneous speech in the VERBMOBIL project (speech to speech translation system), we use a large prosodic feature vector; duration features represent the most important feature class. They are computed in three different ways: ( ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
For the classification of boundaries and accents in German and English spontaneous speech in the VERBMOBIL project (speech to speech translation system), we use a large prosodic feature vector; duration features represent the most important feature class. They are computed in three different ways: (1) The word duration is normalized with respect to the `expected ' word duration: DURNORM; (2) Duration is normalized as for the number of syllables in the word: DURSYLL; (3) The absolute duration value DURABS of a word is taken. Normally, we use all these feature classes simultaneously. In the present paper, we have a look at the impact of each of these duration classes separately. In addition, we use partof -speech (POS) information as a further knowledge source. It turns out that throughout, the best feature class, if used alone, is DURABS, followed by DURSYLL, and third comes DURNORM. Best results are achieved by using all feature classes together. With POS information, better results can be achieved than without. This effect is larger for accent classification than for boundary classification, and much larger in combination with DURNORM than in combination with DURSYLL or DURABS. These results indicate that especially DURABS does not only encode prosodic but to a large extent syntactic POS information as well: content words are normally more prone to be accentuated than function words, and at the same time, they tend to be longer. This information is of course lost if duration is normalized, as is the case for DURSYLL and DURNORM.

Classical and novel discriminant features for affect recognition from speech

by Raul Fern, Rosalind W. Picard - In Proc. Interspeech , 2005
"... This paper investigates the performance and relevance of a set of acoustic features for the task of automatic recognition of affect from speech using machine learning techniques. Eighty seven novel and classical features related to loudness, intonation, and voice quality, are examined. Using feature ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
This paper investigates the performance and relevance of a set of acoustic features for the task of automatic recognition of affect from speech using machine learning techniques. Eighty seven novel and classical features related to loudness, intonation, and voice quality, are examined. Using feature selection, the results yield a performance level of 49.4 % recognition rate (compared to a human performance rate of 60.4 % and a chance level of 20%), while the relevance results show that the more exploratory and novel subset of these features outrank the more classical features in the recognition task. 1.

Language training system utilizing speech modification

by Meron Yoram, Keikichi Hirose - Proceedings ICSLP , 1996
"... In this paper, a computer assisted language training system, focusing on speech input and output, is described. The system is intended to help students of foreign language (typically Japanese or English) to improve their pronunciation, with an emphasis on prosodic features of speech. The system inco ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper, a computer assisted language training system, focusing on speech input and output, is described. The system is intended to help students of foreign language (typically Japanese or English) to improve their pronunciation, with an emphasis on prosodic features of speech. The system incorporates a combination of speech processing techniques, in order to analyze the input speech, and to produce effective speech feedback. The system is implemented on a Unix PC, with audio I/O capability, in a window environment.

Detecting pitch accent using pitch-corrected energy-based predictors

by Andrew Rosenberg, Julia Hirschberg - In Interspeech , 2007
"... Previous work has shown that the energy components of frequency subbands with a variety of frequencies and bandwidths predict pitch accent with various degrees of accuracy, and produce correct predictions for distinct subsets of data points. In this paper, we describe a series of experiments explori ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Previous work has shown that the energy components of frequency subbands with a variety of frequencies and bandwidths predict pitch accent with various degrees of accuracy, and produce correct predictions for distinct subsets of data points. In this paper, we describe a series of experiments exploring techniques to leverage the predictive power of these energy components by including pitch and duration features – other known correlates to pitch accent. We perform these experiments on Standard American English read, spontaneous and broadcast news speech, each corpus containing at least four speakers. Using an approach by which we correct energy-based predictions using pitch and duration information prior to using a majority voting classifier, we were able to detect pitch accent in read, spontaneous and broadcast news speech at 84.0%, 88.3 % and 88.5 % accuracy, respectively. Human performance at pitch accent detection is generally taken to be between 85 % and 90%. Index Terms: prosodic analysis, spectral emphasis 1.

Yet Another Algorithm for Pitch Tracking (YAAPT)

by Kavita Kasi , 2002
"... This thesis presents a pitch detection algorithm that is extremely robust for both high quality and telephone speech. The kernel method for this algorithm is the Normalized Cross Correlation (NCCF) reported by David Talkin [16]. Major innovations include: processing of the original acoustic signal a ..."
Abstract - Add to MetaCart
This thesis presents a pitch detection algorithm that is extremely robust for both high quality and telephone speech. The kernel method for this algorithm is the Normalized Cross Correlation (NCCF) reported by David Talkin [16]. Major innovations include: processing of the original acoustic signal and a nonlinearly processed version of the signal to partially restore very weak F0 components; intelligent peak picking to select multiple F0 candidates and assign merit factors; and, incorporation of highly robust pitch contours obtained from smoothed versions of low frequency portions of spectrograms. Dynamic programming is used to find the “best” pitch track among all the candidates, using both local and transition costs. The algorithm has been evaluated using the Keele pitch extraction reference database as “ground truth” for both “high quality ” and “telephone” speech. For both types of speech, the error rates obtained are lower than the lowest reported in the literature.

A Multilingual Prosody Module In A Speech-To-Speech Translation System

by E. Nöth, A. Batliner, J. Buckow, R. Huber, V. Warnke, H. Niemann, Lehrstuhl Fur Mustererkennung (informatik , 2000
"... In our previous research, we have shown that prosody can be used to dramatically improve the performance of the automatic speech translation system VERBMOBIL [16]. The methods to classify prosodic events have been developed on the German subcorpus of the VERBMOBIL speech database. In this paper we d ..."
Abstract - Add to MetaCart
In our previous research, we have shown that prosody can be used to dramatically improve the performance of the automatic speech translation system VERBMOBIL [16]. The methods to classify prosodic events have been developed on the German subcorpus of the VERBMOBIL speech database. In this paper we describe how the methods that we developed on the German subcorpus can be applied to other languages. Experiments show that these methods are suited for English and Japanese, as well. Efficiency problems are addressed and a new set of features is presented. The new set of features facilitates a multilingual module for prosodic processing. We present an architecture for such a multilingual module and discuss the advantages of this approach compared to an approach that uses separate modules for different languages. This multilingual module and the new feature set are evaluated w.r.t. computation time, memory requirement, and classification performance. The results show that the memory requirement can be reduced by 78%, whereas the recognition accuracy does not decrease.

A Prosodic Module for Self-Learning Activities

by Rodolfo Delmonte , 2002
"... We created an application specialized in prosodic tutoring, called the Prosodic Module(PM). The PM is composed of two different sets of Learning Activities, the first one dealing with prosodic problems at word syllabic level, the second one dealing with prosodic problems at phonological phrase and u ..."
Abstract - Add to MetaCart
We created an application specialized in prosodic tutoring, called the Prosodic Module(PM). The PM is composed of two different sets of Learning Activities, the first one dealing with prosodic problems at word syllabic level, the second one dealing with prosodic problems at phonological phrase and utterance level. The PM is able to detect significant deviations from a master's word/ phrase/ utterance and offers visual aids and a written diagnosis of the problem as well as indications on how to overcome and correct the error. This is achieved by means of a comparison between the two signals, the master and the student one; elements of comparison are constituted by the acoustic correlates of well-known prosodic elements such as intonational contour, sentence accent and word stress, duration at syllable, word and sentence level. We argue that the use of Automatic Speech Recognition as Teaching Aid should be targeted to narrowly focussed spoken exercises, for intermediate or higher level students, disallowing open-ended dialogues, in order to ensure consistency of evaluation. In addition, we support the conjoined use of ASR technology and prosodic tools to gauge Goodness of Pronunciation for linguistically consistent feedback.

F0 Extraction Methods

by Keelan Evanini, Catherine Lai, Overall Ger Results, Rmse Results
"... • Many studies have compared the performance of different ..."
Abstract - Add to MetaCart
• Many studies have compared the performance of different
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University