Results 1 - 10
of
28
Learning to predict pitch accents and prosodic boundaries in Dutch
- In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
, 2003
"... We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow, easy-to-compute features. We train the algorithms on both tasks individually and on the two tasks simultaneously. ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow, easy-to-compute features. We train the algorithms on both tasks individually and on the two tasks simultaneously.
A Comparison Between Syntactic And Prosodic Phrasing
- In Proceedings of the European Conference on Speech Communication and Technology
, 1999
"... This study presents a comparison between syntactic and prosodic phrasing. A parser is used to calculate the syntactic structures from the orthographic text the prosodic structures of which are given by means of ToBI label files. For the automatic evaluation the prosodic break indices "3" (intermedia ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This study presents a comparison between syntactic and prosodic phrasing. A parser is used to calculate the syntactic structures from the orthographic text the prosodic structures of which are given by means of ToBI label files. For the automatic evaluation the prosodic break indices "3" (intermediate phrase boundary) and "4" (intonation phrase boundary) are compared with the terminals extracted from the extensive syntactic structures generated by the parser. These terminals are assumed to be the carriers of the phrase boundaries. Keywords: syntax, parsing, prosody, phrase boundaries, prosodic labelling. 1. INTRODUCTION This study aims to show that there is a strong correspondence between syntactic phrasing and prosodic phrasing. It is based on the comparison of ToBI-labelled break indices and the output of a parser. There has been a lot of discussion about the correspondence between syntactic and prosodic phrasing ([9], [10], [11], [13]). Most of the criticism concerns syntactic ana...
Assigning Prosodic Structure for Speech Synthesis via Syntax-Prosody Mapping
, 2000
"... This thesis presents a model that assigns prosodic structure to unrestricted text. The model is linguistically motivated. For the implementation an XML-pipeline is used as a data-architecture. The output can be processed by a text-to-speech synthesiser for determining the locations of phrase breaks. ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This thesis presents a model that assigns prosodic structure to unrestricted text. The model is linguistically motivated. For the implementation an XML-pipeline is used as a data-architecture. The output can be processed by a text-to-speech synthesiser for determining the locations of phrase breaks. The model's performance is evaluated in various ways. It outperforms another rule-based approach, and achieves either comparable results as a statistical model or comes close to those results, while being psychologically more plausible.
Integrating Linguistic and Performance-Based Constraints for Assigning Phrase Breaks
- IN: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON SPEECH
, 2002
"... The mapping between syntactic structure and prosodic structure is a widely discussed topic in linguistics. In this work we use insights gained from research on syntax-to-prosody mapping in order to develop a computational model which assigns prosodic structure to unrestricted text. The resulting str ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The mapping between syntactic structure and prosodic structure is a widely discussed topic in linguistics. In this work we use insights gained from research on syntax-to-prosody mapping in order to develop a computational model which assigns prosodic structure to unrestricted text. The resulting structure is intended to help a text-to-speech (TTS) system to predict phrase breaks. In addition to linguistic constraints, the model also incorporates a performance-oriented parameter which approximates the effect of speaking rate. The model is rulebased rather than probabilistic, and does not require training. We present the model and implementations for both English and German, and give evaluation results for both implementations. We then examine how far the approach can account for the different break patterns which are associated with slow, normal and fast speech rates.
Prosodic Phrasing: Machine and Human Evaluation
- in Proceedings 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire
, 2001
"... In this paper we describe a set of experiments aiming at building and evaluating a new phrasing module for European Portuguese Text-to-Speech Synthesis, using Classification and Regression Tree (CART) techniques on hand-labeled texts. Using the assessment criteria of matching boundary predictions ag ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper we describe a set of experiments aiming at building and evaluating a new phrasing module for European Portuguese Text-to-Speech Synthesis, using Classification and Regression Tree (CART) techniques on hand-labeled texts. Using the assessment criteria of matching boundary predictions against a reference example of phrased sentences, the best solution found up to now achieves an overall performance of 91.9%, with 86.3% of breaks correctly assigned and 4.3% of false insertions. Although in absolute terms such scores may be considered surprisingly good considering the size of the training set, the total number of exact matches at the sentence level is much lower. This suggested a more formal experiment to test the acceptability of the predicted phrasing in the judgment of human evaluators. The experiment involved 90 participants that were asked to grade both the predicted and reference phrasing, and to also express their opinion on where should the breaks be placed. The results showed that, as expected, there is a large variability among the subjects in the acceptance of a specific partitioning. However the performance of the automatic assignment procedure is better rated by human evaluators.
Modeling Prosodic Structures in Linguistically Enriched Environments
- in “Text, Speech and Dialogue”, Lecture Notes in Artificial. Intelligence. (LNAI), Springer-Verlag Berlin Heidelberg, Vol 3206
, 2004
"... Abstract. A significant challenge in Text-to-Speech (TtS) synthesis is the formulation of the prosodic structures (phrase breaks, pitch accents, phrase accents and boundary tones) of utterances. The prediction of these elements robustly relies on the accuracy and the quality of error-prone linguisti ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. A significant challenge in Text-to-Speech (TtS) synthesis is the formulation of the prosodic structures (phrase breaks, pitch accents, phrase accents and boundary tones) of utterances. The prediction of these elements robustly relies on the accuracy and the quality of error-prone linguistic procedures, such as the identification of the part-of-speech and the syntactic tree. Additional linguistic factors, such as rhetorical relations, improve the naturalness of the prosody, but are hard to extract from plain texts. In this work, we are proposing a method to generate enhanced prosodic events for TtS by utilizing accurate, error-free and high-level linguistic information. We are also presenting an appropriate XML annotation scheme to encode syntax, grammar, new or given information, phrase subject/object information, as well as rhetorical elements. These linguistically enriched has have been utilized to build realistic machine learning models for the prediction of the prosodic structures in terms of segmental information and ToBI marks. The methodology has been applied by exploiting a Natural Language Generator (NLG) system. The trained models have been built using classification via regression trees and the results strongly indicate the realistic effect on the generated prosody. The evaluation of this approach has been made by comparing the models produced by the enriched documents to those produced by plain text of the same domain. The results show an improved accuracy of up to 23%. 1.
Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora
, 2005
"... this paper has been partially supported by the GR-PROSODY project of the KAPODISTRIAS Program, Special Account for Research Grants, National and Kapodistrian University of Athens and by the HERACLITUS project of the Operational Programme for Education and Initial Vocational Training (EPEAEK), Greek ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
this paper has been partially supported by the GR-PROSODY project of the KAPODISTRIAS Program, Special Account for Research Grants, National and Kapodistrian University of Athens and by the HERACLITUS project of the Operational Programme for Education and Initial Vocational Training (EPEAEK), Greek Ministry of Education, under the 3rd European Community Support Framework for Greece
Prominence Prediction For Super-Sentential Prosodic Modeling Based On A New Database
- IN 5TH ISCA SPEECH SYNTHESIS WORKSHOP
, 2004
"... Most current prosodic modeling techniques are concerned with variation within the sentence. With the improvement of local prosodic variation modeling in techniques like unit selection, we would like to address issues of wider context in producing appropriate synthetic output. A common experience fou ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Most current prosodic modeling techniques are concerned with variation within the sentence. With the improvement of local prosodic variation modeling in techniques like unit selection, we would like to address issues of wider context in producing appropriate synthetic output. A common experience found in unit selection synthesis is that a sentence that sounds natural in isolation does not sound so natural when embedded in a wider context, because it has inappropriate prosody. This work
Forced Alignment For Speech Synthesis Databases Using Duration And Prosodic Phrase Breaks
"... Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break locations are first estimated, and then words are placed around breaks based on length and break probabilities for each word. This last step
Bayesian Induction of intonational phrase breaks
"... For the present paper, a Bayesian probabilistic framework for the task of automatic acquisition of intonational phrase breaks was established. By considering two different conditional independence assumptions, the naïve Bayes and Bayesian networks approaches were regarded and evaluated against the C ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
For the present paper, a Bayesian probabilistic framework for the task of automatic acquisition of intonational phrase breaks was established. By considering two different conditional independence assumptions, the naïve Bayes and Bayesian networks approaches were regarded and evaluated against the CART algorithm, which has been previously used with success. A finite length window of minimal morphological and syntactic resources was incorporated, i.e. the POS label and the kind of phrase boundary, a novel syntactic feature that has not been applied to intonational phrase break detection before. This feature can be used in languages where syntactic parsers are not available and proves to be important, not only for the proposed Bayesian methodologies but for other algorithms, like CART. Trained on a 5500 word database, Bayesian networks proved to be the most effective in terms of precision (82,3%) and recall (77,2%) for predicting phrase breaks. 1.

