Results 1 - 10
of
21
Weighted Finite-State Transducers in Speech Recognition
, 2001
"... We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for HMM models, context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer oper ..."
Abstract
-
Cited by 101 (3 self)
- Add to MetaCart
We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for HMM models, context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted
A Flexible, Scalable Finite-State Transducer Architecture For Corpus-Based Concatenative Speech Synthesis
, 2000
"... In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit s ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synthesis costs into a constraint kernel, we have obtained a topology that scales linearly with the size of the synthesis corpus. The FST representation provides a flexible, unified framework in which we can leverage our previous work in speech recognition in areas such as pronunciation modelling and search. The FST synthesizer has been incorporated into two servers which operate within our conversational system architecture to convert meaning representations into waveforms. We have had preliminary success with the new FST-based synthesis in several constrained spoken dialogue applications. 1. INTRO...
Joint Prosody Prediction And Unit Selection For Concatenative Speech Synthesis
, 2001
"... In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation. 1. INTRODUCTION The growing popularity of speech-enabled computer interfaces demands high quality speech output, particularly for telephone applications. The perceived quality of standard general purpose text-tospeech (TTS) systems is not good enough, which forces applicatio...
Corpus-Based Speech Synthesis: Methods and Challenges
"... Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory. The frequency of unit concatenations in, e.g., diphone synthesis has been argued to contribute to the perceived lack of naturalness of synthetic ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory. The frequency of unit concatenations in, e.g., diphone synthesis has been argued to contribute to the perceived lack of naturalness of synthetic speech. The key idea of corpus-based synthesis, or unit selection, is to use an entire speech corpus as the acoustic inventory and to select at run-time from this corpus the longest available strings of phonetic segments that match a sequence of target speech sounds in the utterance to be synthesized, thereby minimizing the number of concatenations and reducing the need for signal processing. This paper reviews the assumptions underlying this synthesis strategy and the dierent approaches to unit selection, as well as the major challenges encountered by corpus-based methods. One of the biggest problems to date is the relative weighting of acoustic distance measures. We further argue agains...
Unit Selection for Speech Synthesis Using Splicing Costs with Weighted Finite State Transducers
, 2001
"... In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint po ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a perceptual experiment we demonstrate an improvement in speech quality achieved by using splicing costs during unit selection.
High-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion
, 2003
"... Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically im ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the naturalness of synthetic speech compared with the early TTS. However, no general-purpose TTS has been developed that can consistently synthesize su#- ciently natural speech. Furthermore, there is not yet enough flexibility in corpusbased TTS.
Restricted unlimited domain synthesis
- In Proc. EUROSPEECH-2003
, 2003
"... This paper describes the hybrid unit selection strategy for restricted domain synthesis in the SmartKom dialog system. Restricted domains are characterized as being biased toward domain specific utterances while being unlimited in terms of vocabulary size. This entails that unit selection in restric ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This paper describes the hybrid unit selection strategy for restricted domain synthesis in the SmartKom dialog system. Restricted domains are characterized as being biased toward domain specific utterances while being unlimited in terms of vocabulary size. This entails that unit selection in restricted domains must deal with both domain specific and open-domain material. The strategy presented here combines the advantages of two existing unit selection approaches, motivated by the claim that the phonological structure matching approach is advantageous for domain specific parts of utterances, while the acoustic clustering algorithm is more appropriate for opendomain material. This dichotomy is also reflected in the speech database, which consists of a domain specific and an opendomain part. The text material for the open-domain part was constructed to optimize coverage of diphones and phonemes in different contexts. 1.
Information-Theoretic Criteria for Unit Selection Synthesis
, 2002
"... In our recent work on concatenative speech synthesis, we have devised an efficient, graph-based search to perform unit selection given symbolic information. By encapsulating concatenation and substitution costs defined at the class level, the graph expands only linearly with respect to corpus size. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In our recent work on concatenative speech synthesis, we have devised an efficient, graph-based search to perform unit selection given symbolic information. By encapsulating concatenation and substitution costs defined at the class level, the graph expands only linearly with respect to corpus size. To date, these costs were manually tuned over pre-specified classes, which was a knowledgeintensive engineering process. In this research paper, we turn to information-theoretic metrics for automatically learning the costs from data. These costs can be analyzed in a minimum description length (MDL) framework. The performance of these automatically determined weights is compared against that of manually tuned weights in a perceptual evaluation.
Weighted Grammar Tools: The GRM Library
, 2000
"... We describe the algorithmic and software design principles of a general grammar library designed for use in spoken-dialogue systems, speech synthesis, and other speech processing applications. The library is a set of general-purpose software tools for constructing and modifying weighted finite-state ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We describe the algorithmic and software design principles of a general grammar library designed for use in spoken-dialogue systems, speech synthesis, and other speech processing applications. The library is a set of general-purpose software tools for constructing and modifying weighted finite-state acceptors and transducers representing grammars. The tools can be used in particular to compile weighted contextdependent rewrite rules into weighted finite-state transducers, read and compile, when possible, weighted context-free grammars into weighted automata, and dynamically modify the compiled grammar automata. The dynamic modifications allowed include: grammar switching, dynamic modification of rules, dynamic activation or non-activation of rules, and the use of dynamic lists. Access to these features is essential in spoken-dialogue applications. 2.1 Motivation We describe the algorithmic and software design principles of a general grammar library (GRM library) designed for use in ...
The Impact Of Speech Recognition On Speech Synthesis
, 2002
"... Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. W ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. We further speculate on future areas where ASR may impact synthesis and vice versa.

