Results 1 -
6 of
6
Phonetic Vocoding With Speaker Adaptation
, 1997
"... This paper describes a phonetic vocoding scheme which relies on speaker adaptation to capture important speaker characteristics. These are typically lost in phonetic vocoders which transmit only information about the phones which are recognized, together with some prosodic information. In our scheme ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper describes a phonetic vocoding scheme which relies on speaker adaptation to capture important speaker characteristics. These are typically lost in phonetic vocoders which transmit only information about the phones which are recognized, together with some prosodic information. In our scheme, however, additional speaker characteristics are transmitted in vowel regions (average values of LSP coefficients for each phone). This additional information yielded potentially good speaker recognizability results, in informal listening tests, while still achieving a rather low average bit rate, suitable for many transmission and storage applications. This work extends our previous phonetic vocoding scheme described in [5]. The vocoder is now fully quantized and the number of transmitted parameters had been significantly reduced.
DFW-based spectral smoothing for concatenative speech synthesis
- in Proceedings of ICSLP
, 2004
"... This paper proposes and evaluates a new spectral smoothing technique whose performance is comparable with LSP interpolation in terms of Euclidean spectral distance measurements but whose interpolated formant trajectories are more reasonable from a phonetic point of view. The approach firstly estimat ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper proposes and evaluates a new spectral smoothing technique whose performance is comparable with LSP interpolation in terms of Euclidean spectral distance measurements but whose interpolated formant trajectories are more reasonable from a phonetic point of view. The approach firstly estimates derivative logarithmic magnitude spectra from both the source and the target frame represented by autoregressive filter coefficients. Then, Dynamic Programming yields the best alignment between these two spectral representations. Smoothed frequency responses are achieved by weighted linear interpolation between the corresponding source and target spectral lines whose alignment was found by DP backtracking. Finally, the spectrum is converted to autoregressive filter coefficients with the intermediate stage of autocorrelation coefficients. 1.
Speech Processing with Linear and Neural Network Models
, 1996
"... ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the quality is maintained when pitch modifications are applied. The second part of the dissertation studies the operation of recurrent neural networks in classifying patterns of correlated feature vectors. Such patterns are typical of speech classification tasks. The operation of a hidden node with a recurrent connection is explained in terms of a decision boundary which changes position in feature space. The feedback is shown to delay switching from one class to another and to smooth output decisions for sequences of feature vectors from the same class. For networks trained with constant class targets, a sequence of feature vectors from the same class tends to drive the operation of hidden nod
Application Of Speaker Modification Techniques To Phonetic Vocoding
- PROC. ICSLP
, 1996
"... The goal of the work described in this paper is to develop a very low bit rate vocoding scheme. The vocoder is a typical LPC vocoder, whose parameters are post-processed on a phone-byphone basis, resulting in a variable bit rate segment vocoder. Given ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The goal of the work described in this paper is to develop a very low bit rate vocoding scheme. The vocoder is a typical LPC vocoder, whose parameters are post-processed on a phone-byphone basis, resulting in a variable bit rate segment vocoder. Given
Phonetic Vocoding
"... 1 A segmental vocoder is a type of coder that explores the correlation between frames in order to achieve significant bit rate savings, in a variable bit rate framework. The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Like in other basic LPC voco ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
1 A segmental vocoder is a type of coder that explores the correlation between frames in order to achieve significant bit rate savings, in a variable bit rate framework. The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Like in other basic LPC vocoders, the transmitter stage performs LPC analysis, and estimates pitch, voicing and energy parameters, on a frame by frame basis. Here, however, the LPC coefficients are fed into a phone recogniser which segments the speech signal, producing a phone index that is transmitted together with the prosodic information. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding, a speaker adaptation methodology may be adopted, although resulting in higher bit r...
AUTOMATIC SPEECH RECOGNITION AND INTRINSIC SPEECH VARIATION
"... This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect ..."
Abstract
- Add to MetaCart
This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect the different levels of the ASR processing chain. For different sources of speech variation, the paper summarizes the current knowledge and highlights specific feature extraction or modeling weaknesses and current trends. 1.

