Results 1 -
3 of
3
An Automatic Algorithm For Segmenting And Labelling A Connected Digit Sequence
, 2000
"... Group delay functions provide an alternative representation of signal information. The main features of group delay functions are the additive and high resolution properties. The Fourier transform (FT) phase is generally featureless due to random polority and wrapping. But the group delay function w ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Group delay functions provide an alternative representation of signal information. The main features of group delay functions are the additive and high resolution properties. The Fourier transform (FT) phase is generally featureless due to random polority and wrapping. But the group delay function which is defined as the negative derivative of phase, can be processed to derive significant information such as peaks and valleys in the spectral envelope. In this paper, we show an application of group delay function to solve the segmentation problem in speech. In the proposed method a new signal is generated by symmetrising the short term energy function. The minimum phase group delay function of this signal is computed, the valleys of which correspond to segment boundaries. The proposed technique was tested on manually segmented digit utterances of the TI-DIGITS database. The overall correct segmentation performance is 77.8%. Digitwise recognition performance on the correctly segmented database is 87.1%
NATURAL SOUNDING TTS BASED ON SYLLABLE-LIKE UNITS
"... In this work we describe a new ”syllable-like ” speech unit that is suitable for concatenative speech synthesis. These units are automatically generated using a group delay based segmentation algorithm and acoustically correspond to the form C ∗ VC ∗ (C: consonant, V: vowel). The effectiveness of th ..."
Abstract
- Add to MetaCart
In this work we describe a new ”syllable-like ” speech unit that is suitable for concatenative speech synthesis. These units are automatically generated using a group delay based segmentation algorithm and acoustically correspond to the form C ∗ VC ∗ (C: consonant, V: vowel). The effectiveness of the unit is demonstrated by synthesizing natural-sounding speech in Tamil, a regional Indian language. Significant quality improvement is obtained if bisyllable units are also used, rather than just monosyllables, with results far superior to the traditional diphone-based approach. An important advantage of this approach is the elimination of prosody rules. Since f0 is part of the target cost, the unit selection procedure chooses the best unit from among the many candidates. The naturalness of the synthesized speech demonstrates the effectiveness of this approach. 1.

