Results 1 -
4 of
4
Robust Text-Independent Speaker Identification over Telephone Channels
- IEEE Trans. on Speech and Audio Processing
, 1997
"... This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: extracting features that are robust against ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: extracting features that are robust against channel variations, and transforming the speaker models to compensate for channel effects. First, an experimental study shows that optimizing the front end processing of the speech signal can significantly improve speaker recognition performance. A new filterbank design is introduced to improve the robustness of the speech spectrum computation in the front-end unit. Next, a new feature based on spectral slopes is described. Its ability to discriminate between speakers is shown to be superior to that of the traditional cepstrum. This feature can be used alone or combined with the cepstrum. The second part of the paper presents two model transformation methods that further reduce channel effe...
The Modified Group Delay Function And Its Application To Phoneme
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
, 2003
"... We explore a new spectral representation of speech signals through group delay functions. The group delay functions by themselves are noisy and difficult to interpret owing to zeroes that are close to the unit circle in the z-domain and these clutter the spectra. A new modified group delay function ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We explore a new spectral representation of speech signals through group delay functions. The group delay functions by themselves are noisy and difficult to interpret owing to zeroes that are close to the unit circle in the z-domain and these clutter the spectra. A new modified group delay function [1] that reduces the effects of zeroes close to the unit circle is used. Assuming that this new function is minimum phase, the modified group delay spectrum is converted to a sequence of cepstral coefficients. A preliminary phoneme recogniser is built using features derived from these cepstra. Results are compared with those obtained from features derived from the traditional mel frequency cepstral coefficients (MFCC). The baseline MFCC performance is 34.7%, while that of the best modified group delay cepstrum is 39.2%. The performance of the composite MFCC feature, which includes the derivatives and double derivatives, is 60.7%, while that of the composite modified group delay feature is 57.3%. When these two composite features are combined, # 2% improvement in performance is achieved (62.8%). When this new system is combined with linear frequency cepstra (LFC) [2], the system performance results in another # 0.8% improvement (63.6%).
M.: New algorithm for spectral smoothing and envelope modification for LP-PSOLA synthesis
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
, 1994
"... The final quality of a concatenation synthesis system is directly related to the continuity of the spectrum at the con-catenation point. Due to the subjective auditory masking, if we minimize the spectral distortion in the formant frequen-cies, the quality will increase significantly. In this paper ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The final quality of a concatenation synthesis system is directly related to the continuity of the spectrum at the con-catenation point. Due to the subjective auditory masking, if we minimize the spectral distortion in the formant frequen-cies, the quality will increase significantly. In this paper we present, along with results concerning pitch marking, an algorithm capable of modifying the LPC envelope in a flex-ible way which is the heart of a spectral smoothing module for a diphone-based Linear Prediction Pitch-Synchronous Overlap-Add (LP-PSOLA) concatenation system. 1.
An Automatic Algorithm For Segmenting And Labelling A Connected Digit Sequence
, 2000
"... Group delay functions provide an alternative representation of signal information. The main features of group delay functions are the additive and high resolution properties. The Fourier transform (FT) phase is generally featureless due to random polority and wrapping. But the group delay function w ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Group delay functions provide an alternative representation of signal information. The main features of group delay functions are the additive and high resolution properties. The Fourier transform (FT) phase is generally featureless due to random polority and wrapping. But the group delay function which is defined as the negative derivative of phase, can be processed to derive significant information such as peaks and valleys in the spectral envelope. In this paper, we show an application of group delay function to solve the segmentation problem in speech. In the proposed method a new signal is generated by symmetrising the short term energy function. The minimum phase group delay function of this signal is computed, the valleys of which correspond to segment boundaries. The proposed technique was tested on manually segmented digit utterances of the TI-DIGITS database. The overall correct segmentation performance is 77.8%. Digitwise recognition performance on the correctly segmented database is 87.1%

