Results 1 -
6 of
6
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Blind stochastic feature transformation for channel robust speaker verification
- J. OF VLSI SIGNAL PROCESSING
, 2006
"... To improve the reliability of telephone-based speaker verification systems, channel com-pensation is indispensable. However, it is also important to ensure that the channel com-pensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresse ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
To improve the reliability of telephone-based speaker verification systems, channel com-pensation is indispensable. However, it is also important to ensure that the channel com-pensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresses this problem by a blind feature-based transformation ap-proach in which the transformation parameters are determined online without any a priori knowledge of channel characteristics. Specifically, a composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics of enrollment speech. Based on the difference between the claimant’s speech and the com-posite model, a stochastic matching type of approach is proposed to transform the claimant’s speech to a region close to the enrollment speech. Therefore, the algorithm can estimate the transformation online without the necessity of detecting the handset types. Experimental results based on the 2001 NIST evaluation set show that the proposed transformation ap-proach achieves significant improvement in both equal error rate and minimum detection cost as compared to cepstral mean subtraction, Znorm, and short-time Gaussianization.
A New Approach to Channel Robust Speaker Verification via Constrained Stochastic Feature Transformation
- in Proc. ICSLP’04
"... This paper proposes a constrained stochastic feature transformation algorithm for robust speaker verification. The algorithm computes the feature transformation parameters based on the statistical difference between a test utterance and a composite GMM formed by combining the speaker and background ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper proposes a constrained stochastic feature transformation algorithm for robust speaker verification. The algorithm computes the feature transformation parameters based on the statistical difference between a test utterance and a composite GMM formed by combining the speaker and background models. The transformation is then used to transform the test utterance to fit the clean speaker model and background model before verification. By implicitly constraining the transformation, the transformed features can fit both models simultaneously. Experimental results based on the 2001 NIST evaluation set show that the proposed algorithms achieves significant improvement in both equal error rate and minimum detection cost when compared to cepstral mean subtraction and Z-norm. The performance of the proposed transformation approach is also slightly better than the short-time Gaussianization method proposed in [1].
Extraction of Speaker Features from Different Stages of DSR Front-ends for Distributed Speaker Verification
, 2004
"... The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. Th ..."
Abstract
- Add to MetaCart
The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the e#ect of extracting spectral features from di#erent stages of the front-end processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a back-end speaker verification system to reduce the acoustic mismatch between di#erent handsets. Because the feature vectors obtained from the back-end server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was # Correspondence should be sent to M.W. Mak, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong. Email: enmwmak@polyu.edu.hk. Tel: (852)27666257. Fax: (852)23628439.
Cluster-Dependent Feature Transformation with Divergence-Based Out-of-Handset Rejection for . . .
, 2003
"... This paper proposes a divergence-based cluster selector with out-of-handset (OOH) rejection capability to identify the `unseen' handsets. This is achieved by measuring the Jensen di#erence between the selector's output and a constant vector with identical elements. The resulting cluster selector is ..."
Abstract
- Add to MetaCart
This paper proposes a divergence-based cluster selector with out-of-handset (OOH) rejection capability to identify the `unseen' handsets. This is achieved by measuring the Jensen di#erence between the selector's output and a constant vector with identical elements. The resulting cluster selector is combined with a feature-based channel compensation algorithm for telephone-based speaker verification. Utterances whose handsets are identified as `unseen' will be normalized by cepstral mean subtraction (CMS). On the other hand, if the handset can be identified (considered as `seen'), a corresponding set of cluster-dependent transformation parameters will be used to transform the utterances. Experiments based on ten handsets of the HTIMIT corpus show that using the cluster-dependent transformation parameters to transform the utterances with correctly identified handsets and processing those utterances with `unseen' handsets by CMS achieve the best result.

