Results 1 -
6 of
6
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 156 (37 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Particle swarm optimization for sorted adapted Gaussian mixture models
- IEEE Trans. Audio, Speech, Lang. Process
, 2009
"... Abstract—Recently, we introduced the sorted Gaussian mixture models (SGMMs) algorithm providing the means to tradeoff per-formance for operational speed and thus permitting the speed-up of GMM-based classification schemes. The performance of the SGMM algorithm depends on the proper choice of the sor ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Recently, we introduced the sorted Gaussian mixture models (SGMMs) algorithm providing the means to tradeoff per-formance for operational speed and thus permitting the speed-up of GMM-based classification schemes. The performance of the SGMM algorithm depends on the proper choice of the sorting function, and the proper adjustment of its parameters. In the present work, we employ particle swarm optimization (PSO) and an appropriate fitness function to find the most advantageous parameters of the sorting function. We evaluate the practical significance of our approach on the text-independent speaker verification task utilizing the NIST 2002 speaker recognition evaluation (SRE) database while following the NIST SRE ex-perimental protocol. The experimental results demonstrate a superior performance of the SGMM algorithm using PSO when compared to the original SGMM. For comprehensiveness we also compared these results with those from a baseline Gaussian mix-ture model–universal background model (GMM-UBM) system. The experimental results suggest that the performance loss due to speed-up is partially mitigated using PSO-derived weights in a sorted GMM-based scheme. Index Terms—Gaussian mixture model–universal background model (GMM-UBM), particle swarm optimization (PSO), sorted GMM, speed-up, text-independent speaker verification. I.
Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling
"... Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actual ..."
Abstract
- Add to MetaCart
(Show Context)
Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actually comparing them. An important question is how much the choice of a clustering method matters for the final pattern recognition application. Our goal is to provide a thorough experimental comparison of clustering methods for text-independent speaker verification. We consider parametric Gaussian mixture model (GMM) and non-parametric vector quantization (VQ) model using the best known clustering algorithms including iterative (K-means, random swap, expectation-maximization), hierarchical (pairwise nearest neighbor, split, split-and-merge), evolutionary (genetic algorithm), neural (self-organizing map) and fuzzy (fuzzy C-means) approaches. We study recognition accuracy, processing time, clustering validity, and correlation of clustering quality and recognition accuracy. Experiments from these complementary observations indicate clustering is not a critical task in speaker recognition and the choice of the algorithm should be based on computational complexity and simplicity of the implementation. This is mainly because of three reasons: the data is not clustered, large models are used and only the best algorithms are considered. For low-order models, choice of the algorithm, however, can have a significant effect. Index Terms – Clustering methods, speaker recognition, vector quantization, Gaussian mixture model, universal background modelList of abbreviations ANN Artificial neural network DET Detection error trade-off
eFr.O. Clustering methods Speaker recognition Vector quantization
"... Gaussian mixture model Universal background model iou ssio methods for text-independent speaker verification. We consider parametric Gaussian mixture model on (Bim aims differ features (Huang et al., 2001) are sensitive to noise and channel recognition; see Ramachandran et al. (2002), Kinnunen and L ..."
Abstract
- Add to MetaCart
(Show Context)
Gaussian mixture model Universal background model iou ssio methods for text-independent speaker verification. We consider parametric Gaussian mixture model on (Bim aims differ features (Huang et al., 2001) are sensitive to noise and channel recognition; see Ramachandran et al. (2002), Kinnunen and Li (2010) for an overview. Speaker models can be divided into gener-ative and discriminative models. Generative models characterize the distribution of the feature vectors within the classes (speakers), whereas discriminative modeling focuses on modeling the decision boundary between the classes. For generative modeling, vector
A Gaussian Selection Method for Speaker Verification with Short Utterances
"... Abstract. Speaker recognition systems frequently use GMM-MAP method for modeling speakers. This method represents the speaker using a Gaussian mixture. However, in this mixture not all Gaussian components are truly representative of the speaker. In order to remove the model redundancy, this work pro ..."
Abstract
- Add to MetaCart
Abstract. Speaker recognition systems frequently use GMM-MAP method for modeling speakers. This method represents the speaker using a Gaussian mixture. However, in this mixture not all Gaussian components are truly representative of the speaker. In order to remove the model redundancy, this work proposes a Gaussian selection method to achieve a new GMM model only with the more representative Gaussian components. The results of speaker verification experiments applying the proposal show a similar performance to the baseline; however, the speaker models used have a reduction of 80% compared to the speaker model used as the baseline. Our proposal was also applied to speaker recognition system with short test signals of 15, 5 and 3 seconds obtaining an improvement in EER of 0.43%, 2.64 % and 1.60%, respectively, compared to the baseline. The application of this method in real or embedded speaker verification systems could be very useful for reducing computational and memory cost.
A Gaussian Selection Method for Speaker Verification with Short Utterances
"... How to cite Complete issue More information about this article Journal's homepage in redalyc.org Scientific Information System ..."
Abstract
- Add to MetaCart
How to cite Complete issue More information about this article Journal's homepage in redalyc.org Scientific Information System