Results 1 - 10
of
10
Support vector machines using GMM supervectors for speaker verification
- IEEE Signal Processing Letters
, 2006
"... pretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
pretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States
SVM based speaker verification using a GMM supervector kernel and NAP variability compensation
- in Proceedings of ICASSP, 2006
"... Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent d ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique. 1.
Factor analysis simplified
- in ICASSP
, 2005
"... We show how the factor analysis model for speaker verification can be successfully implemented using some fast approximations which result in minor degradations in accuracy and open up the possibility of training the model on very large databases such as the union of all of the Switchboard corpora. ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
We show how the factor analysis model for speaker verification can be successfully implemented using some fast approximations which result in minor degradations in accuracy and open up the possibility of training the model on very large databases such as the union of all of the Switchboard corpora. We tested our algorithms on the NIST 1999 evaluation set (carbon data as well as electret). Using warped cepstral features we obtained equal error rates of about 6.3 % and minimum detection costs of about 0.022. 1.
Experiments in session variability modelling for speaker verification
- in Proc. ICASSP
, 2006
"... Presented is an approach to modelling session variability for GMMbased text-independent speaker verification incorporating a constrained session variability component in both the training and testing procedures. The proposed technique reduces the data labelling requirements and removes discrete cate ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
Presented is an approach to modelling session variability for GMMbased text-independent speaker verification incorporating a constrained session variability component in both the training and testing procedures. The proposed technique reduces the data labelling requirements and removes discrete categorisation needed by previous techniques and provides superior performance. Experiments on Mixer conversational telephony data show improvements of as much as 46 % in equal error rate over a baseline system. In this paper the algorithm used for the enrollment procedure is described in detail. Results are also presented investigating the response of the technique to short test utterances and varying session subspace dimension. 1.
On the use of factor analysis with restricted target data in speaker verification
- In Odyssey 2010
, 2010
"... Factor Analysis (FA) based techniques have become the state of the art in automatic speaker verification thanks to their great ability to model session variability. This ability, in turn, relies on accurately estimating a session variability subspace for the operating conditions of interest. In case ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Factor Analysis (FA) based techniques have become the state of the art in automatic speaker verification thanks to their great ability to model session variability. This ability, in turn, relies on accurately estimating a session variability subspace for the operating conditions of interest. In cases such as forensic speaker recognition, however, this requirement cannot always be satisfied due to the very limited quantity of appropriate development data. As a first step toward understanding the application of FA in these restricted data scenarios, this work analyzes the performance of FA with very limited development data and then explores several FA estimation methods that augment the target domain data with examples from a data-rich domain. Experiments on NIST SRE 2006 microphone data conditions demonstrate that telephone data can be effectively exploited to improve performance over a baseline system.
SVM Speaker Verification using Session Variability Modelling and GMM Supervectors
"... Abstract. This paper demonstrates that modelling session variability during GMM training can improve the performance of a GMM supervector SVM speaker verification system. Recently, a method of modelling session variability in GMM-UBM systems has led to significant improvements when the training and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. This paper demonstrates that modelling session variability during GMM training can improve the performance of a GMM supervector SVM speaker verification system. Recently, a method of modelling session variability in GMM-UBM systems has led to significant improvements when the training and testing conditions are subject to session effects. In this work, session variability modelling is applied during the extraction of GMM supervectors prior to SVM speaker model training and classification. Experiments performed on the NIST 2005 corpus show major improvements over the baseline GMM supervector SVM system. 1
INTERSESSION VARIABILITY COMPENSATION FOR LANGUAGE DETECTION
"... Gaussian mixture models (GMM) have become one of the standard acoustic approaches for Language Detection. These models are typically incorporated to produce a log-likelihood ratio (LLR) verification statistic. In this framework, the intersession variability within each language becomes an adverse fa ..."
Abstract
- Add to MetaCart
Gaussian mixture models (GMM) have become one of the standard acoustic approaches for Language Detection. These models are typically incorporated to produce a log-likelihood ratio (LLR) verification statistic. In this framework, the intersession variability within each language becomes an adverse factor degrading the accuracy. To address this problem, we formulate the LLR as a function of the GMM parameters concatenated into normalized mean supervectors, and estimate the distribution of each language in this (high dimensional) supervector space. The goal is to de-emphasize the directions with the largest intersession variability. We compare this method with two other popular intersession variability compensation methods known as Nuisance Attribute Projection (NAP) and Within-Class Covariance Normalization (WCCN). Experiments on the NIST LRE 2003 and NIST LRE 2005 speech corpora show that the presented technique reduces the error by 50 % relative to the baseline, and performs competitively with the NAP and WCCN approaches. Fusion results with a phonotactic component are also presented. Index Terms — WCCN-LLR, NAP, ISV 1.
unknown title
"... This paper examines combining both relevance MAP and subspace speaker adaptation processes to train GMM speaker models for use in speaker verification systems with a particular focus on short utterance lengths. The subspace speaker adaptation method involves developing a speaker GMM mean supervector ..."
Abstract
- Add to MetaCart
This paper examines combining both relevance MAP and subspace speaker adaptation processes to train GMM speaker models for use in speaker verification systems with a particular focus on short utterance lengths. The subspace speaker adaptation method involves developing a speaker GMM mean supervector as the sum of a speaker-independent prior distribution and a speaker dependent offset constrained to lie within a low-rank subspace, and has been shown to provide improvements in accuracy over ordinary relevance MAP when the amount of training data is limited. It is shown through testing on NIST SRE data that combining the two processes provides speaker models which lead to modest improvements in verification accuracy for limited data situations, in addition to improving the performance of the speaker verification system when a larger amount of available training data is available. Index Terms: speaker verification, factor analysis, probabilistic PCA.
2 ADVANCES IN CHANNEL COMPENSATION FOR SVM SPEAKER RECOGNITION
"... Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study the problem for speaker recognition using support vector machines (SVMs). We perform channel compensation in SVM modeling by removing non-speaker nuisance dimensions in the SVM expansion space ..."
Abstract
- Add to MetaCart
Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study the problem for speaker recognition using support vector machines (SVMs). We perform channel compensation in SVM modeling by removing non-speaker nuisance dimensions in the SVM expansion space via projections. Training to remove these dimensions is accomplished via an eigenvalue problem. The eigenvalue problem attempts to reduce multisession variation for the same speaker, reduce different channel effects, and increase “distance ” between different speakers. We apply our methods to a subset of the Switchboard 2 corpus. Experiments show dramatic improvement in performance for the cross-channel case. 1.
MIT CSAIL,
"... Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evalua ..."
Abstract
- Add to MetaCart
Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals— language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationallyscalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks. 1.

