Results 1  10
of
15
A GENERATIVEDISCRIMINATIVE FRAMEWORK USING ENSEMBLE METHODS FOR TEXTDEPENDENT SPEAKER VERIFICATION
"... Speaker Verification can be treated as a statistical hypothesis testing problem. The most commonly used approach is the likelihood ratio test (LRT), which can be shown to be optimal using the NeymannPearson lemma. However, in most practical situations the NeymannPearson lemma does not apply. In th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Speaker Verification can be treated as a statistical hypothesis testing problem. The most commonly used approach is the likelihood ratio test (LRT), which can be shown to be optimal using the NeymannPearson lemma. However, in most practical situations the NeymannPearson lemma does not apply. In this paper, we present a more robust approach that makes use of a hybrid generativediscriminative framework for textdependent speaker verification. Our algorithm makes use of a generative models to learn the characteristics of a speaker and then discriminative models to discriminate between a speaker and an impostor. One of the advantages of the proposed algorithm is that it does not require us to retrain the generative model. The proposed model, on an average, yields 36.41 % relative improvement in ERR over a LRT.
Improving Speaker Verification With Figure Of Merit Training
 in Proc. of ICASSP, 2002
, 2002
"... A novel discriminative training method of Gaussian mixture model for textindependent speaker verification, Figure of Merit (FOM) training, is proposed in this paper. FOM training aims at maximizing the FOM of a ROC curve by adjusting the model parameters, rather than only approximating the underlyi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A novel discriminative training method of Gaussian mixture model for textindependent speaker verification, Figure of Merit (FOM) training, is proposed in this paper. FOM training aims at maximizing the FOM of a ROC curve by adjusting the model parameters, rather than only approximating the underlying distribution of acoustic observations of each speaker that Maximum Likelihood Estimation does. The textindependent speaker verification experiments were conducted on the 1996 NIST Speaker Recognition Evaluation corpus. Compared with standard EM training method, FOM training provides significantly improved performance, e.g. the detection cost function (DCF) was reduced to 0.0286 from 0.0369 and to 0.0537 from 0.0826 in matched and mismatched conditions respectively.
Improving GMMUBM Speaker Verification Using Discriminative Feedback Adaptation
"... The Gaussian Mixture Model Universal Background Model (GMMUBM) system is one of the predominant approaches for textindependent speaker verification, because both the target speaker model and the impostor model (UBM) have generalization ability to handle “unseen ” acoustic patterns. However, since ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The Gaussian Mixture Model Universal Background Model (GMMUBM) system is one of the predominant approaches for textindependent speaker verification, because both the target speaker model and the impostor model (UBM) have generalization ability to handle “unseen ” acoustic patterns. However, since GMMUBM uses a common antimodel, namely UBM, for all target speakers, it tends to be weak in rejecting impostors ’ voices that are similar to the target speaker’s voice. To overcome this limitation, we propose a discriminative feedback adaptation (DFA) framework that reinforces the discriminability between the target speaker model and the antimodel, while preserving the generalization ability of the GMMUBM approach. This is achieved by adapting the UBM to a target speaker dependent antimodel based on a minimum verification squarederror criterion, rather than estimating the model from scratch by applying the conventional discriminative training schemes. The results of experiments conducted on the NIST2001SRE database show that DFA substantially improves the performance of the conventional GMMUBM approach. 1
DISCRIMINATIVE FEEDBACK ADAPTATION FOR GMMUBM SPEAKER VERIFICATION
"... The GMMUBM system is the current stateoftheart approach for textindependent speaker verification. The advantage of the approach is that both target speaker model and impostor model (UBM) have generalization ability to handle “unseen ” acoustic patterns. However, since GMMUBM uses a common anti ..."
Abstract
 Add to MetaCart
(Show Context)
The GMMUBM system is the current stateoftheart approach for textindependent speaker verification. The advantage of the approach is that both target speaker model and impostor model (UBM) have generalization ability to handle “unseen ” acoustic patterns. However, since GMMUBM uses a common antimodel, namely UBM, for all target speakers, it tends to be weak in rejecting impostors ’ voices that are similar to the target speaker’s voice. To overcome this limitation, we propose a discriminative feedback adaptation (DFA) framework that reinforces the discriminability between the target speaker model and the antimodel, while preserves the generalization ability of the GMMUBM approach. This is done by adapting the UBM to a targetspeakerdependent antimodel based on a minimum verification squarederror criterion, rather than estimating from scratch by applying the conventional discriminative training schemes. The results ults of experiments conducted on the NIST2001SRE database show that DFA substantially improves the performance of the conventional nal GMMUBM approach. Index Terms—Discriminative feedback adaptation, ation, loglikelihood ratio, minimum verification squarederrorerror linear regression, speaker verification 1.
Speaker Verification Using Coded Speech
"... Abstract. The implementation of a pseudo textindependent Speaker Verification system is described. This system was designed to use only information extracted directly from the coded parameters embedded in the ITUT G.729 bitstream. Experiments were performed over the YOHO database [1]. The feature ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The implementation of a pseudo textindependent Speaker Verification system is described. This system was designed to use only information extracted directly from the coded parameters embedded in the ITUT G.729 bitstream. Experiments were performed over the YOHO database [1]. The feature vector as a shorttime representation of speech consists of 16 LPCCepstral coefficients, as well as residual information appended in the form of a pitch estimate and a measure of vocality of the speech. The robustness in verification accuracy is also studied. The results show that while speech coders, G.729 in particular, introduce coding distortions that lead to verification performance degradation, proper augmented use of unconventional information nevertheless leads to a competitive performance on par with that of a wellstudied traditional system which does not involve signal coding and transmission. The result suggests that speaker verification over a cell phone connection remains feasible even though the signal has been encoded to 8 Kb/s. 1
unknown title
, 2012
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and educational use, including for instruction at the author’s institution and sharing with colleagues. Other uses, including reproduction and distribution, or ..."
Abstract
 Add to MetaCart
(Show Context)
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and educational use, including for instruction at the author’s institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
EARLY RESULTS FOR A NONPARAMETRIC HIDDEN MARKOV MODEL FOR TALKER CHARACTERIZATION
"... ..."
(Show Context)
COMPARISON OF DISCRIMINATIVE TRAINING METHODS FOR SPEAKER VERIFICATION
"... The maximum likelihood estimation (MLE) and Bayesian maximum aposteriori (MAP) adaptation methods for Gaussian mixture models (GMM) have proven to be effective and efficient for speaker verification, even though each speaker model is trained using only his own training utterances. Discriminative cr ..."
Abstract
 Add to MetaCart
(Show Context)
The maximum likelihood estimation (MLE) and Bayesian maximum aposteriori (MAP) adaptation methods for Gaussian mixture models (GMM) have proven to be effective and efficient for speaker verification, even though each speaker model is trained using only his own training utterances. Discriminative criteria aim at increasing discriminability by using outofclass data. In this paper, we consider the speaker verification task using three discriminative training methods to compare performance. Comparisons are discussed for the maximum mutual information (MMI), minimum classification error (MCE) and figure of merit (FOM) criteria. Experiments on the 1996 NIST speaker recognition evaluation data set show that FOM training method outperforms the other two methods for speaker verification in terms of system performance. Meanwhile, logistic regression is investigated and successfully employed as a discriminative scorenormalization technique. 1.
PERFORMANCE OF DISCRIMINATIVELY TRAINED AUDITORY FEATURES ON
"... The design of acoustic models involves two main tasks: feature extraction and data modeling; and hidden Markov modeling (HMM) is commonly used in contemporary automatic speech recognition. In the past, discriminative training has been applied successfully to refine HMM parameters that are initially ..."
Abstract
 Add to MetaCart
(Show Context)
The design of acoustic models involves two main tasks: feature extraction and data modeling; and hidden Markov modeling (HMM) is commonly used in contemporary automatic speech recognition. In the past, discriminative training has been applied successfully to refine HMM parameters that are initially trained by EM algorithm. Recently, we applied discriminative training in the feature extraction process. We proposed a novel Discriminative Auditory Feature extraction method (DAF) in which filters are discriminatively trained from data. In DAF, we do not make any assumptions on the functional form of the auditory filters except that they have to be smooth and triangularlike. On the method of discriminative training, we also proposed an alternative approach to finding the competing hypotheses which we call Nnearest hypotheses (as opposed to the traditional Nbest hypotheses). By applying the two new ideas and the new robust auditory features proposed by Li et al. of Bell Labs, we reduce the overall word error rate (WER) by 30.27 % over ICSLP2002 Aurora2 baseline on multicondition training. Similarly, we obtain a relative WER reduction of 38.42% over ICSLP2002 Aurora3 baseline. 1.
Determination of A Priori Decision Thresholds for PhrasePrompted Speaker Verification
, 2000
"... Speaker verification systems are often compared based on an equal error rate (equal chance of false acceptance and false rejection) obtained by adjusting a decision threshold during verification. However, the threshold should be found before verification because the identity of a claimant is actuall ..."
Abstract
 Add to MetaCart
Speaker verification systems are often compared based on an equal error rate (equal chance of false acceptance and false rejection) obtained by adjusting a decision threshold during verification. However, the threshold should be found before verification because the identity of a claimant is actually unknown in realworld situations. This paper presents a novel method to determine the decision thresholds of speaker verification systems using enrollment data only. In the method, a speaker model is trained to differentiate the voice of the corresponding speaker and that of a general population. This is accomplished by using the speaker's utterances and those of some other speakers (denoted as antispeakers) as the training set. Then, an operation environment is simulated by presenting the utterances of some pseudoimpostors (none of them is an antispeaker) to the speaker model. The threshold is adjusted until the chance of falsely accepting a pseudoimpostor falls below an application d...