Results 1 
6 of
6
Experiments in speaker verification using factor analysis likelihood ratios
 in Odyssey: The Speaker and Language Recognition Workshop
"... We report the results of some speaker verification experiments on the NIST 1999 and NIST 2000 test sets using factor analysis likelihood ratio statistics. For the experiments on the 1999 test set we had to use a mismatched training set, namely Phases 1 and 2 of the Switchboard II corpus, to train th ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
We report the results of some speaker verification experiments on the NIST 1999 and NIST 2000 test sets using factor analysis likelihood ratio statistics. For the experiments on the 1999 test set we had to use a mismatched training set, namely Phases 1 and 2 of the Switchboard II corpus, to train the factor analysis model. Our results on this test set are are comparable to (but not better than) the best results that have been attained with standard methods (GMM likelihood ratios and handset detection). In order to experiment with well matched training and test sets, we used half of the target speakers in the NIST 2000 evaluation for testing and a disjoint set of speakers taken from Switchboard II, Phases 1 and 2 for training. In this situation we obtained an equal error rate of 7.2 % and a minimum detection cost of 0.028. These figures represent an improvement of about 25 % over standard methods. 1.
Robust Speaker Verification From GSMTranscoded Speech Based On Decision Fusion And Feature Transformation
 in Proc. IEEE ICASSP’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the optimal weights for fusing the scores based on the score distribution of the independent utterances and our prior knowledge about the score statistics. More specifically, we use enrollment data to compute the mean scores of client speakers and impostors and consider them to be the prior scores. During verification, we set the fusion weights for individual speech patterns to be a function of the dispersion between the scores of these speech patterns and the prior scores. Experimental results based on the GSMtranscoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed fusion algorithm can increase the dispersion between the mean speaker scores and the mean impostor scores. Compared with a baseline approach where equal weights are assigned to all scores, the proposed approach provides a relative error reduction of 19%.
Adaptive Decision Fusion for MultiSample Speaker Verification over GSM Networks
 in Eurospeech’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some pseudoimpostors during enrollment. As the fusion weights depend on the prior scores, in this paper, we propose to adapt the prior scores during verification based on the likelihood of the claimant being an impostor. To this end, a pseudoimposter GMM score model is created for each speaker. During verification, the claimant's scores are fed to the score model to obtain a likelihood for adapting the prior score. Experimental results based on the GSMtranscoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed prior score adaptation approach provides a relative error reduction of 15% when compared with our previous approach where the prior scores are nonadaptive.
ClusterDependent Feature Transformation for TelephoneBased Speaker Verification
 In: Proc. International Conference on Audio and VideoBased Biometric Person Authentication (AVBPA’03
, 2003
"... This paper presents a clusterbased feature transformation technique for telephonebased speaker verification when labels of the handset types are not available during the training phase. The technique combines a cluster selector with clusterdependent feature transformations to reduce the acoustic ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper presents a clusterbased feature transformation technique for telephonebased speaker verification when labels of the handset types are not available during the training phase. The technique combines a cluster selector with clusterdependent feature transformations to reduce the acoustic mismatches among different handsets. Specifically, a...
MultiSample DataDependent Fusion Of Sorted Score Sequences For
 in Proc. IEEE ICASSP’04
, 2004
"... In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, w ..."
Abstract
 Add to MetaCart
In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, we have recently proposed a fusion model that incorporates the score distribution by making the fusion weights dependent on the dispersion between the framebased scores and the prior score statistics obtained from training data. As the fusion weights are datadependent, the positions of scores in the score sequences become detrimental to the final fused scores. In this paper, we propose to enhance the fusion model by sorting the score sequences before fusion takes place. The fusion model was evaluated on a speaker verification task where each claimant utters two utterances in a verification session. Results demonstrate that fusion of sorted scores has the effect of maximizing the dispersion between the client scores and the impostor scores, making the verification process more reliable. Compared with our previous work where no sorting is applied, the new approach reduces the equal error rate by 11%.
Probabilistic Fusion of Sorted Score Sequences for Robust Speaker Verification
"... Abstract. Fusion techniques have been widely used in multimodal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modalitydependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consi ..."
Abstract
 Add to MetaCart
Abstract. Fusion techniques have been widely used in multimodal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modalitydependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consider the multiple samples extracted from a single modality as independent but coming from the same source. In this chapter, we propose a singlesource, multisample datadependent fusion algorithm for speaker verification. The algorithm is datadependent in that the fusion weights are dependent on the verification scores and the prior score statistics of claimed speakers and background speakers. To obtain the best out of the speaker’s scores, scores from multiple utterances are sorted before they are probabilistically combined. Evaluations based on 150 speakers from a GSMtranscoded corpus are presented. Results show that datadependent fusion of speaker’s scores is significantly better than the conventional score averaging approach. It was also found that the proposed fusion algorithm can be further enhanced by sorting the score sequences before they are probabilistically combined.