Results 1 
6 of
6
Robust Speaker Verification From GSMTranscoded Speech Based On Decision Fusion And Feature Transformation
 in Proc. IEEE ICASSP’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the optimal weights for fusing the scores based on the score distribution of the independent utterances and our prior knowledge about the score statistics. More specifically, we use enrollment data to compute the mean scores of client speakers and impostors and consider them to be the prior scores. During verification, we set the fusion weights for individual speech patterns to be a function of the dispersion between the scores of these speech patterns and the prior scores. Experimental results based on the GSMtranscoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed fusion algorithm can increase the dispersion between the mean speaker scores and the mean impostor scores. Compared with a baseline approach where equal weights are assigned to all scores, the proposed approach provides a relative error reduction of 19%.
Applying Articulatory Features To TelephoneBased Speaker Verification
 in Proc. IEEE International Conference on Acoustic, Speech, and Signal Processing
, 2004
"... This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephonebased speaker verification. To minimize the acoustic mismatch caused by different handsets, handsetspecific normalization is applied to the spectral features before the AFs are extrac ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephonebased speaker verification. To minimize the acoustic mismatch caused by different handsets, handsetspecific normalization is applied to the spectral features before the AFs are extracted. Experimental results based on 150 speakers using 10 different handsets show that AFs contain useful speakerspecific information for speaker verification and the use of handsetspecific normalization significantly lowers the error rates under the handset mismatched conditions. Results also demonstrate that fusing the scores obtained from an AFbased system with those obtained from a spectral featurebased (MFCC) system helps lower the error rates of the individual systems.
Adaptive Decision Fusion for MultiSample Speaker Verification over GSM Networks
 in Eurospeech’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some pseudoimpostors during enrollment. As the fusion weights depend on the prior scores, in this paper, we propose to adapt the prior scores during verification based on the likelihood of the claimant being an impostor. To this end, a pseudoimposter GMM score model is created for each speaker. During verification, the claimant's scores are fed to the score model to obtain a likelihood for adapting the prior score. Experimental results based on the GSMtranscoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed prior score adaptation approach provides a relative error reduction of 15% when compared with our previous approach where the prior scores are nonadaptive.
MultiSample Fusion with Constrained Feature Transformation for Robust Speaker Verification
"... This paper proposes a singlesource multisample fusion approach to textindependent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal we ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper proposes a singlesource multisample fusion approach to textindependent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal weight for all scores, this paper proposes assigning a different weight to each score, where the weights are made dependent on the difference between the score values and a speakerdependent reference score obtained during enrollment. Because the fusion weights depend on the verification scores, a technique called constrained stochastic feature transformation is applied to minimize the mismatch between enrollment and verification data in order to enhance the scores' reliability. Experimental results based on the 2001 NIST evaluation set show that the proposed fusion approach outperforms the equalweight approach by 22% in terms of equal error rate and 16% in terms of minimum detection cost.
MultiSample DataDependent Fusion Of Sorted Score Sequences For
 in Proc. IEEE ICASSP’04
, 2004
"... In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, w ..."
Abstract
 Add to MetaCart
In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, we have recently proposed a fusion model that incorporates the score distribution by making the fusion weights dependent on the dispersion between the framebased scores and the prior score statistics obtained from training data. As the fusion weights are datadependent, the positions of scores in the score sequences become detrimental to the final fused scores. In this paper, we propose to enhance the fusion model by sorting the score sequences before fusion takes place. The fusion model was evaluated on a speaker verification task where each claimant utters two utterances in a verification session. Results demonstrate that fusion of sorted scores has the effect of maximizing the dispersion between the client scores and the impostor scores, making the verification process more reliable. Compared with our previous work where no sorting is applied, the new approach reduces the equal error rate by 11%.
Extraction of Speaker Features from Different Stages of DSR Frontends for Distributed Speaker Verification
, 2004
"... The ETSI has recently published a frontend processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the frontend terminals so that acoustic distortion caused by communication channels can be avoided. Th ..."
Abstract
 Add to MetaCart
The ETSI has recently published a frontend processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the frontend terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the e#ect of extracting spectral features from di#erent stages of the frontend processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a backend speaker verification system to reduce the acoustic mismatch between di#erent handsets. Because the feature vectors obtained from the backend server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was # Correspondence should be sent to M.W. Mak, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong. Email: enmwmak@polyu.edu.hk. Tel: (852)27666257. Fax: (852)23628439.