Results 1 -
7 of
7
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Adaptive Decision Fusion for Multi-Sample Speaker Verification over GSM Networks
- in Eurospeech’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some pseudo-impostors during enrollment. As the fusion weights depend on the prior scores, in this paper, we propose to adapt the prior scores during verification based on the likelihood of the claimant being an impostor. To this end, a pseudo-imposter GMM score model is created for each speaker. During verification, the claimant's scores are fed to the score model to obtain a likelihood for adapting the prior score. Experimental results based on the GSM-transcoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed prior score adaptation approach provides a relative error reduction of 15% when compared with our previous approach where the prior scores are non-adaptive.
On consistent fusion on multimodal biometrics
- in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’06
, 2006
"... Audio-visual (AV) biometrics offer complementary information sources, and the use of both voice and facial images for biometric authentication has recently become economically feasible. Therefore, multi-modality adaptive fusion, combining audio and visual information, offers an efficient tool for su ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Audio-visual (AV) biometrics offer complementary information sources, and the use of both voice and facial images for biometric authentication has recently become economically feasible. Therefore, multi-modality adaptive fusion, combining audio and visual information, offers an efficient tool for substantially improving the classification performance. In terms of implementation, we propose to integrate an audio classifier (based on Gaussian mixture models) and a visual classifier (based on FaceIT, a commercially available software) into a well-established mixture-of-expert fusion architecture. In addition, a consistent fusion strategy is introduced as a baseline fusion scheme, which establishes the lower bound of the “consistent region ” in the FAR-FRR ROC. Our simulation results indicate that the prediction performance of the proposed adaptive fusion schemes fall in the consistent region. More importantly, the notion of consistent fusion can also facilitate the selection of the best modalities to fuse. 1.
Multi-Sample Fusion with Constrained Feature Transformation for Robust Speaker Verification
"... This paper proposes a single-source multi-sample fusion approach to text-independent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal we ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes a single-source multi-sample fusion approach to text-independent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal weight for all scores, this paper proposes assigning a different weight to each score, where the weights are made dependent on the difference between the score values and a speaker-dependent reference score obtained during enrollment. Because the fusion weights depend on the verification scores, a technique called constrained stochastic feature transformation is applied to minimize the mismatch between enrollment and verification data in order to enhance the scores' reliability. Experimental results based on the 2001 NIST evaluation set show that the proposed fusion approach outperforms the equal-weight approach by 22% in terms of equal error rate and 16% in terms of minimum detection cost.
Multi-Sample Data-Dependent Fusion Of Sorted Score Sequences For
- in Proc. IEEE ICASSP’04
, 2004
"... In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, w ..."
Abstract
- Add to MetaCart
In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, we have recently proposed a fusion model that incorporates the score distribution by making the fusion weights dependent on the dispersion between the framebased scores and the prior score statistics obtained from training data. As the fusion weights are data-dependent, the positions of scores in the score sequences become detrimental to the final fused scores. In this paper, we propose to enhance the fusion model by sorting the score sequences before fusion takes place. The fusion model was evaluated on a speaker verification task where each claimant utters two utterances in a verification session. Results demonstrate that fusion of sorted scores has the effect of maximizing the dispersion between the client scores and the impostor scores, making the verification process more reliable. Compared with our previous work where no sorting is applied, the new approach reduces the equal error rate by 11%.
Intramodal And Intermodal Fusion For Audio-Visual Biometric
"... This paper proposes a multiple-source multiple-sample fusion approach to identity verification. Fusion is performed at two levels: intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g. utterances or video shots) obtained from the same modality are linearly combined, w ..."
Abstract
- Add to MetaCart
This paper proposes a multiple-source multiple-sample fusion approach to identity verification. Fusion is performed at two levels: intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g. utterances or video shots) obtained from the same modality are linearly combined, where the combination weights are dependent on the difference between the score values and a user-dependent reference score obtained during enrollment. This is followed by intermodal fusion in which the means of intramodal fused scores obtained from different modalities are fused. The final fused score is then used for decision making. This twolevel fusion approach was applied to audio-visual biometric authentication, and experimental results based on the XM2VTSDB corpus show that the proposed fusion approach can achieve an error rate reduction of up to 83%.

