Results 1 - 10
of
14
Feature And Score Normalization For Speaker Verification Of Cellular Data
- IN PROC. ICASSP
, 2003
"... This paper presents some experiments with feature and score normalization for text-independent speaker verification of cellular data. The speaker verification system is based on cepstral features and Gaussian mixture models with 1024 components. The following methods, which have been proposed for fe ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
This paper presents some experiments with feature and score normalization for text-independent speaker verification of cellular data. The speaker verification system is based on cepstral features and Gaussian mixture models with 1024 components. The following methods, which have been proposed for feature and score normalization, are reviewed and evaluated on cellular data: cepstral mean subtraction (CMS), variance normalization, feature warping, T-norm, Z-norm and the cohort method. We found that the combination of feature warping and T-norm gives the best results on the NIST 2002 test data (for the one-speaker detection task). Compared to a baseline system using both CMS and variance normalization and achieving a 0.410 minimal decision cost function (DCF), feature warping and T-norm respectively bring 8% and 12% relative reductions, whereas the combination of both techniques yields a 22% relative reduction, reaching a DCF of 0.320. This result approaches the state-of-the-art performance level obtained for speaker verification with land-line telephone speech.
Robust speaker recognition in noisy conditions
- IEEE TRANS. AUDIO, SPEECH LANG. PROCESS
, 2007
"... ..."
(Show Context)
New MAP estimates for speaker recognition
- in Proc. EUROSPEECH
, 2003
"... We report the results of some experiments which demonstrate that eigenvoice MAP and eigenphone MAP are at least as effective as classical MAP for discriminative speaker modeling on SWITCHBOARD data. We show how eigenvoice MAP can be modified to yield a new model-based channel compensation technique ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
We report the results of some experiments which demonstrate that eigenvoice MAP and eigenphone MAP are at least as effective as classical MAP for discriminative speaker modeling on SWITCHBOARD data. We show how eigenvoice MAP can be modified to yield a new model-based channel compensation technique which we call eigenchannel MAP. When compared with multi-channel training, eigenchannel MAP was found to reduce speaker identification errors by 50%. 1.
Unsupervised Online Adaptation for Speaker Verification Over The Telephone
- IN PROC. SPEAKER ODYSSEY
, 2004
"... This paper presents experiments of unsupervised adaptation for a speaker detection system. The system used is a standard speaker verification system based on cepstral features and Gaussian mixture models. Experiments were performed on cellular speech data taken from the NIST 2002 speaker detection e ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents experiments of unsupervised adaptation for a speaker detection system. The system used is a standard speaker verification system based on cepstral features and Gaussian mixture models. Experiments were performed on cellular speech data taken from the NIST 2002 speaker detection evaluation. There was a total of about 30.000 trials involving 330 target speakers and more than 90% of impostor trials. Unsupervised adaptation significantly increases the system accuracy, with a reduction of the minimal detection cost function (DCF) from 0.33 for the baseline system to 0.25 with unsupervised online adaptation. Two incremental adaptation modes were tested, either by using a fixed decision threshold for adaptation, or by using the a posteriori probability of the true target for weighting the adaptation. Both methods provide similar results in the best configurations, but the latter is less sensitive to the actual threshold value.
Databases For Speaker Recognition: Activities In Cost250 Working Group 2
- in Proceedings COST250 Workshop on Speaker Recognition in Telephony
, 1999
"... Working Group (WG) 2 of the COST250 Action "Speaker Recognition in Telephony" has dealt with databases for speaker recognition. The present final report gives an overview of the activities in this WG, and presents its main results. The first result is an overview of 36 existing databases t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Working Group (WG) 2 of the COST250 Action "Speaker Recognition in Telephony" has dealt with databases for speaker recognition. The present final report gives an overview of the activities in this WG, and presents its main results. The first result is an overview of 36 existing databases that has been used in speaker recognition research. Those include both public and proprietary databases. As part of the overview, some of the variability represented in those databases is analyzed. The second result is the publicly available Polycost database, a telephony-speech multi-session database with 134 speakers from all around Europe. Together with pre-defmed experiment specifications, this database is a useful resource to aid in the assessment of speaker recognition systems in general, and in comparing systems across sites, in particular.
Non Directly Acoustic Process for Costless Speaker Recognition and Indexation
, 1999
"... recognition and indexation systems, based on nondirectly -acoustic processing. This new method is specifically designed to lower the complexity of the modeling phase, compared to classical techniques, as well as to decrease the required amount of learning data, making it particularly well-suited to ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
recognition and indexation systems, based on nondirectly -acoustic processing. This new method is specifically designed to lower the complexity of the modeling phase, compared to classical techniques, as well as to decrease the required amount of learning data, making it particularly well-suited to on-line learning (needed for speaker indexation) and use on embedded systems.
On Factors Affecting MFCC-Based Speaker Recognition Accuracy
, 2005
"... We evaluate the accuracy of an MFCC-based speaker recognition method. We analyse the recognition results using speech signal from everyday life environments. We study the mismatch effects of text-dependency, sample length, language, style of speaking, cheating, microphone, sample quality, and noise. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We evaluate the accuracy of an MFCC-based speaker recognition method. We analyse the recognition results using speech signal from everyday life environments. We study the mismatch effects of text-dependency, sample length, language, style of speaking, cheating, microphone, sample quality, and noise. The experiments on a self-collected corpus of 30 subjects indicate that any mismatch degrades recognition accuracy. The most dominating factors are noise, microphone, disguise, and degrading of the sample rate and quality. Speech-related factors and sample length are less critical.
Robust Speaker Recognition in Unknown Noisy Conditions
, 2005
"... This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of spe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information of the noise. This paper describes a method, named universal compensation (UC), that combines multi-condition training and the missing-feature method to model noises with unknown temporal-spectral characteristics. Multi-condition training is conducted using simulated noisy data with limited noise varieties, providing a “coarse ” compensation for the noise, and the missing-feature method refines the compensation by ignoring noise variations outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the UC model for real-world applications. These include the generation
An Investigation into Better Frequency Warping for Time-Varying Speaker Recognition
"... Abstract — Performance degradation has been observed in presence of time intervals in practical speaker recognition systems. Researchers usually resort to enrollment data augmentation, speaker model adaptation, and variable verification threshold to alleviate the time-varying impact. However, in thi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract — Performance degradation has been observed in presence of time intervals in practical speaker recognition systems. Researchers usually resort to enrollment data augmentation, speaker model adaptation, and variable verification threshold to alleviate the time-varying impact. However, in this paper, efforts have been made in the feature domain and an investigation into better frequency warping for the target task has been done. Two methods to determine the discrimination sensitivity of frequency bands are explored: an energy-based F-ratio measure and a performance-driven one. Frequency warping is performed according to the discrimination sensitivity curves of the whole frequency range. Experimental results show that the proposed features outperform both MFCCs and LFCCs, and to some extent, alleviate the time-varying impact on speaker recognition. I.
Unsupervised Online Adaptation for Speaker Verification over the Telephone
, 2004
"... This paper presents experiments of unsupervised adaptation for a speaker detection system. The system used is a standard speaker verification system based on cepstral features and Gaussian mixture models. Experiments were performed on cellular speech data taken from the NIST 2002 speaker detection e ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents experiments of unsupervised adaptation for a speaker detection system. The system used is a standard speaker verification system based on cepstral features and Gaussian mixture models. Experiments were performed on cellular speech data taken from the NIST 2002 speaker detection evaluation. There was a total of about 30.000 trials involving 330 target speakers and more than 90 % of impostor trials. Unsupervised adaptation significantly increases the system accuracy, with a reduction of the minimal detection cost function (DCF) from 0.33 for the baseline system to 0.25 with unsupervised online adaptation. Two incremental adaptation modes were tested, either by using a fixed decision threshold for adaptation, or by using the a posteriori probability of the true target for weighting the adaptation. Both methods provide similar results in the best configurations, but the latter is less sensitive to the actual threshold value. 1.