Results 1 - 10
of
45
du Preez, “Application-independent evaluation of speaker detection
- Computer Speech and Language
, 2006
"... We present a Bayesian analysis of the evaluation of speaker detection performance. We use expectation of utility to confirm that likelihood-ratio is both an optimum and application-independent form of output for speaker detection systems. We point out that the problem of likelihood-ratio calculation ..."
Abstract
-
Cited by 79 (3 self)
- Add to MetaCart
(Show Context)
We present a Bayesian analysis of the evaluation of speaker detection performance. We use expectation of utility to confirm that likelihood-ratio is both an optimum and application-independent form of output for speaker detection systems. We point out that the problem of likelihood-ratio calculation is equivalent to the problem of optimization of decision thresholds. It is shown that the decision cost that is used in the existing NIST evaluations effectively forms a utility (a proper scoring rule) for the evaluation of the quality of likelihood-ratio presentation. As an alternative, a logarithmic utility (a strictly proper scoring rule) is proposed. Finally, an information-theoretic interpretation of the expected logarithmic utility is given. It is hoped that this analysis and the proposed evaluation method will promote the use of likelihood-ratio detector output rather than decision output. 1.
The Expected Performance Curve: a New Assessment Measure for Person Authentication
, 2004
"... ROC and DET curves are often used in the field of person authentication to assess the quality of a model or even to compare several models. We argue in this paper that this measure can be misleading as it compares performance measures that cannot be reached simultaneously by all systems. We propose ..."
Abstract
-
Cited by 61 (35 self)
- Add to MetaCart
(Show Context)
ROC and DET curves are often used in the field of person authentication to assess the quality of a model or even to compare several models. We argue in this paper that this measure can be misleading as it compares performance measures that cannot be reached simultaneously by all systems. We propose instead new curves, called Expected Performance Curves (EPC). These curves enable the comparison between several systems according to a criterion, decided by the application, which is used to set thresholds according to a separate validation set. A free sofware is available to compute these curves. A real case study is used throughout the paper to illustrate it. Finally, note that while this study was done on an authentication problem, it also applies to most 2-class classification tasks.
CONFIDENCE MEASURES FOR MULTIMODAL IDENTITY VERIFICATION
, 2002
"... Multimodal fusion for identity verification has already shown great improvement compared to unimodal algorithms. In this paper, we propose to integrate confidence measures during the fusion process. We present a comparison of three different methods to generate such confidence information from unim ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
(Show Context)
Multimodal fusion for identity verification has already shown great improvement compared to unimodal algorithms. In this paper, we propose to integrate confidence measures during the fusion process. We present a comparison of three different methods to generate such confidence information from unimodal identity verification systems. These methods can be used either to enhance the performance of a multimodal fusion algorithm or to obtain a confidence level on the decisions taken by the system. All the algorithms are compared on the same benchmark database, namely XM2VTS, containing both speech and face information. Results show that some confidence measures did improve statistically significantly the performance, while other measures produced reliable confidence levels over the fusion decisions.
Real-Time Speaker Identification and Verification
- ACCEPTED FOR PUBLICATION IN IEEE TRANS. SPEECH & AUDIO PROCESSING
"... In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the ..."
Abstract
-
Cited by 34 (10 self)
- Add to MetaCart
In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7 % can be reached in 0.84 seconds on average when the length of test utterance is 30.4 seconds.
The NIST Speaker Recognition Evaluations: 1996-2001
, 1998
"... We discuss the history and purposes of the NIST evaluations of speaker recognition performance. We cover the sites that have participated, the performance measures used, and the formats used to report results. We consider the extent to which there has been measurable progress over the years. In part ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
We discuss the history and purposes of the NIST evaluations of speaker recognition performance. We cover the sites that have participated, the performance measures used, and the formats used to report results. We consider the extent to which there has been measurable progress over the years. In particular, we examine apparent performance improvements seen in the 2001 evaluation. Information for prospective participants is included. 1.
Person Authentication by Voice : A Need For Caution
- in "Proc. Eurospeech’03
, 2003
"... Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown by the diversity of information that has been circulating on this matter in the media and general public over the past year. In a press release initiated by the AFCP and further elaborated in collaboration with the SpLC ISCA-SIG, the two groups herein discuss and present a summary of the current state of scientific knowledge and technological development in the field of speaker recognition, in accessible wording for nonspecialists. Our main conclusion is that, despite the existence of technological solutions to some constrained applications, at the present time, there is no scientific process that enables one to uniquely characterize a person’s voice or to identify with absolute certainty an individual from his or her voice. 1.
Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004
- in Proceedings 4th International Conference on Language Resources and Evaluation
, 2004
"... This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for textindependent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The pape ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
This paper discusses some of the factors that should be considered when designing a speech corpus collection to be used for textindependent speaker recognition evaluation. The factors include telephone handset type, telephone transmission type, language, and (non-telephone) microphone type. The paper describes the design of the new corpus collection being undertaken by the Linguistic Data Consortium (LDC) to support the 2004 and subsequent NIST speech recognition evaluations. Some preliminary information on the resulting 2004 evaluation test set is offered. 1.
Hautamäki, “Long-term F0 modeling for text-independent speaker recognition
- in Int. Conf. on Speech and Computer (SPECOM’2005
, 2005
"... ..."
(Show Context)
Application of Time-Frequency Principal Component Analysis to Speaker Verification
- Digital Signal Processing
, 2000
"... This article presents the RIMO/ELISA speaker verification system which has been used in the 1999 NIST speaker recognition evaluation. This system is based on a new technique for analyzing speech signals called timefrequency principal component (TFPC) analysis. This technique consists in extracti ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This article presents the RIMO/ELISA speaker verification system which has been used in the 1999 NIST speaker recognition evaluation. This system is based on a new technique for analyzing speech signals called timefrequency principal component (TFPC) analysis. This technique consists in extracting principal components from the contextual covariance matrix, which is the covariance matrix of a sequence of vectors expanded by their temporal context. The database used for the experiments is a subset of the Switchboard corpus. 2000 Academic Press Key Words: time-frequency principal components (TFPC); contextual principal components (CPC); contextual covariance matrix; speaker verifica- tion; speaker recognition; speaker detection; switchboard