Results 1 - 10
of
10
Speaker verification using Adapted Gaussian mixture models
- Digital Signal Processing
, 2000
"... In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but ef ..."
Abstract
-
Cited by 385 (15 self)
- Add to MetaCart
In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. © 2000 Academic Press Key Words: speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation. 1.
Handset-Dependent Background Models For Robust Text-Independent Speaker Recognition
- In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing
, 1997
"... This paper studies the effects of handset distortion on telephone -based speaker recognition performance, resulting in the following observations: (1) the major factor in speaker recognition errors is whether the handset type (e.g., electret, carbon) is different across training and testing, not whe ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
This paper studies the effects of handset distortion on telephone -based speaker recognition performance, resulting in the following observations: (1) the major factor in speaker recognition errors is whether the handset type (e.g., electret, carbon) is different across training and testing, not whether the telephone lines are mismatched, (2) the distribution of speaker recognition scores for true speakers is bimodal, with one mode dominated by matched handset tests and the other by mismatched handsets, (3) cohort-based normalization methods derive much of their performance gains from implicitly selecting cohorts trained with the same handset type as the claimant, and (4) utilizing a handset-dependent background model which is matched to the handset type of the claimant's training data sharpens and separates the true and false speaker score distributions. Results on the 1996 NIST Speaker Recognition Evaluation corpus show that using handset-matched background models reduces false accep...
Speaker Verification Through Large Vocabulary Continuous Speech Recognition
- Proc. ICSLP
, 1996
"... We present a study of a speaker verification system for telephone data based on large-vocabulary speech recognition. After describing the recognition engine, we give details of the verification algorithm and draw comparisons with other systems. The system has been tested on a test set taken from the ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We present a study of a speaker verification system for telephone data based on large-vocabulary speech recognition. After describing the recognition engine, we give details of the verification algorithm and draw comparisons with other systems. The system has been tested on a test set taken from the Switchboard corpus of conversational telephone speech, and we present results showing how performance varies with length of test utterance, and whether or not the training data has been transcribed. The dominant factor in performance appears to be channel or handset mismatch between training and testing data.
Speaker Tracking and Detection with Multiple Speakers
"... We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset -independent impos ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two-speaker and silence hidden Markov model (HMM) with a minimum state duration constraint and Gaussian mixture model (GMM) state distributions adapted from a single gender- and handset -independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM'ing via a novel parameter interpolation extension of HNORM for use with files of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multispeaker development evaluation dataset. 1. INTRODUCTION As speech starts being exploited fully as an information source, multispeaker tracking and detection systems are increasingly in demand in a wide range of applications from indexing and archiving of broadcast news sources to...
Automated biometrics
- 2000 Biometrics Consortium Workshop
, 2001
"... Identity verification becomes a challenging task when it has to be automated with high accuracy and non-repudiability. The existing methods such as passwords and photo identity cards are inadequate to meet such heavy demands. Automated biometrics-based authentication methods can meet all the demands ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Identity verification becomes a challenging task when it has to be automated with high accuracy and non-repudiability. The existing methods such as passwords and photo identity cards are inadequate to meet such heavy demands. Automated biometrics-based authentication methods can meet all the demands. An overview of the fast developing and exciting area of automated biometrics is provided in this paper. Several popular biometrics including fingerprint, face, iris are briefly described and an introduction to evaluation methods is presented. 1
Speaker recognition — general classifier approaches and data fusion methods
- Pattern Recognition
, 2002
"... ..."
Bayesian Approach based-Decision in Speaker Verification
- In Proceedings of 2001: A Speaker Odyssey
, 2001
"... Considering Bayesian decision framework applied in the context of speaker verification, this paper presents a new way of handling troublesome anti-speaker model by proposing a redefinition of hypotheses involved in the classical statistical hypothesis test. This new definition of hypotheses is then ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Considering Bayesian decision framework applied in the context of speaker verification, this paper presents a new way of handling troublesome anti-speaker model by proposing a redefinition of hypotheses involved in the classical statistical hypothesis test. This new definition of hypotheses is then implemented through a speaker independent normalization technique, named MAP approach. Besides supporting these new hypotheses, MAP approach takes the advantages of projecting likelihood scores into a probabilistic domain and therefore of providing the decision threshold with bounded and meaningful values.
Speaker Recognition in the Text-Independent Domain Using Keyword Hidden Markov Models.” unpublished master’s thesis
, 2005
"... ..."
Speaker Indexing In Large Audio Databases Using Anchor Models
"... This paper introduces the technique of anchor modeling in the applications of speaker detection and speaker indexing. The anchor modeling algorithm is refined by pruning the number of models needed. The system is applied to the speaker detection problem where its performance is shown to fall short o ..."
Abstract
- Add to MetaCart
This paper introduces the technique of anchor modeling in the applications of speaker detection and speaker indexing. The anchor modeling algorithm is refined by pruning the number of models needed. The system is applied to the speaker detection problem where its performance is shown to fall short of the state-of-the-art Gaussian Mixture Model with Universal Background Model (GMM-UBM) system. However, it is further shown that its computational efficiency lends itself to speaker indexing for searching large audio databases for desired speakers. Here, excessive computation may prohibit the use of the GMM-UBM recognition system. Finally, the paper presents a method for cascading anchor model and GMM-UBM detectors for speaker indexing. This approach benefits from the efficiency of anchor modeling and high accuracy of GMM-UBM recognition.
242 Forensic Linguistics On decision making in forensic casework
"... ABSTRACT In forensic applications of speaker recognition it is necessary to be able to specify a confidence level for a decision that two sets of recordings have been produced by the same speaker (or by different speakers). Forensic phoneticians are sometimes criticized because they find it impossib ..."
Abstract
- Add to MetaCart
ABSTRACT In forensic applications of speaker recognition it is necessary to be able to specify a confidence level for a decision that two sets of recordings have been produced by the same speaker (or by different speakers). Forensic phoneticians are sometimes criticized because they find it impossible to provide ‘hard ’ estimates of the confidence level of their expert opinions. This paper investigates to what extent the problem can be solved by deploying automatic speaker verification algorithms, to work alone or to support the work of forensic phoneticians. It is shown that, although heavily dependent on operating conditions, one of the advantages of automatic systems is that their performance is in fact measurable. We construct a confidence measure which takes into account the past performance of the automatic system, the operating conditions and the probative value of the speech evidence, as well as the non-speech evidence. It is very important to note that such a confidence measure will never lead to a fully automatic procedure, since it still requires human input to weigh the non-speech evidence as well as human explanation of the procedure followed, and, finally, human interpretation. However, when all conditions are met, this procedure is able to (1) provide an interpretative measure in the individual forensic case and (2) join together the strengths of the human interpretation of the non-speech evidence and the automatic interpretation of the speech evidence, so that finally the joint performance of human and machine is better than the performance of one of them in isolation.

