Results 1 - 10
of
41
Speaker verification using Adapted Gaussian mixture models
- Digital Signal Processing
, 2000
"... In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but ef ..."
Abstract
-
Cited by 1010 (42 self)
- Add to MetaCart
(Show Context)
In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. © 2000 Academic Press Key Words: speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation. 1.
Generalized Linear Discriminant Sequence Kernels For Speaker Recognition
, 2002
"... Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather tha ..."
Abstract
-
Cited by 95 (23 self)
- Add to MetaCart
Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather than a probability at the frame level. We introduce a novel sequence kernel derived from generalized linear discriminants. The kernel has several advantages. First, the kernel uses an explicit expansion into "feature space"--this property allows all of the support vectors to be collapsed into a single vector creating a small speaker model. Second, the kernel retains the computational advantage of generalized linear discriminants trained using mean-squared error training. Finally, the kernel shows dramatic reductions in equal error rates over standard mean-squared error training in matched and mismatched conditions on a NIST speaker recognition task.
Computational Auditory Scene Recognition
- In IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing
, 2001
"... v 1 ..."
(Show Context)
The Impact Of Speech Recognition On Speech Synthesis
, 2002
"... Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis syste ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. We further speculate on future areas where ASR may impact synthesis and vice versa.
Speaker recognition with polynomial classifiers
- IEEE Transactions on Speech and Audio Processing
"... ..."
A Sequence Kernel and its Application to Speaker Recognition
- in Neural Information Processing Systems 14
, 2001
"... A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training.
CipherVOX: Scalable Low-Complexity Speaker Verification
- in Proceedings of The IEEE International Conference on Acoustics, Speech and Signal Processing
, 2000
"... Biometrics is gaining strong support for access control in the ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Biometrics is gaining strong support for access control in the
Speaker Identification Using A Polynomial-Based Classifier
- in International Symposium on Signal Processing and its Applications
, 1999
"... A new set of techniques for using polynomial-based classifiers for speaker identification is examined. This set of techniques makes application of polynomial classifiers practical for speaker identification by enabling discriminative training for large data sets. The training technique is shown to b ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A new set of techniques for using polynomial-based classifiers for speaker identification is examined. This set of techniques makes application of polynomial classifiers practical for speaker identification by enabling discriminative training for large data sets. The training technique is shown to be invariant to fixed liftering and affine transforms of the feature space. Efficient methods for new class addition, lowcomplexity retraining, and identification across large populations are given. The method is illustrated by application to the YOHO database.
Speaker identification in the presence of room reverberation
- in Proc. IEEE Biometrics Symp
, 2007
"... ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such as telephone have been shown to have markedly lower accuracy levels. In this paper, we consider a reverberant test environment and its impact on SI. We measure the degradation in SI accuracy when the system is trained with clean signals but tested with reverberant signals. Next, we propose a method whereby training signals are first filtered with a family of reverberation filters prior to construction of speaker models; the reverberation filters are designed to approximate expected test room reverberation. Reverberant test signals are then scored against the family of speaker models and identification is made. Our research demonstrates that by approximating test room reverberation in the training signals, the channel mismatch problem can be reduced and SI accuracy increased.
Segmental approaches for automatic speaker verification
- Digital Signal Processing
, 2000
"... Speech is composed of different sounds (acoustic segments). Speakers differ in their pronunciation of these sounds. The segmental approaches described in this paper are meant to exploit these differences for speaker verification purposes. For such approaches, the speech is divided into different cl ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Speech is composed of different sounds (acoustic segments). Speakers differ in their pronunciation of these sounds. The segmental approaches described in this paper are meant to exploit these differences for speaker verification purposes. For such approaches, the speech is divided into different classes, and the speaker modeling is done for each class. The speech segmentation applied is based on automatic language independent speech processing tools that provide a segmentation of the speech requiring neither phonetic nor orthographic transcriptions of the speech data. Two different speaker modeling approaches, based on multilayer perceptrons (MLPs) and on Gaussian mixture models (GMMs), are studied. The MLPbased segmental systems have performance comparable to that of the global MLP-based systems, and in the mismatched train-test conditions slightly better results are obtained with the segmental MLP system. The segmental GMM systems gave poorer results than the equivalent global GMM systems.