• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Automatic speaker recognition using gaussian mixture speaker models, (1995)

by D A Reynolds
Venue:Lincoln Lab. J.
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 41
Next 10 →

Speaker verification using Adapted Gaussian mixture models

by Douglas A. Reynolds, Thomas F. Quatieri, Robert B. Dunn - Digital Signal Processing , 2000
"... In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but ef ..."
Abstract - Cited by 1010 (42 self) - Add to MetaCart
In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. © 2000 Academic Press Key Words: speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation. 1.
(Show Context)

Citation Context

...-independent speaker identification was first described in [1–3]. An Extension of GMM-based systems to speaker verification was described and evaluated on several publicly available speech corpora in =-=[4, 5]-=-. In more recent years, GMM-based systems have been applied to the annual NIST Speaker Recognition Evaluations (SRE). These systems, fielded by different sites, have consistently produced state-of-the...

Generalized Linear Discriminant Sequence Kernels For Speaker Recognition

by William M. Campbell , 2002
"... Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather tha ..."
Abstract - Cited by 95 (23 self) - Add to MetaCart
Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather than a probability at the frame level. We introduce a novel sequence kernel derived from generalized linear discriminants. The kernel has several advantages. First, the kernel uses an explicit expansion into "feature space"--this property allows all of the support vectors to be collapsed into a single vector creating a small speaker model. Second, the kernel retains the computational advantage of generalized linear discriminants trained using mean-squared error training. Finally, the kernel shows dramatic reductions in equal error rates over standard mean-squared error training in matched and mismatched conditions on a NIST speaker recognition task.

Computational Auditory Scene Recognition

by Vesa Peltonen, Vesa Peltonen - In IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing , 2001
"... v 1 ..."
Abstract - Cited by 67 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

... distributions with multiple modes or distributions with nonlinear correlation. AsGaussian mixture density is able to approximate an arbitrary pdf with a weighted sum of N multivariate Gaussian pdf’=-=s [Reynolds95]. T-=-he Gaussian mixture density with a model order M is given by p( x λ) = pibi ( x) , (27) i = 1 where x is a d-dimensional random vector, b i (x) are the M Gaussian pdf’s, and p i are the M mixture w...

The Impact Of Speech Recognition On Speech Synthesis

by Mari Ostendorf, Ivan Bulyko , 2002
"... Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis syste ..."
Abstract - Cited by 17 (0 self) - Add to MetaCart
Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. We further speculate on future areas where ASR may impact synthesis and vice versa.

Speaker recognition with polynomial classifiers

by W Campbell, K Assaleh, C Broun - IEEE Transactions on Speech and Audio Processing
"... ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
Abstract not found

A Sequence Kernel and its Application to Speaker Recognition

by William M. Campbell - in Neural Information Processing Systems 14 , 2001
"... A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training.
(Show Context)

Citation Context

...endent recognition implies that knowledge of the text of the speech data is not used. Traditional methods for text-independent speaker recognition are vector quantization [8], Gaussian mixture models =-=[9]-=-, and artificial neural networks [8]. A state-of-the-art approach based on polynomial classifiers was presented in [7]. The polynomial approach has several advantages over traditional methods--1) it i...

CipherVOX: Scalable Low-Complexity Speaker Verification

by B. A. Fette, C. C. Broun, W. M. Campbell, C. Jaskie - in Proceedings of The IEEE International Conference on Acoustics, Speech and Signal Processing , 2000
"... Biometrics is gaining strong support for access control in the ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Biometrics is gaining strong support for access control in the
(Show Context)

Citation Context

...al methods are used to model the speaker’s speech data from the feature extraction phase. Two of the most popular approaches are the Hidden Markov Model (HMM) [2] and the Gaussian Mixture Model (GMM=-=) [3]-=-. More recently, discriminative classification techniques employing artificial neural networks, such as neural tree networks (NTN) [4], have been applied to the problem. In order to provide the best p...

Speaker Identification Using A Polynomial-Based Classifier

by K. T. Assaleh, W. M. Campbell - in International Symposium on Signal Processing and its Applications , 1999
"... A new set of techniques for using polynomial-based classifiers for speaker identification is examined. This set of techniques makes application of polynomial classifiers practical for speaker identification by enabling discriminative training for large data sets. The training technique is shown to b ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
A new set of techniques for using polynomial-based classifiers for speaker identification is examined. This set of techniques makes application of polynomial classifiers practical for speaker identification by enabling discriminative training for large data sets. The training technique is shown to be invariant to fixed liftering and affine transforms of the feature space. Efficient methods for new class addition, lowcomplexity retraining, and identification across large populations are given. The method is illustrated by application to the YOHO database.

Speaker identification in the presence of room reverberation

by Phillip L De Leon , Audrey L Trevizo - in Proc. IEEE Biometrics Symp , 2007
"... ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
ABSTRACT Speaker identification (SI) systems based on Gaussian Mixture Models (GMMs) have demonstrated high levels of accuracy when both training and testing signals are acquired in near ideal conditions. These same systems when trained and tested with signals acquired under non-ideal channels such as telephone have been shown to have markedly lower accuracy levels. In this paper, we consider a reverberant test environment and its impact on SI. We measure the degradation in SI accuracy when the system is trained with clean signals but tested with reverberant signals. Next, we propose a method whereby training signals are first filtered with a family of reverberation filters prior to construction of speaker models; the reverberation filters are designed to approximate expected test room reverberation. Reverberant test signals are then scored against the family of speaker models and identification is made. Our research demonstrates that by approximating test room reverberation in the training signals, the channel mismatch problem can be reduced and SI accuracy increased.
(Show Context)

Citation Context

...y the total number of tests. Using the TIMIT corpus (630 speakers, clean speech), approximately 24 s training signals/6 s test signals, our SI system (29 ! 1 feature vector and 32 component GMM) has 99.68% accuracy which agrees closely with that published in recent literature [3]. One well-known problem in both SI and SV is the loss of accuracy when channel distortions such as those from the telephone are present in the speech signals. For SI systems which use the NTIMIT corpus (630 speakers, telephonequality speech), accuracy of approximately 70% (30% lower than with TIMIT) has been reported [4]. When using the NIST 1999 speaker recognition evaluation corpus (230 speakers, telephone-quality speech), SI accuracy of approximately 83% has been reported [3]. For signals which come from cellular telephones, two distortions can be present: distortions due to speech coding and distortions due to packet loss. In [5], the authors passed TIMIT signals through Global System for Mobile (GSM) speech coders and measured SI accuracy of approximately 60% (40% lower compared to TIMIT). In [6], the authors considered the problem whereby a SI system is trained with clean speech but tested with speech a...

Segmental approaches for automatic speaker verification

by Dijana Petrovska-Delacrétaz , Jančernocký † , Jean Hennebert , Gérard Chollet , Dijana,Černocký Petrovska -Delacrétaz , Hennebert Jan , Jean - Digital Signal Processing , 2000
"... Speech is composed of different sounds (acoustic segments). Speakers differ in their pronunciation of these sounds. The segmental approaches described in this paper are meant to exploit these differences for speaker verification purposes. For such approaches, the speech is divided into different cl ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Speech is composed of different sounds (acoustic segments). Speakers differ in their pronunciation of these sounds. The segmental approaches described in this paper are meant to exploit these differences for speaker verification purposes. For such approaches, the speech is divided into different classes, and the speaker modeling is done for each class. The speech segmentation applied is based on automatic language independent speech processing tools that provide a segmentation of the speech requiring neither phonetic nor orthographic transcriptions of the speech data. Two different speaker modeling approaches, based on multilayer perceptrons (MLPs) and on Gaussian mixture models (GMMs), are studied. The MLPbased segmental systems have performance comparable to that of the global MLP-based systems, and in the mismatched train-test conditions slightly better results are obtained with the segmental MLP system. The segmental GMM systems gave poorer results than the equivalent global GMM systems. 
(Show Context)

Citation Context

...nemes. As we have no indications about the correspondence of the ALISP units with the usual phone units, we decided to choose, as a first experiment, not as many ALISP classes as phonetic units to ensure a proper training of all the ALISP classes. This is the reason the number of speech classes is set to 8 for all the experiments presented in this paper. 2.2. Speaker Modeling The classical way to do pattern classification in text-independent systems is to assign a unique probability density function (pdf) to the whole vector sequence. One way to build the pdf is to use Gaussian mixture models [26] in which the multivariate distribution is modeled with a weighted sum of Gaussian distributions. Another way to perform classification is to use artificial neural networks (ANNs) [16]. In previous studies [2, 11, 18, 20], ANNs have successfully been used for speaker verification. Among the different ANN architectures, multilayer perceptrons (MLPs) are often used. As explained in [3, 14, 28], the main advantages of MLPs include a discrimination-based learning procedure, a flexible architecture that permits easy use of contextual information, and weaker hypotheses about statistical distribution...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University