DMCA
Speaker verification using Adapted Gaussian mixture models (2000)
Cached
Download Links
Venue: | Digital Signal Processing |
Citations: | 1010 - 42 self |
Citations
11964 | Maximum Likelihood from Incomplete Data via the EM Algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...diagonal matrix GMMs outperform full matrix GMMs. Given a collection of training vectors, maximum likelihood model parameters are estimated using the iterative expectation–maximization (EM) algorithm =-=[17]-=-. The EM algorithm iteratively refines the GMM parameters to monotonically increase the likelihood of the estimated model for the observed feature vectors, i.e., for iterations k and k + 1, p(X | λ (k... |
3781 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ...adapted from the gender-independent UBM described in Section 3.3. Each model was adapted with two minutes of data and only means were adapted (αw i = αv i = 0). We computed the Bhattacharyya distance =-=[29]-=- between corresponding Gaussians in the speaker model and UBM and counted the number of zero distances (those Gaussians which were unchanged in adaptation). For the male speakers, we found that 24% of... |
704 | Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,”
- Gauvain, Lee
- 1994
(Show Context)
Citation Context ...aptation of Speaker Model In the GMM-UBM system, we derive the hypothesized speaker model by adapting the parameters of the UBM using the speaker’s training speech and a form of Bayesian adaptation 6 =-=[18, 27]-=-. Unlike the standard approach of maximum likelihood training of a model for the speaker independently of the UBM, the basic idea in the adaptation approach is to derive the speaker’s model by updatin... |
631 |
Robust text-independent speaker identification using Gaussian mixture models,”
- Reynolds, Rose
- 1995
(Show Context)
Citation Context ...ature vectors, i.e., for iterations k and k + 1, p(X | λ (k+1) )>p(X| λ (k) ). Generally, five iterations are sufficient for parameter convergence. The EM equations for training a GMM can be found in =-=[3, 18]-=-. As discussed later, parameters for 5 GMMs with M>1 using diagonal covariance matrices can model distributions of feature vectors with correlated elements. Only in the degenerate case of M = 1 is the... |
368 |
Speaker Identification and Verification using Gaussian Mixture Speaker Models. Speech Communication,
- Reynolds
- 1995
(Show Context)
Citation Context ...-independent speaker identification was first described in [1–3]. An Extension of GMM-based systems to speaker verification was described and evaluated on several publicly available speech corpora in =-=[4, 5]-=-. In more recent years, GMM-based systems have been applied to the annual NIST Speaker Recognition Evaluations (SRE). These systems, fielded by different sites, have consistently produced state-of-the... |
364 | The DET curve in assessment of detection task performance
- Martin, Doddington, et al.
- 1997
(Show Context)
Citation Context ...er the two sets of scores and the probability of miss and probability of false alarm are computed for each threshold. The error probabilities are then plotted as detection error tradeoff (DET) curves =-=[35]-=- to show system performance. The first set of experiments, which examined the composition of the UBM, were conducted on the 1998 summer-development data. Training data were selected from the 1997 SRE ... |
126 |
Comparison of Background Normalization Methods for Text-Independent Speaker Verification’,
- Reynolds
- 1997
(Show Context)
Citation Context ...ion Evaluations (SRE). These systems, fielded by different sites, have consistently produced state-of-the-art performance [6, 7]. In particular, a GMM-based system developed by MIT Lincoln Laboratory =-=[8]-=-, employing Bayesian adaptation of speaker models from a universal background model and handset-based score normalization, has been the basis of the top performing systems in the NIST SREs since 1996.... |
104 |
Experimental evaluation of features for robust speaker identification,
- Reynolds
- 1994
(Show Context)
Citation Context ...ynomial temporal fit over ±2 feature vectors (two to the left and two to the right over time) from the current vector [21]. The choice of features is based on previous good performance and results in =-=[22]-=- comparing several standard speech features for speaker identification. Finally, the feature vectors are channel normalized to remove linear channel convolutional effects. Since we are using cepstral ... |
99 |
A gaussian mixture modeling approach to text-independent speaker identification,
- REYNOLDS
- 1992
(Show Context)
Citation Context ...n components can be considered to be modeling the underlying broad phonetic sounds that characterize a person’s voice. A more detailed discussion of how GMMs apply to speaker modeling can be found in =-=[2, 3]-=-. The advantages of using a GMM as the likelihood function are that it is computationally inexpensive, is based on a well-understood statistical model, and, for text-independent tasks, is insensitive ... |
98 |
On the use of Instantaneous and Transitional Spectral Information in Speaker Recognition”,
- Soong, E
- 1988
(Show Context)
Citation Context ...processing. Finally, delta cepstra are computed using a first order orthogonal polynomial temporal fit over ±2 feature vectors (two to the left and two to the right over time) from the current vector =-=[21]-=-. The choice of features is based on previous good performance and results in [22] comparing several standard speech features for speaker identification. Finally, the feature vectors are channel norma... |
78 |
The use of cohort normalized scores for speaker veri®cation. In:
- Rosenberg, DeLong, et al.
- 1992
(Show Context)
Citation Context ... approach is to use a set of other speaker models to cover the space of the alternative hypothesis. In various contexts, this set of other speakers has been called likelihood ratio sets [10], cohorts =-=[11]-=-, and background speakers [4]. Given a set of N background speaker models {λ1,...,λN }, the alternative hypothesis model is represented by p(X | λ hyp ) = F(p(X | λ1), . . ., p(X | λN )), (3) where F(... |
64 |
RASTA-PLP speech analysis technique.
- Hermansky, Morgan, et al.
- 1992
(Show Context)
Citation Context ... to remove linear channel convolutional effects. Since we are using cepstral features, linear convolutional effects appear as additive biases. Both cepstral mean subtraction (CMS) and RASTA filtering =-=[23]-=- have been used successfully and, in general, both methods have comparable performance for single speaker detection tasks. When training and recognition speech are collected from different microphones... |
56 |
Htimit and llhdb: Speech corpora for the study of handset transducer effects.
- Reynolds
- 1997
(Show Context)
Citation Context ...n electret microphone handset (ELEC). The handset detector is a simple maximum likelihood classifier in which handset dependent GMMs were trained using the Lincoln Laboratory Handset Database (LLHDB) =-=[31, 32]-=-. A 1024 mixture GMM was trained using speech from 40 speakers spoken over two carbon-button microphone handsets and another 1024 mixture GMM was trained using speech from the same 40 speakers spoken ... |
53 |
Speaker verification using randomized phrase prompting.
- Higgins, Bahler, et al.
- 1992
(Show Context)
Citation Context ...ing. The first approach is to use a set of other speaker models to cover the space of the alternative hypothesis. In various contexts, this set of other speakers has been called likelihood ratio sets =-=[10]-=-, cohorts [11], and background speakers [4]. Given a set of N background speaker models {λ1,...,λN }, the alternative hypothesis model is represented by p(X | λ hyp ) = F(p(X | λ1), . . ., p(X | λN ))... |
47 |
Speaker background models for connected digit password speaker verification.
- Rosenberg, Parthasarathy
- 1996
(Show Context)
Citation Context ... a single model, λbkg, is trained to represent the alternative hypothesis. Research on this approach has focused on selection and composition of the speakers and speech used to train the single model =-=[14, 15]-=-. The main advantage of this approach is that a single speaker-independent model can be trained once for a particular task and then used for all hypothesized speakers in that task. It is also possible... |
41 | Automatic speaker recognition using gaussian mixture speaker models,
- Reynolds
- 1995
(Show Context)
Citation Context ...-independent speaker identification was first described in [1–3]. An Extension of GMM-based systems to speaker verification was described and evaluated on several publicly available speech corpora in =-=[4, 5]-=-. In more recent years, GMM-based systems have been applied to the annual NIST Speaker Recognition Evaluations (SRE). These systems, fielded by different sites, have consistently produced state-of-the... |
37 | Handset-Dependent Background Models for Robust Text-Independent Speaker Recognition. Submitted to
- Heck, Weintraub
- 1997
(Show Context)
Citation Context ...t model can be trained once for a particular task and then used for all hypothesized speakers in that task. It is also possible to use multiple background models tailored to specific sets of speakers =-=[15, 16]-=-. In this paper we will use a single background model for all hypothesized speakers and we refer to this as the universal background model (UBM). 3. GMM-UBM VERIFICATION SYSTEM Given the canonical fra... |
36 |
The Effects of Handset Variability on Speaker RecognitionPerformance Experiments on the Switchboard Corpus. In
- Reynolds
- 1996
(Show Context)
Citation Context ...microphones or channels (e.g., different telephone handsets and/or lines), this is a crucial step for achieving good recognition accuracy. However, as seen in several NIST SRE results and reported in =-=[24]-=-, this linear compensation does not completely eliminate the performance loss under mismatched microphone conditions. In this paper, we describe one approach to address this remaining mismatch using a... |
28 |
Magnitude-only estimation of handset nonlinearity wi th application to speaker recognition
- Quatieri, Reynolds, et al.
- 1998
(Show Context)
Citation Context ...pecifically for differences in microphone nonlinearities across train and test data is to operate on the waveform with nonlinear transformations, rather than adjusting the log-likelihood ratio scores =-=[25]-=-. 3.3. Universal Background Model In the GMM-UBM system we use a single, speaker-independent background model to represent p(X | λhyp ). The UBM is a large GMM trained to represent the speaker-indepen... |
24 |
A.Carlson,“The effect of telephone transmission degradation on speaker recognition performance”, Acoustics, Speech, and Signal Processing,
- Reynolds, Zissman, et al.
- 1995
(Show Context)
Citation Context ...ng addresses linear channel effects, but there is evidence that handset transducer effects are nonlinear in nature and are thus difficult to remove from the features prior to training and recognition =-=[25, 30]-=-. Because the handset effects remain in the features, the speaker’s model will represent the speaker’s acoustic characteristics coupled with the distortions caused by the handset from which the traini... |
23 |
Text independent speaker identification using automatic acoustic segmentation’,
- Rose, Reynolds
- 1990
(Show Context)
Citation Context ...om the hypothesized speaker S and H1: Y is not from the hypothesized speaker S. The optimum test 3 to decide between these two hypotheses is a likelihood ratio test given by � p(Y | H0) ≥ θ accept H0 =-=(1)-=- p(Y | H1) <θ reject H0, where p(Y | Hi), i = 0, 1, is the probability density function for the hypothesis Hi evaluated for the observed speech segment Y ,alsoreferredtoasthelikelihood of the hypothes... |
21 |
A speaker verification system using alpha-nets
- Carey, Parris, et al.
- 1991
(Show Context)
Citation Context ...und speaker set. The second major approach to alternative hypothesis modeling is to pool speech from several speakers and train a single model. Various terms for this single model are a general model =-=[13]-=-, a world model, and a universal background model [8]. Given a collection of speech samples from a large number of speakers representative of the population of speakers expected during recognition, a ... |
21 |
Likelihood normalization for speaker verification using a phoneme- and speaker-independent model
- Matsui, Furui
- 1995
(Show Context)
Citation Context ... a single model, λbkg, is trained to represent the alternative hypothesis. Research on this approach has focused on selection and composition of the speakers and speech used to train the single model =-=[14, 15]-=-. The main advantage of this approach is that a single speaker-independent model can be trained once for a particular task and then used for all hypothesized speakers in that task. It is also possible... |
20 |
The NIST speaker recognition evaluation—Overview, methodology, systems, results, perspective
- Doddington, Przybocki, et al.
- 2000
(Show Context)
Citation Context ...years, GMM-based systems have been applied to the annual NIST Speaker Recognition Evaluations (SRE). These systems, fielded by different sites, have consistently produced state-of-the-art performance =-=[6, 7]-=-. In particular, a GMM-based system developed by MIT Lincoln Laboratory [8], employing Bayesian adaptation of speaker models from a universal background model and handset-based score normalization, ha... |
17 |
Similarity Normalization Method for Speaker Verification Based on a Posteriori Probability.
- Matsui, Furui
- 1994
(Show Context)
Citation Context ...ch as average or maximum, of the likelihood values from the background speaker set. The selection, size, and combination of the background speakers has been the subject of much research (for example, =-=[4, 11, 12]-=-). In general, it has been found that to obtain the best performance with this approach requires the use of speaker-specific background speaker sets. This can be a drawback in applications using a lar... |
16 |
Approaches to speaker detection and tracking
- Dunn, Reynolds, et al.
- 2000
(Show Context)
Citation Context ...aker, the task becomes multispeaker detection. In this paper we will focus on the core single-speaker detection task. Discussion of systems that handle the multispeaker detection task can be found in =-=[9]-=-. 2 We will use the terms verification and detection interchangeably in this paper.sReynolds, Quatieri, and Dunn: Speaker Verification Using Adapted GMMs 21 FIG. 1. Likelihood ratio-based speaker dete... |
16 | Speaker verification through large vocabulary continuous speech recognition.
- Neuman, Gillick, et al.
- 1996
(Show Context)
Citation Context ...ormation about the speaker conveyed in the temporal speech signal are not used. The modeling and exploitation of these higher-levels of information may be where approaches based on speech recognition =-=[19]-=- produce benefits in the future. To date, however, these approaches (e.g., large vocabulary or phoneme recognizers) have basically been used only as means to compute likelihood values, without explici... |
13 |
The NIST 1999 speaker recognition evaluation–an overview
- Martin, Przybocki
(Show Context)
Citation Context ...years, GMM-based systems have been applied to the annual NIST Speaker Recognition Evaluations (SRE). These systems, fielded by different sites, have consistently produced state-of-the-art performance =-=[6, 7]-=-. In particular, a GMM-based system developed by MIT Lincoln Laboratory [8], employing Bayesian adaptation of speaker models from a universal background model and handset-based score normalization, ha... |
9 |
PC-based TMS320C30 implementation of the Gaussian Mixture model Text-Independent Speaker Recognition System
- Reynolds, Rose, et al.
- 1992
(Show Context)
Citation Context ...en used to discard silence–noise frames. The speech activity detector is a self-normalizing, energy based detector that tracks the noise floor of the signal and can adapt to changing noise conditions =-=[2, 20]-=-. The speech detector discards 20–25% of the signal fromsReynolds, Quatieri, and Dunn: Speaker Verification Using Adapted GMMs 25 conversational telephone recordings such as that in the Switchboard da... |
9 |
The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking
- Przybocki, Martin
- 1999
(Show Context)
Citation Context ...IST evaluation paradigm. The NIST SRE plans detailing the evaluation paradigm can be found in [33]. A more complete description of the NIST SRE along with detailed analysis of results can be found in =-=[7, 34]-=-. The experiments presented here show the general effects on performance of various components and parameters of the GMM-UBM system. The 1998 and 1999 NIST SRE one-speaker corpora are derived from the... |
5 |
Speaker Verification in a Time-Feature Space
- Vuuren
- 1999
(Show Context)
Citation Context ...vance factors (and hence parameterdependent adaptation coefficients α ρ i ) further allows tuning of different adaptation rates for the weights, means, and variances. However, experiments reported in =-=[28]-=- found there was only a minor gain in using parameter-dependent adaptation coefficients. In the GMM-UBM system we use a single adaptation coefficient for all parameters (αw i = αm i = αv i = ni/(ni + ... |
3 |
Text-independent speaker verification using virtual speaker based cohort normalization
- Isobe, Takahashi
- 1999
(Show Context)
Citation Context ...ch has the advantages that one can effectively use unbalanced data and can carefully control the composition of the final UBM. Still other approaches can be found in the literature (see, for example, =-=[15, 26]-=-). Over the past several SREs, our approach has been to train UBMs over subpopulations in the data and then pool the models to create the final UBM (Fig. 2b). For the 1999 NIST SRE we created a gender... |
1 |
recognition evaluation plans
- speaker
(Show Context)
Citation Context ...UBM system. Experiments are conducted on the 1998 summer-development and 1999 NIST SRE corpora using the NIST evaluation paradigm. The NIST SRE plans detailing the evaluation paradigm can be found in =-=[33]-=-. A more complete description of the NIST SRE along with detailed analysis of results can be found in [7, 34]. The experiments presented here show the general effects on performance of various compone... |