Results 1 - 10
of
52
Speaker verification using Adapted Gaussian mixture models
- Digital Signal Processing
, 2000
"... In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but ef ..."
Abstract
-
Cited by 385 (15 self)
- Add to MetaCart
In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. © 2000 Academic Press Key Words: speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation. 1.
Person identification using multiple cues
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1995
"... Abstract-This paper presents a person identification system based on acoustic and visual features. The system is organized as a set of non-homogeneous classifiers whose outputs are integrated after a normalization step. In particular, two classifiers based on acoustic features and three based on vis ..."
Abstract
-
Cited by 142 (1 self)
- Add to MetaCart
Abstract-This paper presents a person identification system based on acoustic and visual features. The system is organized as a set of non-homogeneous classifiers whose outputs are integrated after a normalization step. In particular, two classifiers based on acoustic features and three based on visual ones provide data for an integration module whose performance is evaluated. A novel technique for the integration of multiple classifiers at an hybrid ranWmeasurement level is introduced using HyperBF networks. Two different methods for the rejection of an unknown person are introduced. The performance of the integrated system is shown to be superior to that of the acoustic and visual subsystems. The resulting identification system can be used to log personal access and, with minor modifications, as an identity verification system. Index Tenns-Template matching, robust statistics, correlation, face recognition, speaker recognition, learning, classification. I.
Fast Features for Face Authentication under Illumination Direction Changes
- PATTERN RECOGNITION LETTERS
, 2003
"... In this letter we propose a facGE feature extracA-W tecracA whic utilizes polynomial clynomial derived from 2D DiscHWE Cosine Transform (DCT)cT)2:EEB8 obtained from horizontally and vertic:2) neighbouringblochb Fac authenticing2 results on the VidTIMIT database suggest that the proposed featur ..."
Abstract
-
Cited by 57 (22 self)
- Add to MetaCart
In this letter we propose a facGE feature extracA-W tecracA whic utilizes polynomial clynomial derived from 2D DiscHWE Cosine Transform (DCT)cT)2:EEB8 obtained from horizontally and vertic:2) neighbouringblochb Fac authenticing2 results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illuminationclumin anddiscAB:2)AH8# ability) to features extracs2 using four popular methods: Princs:2 Component Analysis (PCA), PCA with histogram equalizationpre-procion2AB 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalizationpre-procion2A inc-proc the error rate and o#ers no help against illuminationcuminat Moreover, the proposed feature set is over 80 times faster toc2GWW# than features based on Gabor wavelets. Further experiments on the Weizmann database also show that the proposed approac is more robust than 2D Gabor wavelets and 2D DCT coefficients.
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio--temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification
- INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
, 1997
"... In practical applications of pattern recognition, there are often different features extracted from raw data which needs recognizing. Methods of combining multiple classifiers with different features are viewed as a general problem in various application areas of pattern recognition. In this paper, ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
In practical applications of pattern recognition, there are often different features extracted from raw data which needs recognizing. Methods of combining multiple classifiers with different features are viewed as a general problem in various application areas of pattern recognition. In this paper, a systematic investigation has been made and possible solutions are classified into three frameworks, i.e. linear opinion pools, winnertake -all and evidential reasoning. For combining multiple classifiers with different features, a novel method is presented in the framework of linear opinion pools and a modified training algorithm for associative switch is also proposed in the framework of winner-take-all. In the framework of evidential reasoning, several typical methods are briefly reviewed for use. All aforementioned methods have already been applied to text-independent speaker identification. The simulations show that results yielded by the methods described in this paper are better than...
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems
Identity Verification Using Speech And Face Information
- Digital Signal Processing
, 2004
"... This article first provides an review of important concepts in the field of information fusion, followed by a review of important milestones in audio--visual person identification and verification. Several recent adaptive and nonadaptive techniques for reaching the verification decision (i.e., to ac ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This article first provides an review of important concepts in the field of information fusion, followed by a review of important milestones in audio--visual person identification and verification. Several recent adaptive and nonadaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the nonadaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions.
User Authentication via Adapted Statistical Models of Face Images
, 2006
"... It has been previously demonstrated that systems based on local features and relatively complex statistical models, namely, one-dimensional (1-D) hidden Markov models (HMMs) and pseudo-two-dimensional (2-D) HMMs, are suitable for face recognition. Recently, a simpler statistical model, namely, th ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
It has been previously demonstrated that systems based on local features and relatively complex statistical models, namely, one-dimensional (1-D) hidden Markov models (HMMs) and pseudo-two-dimensional (2-D) HMMs, are suitable for face recognition. Recently, a simpler statistical model, namely, the Gaussian mixture model (GMM), was also shown to perform well. In much of the literature devoted to these models, the experiments were performed with controlled images (manual face localization, controlled lighting, background, pose, etc). However, a practical recognition system has to be robust to more challenging conditions. In this article we evaluate, on the relatively difficult BANCA database, the performance, robustness and complexity of GMM and HMM-based approaches, using both manual and automatic face localization. We extend the GMM approach through the use of local features with embedded positional information, increasing performance without sacrificing its low complexity. Furthermore, we show that the traditionally used maximum likelihood (ML) training approach has problems estimating robust model parameters when there is only a few training images available. Considerably more precise models can be obtained through the use of Maximum a posteriori probability (MAP) training. We also show that face recognition techniques which obtain good performance on manually located faces do not necessarily obtain good performance on automatically located faces, indicating that recognition techniques must be designed from the ground up to handle imperfect localization. Finally, we show that while the pseudo-2-D HMM approach has the best overall performance, authentication time on current hardware makes it impractical. The best tradeoff in terms of authentication ...

