Results 1 - 10
of
126
Speaker verification using Adapted Gaussian mixture models
- Digital Signal Processing
, 2000
"... In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but ef ..."
Abstract
-
Cited by 1010 (42 self)
- Add to MetaCart
(Show Context)
In this paper we describe the major elements of MIT Lincoln Laboratory’s Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. © 2000 Academic Press Key Words: speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation. 1.
Feature Warping for Robust Speaker Verification
- ISCA ARCHIVE
, 2001
"... We propose a novel feature mapping approach that is robust to channel mismatch, additive noise and to some extent, nonlinear effects attributed to handset transducers. These adverse effects can distort the short-term distribution of the speech features. Some methods have addressed this issue by cond ..."
Abstract
-
Cited by 191 (10 self)
- Add to MetaCart
We propose a novel feature mapping approach that is robust to channel mismatch, additive noise and to some extent, nonlinear effects attributed to handset transducers. These adverse effects can distort the short-term distribution of the speech features. Some methods have addressed this issue by conditioning the variance of the distribution, but not to the extent of conforming the speech statistics to a target distribution. The proposed target mapping method warps the distribution of a cepstral feature stream to a standardised distribution over a specified time interval. We evaluate a number of the enhancement methods for speaker verification, and compare them against a Gaussian target mapping implementation. Results indicate improvements of the warping technique over a number of methods such as Cepstral Mean Subtraction (CMS), modulation spectrum processing, and short-term windowed CMS and variance normalisation. This technique is a suitable feature post-processing method that may be combined with other techniques to enhance speaker recognition robustness under adverse conditions.
A Tutorial on Text-Independent Speaker Verification
- EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING 2004:4, 430–451
, 2004
"... This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, ..."
Abstract
-
Cited by 138 (13 self)
- Add to MetaCart
This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a
Corpora for the evaluation of speaker recognition systems
- In: Proc. ICASSP 1999
, 1999
"... Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outli ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
(Show Context)
Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora’s salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. Links to these corpora, and to new corpora, will appear on the web
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems
Texture for Script Identification
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2005
"... Abstract—The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we inve ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed. Index Terms—Script identification, wavelets and fractals, texture, document analysis, clustering, classification and association rules. 1
Recent advances in biometric person authentication
- In: Proc. Internat. Conf. Acoustics, Speech Signal Processing
, 2002
"... Biometrics is an emerging topic in the field of signal processing. While enabling technologies (e.g. audio, video) for biometrics have mostly used separately, ultimately, biometric technologies could find their strongest role as interwined and complementary pieces of a multi-modal authentication sys ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
Biometrics is an emerging topic in the field of signal processing. While enabling technologies (e.g. audio, video) for biometrics have mostly used separately, ultimately, biometric technologies could find their strongest role as interwined and complementary pieces of a multi-modal authentication system. In this paper, a short overview of voice, fingerprint, and face authentication algorithms is provided. 1.
Using Prosodic And Lexical Information For Speaker Identification
- in Proc. Inter. Conf. on Acoustics, Speech and Signal Proc
, 2002
"... We investigate the incorporation of larger time-scale information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
We investigate the incorporation of larger time-scale information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker ID evaluation, which provides much more test and training data than has traditionally been available to similar speaker ID investigations. In addition, we have had access to a detailed prosodic feature database of Switchboard-I conversations, including data not previously applied to speaker ID. We describe two baseline acoustic systems, an approach using Gaussian Mixture Models, and an LVCSR-based speaker ID system. These results are compared to and combined with two larger time-scale systems: a system based on an "idiolect" language model, and a system making use of the contents of the prosody database. We find that, with sufficient test and training data, suprasegmental information can significantly enhance the performance of traditional speaker ID systems.
Speaker adaptive cohort selection for tnorm in text-independent speaker verification
- in Proc. ICASSP
"... In this paper we discuss an extension to the widely used score normalization technique of test normalization (Tnorm) for textindependent speaker verification. A new method of speaker Adaptive-Tnorm that offers advantages over the standard Tnorm by adjusting the speaker set to the target model is pre ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
(Show Context)
In this paper we discuss an extension to the widely used score normalization technique of test normalization (Tnorm) for textindependent speaker verification. A new method of speaker Adaptive-Tnorm that offers advantages over the standard Tnorm by adjusting the speaker set to the target model is presented. Examples of this improvement using the 2004 NIST SRE data are also presented. 1.
On the Use of Score Pruning in Speaker Verification for Speaker Dependent Threshold Estimation
- In The Speaker and Language Recognition Workshop (Odyssey
, 2004
"... The use of a priori speaker-dependent thresholds has been shown convenient for speaker verification. However, their estimation is highly affected by the difficulty of obtaining data from impostors, the mismatched conditions, the scarcity of data in real applications and the need of setting the thres ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
The use of a priori speaker-dependent thresholds has been shown convenient for speaker verification. However, their estimation is highly affected by the difficulty of obtaining data from impostors, the mismatched conditions, the scarcity of data in real applications and the need of setting the threshold a priori, during enrollment. In this context, possible outliers, i.e., those client scores which are distant with respect to mean in terms of Log-Likelihood Ratio (LLR), could lead to wrong estimations of client mean and variance. To overcome this problem, we propose here several methods based on pruning LLR scores with different statistical criteria. Before estimating the threshold, score pruning removes outliers and improves subsequent estimations. To solve the problem of impostor data, we also suggest a speaker dependent threshold estimation with only data from clients. Text-dependent and text-independent experiments have been carried out by using a telephonic multisession database in Spanish with 184 speakers, that has been recorded by the authors. 1.