Results 1 - 10
of
112
Feature Warping for Robust Speaker Verification
- ISCA ARCHIVE
, 2001
"... We propose a novel feature mapping approach that is robust to channel mismatch, additive noise and to some extent, nonlinear effects attributed to handset transducers. These adverse effects can distort the short-term distribution of the speech features. Some methods have addressed this issue by cond ..."
Abstract
-
Cited by 86 (4 self)
- Add to MetaCart
We propose a novel feature mapping approach that is robust to channel mismatch, additive noise and to some extent, nonlinear effects attributed to handset transducers. These adverse effects can distort the short-term distribution of the speech features. Some methods have addressed this issue by conditioning the variance of the distribution, but not to the extent of conforming the speech statistics to a target distribution. The proposed target mapping method warps the distribution of a cepstral feature stream to a standardised distribution over a specified time interval. We evaluate a number of the enhancement methods for speaker verification, and compare them against a Gaussian target mapping implementation. Results indicate improvements of the warping technique over a number of methods such as Cepstral Mean Subtraction (CMS), modulation spectrum processing, and short-term windowed CMS and variance normalisation. This technique is a suitable feature post-processing method that may be combined with other techniques to enhance speaker recognition robustness under adverse conditions.
Support vector machines for speaker and language recognition
- Computer Speech and Language
, 2006
"... ..."
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems
MLLR transforms as features in speaker recognition
- in Proceedings of the 9th European Conference on Speech Communication and Technology
, 2005
"... We explore the use of adaptation transforms employed in speech recognition systems as features for speaker recognition. This approach is attractive because, unlike standard framebased cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verifi ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
We explore the use of adaptation transforms employed in speech recognition systems as features for speaker recognition. This approach is attractive because, unlike standard framebased cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification. Affine transforms are computed for the Gaussian means of the acoustic models used in a recognizer, using maximum likelihood linear regression (MLLR). The high-dimensional vectors formed by the transform coefficients are then modeled as speaker features using support vector machines (SVMs). The resulting speaker verification system is competitive, and in some cases significantly more accurate, than state-of-the-art cepstral gaussian mixture and SVM systems. Further improvements are obtained by combining baseline and MLLR-based systems. 1.
Large Scale Evaluation of Multimodal Biometric Authentication Using State-of-the-Art Systems
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2005
"... We examine the performance of multimodal biometric authentication systems using state-of-the-art Commercial Off-the-Shelf (COTS) fingerprint and face biometric systems on a population approaching 1,000 individuals. Majority of prior studies of multimodal biometrics have been limited to relatively lo ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
We examine the performance of multimodal biometric authentication systems using state-of-the-art Commercial Off-the-Shelf (COTS) fingerprint and face biometric systems on a population approaching 1,000 individuals. Majority of prior studies of multimodal biometrics have been limited to relatively low accuracy non-COTS systems and populations of a few hundred users. Our work is the first to demonstrate that multimodal fingerprint and face biometric systems can achieve significant accuracy gains over either biometric alone, even when using highly accurate COTS systems on a relatively large-scale population. In addition to examining well-known multimodal methods, we introduce new methods of normalization and fusion that further improve the accuracy.
Target Dependent Score Normalization Techniques and . . .
- IEEE TRANS. ON SYSTEMS, MAN AND CYBERNETICS, PART C
, 2005
"... Score normalization methods in biometric verification, which encompass the more traditional user-dependent decision thresholding techniques, are reviewed from a test hypotheses point of view. These are classified into test dependent and target dependent methods. The focus of the paper is on targ ..."
Abstract
-
Cited by 19 (14 self)
- Add to MetaCart
Score normalization methods in biometric verification, which encompass the more traditional user-dependent decision thresholding techniques, are reviewed from a test hypotheses point of view. These are classified into test dependent and target dependent methods. The focus of the paper is on target dependent methods, which are further classified into impostor-centric, target-centric and target-impostor. These are applied to an on-line signature verification system on signature data from SVC 2004. In particular, a target-centric technique based on a variant of the cross-validation procedure provides the best relative performance improvement both for skilled (19%) and random forgeries (53%) as compared to the raw verification performance without score normalization (7.14% EER and 1.06% EER for skilled and random forgeries respectively).
Fusion of heterogeneous speaker recognition systems
- in the STBU submission for the NIST speaker recognition evaluation 2006,” IEEE Transactions on Audio, Speech and Signal Processing
, 2007
"... Abstract—This paper describes and discusses the ‘STBU’ speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Abstract—This paper describes and discusses the ‘STBU’ speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium
Within-class Covariance Normalization for SVM-based Speaker Recognition
- Proc. of ICSLP
, 2006
"... This paper extends the within-class covariance normalization (WCCN) technique described in [1, 2] for training generalized linear kernels. We describe a practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
This paper extends the within-class covariance normalization (WCCN) technique described in [1, 2] for training generalized linear kernels. We describe a practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space. Our approach involves using principal component analysis (PCA) to split the original feature space into two subspaces: a low-dimensional “PCA space ” and a high-dimensional “PCA-complement space.” After performing WCCN in the PCA space, we concatenate the resulting feature vectors with a weighted version of their PCAcomplements. When applied to a state-of-the-art MLLR-SVM speaker recognition system, this approach achieves improvements of up to 22 % in EER and 28 % in minimum decision cost function (DCF) over our previous baseline. We also achieve substantial improvements over an MLLR-SVM system that performs WCCN in the PCA space but discards the PCA-complement. Index Terms: kernel machines, support vector machines, feature normalization, generalized linear kernels, speaker recognition.
Higher-Level Features in Speaker Recognition,” in Speaker Classification I
- of Lecture Notes in Computer Science / Artificial Intelligence. Springer, Heidelberg / Berlin
, 2007
"... Abstract. Higher-level features based on linguistic or long-range information have attracted significant attention in automatic speaker recognition. This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade. To clarify ho ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Abstract. Higher-level features based on linguistic or long-range information have attracted significant attention in automatic speaker recognition. This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade. To clarify how each approach uses higher-level information, features are described in terms of their type, temporal span, and reliance on automatic speech recognition for both feature extraction and feature conditioning. A subsequent analysis of higher-level features in a state-of-the-art system illustrates that (1) a higher-level cepstral system outperforms standard systems, (2) a prosodic system shows excellent performance individually and in combination, (3) other higher-level systems provide further gains, and (4) higher-level systems provide increasing relative gains as training data increases. Implications for the general field of speaker classification are discussed.

