Results 1 - 10
of
18
Support vector machines using GMM supervectors for speaker verification
- IEEE Signal Processing Letters
, 2006
"... pretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States ..."
Abstract
-
Cited by 188 (6 self)
- Add to MetaCart
(Show Context)
pretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States
SVM based speaker verification using a GMM supervector kernel and NAP variability compensation
- in Proceedings of ICASSP, 2006
"... Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent d ..."
Abstract
-
Cited by 161 (16 self)
- Add to MetaCart
(Show Context)
Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique. 1.
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 156 (37 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
The MIT-LL/IBM 2006 speaker recognition system: High-performance reducedcomplexity recognition
- in Proc. ICASSP 2007
"... Many powerful methods for speaker recognition have been introduced in recent years—high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular a ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
Many powerful methods for speaker recognition have been introduced in recent years—high-level features, novel classifiers, and channel compensation methods. A common arena for evaluating these methods has been the NIST speaker recognition evaluation (SRE). In the NIST SRE from 2002-2005, a popular approach was to fuse multiple systems based upon cepstral features and different linguistic tiers of high-level features. With enough enrollment data, this approach produced dramatic error rate reductions and showed conceptually that better performance was attainable. A drawback in this approach is that many high-level systems were being run independently requiring significant computational complexity and resources. In 2006, MIT Lincoln Laboratory focused on a new system architecture which emphasized reduced complexity. This system was a carefully selected mixture of high-level techniques, new classifier methods, and novel channel compensation techniques. This new system has excellent accuracy and has substantially reduced complexity. The performance and computational aspects of the system are detailed on a NIST 2006 SRE task. Index Terms — speech processing, speaker recognition 1.
Adaptive Client-Impostor Centric Score Normalization: A Case Study
- in Fingerprint Verication,” in IEEE BTAS
, 2009
"... Abstract — Cohort-based score normalization as examplified by the T-norm (for Test normalization) has been the state-ofthe-art approach to account for the variability of signal quality in testing. On the other hand, user-specific score normalization such as the Z-norm and the F-norm, designed to han ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Cohort-based score normalization as examplified by the T-norm (for Test normalization) has been the state-ofthe-art approach to account for the variability of signal quality in testing. On the other hand, user-specific score normalization such as the Z-norm and the F-norm, designed to handle variability in performance across different reference models, has also been shown to be very effective. Exploiting the strenghth of both approaches, this paper proposes a novel score normalization called adaptive F-norm, which is client-impostor centric, i.e., utilizing both the genuine and impostor score information, as well as adaptive, i.e, adaptive to the test condition thanks to the use of a pool of cohort models. Experiments based on the BioSecure DS2 database which contains 6 fingers of 415 subjects, each acquired using a thermal and an optical device, show that the proposed adaptive F-norm is better or at least as good as the other alternatives, including those recently proposed in the literature. I.
Kernel Methods for Text-Independent Speaker Verification
, 2010
"... In recent years, systems based on support vector machines (SVMs) have become standard for speaker verification (SV) tasks. An important aspect of these systems is the dynamic kernel.
These operate on sequence data and handle the dynamic nature of the speech. In this thesis a number of techniques are ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
In recent years, systems based on support vector machines (SVMs) have become standard for speaker verification (SV) tasks. An important aspect of these systems is the dynamic kernel.
These operate on sequence data and handle the dynamic nature of the speech. In this thesis a number of techniques are proposed for improving dynamic kernel-based SV systems.
The first contribution of this thesis is the development of alternative forms of dynamic kernel. Several popular dynamic kernels proposed for SV are based on the Kullback-Leibler divergence between Gaussian mixture models. Since this has no closed-form solution, typically a matched-pair upper bound is used instead. This places significant restrictions on the forms of model structure that may be used. In this thesis, dynamic kernels are proposed based
on alternative, variational approximations to the divergence. Unlike standard approaches, these allow the use of a more flexible modelling framework. Also, using a more accurate approximation may lead to performance gains.
The second contribution of this thesis is to investigate the combination of multiple systems to improve SV performance. Typically, systems are combined by fusing the output scores.
For SVM classifiers, an alternative strategy is to combine at the kernel level. Recently an efficient maximum-margin scheme for learning kernel weights has been developed. In this thesis several modifications are proposed to allow this scheme to be applied to SV tasks.
System combination will only lead to gains when the kernels are complementary. In this thesis it is shown that many commonly used dynamic kernels can be placed into one of two broad classes, derivative and parametric kernels. The attributes of these classes are contrasted and the conditions under which the two forms of kernel are identical are described. By avoiding these conditions gains may be obtained by combining derivative and parametric kernels.
The final contribution of this thesis is to investigate the combination of dynamic kernels with traditional static kernels for vector data. Here two general combination strategies are available: static kernel functions may be defined over the dynamic feature vectors. Alternatively, a static kernel may be applied at the observation level. In general, it is not possible to explicitly train a model in the feature space associated with a static kernel. However, it is shown in this thesis that this form of kernel can be computed by using a suitable metric with approximate component posteriors. Generalised versions of standard parametric and derivative kernels, that include an observation-level static kernel, are proposed based on this
approach.
User-Specific Cohort Selection and Score Normalization for Biometric Systems
- IEEE Transactions on Information Forensics and Security
, 2012
"... Abstract—An increasing body of evidence suggests that cohort-based score normalization can improve the performance of bio-metric authentication. This approach relies on the use of cohort biometric templates, which can be computationally expensive. We contribute to the advancement of cohort score nor ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract—An increasing body of evidence suggests that cohort-based score normalization can improve the performance of bio-metric authentication. This approach relies on the use of cohort biometric templates, which can be computationally expensive. We contribute to the advancement of cohort score normalization in two ways. First, we show both theoretically and empirically that the most similar and the most dissimilar cohort templates to a target user contain discriminative information. We then investigate the extraction of this information using polynomial regression. Exten-sive evaluation on the face and fingerprint modalities in the Biose-cure DS2 dataset indicates that the proposed method outperforms the state-of-the-art cohort score normalization methods, while re-ducing the computation cost by as much as half. Index Terms—Biometric authentication, cohort-based score nor-malization, discriminative cohort, ordered cohort selection. I.
A new speaker identification algorithm for gaming scenarios
, 2007
"... Speaker identification is a well-established research problem but has not been a major application used in gaming scenarios. In this pa-per, we propose a new algorithm for the open-set, text-independent, speaker ID problem, applied as an important component (among other cues) of a game player identi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speaker identification is a well-established research problem but has not been a major application used in gaming scenarios. In this pa-per, we propose a new algorithm for the open-set, text-independent, speaker ID problem, applied as an important component (among other cues) of a game player identification system. This scenario poses new challenges: far-field, limited training and very short test data, and almost real-time processing. To tackle this, we introduce new and more informative feature sets. The scores given by these feature sets are then combined in an optimal way to construct the final score. Experimental results on the gaming device’s processed reverberated-speech show the effectiveness of the new features, and that reliable decisions can be made after very short (2- 5 second) test utterances required by the gaming scheme. Index Terms: acoustic arrays, games, speaker recognition 1.
Cohort-based speaker model synthesis for channel robust speaker recognition
- Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
, 2006
"... Speaker recognition over a public telephone network involves various types of transmission channels and handsets, which leads to mismatched channels (between the enrolled models and the test utterances), and hence to a significant decline in the speaker recognition performance. In this paper a cohor ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Speaker recognition over a public telephone network involves various types of transmission channels and handsets, which leads to mismatched channels (between the enrolled models and the test utterances), and hence to a significant decline in the speaker recognition performance. In this paper a cohort-based speaker model synthesis algorithm, which aims at syn-thesizing speaker models for channels where no enrollment data is available is proposed. This algorithm applies a pri-ori knowledge of channels extracted from speaker-specific co-hort sets to synthesize speaker models. Results for the China Criminal Police College (CCPC) speaker recognition corpus, which contains utterances from both a landline and a mobile channel, show significant improvements over the HT-Norm and UBM-based speaker model synthesis algorithms. 1.