Results 1 - 10
of
111
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 156 (37 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Language recognition via i-vectors and dimensionality reduction
- in Interspeech
, 2011
"... In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are em-ployed to extract the most salient features in the lower dimen-sional i-vector space and the system develope ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
(Show Context)
In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are em-ployed to extract the most salient features in the lower dimen-sional i-vector space and the system developed results in excel-lent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques. Additional performance gains are observed when the system is combined with other acoustic systems.
Temporally weighted linear prediction features for tackling additive noise in speaker verification
, 2010
"... We consider text-independent speaker verification under additive noise corruption. In the popular mel-frequency cepstral coefficient (MFCC) front-end, we substitute the conventional Fourier-based spectrum estimation with weighted linear predictive methods, which have earlier shown success in noise-r ..."
Abstract
-
Cited by 26 (19 self)
- Add to MetaCart
(Show Context)
We consider text-independent speaker verification under additive noise corruption. In the popular mel-frequency cepstral coefficient (MFCC) front-end, we substitute the conventional Fourier-based spectrum estimation with weighted linear predictive methods, which have earlier shown success in noise-robust speech recognition. We introduce two temporally weighted variants of linear predictive (LP) modeling to speaker verification and compare them to FFT, which is normally used in computing MFCCs, and to conventional LP. We also investigate the effect of speech enhancement (spectral subtraction) on the system performance with each of the four feature representations. Our experiments on the NIST 2002 SRE corpus indicate that the accuracy of the conventional and proposed features are close to each other on clean data. On 0 dB SNR level, baseline FFT and the better of the proposed features give EERs of 17.4 % and 15.6 %, respectively. These accuracies improve to 11.6 % and 11.2 %, respectively, when spectral subtraction is included as a pre-processing method. The new features hold a promise for noise-robust speaker verification. 1.
ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition
"... This paper presents the ALIZE/SpkDet open source software packages for text independent speaker recognition. This software is based on the well-known UBM/GMM approach. It includes also the latest speaker recognition developments such as Latent Factor Analysis (LFA) and unsupervised adaptation. Discr ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
(Show Context)
This paper presents the ALIZE/SpkDet open source software packages for text independent speaker recognition. This software is based on the well-known UBM/GMM approach. It includes also the latest speaker recognition developments such as Latent Factor Analysis (LFA) and unsupervised adaptation. Discriminant classifiers such as SVM supervectors are also provided, linked with the Nuisance Attribute Projection (NAP). The software performance is demonstrated within the framework of the NIST’06 SRE evaluation campaign. Several other applications like speaker diarization, embedded speaker recognition, password dependent speaker recognition and pathological voice assessment are also presented. 1.
Vulnerability of Speaker Verification Systems Against Voice Conversion Spoofing Attacks: the Case of Telephone Speech
"... Voice conversion – the methodology of automatically converting one’s utterances to sound as if spoken by another speaker – presents a threat for applications relying on speaker verification. We study vulnerability of text-independent speaker verification systems against voice conversion attacks usin ..."
Abstract
-
Cited by 23 (10 self)
- Add to MetaCart
(Show Context)
Voice conversion – the methodology of automatically converting one’s utterances to sound as if spoken by another speaker – presents a threat for applications relying on speaker verification. We study vulnerability of text-independent speaker verification systems against voice conversion attacks using telephone speech. We implemented a voice conversion systems with two types of features and nonparallel frame alignment methods and five speaker verification systems ranging from simple Gaussian mixture models (GMMs) to state-of-the-art joint factor analysis (JFA) recognizer. Experiments on a subset of NIST 2006 SRE corpus indicate that the JFA method is most resilient against conversion attacks. But even it experiences more than 5-fold increase in the false acceptance rate from 3.24 % to 17.33 %. Index Terms: speaker verification, voice conversion, security 1.
State-of-theArt Performance in Text-Independent Speaker Verification through Open-Source Software
- IEEE Transactions on Audio, Speech and Language Processing
, 2007
"... Abstract—This paper illustrates an evolution in state-of-the-art speaker verification by highlighting the contribution from newly developed techniques. Starting from a baseline system based on Gaussian mixture models that reached state-of-the-art performances during the NIST’04 SRE, final systems wi ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
Abstract—This paper illustrates an evolution in state-of-the-art speaker verification by highlighting the contribution from newly developed techniques. Starting from a baseline system based on Gaussian mixture models that reached state-of-the-art performances during the NIST’04 SRE, final systems with new intersession compensation techniques show a relative gain of around 50%. This work highlights that a key element in recent improvements is still the classical maximum a posteriori (MAP) adaptation, while the latest compensation methods have a crucial impact on overall performances. Nuisance attribute projection (NAP) and factor analysis (FA) are examined and shown to provide significant improvements. For FA, a new symmetrical scoring (SFA) approach is proposed. We also show further improvement with an original combination between a support vector machine and SFA. This work is undertaken through the open-source ALIZE toolkit. Index Terms—Channel compensation, factor analysis, nuisance attribute projection, speaker verification. I.
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a tech-nical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
Combining Derivative and Parametric Kernels for Speaker Verification
, 2007
"... Support Vector Machine-based speaker verification (SV) has become a standard approach in recent years. These systems typically use dynamic kernels to handle the dynamic nature of the speech utterances. This paper shows that many of these kernels fall into one of two general classes, derivative and p ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
(Show Context)
Support Vector Machine-based speaker verification (SV) has become a standard approach in recent years. These systems typically use dynamic kernels to handle the dynamic nature of the speech utterances. This paper shows that many of these kernels fall into one of two general classes, derivative and parametric kernels. The attributes of these classes are contrasted and the conditions under which the two forms of kernel are identical are described. By avoiding these conditions gains may be obtained by combining derivative and parametric kernels. One combination strategy is to combine at the kernel level. This paper describes a maximum-margin based scheme for learning kernel weights for the SV task. Various dynamic kernels and combinations were evaluated on the NIST 2002 SRE task, including derivative and parametric kernels based upon different model structures. The best overall performance was 7.78 % EER achieved when combining five kernels.
A covariance kernel for svm language recognition
- in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas
, 2008
"... Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A recent successful example of this paradigm is SVM-based discrimination of languages based on GMM mean ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
(Show Context)
Discriminative training for language recognition has been a key tool for improving system performance. In addition, recognition directly from shifted-delta cepstral features has proven effective. A recent successful example of this paradigm is SVM-based discrimination of languages based on GMM mean supervectors (GSVs). GSVs are created through MAP adaptation of a universal background model (UBM) GMM. This work proposes a novel extension to this idea by extending the supervector framework to the covariances of the UBM. We demonstrate a new SVM kernel including this covariance structure. In addition, we propose a method for pushing SVM model parameters back to GMM models. These GMM models can be used as an alternate form of scoring. The new approach is demonstrated on a fourteen language task with substantial performance improvements over prior techniques. Index Terms — language recognition, support vector machines 1.
A kernel trick for sequences applied to text-independent speaker verification systems
- Pattern Recognition, 2007. IDIAP-RR
"... to appear in In Pattern Recognition Abstract. This paper present a principled SVM based speaker verification system. We propose a new framework and a new sequence kernel that can make use of any Mercer kernel at the frame level. An extension of the sequence kernel based on the Max operator is also p ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
to appear in In Pattern Recognition Abstract. This paper present a principled SVM based speaker verification system. We propose a new framework and a new sequence kernel that can make use of any Mercer kernel at the frame level. An extension of the sequence kernel based on the Max operator is also proposed. The new system is compared to state-of-the-art GMM and other SVM based systems found in the literature on the Banca and Polyvar databases. The new system outperforms, most of the time, the other systems, statistically significantly. Finally, the new proposed framework clarifies previous SVM based systems and suggests interesting future research directions. 2 IDIAP–RR 05-77