Results 11 - 20
of
31
Applying Articulatory Features To Telephone-Based Speaker Verification
- in Proc. IEEE International Conference on Acoustic, Speech, and Signal Processing
, 2004
"... This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephone-based speaker verification. To minimize the acoustic mismatch caused by different handsets, handset-specific normalization is applied to the spectral features before the AFs are extrac ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephone-based speaker verification. To minimize the acoustic mismatch caused by different handsets, handset-specific normalization is applied to the spectral features before the AFs are extracted. Experimental results based on 150 speakers using 10 different handsets show that AFs contain useful speaker-specific information for speaker verification and the use of handset-specific normalization significantly lowers the error rates under the handset mismatched conditions. Results also demonstrate that fusing the scores obtained from an AF-based system with those obtained from a spectral feature-based (MFCC) system helps lower the error rates of the individual systems.
Adaptive Decision Fusion for Multi-Sample Speaker Verification over GSM Networks
- in Eurospeech’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some pseudo-impostors during enrollment. As the fusion weights depend on the prior scores, in this paper, we propose to adapt the prior scores during verification based on the likelihood of the claimant being an impostor. To this end, a pseudo-imposter GMM score model is created for each speaker. During verification, the claimant's scores are fed to the score model to obtain a likelihood for adapting the prior score. Experimental results based on the GSM-transcoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed prior score adaptation approach provides a relative error reduction of 15% when compared with our previous approach where the prior scores are non-adaptive.
A New Adaptation Method for Speaker-Model Creation in High-Level Speaker Verification
"... Abstract. Research has shown that speaker verification based on highlevel speaker features requires long enrollment utterances to be reliable. However, in practical speaker verification, it is common to model speakers based a limited amount of enrollment data. To minimize the undesirable effect of i ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. Research has shown that speaker verification based on highlevel speaker features requires long enrollment utterances to be reliable. However, in practical speaker verification, it is common to model speakers based a limited amount of enrollment data. To minimize the undesirable effect of insufficient enrollment data on system performance, this paper proposes a new adaptation method for creating speaker models based on high-level features. Different from conventional methods, the proposed adaptation method not only adapts the phoneme-dependent background model but also the phoneme-independent speaker model. The amount of adaptation in the latter is adjusted by a proportional factor derived from the phoneme-independent background models. The proposed method was compared with traditional MAP adaptation under the NIST2000 SRE framework. Experimental results show that the proposed method can solve the data-spareness problem effectively and achieves a better performance when compare with traditional MAP adaptation. 1
Databases For Speaker Recognition: Activities In Cost250 Working Group 2
- in Proceedings COST250 Workshop on Speaker Recognition in Telephony
, 1999
"... Working Group (WG) 2 of the COST250 Action "Speaker Recognition in Telephony" has dealt with databases for speaker recognition. The present final report gives an overview of the activities in this WG, and presents its main results. The first result is an overview of 36 existing databases that has be ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Working Group (WG) 2 of the COST250 Action "Speaker Recognition in Telephony" has dealt with databases for speaker recognition. The present final report gives an overview of the activities in this WG, and presents its main results. The first result is an overview of 36 existing databases that has been used in speaker recognition research. Those include both public and proprietary databases. As part of the overview, some of the variability represented in those databases is analyzed. The second result is the publicly available Polycost database, a telephony-speech multi-session database with 134 speakers from all around Europe. Together with pre-defmed experiment specifications, this database is a useful resource to aid in the assessment of speaker recognition systems in general, and in comparing systems across sites, in particular.
Speaker Verification Using Adapted Articulatory Featurebased Conditional Pronunciation Modeling
- in Proc. ICASSP 2005
, 2005
"... This paper proposes a speaker verification system based on articulatory feature-based conditional pronunciation modeling (AFCPM). The system captures the pronunciation characteristics of speakers by modeling the linkage between the actual phones produced by the speakers and the state of articulation ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper proposes a speaker verification system based on articulatory feature-based conditional pronunciation modeling (AFCPM). The system captures the pronunciation characteristics of speakers by modeling the linkage between the actual phones produced by the speakers and the state of articulations during speech production. The speaker models, which consist of conditional probabilities of two articulatory classes, are adapted from a set of universal background models (UBMs) via MAP adaptation. This creates a direct coupling between the speaker and background models, which prevents over-fitting the speaker models when the amount of speaker data is limited. Experimental results demonstrate that MAP adaptation not only enhances the discriminative power of the speaker models but also improves their robustness against handset mismatches. Results also show that fusing the scores derived from an AFCPM-based system and a conventional spectral-based system achieves an error rate that is significantly lower than that can be achieved by the individual systems. This suggests that AFCPM and spectral features are complementary to each other. 1.
Enhancing Gmm Scores Using Svm "hints"
, 2001
"... This paper proposes a classification scheme that combines statistical models and support vector machines. It exploits the fact (observed in [1]) that GMM and SVM classifiers with roughly the same level of performance produce uncorrelated errors. We describe a novel scheme which employs an SVM classi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper proposes a classification scheme that combines statistical models and support vector machines. It exploits the fact (observed in [1]) that GMM and SVM classifiers with roughly the same level of performance produce uncorrelated errors. We describe a novel scheme which employs an SVM classifier as an "advisor" to the GMM classifier in uncertain cases. The utility of the combined generative/discriminative approach is demonstrated on standard text-independent speaker verification and speaker identification tasks in matched and mismatched training and test conditions. Results indicate significant improvements in performance without much computational overhead. 1.
Cluster-Dependent Feature Transformation for Telephone-Based Speaker Verification
- In: Proc. International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA’03
, 2003
"... This paper presents a cluster-based feature transformation technique for telephone-based speaker verification when labels of the handset types are not available during the training phase. The technique combines a cluster selector with cluster-dependent feature transformations to reduce the acoustic ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents a cluster-based feature transformation technique for telephone-based speaker verification when labels of the handset types are not available during the training phase. The technique combines a cluster selector with cluster-dependent feature transformations to reduce the acoustic mismatches among different handsets. Specifically, a...
Y.T.: Eigen-Prosody Analysis for Robust Speaker Recognition under Mismatch Handset Environment
- In: Proceedings of the International Conference of Spoken Language Processing (ICSLP ’04), Jeju Island, South Korea
, 2004
"... Most speaker recognition systems utilize only low-level shortterm spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust spe ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Most speaker recognition systems utilize only low-level shortterm spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust speaker recognition under mismatch environment. It converts the prosodic feature contours of a speaker’s speech into sequences of prosody symbols, and then transforms the speaker recognition problem into a full text document retrieval-similar task. Experimental results on the well-known HTIMIT database have shown that, even only few training/test data is available, a remarkable improvement, about 28.7 % relative error rate reduction comparing with the GMM/cepstral mean subtraction (CMS) baseline, could be achieved. 1.
Optimized Discriminative Kernel for SVM Scoring and its Application to Speaker Verification
, 2010
"... The decision making process of many binary classification systems is based on the likelihood-ratio (LR) scores of test patterns. This paper shows that LR scores can be expressed in terms of the similarity between the supervectors formed by stacking the mean vectors of Gaussian mixture models corres ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The decision making process of many binary classification systems is based on the likelihood-ratio (LR) scores of test patterns. This paper shows that LR scores can be expressed in terms of the similarity between the supervectors formed by stacking the mean vectors of Gaussian mixture models corresponding to the test patterns, the target model, and the background model. By interpreting the SVM kernels as a specific similarity (or discriminant) function between supervectors, this paper shows that LR scoring is a special case of SVM scoring and that most sequence kernels can be obtained by assuming a specific form for the similarity function of supervectors. The paper further shows that this assumption can be relaxed to derive a new general kernel. The kernel function is general in that it is a linear combination of any kernels belonging to the reproducing kernel Hilbert space. The combination weights are obtained by optimizing the ability of a discriminant function to separate the positive- and negative-classes using either regression analysis or SVM training. The idea was applied to both high- and low-level speaker verification. In both cases, results show that the proposed kernels achieve a better performance than several state-of-the-art sequence kernels. Further performance enhancement was also observed when the high-level scores were combined with acoustic scores.
PROSODY MODELING AND EIGEN-PROSODY ANALYSIS FOR ROBUST SPEAKER RECOGNITION
"... Unseen handset mismatch and limited training/test data are the major source of performance degradation for speaker identification in telecommunication environment. In this paper, a vector quantization (VQ)-based prosody modeling and an eigen-prosody analysis (EPA) is integrated to transform the clos ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Unseen handset mismatch and limited training/test data are the major source of performance degradation for speaker identification in telecommunication environment. In this paper, a vector quantization (VQ)-based prosody modeling and an eigen-prosody analysis (EPA) is integrated to transform the close-set speaker identification problem into a full text document retrieval-similar task. The prosody modeling labels the prosodic feature contours of a speaker’s speech into sequences of prosody states. EPA then constructs a compact eigen-prosody space to represent the constellation of speakers. Furthermore, EPA is fused with a lower-level a priori knowledge interpolation (AKI) handset distortion compensator to complement each other. Experimental results on the HTIMIT database had shown that about 41.0 % and 32.8 % relative error rate reduction for seen and unseen handsets, respectively, was achieved comparing with the maximum a priori-adapted Gaussian mixture model/cepstral mean subtraction (MAP-GMM/CMS) baseline. 1.

