Results 1 -
5 of
5
A K-Phoneme-Class based Multi-Model Method for Short Utterance Speaker Recognition
"... Abstract—For GMM-UBM based text-independent speaker recognition, the performance decreases significantly when the test speech is too short. Considering that the use of text information is helpful, a K-phoneme-class scoring based multiple phoneme class speaker model method (shortened as K-phoneme-cla ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract—For GMM-UBM based text-independent speaker recognition, the performance decreases significantly when the test speech is too short. Considering that the use of text information is helpful, a K-phoneme-class scoring based multiple phoneme class speaker model method (shortened as K-phoneme-class based multi-model method, abbreviated as KPCMMM) is proposed including a phoneme class speech recognition stage and a phoneme class dependent multi-model speaker recognition stage, where K means the number of most likely phoneme classes to be used in the second stage. Two different phoneme class definitions, expert-knowledge based and data-driven, are compared, and the performance as a function of K is also studied. Experimental results show that the data-driven phoneme class definition outperforms the expert-knowledge based one, and that an appropriate K value can lead to much better performance. Compared with the baseline GMM-UBM system, the proposed KPCMMM can achieve a relative equal error rate (EER) reduction of 38.60 % for text-independent speaker recognition with a length of less than 2 seconds of test speech. I.
A fishervoice based feature fusion method for short utterance speaker recognition
- IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP
, 2013
"... For GMM-UBM based text-independent speaker recognition, the performance decreases significantly when the utterance is getting too short, and that is mostly due to the lack of distinguishable information from a single kind of feature. Fusion of different features followed by a dimensionality reductio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
For GMM-UBM based text-independent speaker recognition, the performance decreases significantly when the utterance is getting too short, and that is mostly due to the lack of distinguishable information from a single kind of feature. Fusion of different features followed by a dimensionality reduction process has been proved useful to provide a satisfying solution. However, some fusion methods based on the traditional Linear Discriminant Analysis (LDA) may cause the singular matrix problem. Therefore, a Fishervoice based feature fusion method incorporating with the Principal Component Analysis (PCA) and the LDA is proposed, where several features, such as MFCC, PLAR and LPCC, which are commonly used, are concatenated, and then projected into a lower-dimensional subspace. Compared with the baseline GMM-UBM systems using any single feature and using the LDA based fusion method, the proposed one can effectively reduce the equal error rate and give the best performance for text-independent speaker recognition for utterances as short as about 2 seconds. Index Terms — Short utterance speaker recognition,
Confidence Measure by Substring Comparison for Automatic Speech Recognition
"... Abstract ..."
(Show Context)
, Gang Wang
"... Abstract—The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1~5 seconds, the performance decreases significantly because the GMM-UBM based speaker recognition method ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1~5 seconds, the performance decreases significantly because the GMM-UBM based speaker recognition method is a statistical one, of which sufficient data is the foundation. Considering that the use of text information will be helpful to speaker recognition, a multi-model method is proposed to improve short-utterance speaker recognition (SUSR) in Chinese. We build a few phoneme class models for each speaker to represent different parts of the characteristic space and fuse the scores to fit the test data on the models with the purpose of increasing the matching degree between training models and test utterance. Experimental results showed that the proposed method achieved a relative EER reduction of about 26 % compared with the traditional GMM-UBM method. I.
An Overview of Robustness Related Issues in Speaker Recognition
"... Abstract — Speaker recognition technologies have been improved rapidly in recent years. However, critical robustness issues need to be addressed when they are applied in practical situations. This paper provides an overview of technologies dealing with robustness related issues in automatic speaker ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — Speaker recognition technologies have been improved rapidly in recent years. However, critical robustness issues need to be addressed when they are applied in practical situations. This paper provides an overview of technologies dealing with robustness related issues in automatic speaker recognition. We first categorize the robustness issues into three categories, including environment-related, speaker-related and application-oriented issues. For each category, we then describe the current hot topics, existing technologies, and potential research focuses in the future. I.