Results 1 - 10
of
10
Robust Speaker Verification From GSM-Transcoded Speech Based On Decision Fusion And Feature Transformation
- in Proc. IEEE ICASSP’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
(Show Context)
In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the optimal weights for fusing the scores based on the score distribution of the independent utterances and our prior knowledge about the score statistics. More specifically, we use enrollment data to compute the mean scores of client speakers and impostors and consider them to be the prior scores. During verification, we set the fusion weights for individual speech patterns to be a function of the dispersion between the scores of these speech patterns and the prior scores. Experimental results based on the GSM-transcoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed fusion algorithm can increase the dispersion between the mean speaker scores and the mean impostor scores. Compared with a baseline approach where equal weights are assigned to all scores, the proposed approach provides a relative error reduction of 19%.
Applying Articulatory Features To Telephone-Based Speaker Verification
- in Proc. IEEE International Conference on Acoustic, Speech, and Signal Processing
, 2004
"... This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephone-based speaker verification. To minimize the acoustic mismatch caused by different handsets, handset-specific normalization is applied to the spectral features before the AFs are extrac ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
(Show Context)
This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephone-based speaker verification. To minimize the acoustic mismatch caused by different handsets, handset-specific normalization is applied to the spectral features before the AFs are extracted. Experimental results based on 150 speakers using 10 different handsets show that AFs contain useful speaker-specific information for speaker verification and the use of handset-specific normalization significantly lowers the error rates under the handset mismatched conditions. Results also demonstrate that fusing the scores obtained from an AF-based system with those obtained from a spectral feature-based (MFCC) system helps lower the error rates of the individual systems.
Adaptive Decision Fusion for Multi-Sample Speaker Verification over GSM Networks
- in Eurospeech’03
, 2003
"... In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some pseudo-impostors during enrollment. As the fusion weights depend on the prior scores, in this paper, we propose to adapt the prior scores during verification based on the likelihood of the claimant being an impostor. To this end, a pseudo-imposter GMM score model is created for each speaker. During verification, the claimant's scores are fed to the score model to obtain a likelihood for adapting the prior score. Experimental results based on the GSM-transcoded speech of 150 speakers from the HTIMIT corpus demonstrate that the proposed prior score adaptation approach provides a relative error reduction of 15% when compared with our previous approach where the prior scores are non-adaptive.
Multi-Sample Fusion with Constrained Feature Transformation for Robust Speaker Verification
"... This paper proposes a single-source multi-sample fusion approach to text-independent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equ ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This paper proposes a single-source multi-sample fusion approach to text-independent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal weight for all scores, this paper proposes assigning a different weight to each score, where the weights are made dependent on the difference between the score values and a speaker-dependent reference score obtained during enrollment. Because the fusion weights depend on the verification scores, a technique called constrained stochastic feature transformation is applied to minimize the mismatch between enrollment and verification data in order to enhance the scores' reliability. Experimental results based on the 2001 NIST evaluation set show that the proposed fusion approach outperforms the equal-weight approach by 22% in terms of equal error rate and 16% in terms of minimum detection cost.
Extraction of Speaker Features from Different Stages of DSR Front-ends for Distributed Speaker Verification
, 2004
"... The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. Th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the e#ect of extracting spectral features from di#erent stages of the front-end processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a back-end speaker verification system to reduce the acoustic mismatch between di#erent handsets. Because the feature vectors obtained from the back-end server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was # Correspondence should be sent to M.W. Mak, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong. Email: enmwmak@polyu.edu.hk. Tel: (852)27666257. Fax: (852)23628439.
Multibiometric Security in Wireless Communication Systems
, 2010
"... This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to ..."
Abstract
- Add to MetaCart
This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the
Probabilistic Fusion of Sorted Score Sequences for Robust Speaker Verification
"... Abstract. Fusion techniques have been widely used in multi-modal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modality-dependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consi ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Fusion techniques have been widely used in multi-modal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modality-dependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consider the multiple samples extracted from a single modality as independent but coming from the same source. In this chapter, we propose a single-source, multi-sample data-dependent fusion algorithm for speaker verification. The algorithm is data-dependent in that the fusion weights are dependent on the verification scores and the prior score statistics of claimed speakers and background speakers. To obtain the best out of the speaker’s scores, scores from multiple utterances are sorted before they are probabilistically combined. Evaluations based on 150 speakers from a GSM-transcoded corpus are presented. Results show that data-dependent fusion of speaker’s scores is significantly better than the conventional score averaging approach. It was also found that the proposed fusion algorithm can be further enhanced by sorting the score sequences before they are probabilistically combined.
Multi-Sample Data-Dependent Fusion Of Sorted Score Sequences For
- in Proc. IEEE ICASSP’04
, 2004
"... In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, w ..."
Abstract
- Add to MetaCart
In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, we have recently proposed a fusion model that incorporates the score distribution by making the fusion weights dependent on the dispersion between the framebased scores and the prior score statistics obtained from training data. As the fusion weights are data-dependent, the positions of scores in the score sequences become detrimental to the final fused scores. In this paper, we propose to enhance the fusion model by sorting the score sequences before fusion takes place. The fusion model was evaluated on a speaker verification task where each claimant utters two utterances in a verification session. Results demonstrate that fusion of sorted scores has the effect of maximizing the dispersion between the client scores and the impostor scores, making the verification process more reliable. Compared with our previous work where no sorting is applied, the new approach reduces the equal error rate by 11%.
unknown title
"... doi:10.3906/elk-1103-35 Model selection and score normalization for text-dependent single utterance speaker verification ..."
Abstract
- Add to MetaCart
(Show Context)
doi:10.3906/elk-1103-35 Model selection and score normalization for text-dependent single utterance speaker verification