Results 11 - 20
of
47
A Two-Stage Scoring Method Combining World And Cohort Models For Speaker Verification
- In Proc. ICASSP'000
, 2000
"... The cohort and world models are commonly used for scoring normalization in speaker verification. As these models represent different regions of the feature space, a better solution could be obtained by integrating them into a single framework. In this paper, we embed the two models in elliptical ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The cohort and world models are commonly used for scoring normalization in speaker verification. As these models represent different regions of the feature space, a better solution could be obtained by integrating them into a single framework. In this paper, we embed the two models in elliptical basis function networks and propose a two-stage decision procedure for improving verification performance. In the first stage, the score of an unknown utterance is normalized by a world model. If the difference between the resulting normalized score and a world threshold is sufficiently large, the claimant is accepted or rejected immediately. Otherwise, the score will be normalized by a cohort model and compared with a cohort threshold to make a final accept/reject decision. Experimental evaluations based on the YOHO corpus suggest that the two-stage method achieves a lower error rate as compared to the case where only one background model is used. 1. INTRODUCTION Most speaker ...
Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing
- Internat. J. Speech Technol
, 2000
"... A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the subband approach delivers performance improvements (especially in the presence of noise), ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
A growing body of recent work documents the potential benefits of sub-band processing over wideband processing in automatic speech recognition and, less usually, speaker recognition. It is often found that the subband approach delivers performance improvements (especially in the presence of noise), but not always so. This raises the question of precisely when and how sub-band processing might be advantageous, which is difficult to answer because there is as yet only a rudimentary theoretical framework guiding this work. We describe a simple sub-band speaker recognition system designed to facilitate experimentation aimed at increasing understanding of the approach. This splits the time-domain speech signal into 16 sub-bands using a bank of second-order filters spaced on the psychophysical mel scale. Each sub-band has its own separate cepstral-based recognition system, the outputs of which are combined using the sum rule to produce a final decision. We find that sub-band processing leads...
Face Processing & Frontal Face Verification
- IDIAP RESEARCH REPORT 03-20
, 2003
"... In this report we first review important publications in the field of face recognition; geometric features, templates, Principal Component Analysis (PCA), pseudo-2D Hidden Markov Models, Elastic Graph Matching, as well as other points are covered; important issues, such as the effects of an illumina ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In this report we first review important publications in the field of face recognition; geometric features, templates, Principal Component Analysis (PCA), pseudo-2D Hidden Markov Models, Elastic Graph Matching, as well as other points are covered; important issues, such as the effects of an illumination direction change and the use of different face areas, are also covered. A new feature set (termed DCT-mod2) is then proposed; the feature set utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients obtained from horizontally & vertically neighbouring blocks. Face authentication results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illumination changes and discrimination ability) to features extracted using four popular methods: PCA, PCA with histogram equalization pre-processing, 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalization pre-processing increases the error rate and offers no help against illumination changes. Moreover, the proposed feature set is over 80 times faster to compute than features based on 2D Gabor wavelets. Further experiments on the Weizmann Database also show that the proposed approach is more robust than 2D Gabor wavelets and 2D DCT coefficients.
Autoassociative Neural Network Models for Speaker Verification
, 1999
"... KEYWORDS: speaker verification; autoassociative neural network; distribution estimation; matching technique; dimensionality reduction. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
KEYWORDS: speaker verification; autoassociative neural network; distribution estimation; matching technique; dimensionality reduction.
A New Two-Stage Scoring Normalization Approach to Speaker Verification
- Proc. Int. Sym. on Intelligent Multimedia, Video and Speech Processing
"... In speaker verification, the cohort and world models have been separately used for scoring normalization. In this work, we embed the two models in elliptical basis function networks and propose a two-stage decision procedure for improving verification performance. The procedure begins with normaliza ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In speaker verification, the cohort and world models have been separately used for scoring normalization. In this work, we embed the two models in elliptical basis function networks and propose a two-stage decision procedure for improving verification performance. The procedure begins with normalization of an utterance by a world model. If the difference between the resulting score and a world threshold is sufficiently large, the claimant is accepted or rejected immediately. Otherwise, the score will...
RELATIVE EFFECTIVENESS OF SCORE NORMALISATION METHODS IN OPEN-SET SPEAKER IDENTIFICATION
, 2004
"... This paper presents an investigation into the relative effectiveness of various well-known score normalisation methods in the context of open-set, text-independent speaker identification. The scope of the study includes a thorough experimental analysis of the performance of the methods considered. T ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents an investigation into the relative effectiveness of various well-known score normalisation methods in the context of open-set, text-independent speaker identification. The scope of the study includes a thorough experimental analysis of the performance of the methods considered. The experimental investigations are based on the use of the dataset proposed for the 1-speaker detection task of the NIST Speaker Recognition Evaluation 2003. The results clearly demonstrate that significant benefits can be achieved by using score normalisation in open-set identification, and that the level of this depends highly on the type of the approach adopted. Based on the experimental results, it is found that amongst the various normalisation methods considered, those which are based on the Bayesian solution provide the best performance. In particular, the unconstrained cohort method with a small cohort size appears to outperform all other approaches. The paper provides a detailed description of the experimental set up, and presents an analysis of the results obtained.
Combining Biometric Scores in Identification Systems
, 2006
"... Combination approaches in biometric identification systems usually consider only the matching scores related to a single person in order to derive a combined score for that person. We present the use of all scores received by all persons and explore the advantages of such an approach when enough tra ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Combination approaches in biometric identification systems usually consider only the matching scores related to a single person in order to derive a combined score for that person. We present the use of all scores received by all persons and explore the advantages of such an approach when enough training data is available. More fundamentally, we identify four types of classifier combinations determined by the numbers of trained combining functions and their input parameters. We illustrate how an improper choice of the combination type can lead to a decrease in the system performance, even though an optimal combination algorithm within this type is used. We investigate combinations, which consider all available matching scores and have only single trainable combination function. We introduce a particular kind of such combinations utilizing identification models, which account for dependencies between scores output by any one classifier. We present several experiments validating the advantage of our proposed combination algorithms for problems dealing with large number of classes, in particular, biometric person identification systems.
Cohort Normalization based Sparse Representation for Undersampled Face Recognition
"... Abstract. In recent years, sparse representation based classification (SRC) has received much attention in face recognition with multiple training samples of each subject. However, it cannot be easily applied to a recognition task with insufficient training samples under uncontrolled environments. O ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In recent years, sparse representation based classification (SRC) has received much attention in face recognition with multiple training samples of each subject. However, it cannot be easily applied to a recognition task with insufficient training samples under uncontrolled environments. On the other hand, cohort normalization, as a way of mea-suring the degradation effect under challenging environments in relation to a pool of cohort samples, has been widely used in the area of biometric authentication. In this paper, for the first time, we introduce cohort nor-malization to SRC-based face recognition with insufficient training sam-ples. Specifically, a user-specific cohort set is selected to normalize the raw residual, which is obtained from comparing the test sample with its sparse representations corresponding to the gallery subject, using poly-nomial regression. Experimental results on AR and FERET databases show that cohort normalization can bring SRC much robustness against various forms of degradation factors for undersampled face recognition. 1
Picture-Specific Cohort Score Normalization for Face Pair Matching
- In: Proceedings of IEEE International Conference on Biometrics: Theory, Applications and Systems
, 2013
"... Face pair matching is the task of deciding whether or not two face images belong to the same person. This has been a very active and challenging topic recently due to the p-resence of various sources of variation in facial images, e-specially under unconstrained environment. We investigate cohort no ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Face pair matching is the task of deciding whether or not two face images belong to the same person. This has been a very active and challenging topic recently due to the p-resence of various sources of variation in facial images, e-specially under unconstrained environment. We investigate cohort normalization that has been widely used in biomet-ric verification as means to improve the robustness of face recognition under challenging environments to the face pair matching problem. Specifically, given a pair of images and an additional fixed cohort set (identities of cohort samples never appear in the test stage), two picture-specific cohort s-core lists are computed and the correspondent score profiles of which are modeled by polynomial regression. The extract-ed regression coefficients are subsequently classified using a classifier. We advance the state-of-the-art in cohort nor-malization by providing a better understanding of the cohort behavior. In particular, we found that the choice of the co-hort set had little impact on the generalization performance. Furthermore, the larger the size of the cohort set, the more stable the system performance becomes. Experiments per-formed on the Labeled Faces in the Wild (LFW) benchmark show that our system achieves performance that is compara-ble to state-of-the-art methods. 1.
ABSTRACT OF THE THESIS Combining Speech Recognition and Speaker Verification
, 2008
"... and approved by ..."
(Show Context)