Results 1 - 10
of
46
Fast Features for Face Authentication under Illumination Direction Changes
- PATTERN RECOGNITION LETTERS
, 2003
"... In this letter we propose a facGE feature extracA-W tecracA whic utilizes polynomial clynomial derived from 2D DiscHWE Cosine Transform (DCT)cT)2:EEB8 obtained from horizontally and vertic:2) neighbouringblochb Fac authenticing2 results on the VidTIMIT database suggest that the proposed featur ..."
Abstract
-
Cited by 57 (22 self)
- Add to MetaCart
In this letter we propose a facGE feature extracA-W tecracA whic utilizes polynomial clynomial derived from 2D DiscHWE Cosine Transform (DCT)cT)2:EEB8 obtained from horizontally and vertic:2) neighbouringblochb Fac authenticing2 results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illuminationclumin anddiscAB:2)AH8# ability) to features extracs2 using four popular methods: Princs:2 Component Analysis (PCA), PCA with histogram equalizationpre-procion2AB 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalizationpre-procion2A inc-proc the error rate and o#ers no help against illuminationcuminat Moreover, the proposed feature set is over 80 times faster toc2GWW# than features based on Gabor wavelets. Further experiments on the Weizmann database also show that the proposed approac is more robust than 2D Gabor wavelets and 2D DCT coefficients.
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Biometrics: a grand challenge
- Proc. of IC PR
, 2004
"... Reliable person identification is an important problem in diverse businesses. Biometrics, identification based on distinctive personal traits, has the potential to become an irreplaceable part of any identification system. While successful in some niche markets, the biometrics technology has not yet ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Reliable person identification is an important problem in diverse businesses. Biometrics, identification based on distinctive personal traits, has the potential to become an irreplaceable part of any identification system. While successful in some niche markets, the biometrics technology has not yet delivered its promise of foolproof automatic identification. With the availability of inexpensive biometric sensors and computing power, it is becoming increasingly clear that widespread usage of biometric person identification is being stymied by our lack of understanding of three fundamental problems: (i) How to accurately and efficiently represent and recognize biometric patterns? (ii) How to guarantee that the sensed measurements are not fraudulent? and (iii) How to make sure that the application is indeed exclusively using pattern recognition for the expressed purpose (function creep [16])? Solving these core problems will not only catapult biometrics into mainstream applications but will also stimulate adoption of other pattern recognition applications for providing effective automation of sensitive tasks without jeopardizing our individual freedoms. For these reasons, we view biometrics as a grand challenge- "a fundamental problem in science and engineering with broad economic and scientific impact 1 ". 1.
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems
Identity Verification Using Speech And Face Information
- Digital Signal Processing
, 2004
"... This article first provides an review of important concepts in the field of information fusion, followed by a review of important milestones in audio--visual person identification and verification. Several recent adaptive and nonadaptive techniques for reaching the verification decision (i.e., to ac ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This article first provides an review of important concepts in the field of information fusion, followed by a review of important milestones in audio--visual person identification and verification. Several recent adaptive and nonadaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the nonadaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions.
Noise Compensation in a Person Verification System Using Face and Multiple Speech Features
- PATTERN RECOGNITION
, 2003
"... In this paper, we demonstrate that use ofa recently proposed feature set, termed Maximum Auto-Correlation Values, which utilizes information from the source part of the speech sigchf sighfqjEjkf improves the robustness of a text independent identity veri#cation system. We also propose an adaptive ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
In this paper, we demonstrate that use ofa recently proposed feature set, termed Maximum Auto-Correlation Values, which utilizes information from the source part of the speech sigchf sighfqjEjkf improves the robustness of a text independent identity veri#cation system. We also propose an adaptive fusion technique forintegzEfAT of audio and visual information in a multi-modal veri#cation system. The proposed technique explicitly measures the quality ofthe speech sigchf adjusting the amount ofcontribution ofthe speech modality to the #nal veri#cation decision. Results on the VidTIMIT database indicate that the proposed approach outperformsexisting adaptive and non-adaptive fusion techniques. For a wide rang of audio SNRs, the performance of the multi-modal systemutilizing the proposed technique is always found to be better than the performance ofthe face modality.
Improved Learning Algorithms for Mixture of Experts in Multiclass Classification
, 1999
"... Mixture of experts (ME) is a modular neural network architecture for supervised learning. A double-loop Expectation-Maximization (EM) algorithm has been introduced to the ME architecture for adjusting the parameters and the iteratively reweighted least squares (IRLS) algorithm is used to perform max ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Mixture of experts (ME) is a modular neural network architecture for supervised learning. A double-loop Expectation-Maximization (EM) algorithm has been introduced to the ME architecture for adjusting the parameters and the iteratively reweighted least squares (IRLS) algorithm is used to perform maximization in the inner loop [Jordan, M.I., Jacobs, R.A. (1994). Hierarchical mixture of experts and the EM algorithm, Neural Computation, 6(2), 181--214]. However, it is reported in literature that the IRLS algorithm is of instability and the ME architecture trained by the EM algorithm, where IRLS algorithm is used in the inner loop, often produces the poor performance in multiclass classification. In this paper, the reason of this instability is explored. We find out that due to an implicitly imposed incorrect assumption on parameter independence in multiclass classification, an incomplete Hessian matrix is used in that IRLS algorithm. Based on this finding, we apply the Newton--Raphson met...
Comparison of Clustering Algorithms in Speaker Identification
"... In speaker identification, we match a given (unkown) speaker to the set of known speakers in a database. The database is constructed from the speech samples of each known speaker. Feature vectors are extracted from the samples by short-term spectral analysis, and processed further by vector quantiza ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
In speaker identification, we match a given (unkown) speaker to the set of known speakers in a database. The database is constructed from the speech samples of each known speaker. Feature vectors are extracted from the samples by short-term spectral analysis, and processed further by vector quantization for locating the clusters in the feature space. We study the role of the vector quantization in the speaker identification system. We compare the performance of different clustering algorithms, and the influence of the codebook size. We want to find out, which method provides the best clustering result, and whether the difference in quality contribute to improvement in recognition accuracy of the system.
A Method of Combining Multiple Probabilistic Classifiers through Soft Competition on Different Feature Sets
- Neurocomputing
, 1998
"... A novel method is proposed for combining multiple probabilistic classifiers on different feature sets. In order to achieve the improved classification performance, a generalized finite mixture model is proposed as a linear combination scheme and implemented based on radial basis function networks. I ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
A novel method is proposed for combining multiple probabilistic classifiers on different feature sets. In order to achieve the improved classification performance, a generalized finite mixture model is proposed as a linear combination scheme and implemented based on radial basis function networks. In the linear combination scheme, soft competition on different feature sets is adopted as an automatic feature rank mechanism so that different feature sets can be always simultaneously used in an optimal way to determine linear combination weights. For training the linear combination scheme, a learning algorithm is developed based on Expectation---Maximization (EM) algorithm. The proposed method has been applied to a typical real-world problem, viz., speaker identification, in which different feature sets often need consideration simultaneously for robustness. Simulation results show that the proposed method yields good performance in speaker identification.
Unsupervised Speaker Recognition Based on Competition Between Self-Organizing Maps
, 2002
"... We present a method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given. Each speaker was modeled by a self-organizing map (SOM). The SOMs were randomly initiated. An iter ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We present a method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given. Each speaker was modeled by a self-organizing map (SOM). The SOMs were randomly initiated. An iterative algorithm allows the data move from one model to another and adjust the SOMs. The restriction that the data can move only in small groups but not by moving each and every feature vector separately force the SOMs to adjust to speakers (instead of phonemes or other vocal events). This method was applied to high-quality conversations with two to five participants and to two-speaker telephone-quality conversations. The results for two (both high- and telephone-quality) and three speakers were over 80 % correct segmentation. The problem becomes even harder when the number of participants is also unknown. Based on the iterative clustering algorithm a validity criterion was also developed to estimate the number of speakers. In 16 out of 17 conversations of high-quality conversations between two and three participants, the estimation of the number of the participants was correct. In telephone-quality the results were poorer.

