Results 1 - 10
of
19
Generalized Linear Discriminant Sequence Kernels For Speaker Recognition
, 2002
"... Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather tha ..."
Abstract
-
Cited by 50 (9 self)
- Add to MetaCart
Support Vector Machines have recently shown dramatic performance gains in many application areas. We show that the same gains can be realized in the area of speaker recognition via sequence kernels. A sequence kernel provides a numerical comparison of speech utterances as entire sequences rather than a probability at the frame level. We introduce a novel sequence kernel derived from generalized linear discriminants. The kernel has several advantages. First, the kernel uses an explicit expansion into "feature space"--this property allows all of the support vectors to be collapsed into a single vector creating a small speaker model. Second, the kernel retains the computational advantage of generalized linear discriminants trained using mean-squared error training. Finally, the kernel shows dramatic reductions in equal error rates over standard mean-squared error training in matched and mismatched conditions on a NIST speaker recognition task.
Support vector machines for speaker and language recognition
- Computer Speech and Language
, 2006
"... ..."
Support vector machines for speaker verification and identification
- IEEE Proceeding
, 2000
"... Abstract. In this paper the performance of the support vector machine (SVM) on a speaker verification task is assessed. Since speaker verification requires binary decisions, support vector machines seem to be a promising candidate to perform the task. A new technique for normalising the polynomial k ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Abstract. In this paper the performance of the support vector machine (SVM) on a speaker verification task is assessed. Since speaker verification requires binary decisions, support vector machines seem to be a promising candidate to perform the task. A new technique for normalising the polynomial kernel is developed and used to achieve performance comparable to other classifiers on the YOHO database. We also present results on a speaker identification task.
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification
- INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
, 1997
"... In practical applications of pattern recognition, there are often different features extracted from raw data which needs recognizing. Methods of combining multiple classifiers with different features are viewed as a general problem in various application areas of pattern recognition. In this paper, ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
In practical applications of pattern recognition, there are often different features extracted from raw data which needs recognizing. Methods of combining multiple classifiers with different features are viewed as a general problem in various application areas of pattern recognition. In this paper, a systematic investigation has been made and possible solutions are classified into three frameworks, i.e. linear opinion pools, winnertake -all and evidential reasoning. For combining multiple classifiers with different features, a novel method is presented in the framework of linear opinion pools and a modified training algorithm for associative switch is also proposed in the framework of winner-take-all. In the framework of evidential reasoning, several typical methods are briefly reviewed for use. All aforementioned methods have already been applied to text-independent speaker identification. The simulations show that results yielded by the methods described in this paper are better than...
Spectral Features for Automatic Text-Independent Speaker Recognition
, 2003
"... Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but e#ective representation that is more stable and discriminative than the original signal. Since the front-end is the first component ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but e#ective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is strongly determined by the quality of the front-end. In other words, classification can be at most as accurate as the features.
CNeT: Competitive Neural Trees for Pattern Classification
- Proceedings of the IEEE International Conference on Neural Networks
, 1996
"... : This paper introduces Competitive Neural Trees (CNeT) for pattern classification. The CNeT performs hierarchical classification and employ competitive unsupervised learning at the node level. The generalization ability of the CNeT is guaranteed by forward pruning, which is an inherent part of the ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
: This paper introduces Competitive Neural Trees (CNeT) for pattern classification. The CNeT performs hierarchical classification and employ competitive unsupervised learning at the node level. The generalization ability of the CNeT is guaranteed by forward pruning, which is an inherent part of the learning process. Different search methods for the CNeT are introduced and used for training and recall. The influence of different search methods on the performance of the CNeT is experimentally evaluated.
A Sequence Kernel and its Application to Speaker Recognition
- in Neural Information Processing Systems 14
, 2001
"... A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training.
Speaker recognition — general classifier approaches and data fusion methods
- Pattern Recognition
, 2002
"... ..."
Sample-adaptive product quantization: Asymptotic analysis and examples
- IEEE Trans. Signal Processing
, 2000
"... Abstract—Vector quantization (VQ) is an efficient data compression technique for low bit rate applications. However, the major disadvantage of VQ is that its encoding complexity increases dramatically with bit rate and vector dimension. Even though one can use a modified VQ, such as the tree-structu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—Vector quantization (VQ) is an efficient data compression technique for low bit rate applications. However, the major disadvantage of VQ is that its encoding complexity increases dramatically with bit rate and vector dimension. Even though one can use a modified VQ, such as the tree-structured VQ, to reduce the encoding complexity, it is practically infeasible to implement such a VQ at a high bit rate or for large vector dimensions because of the huge memory requirement for its codebook and for the very large training sequence requirement. To overcome this difficulty, a structurally constrained VQ called the sample-adaptive product quantizer (SAPQ) has recently been proposed. In this paper, we extensively study the SAPQ that is based on scalar quantizers in order to exploit the simplicity of scalar quantization. Through an asymptotic distortion result, we discuss the achievable performance and the relationship between distortion and encoding complexity. We illustrate that even when SAPQ is based on scalar quantizers, it can provide VQ-level performance. We also provide numerical results that show a 2–3 dB improvement over the Lloyd–Max quantizers for data rates above 4 b/point. Index Terms—Lattice vector quantizer, product quantizer, sample-adaptive product quantizer (SAPQ), vector quantizer. I.

