Results 1 - 10
of
16
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 156 (37 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Identifying Perceptually Similar Languages Using Teager Energy Based Cepstrum
"... Abstract — Language Identification (LID) refers to the task of identifying an unknown language from the test utterances. In this paper, a new method of feature extraction, viz., Teager Energy Based Mel Frequency Cepstral Coefficients (T-MFCC) is developed for identification of perceptually similar l ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Language Identification (LID) refers to the task of identifying an unknown language from the test utterances. In this paper, a new method of feature extraction, viz., Teager Energy Based Mel Frequency Cepstral Coefficients (T-MFCC) is developed for identification of perceptually similar languages. Finally, an LID system is presented for Hindi and Urdu (perceptually similar Indian languages) to demonstrate effectiveness of newly proposed feature set with short discussion on experimental results. Keywords- Language identification, Teager Energy Operator (TEO), Mel cepstrum, polynomial classifier, discriminative training. L I.
Speaker Identification Using Probabilistic PCA Model Selection
"... Gaussian mixture model (GMM) techniques are popular for speaker identification. Theoretically, each Gaussian function should have a full covariance matrix. However, the diagonal covariance matrix is usually used because the inverse of diagonal covariance matrix can be easily calculated via expectati ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Gaussian mixture model (GMM) techniques are popular for speaker identification. Theoretically, each Gaussian function should have a full covariance matrix. However, the diagonal covariance matrix is usually used because the inverse of diagonal covariance matrix can be easily calculated via expectation maximization (EM) algorithm. This paper proposes a new probabilistic principal component analysis (PPCA) model for speaker identification. The full covariance of speaker’s data is considered. This model is originated from factor analysis theory. The probability distributions using PPCA are well defined. In particular, GMM and PPCA are found to be equivalent when using diagonal covariance matrix. In this study, we derive a novel PPCA model selection and establish models for different speakers. Applying PPCA model selection, we can dynamically determine the numbers of speech features and mixture components. Experiments show that PPCA achieves desirable speaker recognition performance with proper model regularization. 1.
A Text Independent Speaker Recognition System Using a Novel Parametric Neural Network
"... This paper presents a new Speaker Recognition Technique aimed at high identification accuracy and low impostor acceptance. This method is based on a modified neural network, which is an extended and improved version of a Self-Organizing Map in multiple dimensions. The goal of this methodology is to ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents a new Speaker Recognition Technique aimed at high identification accuracy and low impostor acceptance. This method is based on a modified neural network, which is an extended and improved version of a Self-Organizing Map in multiple dimensions. The goal of this methodology is to achieve high accuracy identification and impostor rejection. The proposed method, Multiple Parametric Self-Organizing Maps (M-PSOM) is a classification and verification technique. This novel method was successfully implemented and tested using the CSLU Speaker Recognition Corpora of the Oregon School of Engineering with excellent results. This method builds a unique parametric neural network for each speaker as opposed to a single neural network for the whole system as it has been done in the past. With this technology a parametric neural network is a unique representation of a speaker’s acoustic signature.
A New Data Fusion Technique and Performance Measure for Identification of Twins in Marathi
"... Abstract. Speaker Recognition (SR) is an economic method of biometrics because of availability of low cost and high power computers. An important question which must be answered for the SR system is how well the system resists the effects of determined mimics such as those based on physiological cha ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Speaker Recognition (SR) is an economic method of biometrics because of availability of low cost and high power computers. An important question which must be answered for the SR system is how well the system resists the effects of determined mimics such as those based on physiological characteristics especially identical twins or triplets. In this paper, a new data fusion technique (viz., majority rule for combining evidence from different feature sets) and a new performance measure is proposed for speaker identification of twins in an Indian language, viz., Marathi. The results have been compared with baseline SR system designed by using Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC) as input feature vectors and polynomial classifiers of 2 nd and 3 rd order approximation for speaker modeling. 1
Lead Contractor for this Deliverable: JR (partner 20)
, 2002
"... alternatives and advanced solutions for ..."
(Show Context)
A Probabilistic Approach to Processing Imperfect Transcription in Real World Speech Recognition
"... In modern speech recognition technology, it is very important to get verbatim correct word labels for speech data in order to train a speech recognizer with good performance. However, in many real world speech recognition tasks, it is very difficult to find appropriate segments of completely correct ..."
Abstract
- Add to MetaCart
(Show Context)
In modern speech recognition technology, it is very important to get verbatim correct word labels for speech data in order to train a speech recognizer with good performance. However, in many real world speech recognition tasks, it is very difficult to find appropriate segments of completely correct word labels for speech data. We often have to work under a non-ideal environment in which there are only speech data with non-verbatim transcriptions available. In this project the common problems including mismatches between transcription and speech data and the inappropriate segmentation of the word labels, which are often faced by many real world speech recognition tasks, are explored; and a probabilistic approach dealing with automatic segmentation and flexible alignment is introduced to solve these problems. The results of the approach are analyzed and a further enhanced approach which integrates the EM algorithm into the basic one is constructed. The results of all approaches, including the baseline one in which no probabilistic preprocessing is applied, are compared with each other. Several suggestions for future experiments in this area are discussed at last. The main findings are that the proposed approach does improve the performance of the real world speech recognition while reducing the human work involved in
Effectiveness of LP Based Features for Identification of Professional Mimics in Indian Languages
"... Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics whi ..."
Abstract
- Add to MetaCart
(Show Context)
Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional mimics. In this paper, we demonstrate the effectiveness of Linear Prediction (LP) based features viz. Linear Prediction Coefficients (LPC) and Linear Prediction Cepstral Coefficients (LPCC) over filterbank based features such as Mel-Frequency Cepstral Coefficients (MFCC) and newly proposed Teager energy based MFCC (T-MFCC) for the identification of professional mimics in Marathi and Hindi languages. 1.
Scalable Optimal Linear Representation for Face and Object Recognition
"... Optimal Component Analysis (OCA) is a linear method for feature extraction and dimension reduction. It has been widely used in many applications such as face and object recognitions. The optimal basis of OCA is obtained through solving an optimization problem on a Grassmann manifold. However, one li ..."
Abstract
- Add to MetaCart
(Show Context)
Optimal Component Analysis (OCA) is a linear method for feature extraction and dimension reduction. It has been widely used in many applications such as face and object recognitions. The optimal basis of OCA is obtained through solving an optimization problem on a Grassmann manifold. However, one limitation of OCA is the computational cost becoming heavy when the number of training data is large, which prevents OCA from efficiently applying in many real applications. In this paper, a scalable OCA (S-OCA) that uses a two-stage strategy is developed to bridge this gap. In the first stage, we cluster the training data using K-means algorithm and the dimension of data is reduced into a low dimensional space. In the second stage, OCA search is performed in the reduced space and the gradient is updated using an numerical approximation. In the process of OCA gradient updating, instead of choosing the entire training data, S-OCA randomly chooses a small subset of the training images in each class to update the gradient. This brings the stochastic property to the OCA gradient updating and at the same time reduces the searching time of OCA in magnitude order. Experimental results on face and object datasets show efficiency of the S-OCA method, in term of both classification accuracy and computational complexity. 1.
Florida International University, Fall 2003 A Speaker Recognition System Using Multiple Parametric Self-Organizing Maps A Speaker Recognition System Using Multiple Parametric
"... Abstract – Speaker Recognition is the process of automatically recognizing a person’s voice, which allows systems to automatically verify identity in applications such as banking by telephone or forensic science. This paper describes a new Speaker Recognition Technique. The goal of it is to achieve ..."
Abstract
- Add to MetaCart
Abstract – Speaker Recognition is the process of automatically recognizing a person’s voice, which allows systems to automatically verify identity in applications such as banking by telephone or forensic science. This paper describes a new Speaker Recognition Technique. The goal of it is to achieve high accuracy identification and impostor rejection. This novel method, Multiple Parametric Self-Organizing Maps (MPSOM) is a classification and verification technique. The new method was successfully implemented and tested using the CSLU Speaker Recognition Corpora of the Oregon School of Engineering with excellent results.