Results 1 -
3 of
3
Kernel Metric Learning For Phonetic Classification
"... Abstract—While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic ..."
Abstract
- Add to MetaCart
Abstract—While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework. I.
SVM-HMM LANDMARK BASED SPEECH RECOGNITION
"... Support vector machines (SVMs) are trained to detect acoustic-phonetic landmarks, and to identify both the manner and place of articulation of the phones producing each landmark with high accuracy. The discriminant outputs of these SVMs are used as input features for a standard HMM based ASR system. ..."
Abstract
- Add to MetaCart
Support vector machines (SVMs) are trained to detect acoustic-phonetic landmarks, and to identify both the manner and place of articulation of the phones producing each landmark with high accuracy. The discriminant outputs of these SVMs are used as input features for a standard HMM based ASR system. There is a significant improvement in both the phone and word recognition accuracy when using these SVM discriminant features when compared to the phone and word recognition accuracy of an MFCC based recognizer.
Advisor:
"... In this thesis, we describe a biometric authentication system that is capable of recognizing its users’ voice using advanced machine learning and digital signal processing tools. The proposed system can both validate a person’s identity (i.e. verification) and recognize it from a larger known group ..."
Abstract
- Add to MetaCart
In this thesis, we describe a biometric authentication system that is capable of recognizing its users’ voice using advanced machine learning and digital signal processing tools. The proposed system can both validate a person’s identity (i.e. verification) and recognize it from a larger known group of people (i.e. identification). We designed the entire speaker recognition system to be integrated into the Siebel Center’s infrastructure, and named it “Biometric Authentication System for the Siebel Center (BASS)”. The main idea is to extract discriminative characteristics of an individual’s voiceprint, and employ them to train classifiers using binary classification. We formed the training data set by recording 11 speakers ’ voices in a laboratory environment. The majority of the speakers were from different nations, with different language backgrounds and therefore various accents. They were considered to be a subset of the Siebel Center community. We asked them to speak 13 words including numeric digits (0-9) and proper nouns, and used triplet combinations of these words as passwords. We chose Mel-Frequency Cepstral Coefficients to represent the voice signals for forming frame-based feature vectors. With these we trained Support Vector Machine and Artificial Neural Network classifiers using “One vs. all ” strategy. We tested our recognition

