Results 1 - 10
of
17
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
, 2003
"... We propose a method that combines a probabilistic phonetic feature hierarchy with support vector machines for segmentation of continuous speech into five classes - vowel, sonorant consonant, fricative, stop and silence. We show that by using the hierarchy, only four binary classifiers are required t ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
We propose a method that combines a probabilistic phonetic feature hierarchy with support vector machines for segmentation of continuous speech into five classes - vowel, sonorant consonant, fricative, stop and silence. We show that by using the hierarchy, only four binary classifiers are required to recognize the five classes. Due to the probabilistic nature of the hierarchy, the method overcomes the disadvantage of the traditional acoustic-phonetic methods where the error is carried down the hierarchy. In addition, the hierarchical approach allows the use of comparable amount of training data of two classes that each binary classifier is designed to discriminate. The segmentation method with 13 knowledge based parameters performs considerably better than a context-independent Hidden Markov Model (HMM) based approach that uses 39 mel-cepstrum based parameters.
Large margin hidden markov models for speech recognition
, 2005
"... In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muti-class separation margin. The approach is named as large margi ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muti-class separation margin. The approach is named as large margin HMM. Firstly, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Secondly, by imposing different constraints to the minimax problem, we propose three solutions to the large margin HMM estimation problem, namely the iterative localized optimization method, the constrained joint optimization method and the semidefinite pro-gramming (SDP) method. These new training methods are evaluated in the isolated E-set recognition task using ISOLET database and the TIDIGITS connected digit string recog-nition task. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods.
Nonnegativity Constraints in Numerical Analysis
"... A survey of the development of algorithms for enforcing nonnegativity constraints in scientific computation is given. Special emphasis is placed on such constraints in least squares computations in numerical linear algebra and in nonlinear optimization. Techniques involving nonnegative low-rank matr ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
A survey of the development of algorithms for enforcing nonnegativity constraints in scientific computation is given. Special emphasis is placed on such constraints in least squares computations in numerical linear algebra and in nonlinear optimization. Techniques involving nonnegative low-rank matrix and tensor factorizations are also emphasized. Details are provided for some important classical and modern applications in science and engineering. For completeness, this report also includes an effort toward a literature survey of the various algorithms and applications of nonnegativity constraints in numerical analysis. Key Words: nonnegativity constraints, nonnegative least squares, matrix and tensor factorizations, image processing, optimization.
A Support Vector Machines-Based Rejection Technique For Speech
- Proceedings of ICASSP'01
, 2001
"... Support Vector Machines represent a new approach to pattern classification developed from the theory of Structural Risk Minimization[1]. In this paper, we present an investigation into the application of Support Vector Machines' to the confidence measurement problem in speech recognition. Specifical ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Support Vector Machines represent a new approach to pattern classification developed from the theory of Structural Risk Minimization[1]. In this paper, we present an investigation into the application of Support Vector Machines' to the confidence measurement problem in speech recognition. Specifically, based on the results from an initial decoding of an utterance during speech recognition, we derive a feature vector consisting of parameters such as word score density, N-best word score density differences, relative word score and relative word duration as input to the confidence measurement process in which hypothetically correct utterances are accepted and utterances determined to be incorrect are rejected. We propose a new approach to training Support Vector Machines. In this paper, we have trained and tested a Support Vector Machines classifier and compared the results with other statistical classification methods.
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of
Segmentation of Continuous Speech Using Acoustic-Phonetic Parameters and Statistical Learning
- Learning,” Proceedings of 9th International Conference on Neural Information Processing
, 2002
"... In this paper, we present a methodology for combining acoustic-phonetic knowledge with statistical learning for automatic segmentation and classification of continuous speech. At present we focus on the recognition of broad classes - vowel, stop, fricative, sonorant consonant and silence. Judicious ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this paper, we present a methodology for combining acoustic-phonetic knowledge with statistical learning for automatic segmentation and classification of continuous speech. At present we focus on the recognition of broad classes - vowel, stop, fricative, sonorant consonant and silence. Judicious use is made of 13 knowledge-based acoustic parameters (APs) and support vector machines (SVMs). It has been shown earlier that SVMs perform comparable to hidden Markov models (HMMs) for detection of stop consonants. We achieve performance on segmentation of continuous speech better than the HMM based approach that uses 39 cepstrum-based speech parameters.
Speech recognition using randomized relational decision trees
- IEEE Transactions on Speech and Audio Processing 9
, 1999
"... Abstract—We explore the possibility of recognizing speech signals using a large collection of coarse acoustic events, which describe temporal relations between a small number of local features of the spectrogram. The major issue of invariance to changes in duration of speech signal events is address ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract—We explore the possibility of recognizing speech signals using a large collection of coarse acoustic events, which describe temporal relations between a small number of local features of the spectrogram. The major issue of invariance to changes in duration of speech signal events is addressed by defining temporal relations in a rather coarse manner, allowing for a large degree of slack. The approach is greedy in that it does not offer an “explanation” of the entire signal as the hidden Markov models (HMMs) approach does; rather, it accesses small amounts of relational information to determine a speech unit or class. This implies that we recognize words as units, without recognizing their subcomponents. Multiple randomized decision trees are used to access the large pool of acoustic events in a systematic manner and are aggregated to produce the classifier. Index Terms—Classification, decision trees, labeled graphs, spectogram, speech recognition. I.
Speech Recognition Using Acoustic Landmarks and Binary Phonetic Feature Classifiers
, 2003
"... In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal. But an acoustic phonetic system that carries out large ASR speech recognition tasks, for example, connected word or continuous speech recognition, does not exist. We propose a probabilistic and statistical framework for ASR based on the knowledge of acoustic phonetics for connected word ASR. The proposed system is based on the idea of representation of speech sounds by bundles of binary valued articulatory phonetic features. The probabilistic framework requires only binary classifiers of phonetic features and the knowledge based acoustic correlates of the features for the purpose of connected word speech recognition. We explore the use of Support Vector Machines (SVMs) for binary phonetic feature classification because of the favorable properties well suited to our recognition task that SVMs o#er. In the proposed method, probabilistic segmentation of speech is obtained using SVM based classifiers of manner phonetic features. The linguistically motivated landmarks obtained in each segmentation is used for classification of source and place phonetic features. Probabilistic segmentation paths are constrained using Finite State Automata (FSA) for isolated or connected word recognition. The proposed method could overcome the disadvantages ...
Bourennane, “SVM Approximation for Real-time Image Segmentation by Using an Improved Hyperrectangles-based Method, Real Time imaging
- Elsevier
, 2003
"... A real-time implementation of an approximation of the support vector machine decision rule is proposed. This method is based on an improvement of a supervised classification method using hyperrectangles, which is useful for real-time image segmentation. The final decision combines the accuracy of th ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A real-time implementation of an approximation of the support vector machine decision rule is proposed. This method is based on an improvement of a supervised classification method using hyperrectangles, which is useful for real-time image segmentation. The final decision combines the accuracy of the SVM learning algorithm and the speed of a hyperrectangles-based method. We review the principles of the classification methods and we evaluate the hardware implementation cost of each method. We present the combination algorithm which consists of rejecting ambiguities in the learning set using SVM decision, before using the learning step of the hyperrectangles-based method. We present results obtained using Gaussian distribution and give an example of image segmentation from an industrial inspection problem. The results are evaluated regarding hardware cost as well as classification performances. Running headline: SVM approximation for image segmentation

