Results 1 - 10
of
19
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
, 1997
"... This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1 ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research
Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach
, 2005
"... Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the ad-vent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of rep-resentation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because deci-sion surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and lin-guistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisa-
Mapping context dependent acoustic information into context independent form by LVQ
, 1994
"... In the framework of phonemic speech recognition using Hidden Markov Models (HMMs) together with codebooks trained by Learning Vector Quantization (LVQ), a novel way to model context-dependencies in speech is presented. We use LVQ to map acoustic contextual data into context-independent phonemic form ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In the framework of phonemic speech recognition using Hidden Markov Models (HMMs) together with codebooks trained by Learning Vector Quantization (LVQ), a novel way to model context-dependencies in speech is presented. We use LVQ to map acoustic contextual data into context-independent phonemic form. The acoustic data is in the form of concatenated averages of successive short-time feature vectors. This mapping eliminates the need to employ context dependent phonemic, for example, triphone HMMs, and the difficulties associated therein. Instead, simpler context-independent discrete observation HMMs suffice. We report excellent results for a speaker dependent task for Finnish. Zusammenfassung Wir diskutieren ein neues Modell von Kontextabhangigkeiten in der phonemischen Spracherkennung. Unser Modell basiert auf den Methoden der Hidden-Markov-Modelle (HMM) und der lernende Vektorquantisierung (LVQ). Wir benutzen LVQ, um Informationen uber den akustischen Kontext in eine kontextunabhang...
Structural Representation of Speech for Phonetic Classification
- In: Proc. 17th ICPR. Volume 3
, 2004
"... This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition (ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive features which is a linguistically well-motivated alternative to the "beads ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper explores the issues involved in using symbolic metric algorithms for automatic speech recognition (ASR), via a structural representation of speech. This representation is based on a set of phonological distinctive features which is a linguistically well-motivated alternative to the "beads-on-a-string" view of speech that is standard in current ASR systems. We report the promising results of phoneme classification experiments conducted on a standard continuous speech task.
A Continuous Density Interpretation of Discrete HMM Systems and MMI-Neural Networks
- IEEE Transactions on Speech and Audio Processing
, 2001
"... The subject of this paper is the integration of the traditional vector quantizer (VQ) and discrete hidden Markov models (HMM) combination in the mixture emission density framework commonly used in automatic speech recognition (ASR). It is shown that the probability density of a system that consists ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The subject of this paper is the integration of the traditional vector quantizer (VQ) and discrete hidden Markov models (HMM) combination in the mixture emission density framework commonly used in automatic speech recognition (ASR). It is shown that the probability density of a system that consists of a VQ and a discrete classifier can be interpreted as a special case of a semicontinuous mixture model. Thus, the VQ parameters and the classifier can be trained jointly. In this framework, a gradient based VQ training method for single and multiple feature stream systems is derived. This leads to an approach that is directly related to the paradigm of maximum mutual information (MMI) neural networks, that has been successfully applied as VQ in ASR earlier. In continuous speech recognition experiments that were carried out for the Resource Management and Wall Street Journal databases the presented systems achieve recognition accuracies that compete well with comparable Gaussian mixture HMMs. Thus, we demonstrate that the performance degradations, often reported for discrete HMM systems, are not mainly caused by the vector quantization process in itself, but that they are due to the traditional separation of the VQ and the HMM during parameter estimation. These degradations can be avoided by training of the entire system as described here, while keeping the attractive computational speed of discrete HMMs.
Spatiotemporal Pattern Recognition via Liquid State Machines
"... Abstract — The applicability of complex networks of spiking neurons as a general purpose machine learning technique remains open. Building on previous work using macroscopic exploration of the parameter space of an (artificial) neural microcircuit, we investigate the possibility of using a liquid st ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract — The applicability of complex networks of spiking neurons as a general purpose machine learning technique remains open. Building on previous work using macroscopic exploration of the parameter space of an (artificial) neural microcircuit, we investigate the possibility of using a liquid state machine to solve two real-world problems: stockpile surveillance signal alignment and spoken phoneme recognition. I.
Phoneme classification over reconstructed phase space using principal component analysis
- proceedings of ISCA Tutorial and Research Workshop on Non-linear Speech Processing (NOLISP), Le Croisic
, 2003
"... Abstract- Although isolated phoneme classification using features from time-domain phase space reconstruction has been investigated recently, the best representation of feature vectors for the discriminability over phoneme classes is still an open question. This paper applies Principal Component Ana ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract- Although isolated phoneme classification using features from time-domain phase space reconstruction has been investigated recently, the best representation of feature vectors for the discriminability over phoneme classes is still an open question. This paper applies Principal Component Analysis (PCA) to feature vectors from the reconstructed phase space. By using PCA projection, the basis of the feature space is orthogonalized. A Bayes classifier uses the transformed feature vectors to classify phoneme exemplars. The results show that the classification accuracy with the PCA method surpasses the accuracy using only original features in most cases. PCA projection was implemented in three ways over the reconstructed phase space on both speaker-dependent and speaker-independent data. Models are trained and tested using data drawn from the TIMIT database. I.
Controlling the Complexity of HMM Systems by Regularization
"... This paper introduces a method for regularization of HMM systems that avoids parameter overfitting caused by insufficient training data. Regularization is done by augmenting the EM training method by a penalty term that favors simple and smooth HMM systems. The penalty term is constructed as a m ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper introduces a method for regularization of HMM systems that avoids parameter overfitting caused by insufficient training data. Regularization is done by augmenting the EM training method by a penalty term that favors simple and smooth HMM systems. The penalty term is constructed as a mixture model of negative exponential distributions that is assumed to generate the state dependent emission probabilities of the HMMs. This new method is the successful transfer of a well known regularization approach in neural networks to the HMM domain and can be interpreted as a generalization of traditional state-tying for HMM systems. The effect of regularization is demonstrated for continuous speech recognition tasks by improving overfitted triphone models and by speaker adaptation with limited training data. 1 Introduction One general problem when constructing statistical pattern recognition systems is to ensure the capability to generalize well, i.e. the system must be able ...
EXPLOITING PHONETIC AND PHONOLOGICAL SIMILARITIES AS A FIRST STEP FOR ROBUST SPEECH RECOGNITION
"... This paper presents two speech recognition systems which use the notion of phonetic and phonological similarity to improve the robustness of phoneme recognition. The first recognition system, YASPER, uses phonetic feature extraction engines to identify phonemes based on overlap relations between pho ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents two speech recognition systems which use the notion of phonetic and phonological similarity to improve the robustness of phoneme recognition. The first recognition system, YASPER, uses phonetic feature extraction engines to identify phonemes based on overlap relations between phonetic features. The second system uses the CMU Sphinx 3.7 decoder based on statistical context-dependent phone models. Experiments have been carried out on the TIMIT corpus which show improvements in phoneme error rate when a projection set constructed with respect to phonetic and phonological similarity is used. It is envisaged that in future, the two systems will provide alternative parallel streams of hypotheses for each interval of the speech signal and will work together as experts in the phoneme recognition process. 1.

