Results 11 - 20
of
30
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research
Speech Recognition for an Information Kiosk
- Proc. ICSLP 96
, 1996
"... In the context of the ESPRIT MASK project we face the problem of adapting a "state-of-the-art" laboratory speech recognizer for use in the real world with naive users. The speech recognizer is a softwareonly system that runs in real-time on a standard Risc processor. All aspects of the speech recogn ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
In the context of the ESPRIT MASK project we face the problem of adapting a "state-of-the-art" laboratory speech recognizer for use in the real world with naive users. The speech recognizer is a softwareonly system that runs in real-time on a standard Risc processor. All aspects of the speech recognizer have been reconsidered from signal capture to adaptive acoustic models and language models. The resulting system includes such features as microphone selection, response cancellation, noise compensation, query rejection capability and decoding strategies for real-time recognition. 1. INTRODUCTION In this paper we address issues that must be faced in adapting a "state-of-the-art" speech recognizer developed in a laboratory for real-world use. All aspects of the speech recognizer must be reconsidered from signal capture to adaptive acoustic and language models. We have confronted these issues in the context of the ESPRIT MASK (Multimodal-Multimedia Automated Service Kiosk) project, aime...
Gaussian Selection Applied to Text-Independent Speaker Verification
- In Proc. Speaker Odyssey 2001
, 2001
"... Fast speaker verification systems can be realised by reducing the computation associated with searching of mixture components within the statistical model such as a Gaussian mixture model, GMM. Several improvements regarding computational efficiency have already been proposed ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Fast speaker verification systems can be realised by reducing the computation associated with searching of mixture components within the statistical model such as a Gaussian mixture model, GMM. Several improvements regarding computational efficiency have already been proposed
A New Approach To Generalized Mixture Tying For Continuous HMM-Based Speech Recognition
- Proc. EUROSPEECH, Rhodes
, 1997
"... In this paper we present a new approach for a generalized tying of mixture components for continuous mixture-density HMM-based speech recognition systems. With an iterative pruning and splitting procedure for the mixture components, this approach offers a very accurate and detailed representation of ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper we present a new approach for a generalized tying of mixture components for continuous mixture-density HMM-based speech recognition systems. With an iterative pruning and splitting procedure for the mixture components, this approach offers a very accurate and detailed representation of the acoustic space and at the same time keeps the number of parameters reasonably small in favor of a robust parameter estimation and a fast decoding. Contrary to other approaches, it does not require a strict clustering of the pdfs into subsets that share their mixture components, so that it is capable of providing more general and flexible types of mixture tying. We applied the new approach on a semi-continuous HMM (SCHMM)-system for the Resource Management task and improved its recognition performance by 12% and vastly accelerated the decoding because of a much faster likelihood computation. 1. INTRODUCTION In continuous mixture-density HMM-based speech recognition systems the HMM stat...
Use Of Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1996
"... This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMM-based systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recogni ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMM-based systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a \Theta3 reduction in likelihood computation. To explain this degradation, this paper investigates the trade-offs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster, enabled the recognition accuracy on a 5k speaker-independent task to be maintained up to a \Theta5 reduction in likelihood computation. 1. INTRODUCTION In recent years, high accuracy large vocabulary continuous speech recognition sys...
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...
Decision-Tree Based Quantization Of The Feature Space Of A Speech Recognizer
- In Proceedings of the European Conference on Speech Communication and Technology
, 1997
"... We present a decision-tree based procedure to quantize the feature-space of a speech recognizer, with the motivation of reducing the computation time required for evaluating gaussians in a speech recognition system. The entire feature space is quantized into non overlapping regions where each region ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present a decision-tree based procedure to quantize the feature-space of a speech recognizer, with the motivation of reducing the computation time required for evaluating gaussians in a speech recognition system. The entire feature space is quantized into non overlapping regions where each region is bounded by a number of hyperplanes. Further, each region is characterized by the occurence of only a small number of the total alphabet of allophones (sub-phonetic speech units); by identifying the region in which a test feature vector lies, only the gaussians that model the density of allophones that exist in that region need be evaluated. The quantization of the feature space is done in a heirarchical manner using a binary decision tree. Each node of the decision tree represents a region of the feature space, and is further characterized by a hyperplane (a vector v n and a scalar threshold value hn ), that subdivides the region corresponding to the current node into two non-overlapping...
Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems
- in Proceedings of ICSLP
, 2004
"... Large vocabulary continuous speech recognition systems are known to be computationally intensive. A major bottleneck is the Gaussian mixture model (GMM) computation and various techniques have been proposed to address this problem. We present a systematic study of fast GMM computation techniques. As ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Large vocabulary continuous speech recognition systems are known to be computationally intensive. A major bottleneck is the Gaussian mixture model (GMM) computation and various techniques have been proposed to address this problem. We present a systematic study of fast GMM computation techniques. As there are a large number of these and it is impractical to exhaustively evaluate all of them, we first categorized techniques into four layers and selected representative ones to evaluate in each layer. Based on this framework of study, we provide a detailed analysis and comparison of GMM computation techniques from the four-layer perspective and explore two subtle practical issues, 1) how different techniques can be combined effectively and 2) how beam pruning will affect the performance of GMM computation techniques. All techniques are evaluated in the CMU Communicator domain. We also compare their performance with others reported in the literature. 1.
Abstract Gaussian-selection-based non-optimal search for speaker identification
, 2005
"... Most speaker identification systems train individual models for each speaker. This is done as individual models often yield better performance and they permit easier adaptation and enrollment. When classifying a speech token, the token is scored against each model and the maximum a priori decision r ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Most speaker identification systems train individual models for each speaker. This is done as individual models often yield better performance and they permit easier adaptation and enrollment. When classifying a speech token, the token is scored against each model and the maximum a priori decision rule is used to decide the classification label. Consequently, the cost of classification grows linearly for each token as the population size grows. When considering that the number of tokens to classify is also likely to grow linearly with the population, the total work load increases exponentially. This paper presents a preclassifier which generates an N-best hypothesis using a novel application of Gaussian selection, and a transformation of the traditional tail test statistic which lets the implementer specify the tail region in terms of probability. The system is trained using parameters of individual speaker models and does not require the original feature vectors, even when enrolling new speakers or adapting existing ones. As the correct class label need only be in the N-best hypothesis set, it is possible to prune more Gaussians than in a traditional Gaussian selection application. The N-best hypothesis set is then evaluated using individual speaker models, resulting in an overall reduction of workload.
Transformation Streams and the HMM Error Model
- Computer Speech and Language
, 2001
"... The most popular model used in automatic speech recognition is the hidden Markov model (HMM). Though good performance has been obtained with such models there are well known limitations for its ability to model speech. A variety of modications to the standard HMM topology have been proposed to handl ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The most popular model used in automatic speech recognition is the hidden Markov model (HMM). Though good performance has been obtained with such models there are well known limitations for its ability to model speech. A variety of modications to the standard HMM topology have been proposed to handle these problems. One such scheme is the factorial HMM. This paper introduces a new form of factorial HMM which makes use of transformation streams. This new scheme is a generalisation of the standard factorial HMM and other related schemes in speech processing. A particular form of this model, the HMM error model (HEM) is described in detail. The HEM is evaluated on two standard large vocabulary speaker independent speech recognition tasks. On both tasks signicant reductions in word error rate are obtained over standard HMM-based systems. 2 1

