Results 1 - 10
of
15
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
The LIMSI Broadcast News Transcription System
- Speech Communication
, 2002
"... This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. T ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. Two main problems needed to be addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the signal (signal quality, environmental and transmission noise, music) and different linguistic styles (prepared and spontaneous speech on a wide range of topics, spoken by a large variety of speakers).
Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers
- IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's Wall-Street Journal corpus show that this scheme reduces errors by 25% over typical tied-mixture systems. New fast algorithms for computing Gaussian likelihoods--the most time-consuming aspect of continuous-density HMM systems--are also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30-821...
Partitioning and Transcription of Broadcast News Data
- ICSLP'98
, 1998
"... Radio and television broadcasts consist of a continuous stream of data comprised of segments of different linguistic and acoustic natures, which poses challenges for transcription. In this paper we report on our recent work in transcribing broadcast news data[2, 4], including the problem of partitio ..."
Abstract
-
Cited by 36 (17 self)
- Add to MetaCart
Radio and television broadcasts consist of a continuous stream of data comprised of segments of different linguistic and acoustic natures, which poses challenges for transcription. In this paper we report on our recent work in transcribing broadcast news data[2, 4], including the problem of partitioning the data into homogeneous segments prior to word recognition. Gaussian mixture models are used to identify speech and non-speech segments. A maximumlikelihood segmentation/clustering process is then applied to the speech segments using GMMs and an agglomerative clustering algorithm. The clustered segments are then labeled according to bandwidth and gender. The recognizer is a continuous mixture density, tied-state cross-word context-dependent HMM system with a 65k trigram language model. Decoding is carried out in three passes, with a final pass incorporating cluster-based test-set MLLR adaptation. The overall word transcription error on the Nov'97 unpartitioned evaluation test data was...
Language Modeling With Sentence-Level Mixtures
, 1994
"... Language models play an important role in improving the accuracy of a continuous speech recognizer. In this thesis, we introduce a new statistical language model which captures long term topic dependencies of words within and across sentences. The model includes two main contributions. First, we dev ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Language models play an important role in improving the accuracy of a continuous speech recognizer. In this thesis, we introduce a new statistical language model which captures long term topic dependencies of words within and across sentences. The model includes two main contributions. First, we develop a topic-dependent sentence-level mixture language model which takes advantage of the topic constraints in a sentence or a paragraph. Since this language model is not Markov and has a large search space, it is used only in the last stage of a multi-pass search strategy in the recognizer. Second, we introduce topic-dependent dynamic adaptation techniques in the framework of the mixture model. During the course of this thesis, we also investigate robust parameter estimation techniques, which are extremely important in light of the sparse data problems in language modeling. The model is implemented in the BU speech recognition system and provides a significant improvement in recognition accuracy. An important advantage of the framework of our model is that it is a simple extension of existing language modeling techniques that can easily be integrated with other language modeling advances.
Speech Recognition System Design Based on Automatically Derived Units
, 1999
"... In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function. In particular,
The LIMSI 1998 Hub-4E Transcription System
- IN PROC. OF THE DARPA BROADCAST NEWS WORKSHOP
, 1999
"... In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 official Hub-4E training data containing about 150 hours of transcribed speech material. 65K word language models were obtained by interpolation of backoff n-gram language models trained on different text data sets. Prior to word decoding a maximum likelihood partitioning algorithm segments the data into homogenous regions and assigns gender, bandwidth and cluster labels to the speech segments. Word decoding is carried out in three steps, integrating cluster-based MLLR acoustic model adaptation. The final decoding step uses a 4-gram languagemodel interpolated with a category trigram model. The main differences compared to last year's system arise from the use of additional acoustic and lang...
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
, 1997
"... This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1 ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1
Utterance Clustering For Large Vocabulary Continuous Speech Recognition
- in ‘Proceedings of the European Conference on Speech Technology
, 1995
"... Conventional speaker independent speech recognition systems are trained using data from many different speakers. Inter-speaker variability is a major problem because parametric representations of speech are highly speaker dependent. This paper describes a technique which allows speaker dependent par ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Conventional speaker independent speech recognition systems are trained using data from many different speakers. Inter-speaker variability is a major problem because parametric representations of speech are highly speaker dependent. This paper describes a technique which allows speaker dependent parameters to be considered when building a speaker independent speech recognition system. The technique is based on utterance clustering, where subsets of the training data are formed and the variability within each subset minimized. Cluster dependent connectionist models are then used to estimate phone probabilities as part of a hybrid connectionist hidden Markov model based large vocabulary talker independent speech recognition system. The system has been evaluated on the ARPA Wall Street Journal continuous speech recognition task. 1. INTRODUCTION Speaker dependent speech recognition systems are generated using training utterances from a single speaker, resulting in a system tuned to a spec...
Context-dependent alignment models for Statistical Machine Translation
- In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
, 2009
"... We introduce alignment models for Machine Translation that take into account the context of a source word when determining its translation. Since the use of these contexts alone causes data sparsity problems, we develop a decision tree algorithm for clustering the contexts based on optimisation of t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We introduce alignment models for Machine Translation that take into account the context of a source word when determining its translation. Since the use of these contexts alone causes data sparsity problems, we develop a decision tree algorithm for clustering the contexts based on optimisation of the EM auxiliary function. We show that our contextdependent models lead to an improvement in alignment quality, and an increase in translation quality when the alignments are used in Arabic-English and Chinese-English translation. 1

