Results 1 - 10
of
14
The LIMSI Broadcast News Transcription System
- Speech Communication
, 2002
"... This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. T ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
This paper reports on activites at LIMSI over the last few years directed at the transcription of broadcast news data. We describe our development work in moving from laboratory read speech data to real-world or `found' speech data in preparation for the ARPA Nov96, Nov97 and Nov98 evaluations. Two main problems needed to be addressed to deal with the continuous flow of inhomogenous data. These concern the varied acoustic nature of the signal (signal quality, environmental and transmission noise, music) and different linguistic styles (prepared and spontaneous speech on a wide range of topics, spoken by a large variety of speakers).
Algorithms For Bigram And Trigram Word Clustering
- Speech Communication
, 1995
"... . This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
. This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large corpora (38 million words and more) can be clustered within a small number of days or even hours. 3) We extend the clustering approach from bigrams to trigrams. 4) We present experimental results on a 38 million word training corpus. 1. INTRODUCTION Word equivalence classes are a method for improving undertrained word M--gram language models [1], [2], [4]. Words are grouped into classes, and each word belongs to only one such class. Thus, if a word pair is not seen in training, it is quite likely that the corresponding class pair is seen. For bigram and trigram class models, we have the equations p(wn jwn\Gamma1 ) = p0(wn jG(wn)) (1) \Deltap 1(G(wn)jG(wn\Gamma1)) p(wn jwn\Gamma2 ; wn\Gam...
Modeling Out-Of-Vocabulary Words For Robust Speech Recognition
, 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.
Statistical language model adaptation: review and perspectives
- Speech Communication
, 2004
"... Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for this mismatch. More generally, an adaptive language model seeks to maintain an adequate representation of the current task domain under changing conditions involving potential variations in vocabulary, syntax, content, and style. This paper presents an overview of the major approaches proposed to address this issue, and offers some perspectives regarding their comparative merits and associated tradeoffs. Ó 2003 Elsevier B.V. All rights reserved. 1.
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Retrieval Methods for English-Text with Missrecognized OCR Characters
"... This paper presents three probabilistic text retrieval methods designed to carry out a full-text search of English documents containing OCR errors. By searching for any query term on the premise that there are errors in the recognized text, the methods presented can tolerate such errors, and therefo ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper presents three probabilistic text retrieval methods designed to carry out a full-text search of English documents containing OCR errors. By searching for any query term on the premise that there are errors in the recognized text, the methods presented can tolerate such errors, and therefore costly manual postediting is not required after OCR recognition. In the applied approach, confusion matrices are used to store characters which are likely to be interchanged when a particular character is missrecognized, and the respective probability of each occurrence. Moreover, a 2-gram matrix is used to store probabilities of character connection, i.e., which letter is likely to come after another. Multiple search terms are generated for an input query term by making reference to confusion matrices, after which a full-text search is run for each search term. The validity of retrieved terms is determined based on error-occurrence and characterconnection probabilities. The performance o...
Category-Based Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams.
Assessment Of Smoothing Methods And Complex Stochastic Language Modeling
- IN PROCEEDINGS OF THE 6TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY
"... This paper studies the overall effect of language modeling on perplexity and word error rate, starting from a trigram model with a standard smoothing method up to complex state--of--the-- art language models: (1) We compare different smoothing methods, namely linear vs. absolute discounting, interpo ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper studies the overall effect of language modeling on perplexity and word error rate, starting from a trigram model with a standard smoothing method up to complex state--of--the-- art language models: (1) We compare different smoothing methods, namely linear vs. absolute discounting, interpolation vs. backing-off, and back-off functions based on relative frequencies vs. singleton events. (2) We show the effect of complex language model techniques by using distant-trigrams and automatically selected word classes and word phrases using a maximum likelihood criterion (i.e. minimum perplexity). (3) We show the overall gain of the combined application of the above techniques, as opposed to their separate assessment in past publications. (4) We give perplexity and word error rate results on the North American Business corpus (NAB) with a training text of about 240 million words and on the German Verbmobil corpus.
The LIMSI 1998 Hub-4E Transcription System
- IN PROC. OF THE DARPA BROADCAST NEWS WORKSHOP
, 1999
"... In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 official Hub-4E training data containing about 150 hours of transcribed speech material. 65K word language models were obtained by interpolation of backoff n-gram language models trained on different text data sets. Prior to word decoding a maximum likelihood partitioning algorithm segments the data into homogenous regions and assigns gender, bandwidth and cluster labels to the speech segments. Word decoding is carried out in three steps, integrating cluster-based MLLR acoustic model adaptation. The final decoding step uses a 4-gram languagemodel interpolated with a category trigram model. The main differences compared to last year's system arise from the use of additional acoustic and lang...
Recent Advances in Transcribing Television and Radio Broadcasts
- Proc. Eurospeech '99
"... Transcription of broadcast news shows (radio and television) is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Broadcast shows are challenging to transcribe as they consist of a continuous data stream with segmen ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Transcription of broadcast news shows (radio and television) is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Broadcast shows are challenging to transcribe as they consist of a continuous data stream with segments of different linguistic and acoustic natures. Transcribing such data requires addressing two main problems: those related to the varied acoustic properties of the signal, and those related to the linguistic properties of the speech. Prior to word transcription, the data is partitioned into homogeneous acoustic segments. Non-speech segments are identified and rejected, and the speech segments are clustered and labeled according to bandwidth and gender. The speaker-independent large vocabulary, continuous speech recognizer makes use of n-gram statistics for language modeling and of continuous density HMMs with Gaussian mixtures for acoustic modeling. The LIMSI system has consistently obtain...

