Results 11 -
15 of
15
SRILM at Sixteen: Update and Outlook
"... Abstract—We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. These developments include measures to make training from large data sets more efficient, to implement additional language modeling techniques (such as for adaptatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. These developments include measures to make training from large data sets more efficient, to implement additional language modeling techniques (such as for adaptation and smoothing), and for client/server operation. In addition, the functionality for lattice processing has been greatly expanded. We also highlight several external contributions and notable applications of the toolkit, and assess SRILM’s impact on the research community. I.
SRI's 1998 Broadcast News System -- Toward Faster, Better, Smaller Speech Recognition
- In Proceedings of the DARPA Broadcast News Workshop
, 1999
"... We describe several new research directions we investigated toward the development of our broadcast news transcription system for the 1998 DARPA H4 evaluations. Our goal was to develop significantly faster and smaller speech recognition systems without degrading the word error rate of our 1997 syste ..."
Abstract
- Add to MetaCart
We describe several new research directions we investigated toward the development of our broadcast news transcription system for the 1998 DARPA H4 evaluations. Our goal was to develop significantly faster and smaller speech recognition systems without degrading the word error rate of our 1997 system. We did this through significant algorithmic research creating various new techniques. A sample of these techniques was used to put together our 1998 broadcast news system, which is conceptually much simpler, faster, and smaller, but gives the same word error rate as our 1997 system. In particular, our 1998 system is based on a simple phonetically tied mixture (PTM) model with a total of only 13,000 Gaussians, as compared to a 67,000-Gaussian state-clustered system we used in 1997. 1. Introduction One of our main goals in 1998 was to significantly increase speed and decrease model size, while maintaining or improving accuracy. These goals are difficult to achieve simultaneously because o...
DynaSpeak: SRI's Scalable Speech Recognizer for
- in Proceedsings of HLT
, 2002
"... We introduce SRI's new speech recognition engine, , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on i ..."
Abstract
- Add to MetaCart
We introduce SRI's new speech recognition engine, , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.
Statistical Lattice-Based Spoken Document Retrieval
"... Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by ..."
Abstract
- Add to MetaCart
Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for lattice-based spoken document retrieval based on a statistical n-gram modeling approach to information retrieval. In this statistical lattice-based retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattice-based retrieval method based on the Okapi BM25 model.
Joint Decoding for Speech Recognition and Semantic Tagging
"... Most conversational understanding (CU) systems today employ a cascade approach, where the best hypothesis from automatic speech recognizer (ASR) is fed into spoken language understanding (SLU) module, whose best hypothesis is then fed into other systems such as interpreter or dialog manager. In such ..."
Abstract
- Add to MetaCart
Most conversational understanding (CU) systems today employ a cascade approach, where the best hypothesis from automatic speech recognizer (ASR) is fed into spoken language understanding (SLU) module, whose best hypothesis is then fed into other systems such as interpreter or dialog manager. In such approaches, errors from one statistical module irreversibly propagates into another module causing a serious degradation in the overall performance of the conversational understanding system. Thus it is desirable to jointly optimize all the statistical modules together. As a first step towards this, in this paper, we propose a joint decoding framework in which we predict the optimal word as well as slot (semantic tag) sequence jointly given the input acoustic stream. On Microsoft’s CU system, we show 1.3 % absolute reduction in word error rate (WER) and 1.2% absolute improvement in F measure for slot prediction when compared to a very strong cascade baseline comprising of the state-of-the-art recognizer followed by a slot sequence tagger.

