Results 1 - 10
of
10
Statistical Language Modeling Using The Cmu-Cambridge Toolkit
, 1997
"... The CMU Statistical Language Modeling toolkit was released in 1994 in order to facilitate the construction and testing of bigram and trigram language models. It is currently in use in over 40 academic, government and industrial laboratories in over 12 countries. This paper presents a new version of ..."
Abstract
-
Cited by 264 (3 self)
- Add to MetaCart
The CMU Statistical Language Modeling toolkit was released in 1994 in order to facilitate the construction and testing of bigram and trigram language models. It is currently in use in over 40 academic, government and industrial laboratories in over 12 countries. This paper presents a new version of the toolkit. We outline the conventional language modeling technology, as implemented in the toolkit, and describe the extra efficiency and functionality that the new toolkit provides as compared to previous software for this task. Finally,we give an example of the use of the toolkit in constructing and testing a simple language model.
Augmenting Naive Bayes Classifiers with Statistical Language Models
, 2003
"... We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier
Dialogos: A Robust System for Human-Machine Spoken Dialogue on the Telephone”, these Proceedings
"... This paper presents Dialogos, a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions both to users which get good recognitio ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper presents Dialogos, a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions both to users which get good recognition performance and to the ones which get lower scores. The robust behavior of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows to deal with partial or total breakdowns of the different levels of analysis. We report the field trial data of the system and the evaluation results of the overall system and of the submodules. 1.
Word Triggers and the EM Algorithm
- In Proceedings of the Workshop Computational Natural Language Learning (CoNLL 97
, 1997
"... In this paper, we study the use of so-called word trigger pairs to improve an existing language model, which is typically a trigram model in combination with a cache component. A word trigger pair is defined as a long-distance word pair. We present two methods to select the most significant s ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In this paper, we study the use of so-called word trigger pairs to improve an existing language model, which is typically a trigram model in combination with a cache component. A word trigger pair is defined as a long-distance word pair. We present two methods to select the most significant single word trigger pairs. The selected trigger pairs are used in a com- bined model where the interpolation parameters and trigger interaction parameters are trained by the EM algorithm.
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
, 1997
"... This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1 ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1
Statistical Modelling in Continuous Speech Recognition (CSR)
- IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently
Statistical approach to the semantic analysis of spoken dialogues
, 2007
"... Spoken speech is a basic, comfortable, and easy to use mean of communication among
humans. We are taught to the art of speaking and understanding when we are small
children and from that time we consider it as an effortless activity. However, it is a very
complex process because it starts with prepar ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Spoken speech is a basic, comfortable, and easy to use mean of communication among
humans. We are taught to the art of speaking and understanding when we are small
children and from that time we consider it as an effortless activity. However, it is a very
complex process because it starts with preparation of a message in a speaker’s mind. Then,
it continues with transmission of the idea as an acoustic signal. And finally, it ends with
recognition of the acoustic signal by a listener including understanding (interpreting) the
message.
Topic Tracking in a News Stream
- In Proceedings of DARPA Broadcast News Workshop
, 1999
"... In this paper we describe a Topic Tracking system based on unigram models, submitted by Dragon Systems in the December 1998 Topic Detection and Tracking (TDT) Evaluation. We focus on the most recent developments, including improvements in the smoothing of sparse unigram models, a better discriminato ..."
Abstract
- Add to MetaCart
In this paper we describe a Topic Tracking system based on unigram models, submitted by Dragon Systems in the December 1998 Topic Detection and Tracking (TDT) Evaluation. We focus on the most recent developments, including improvements in the smoothing of sparse unigram models, a better discriminator, and the implementation of unsupervised adaptation. We give results on the default test conditions, namely, tracking in newswire and automatically recognized broadcast given four story samples, as well as several variations: one story sample, automatically recognized broadcast only, and automatically recognized broadcast with automatically determined story boundaries. Finally, we show the effect of interpolating this system with Dragon's other tracking system based on a BetaBinomial model. 1. INTRODUCTION The DARPA Topic Detection and Tracking (TDT) program is concerned with the development of information processing technology that can applied to large streams of data, such as newswire a...
Word Triggers and the EM Algorithm
- In Proceedings of the Workshop Computational Natural Language Learning (CoNLL 97
, 1997
"... In this paper, we study the use of so-called word trigger pairs to improve an existing language model, which is typically a trigram model in combination with a cache component. A word trigger pair is defined as a long-distance word pair. We present two methods to select the most significant s ..."
Abstract
- Add to MetaCart
In this paper, we study the use of so-called word trigger pairs to improve an existing language model, which is typically a trigram model in combination with a cache component. A word trigger pair is defined as a long-distance word pair. We present two methods to select the most significant single word trigger pairs. The selected trigger pairs are used in a combined model where the interpolation parameters and trigger interaction parameters are trained by the EM algorithm.
The Design Principles and Algorithms of a General Weighted Grammar Library
"... We present the software design principles, algorithms, and utilities of a general weighted grammar library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. Several of the algorithms and utilities of this library are described, including in ..."
Abstract
- Add to MetaCart
We present the software design principles, algorithms, and utilities of a general weighted grammar library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. Several of the algorithms and utilities of this library are described, including in some cases their pseudocodes and pointers to their use in applications. The algorithms and the utilities were designed to support a wide variety of semirings and the representation and use of large grammars and automata of several hundred million rules or transitions.

