Results 21 - 30
of
117
VERBMOBIL: The Use of Prosody in the Linguistic Components of a Speech Understanding System
, 2000
"... In this paper, we show how prosody can be used in speech understanding systems. This is demonstrated with the VERBMOBIL speech-to-speech translation system which, to our knowledge, is the first complete system which successfully uses prosodic information in the linguistic analysis. Prosody is used b ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
In this paper, we show how prosody can be used in speech understanding systems. This is demonstrated with the VERBMOBIL speech-to-speech translation system which, to our knowledge, is the first complete system which successfully uses prosodic information in the linguistic analysis. Prosody is used by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings.
Estimating dependency structure as a hidden variable
- In NIPS
, 1998
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spann ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms based on the EM and the Minimum Spanning Tree algorithms that learn mixtures of trees in the ML framework. The method can be extended to take into account priors and, for a wide class of priors that includes the Dirichlet and the MDL priors, it preserves its computational efficiency. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification. Finally, we show that a single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes.
Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies
- COMPUTATIONAL LINGUISTICS
, 2003
"... ..."
Generalized Algorithms for Constructing Statistical Language Models
- IN PROC. OF THE 41ST MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2003
"... Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demon ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representations -gram language models by weighted automata whose size is practical for offline use even for a vocabulary size of about 500,000 words and an n-gram order n = 6; and present a simple and more general technique for constructing class-based language models that allows each class to represent an arbitrary weighted automaton. An efficient implementation of our algorithms and techniques has been incorporated in a general software library for language modeling, the GRM Library, that includes many other text and grammar processing functionalities.
LoPar: Design and Implementation
, 2000
"... This report is mainly a documentation of the parser implementation and the underlying theoretical concepts and not a manual for the LoPar program. For the latter, the reader is referred to the online manual pages. ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This report is mainly a documentation of the parser implementation and the underlying theoretical concepts and not a manual for the LoPar program. For the latter, the reader is referred to the online manual pages.
Statistical Cross-Language Information Retrieval using N-Best Query Translations
, 2002
"... This paper presents a novel statistical model for crosslanguage information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable t ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
This paper presents a novel statistical model for crosslanguage information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N = 1, 5, 10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaigns.
The Thoughtful Elephant: Strategies for Spoken Dialog Systems
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2000
"... In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and fle ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human--human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe
A voice-controlled automatic telephone switchboard and directory information system
- Speech Communication
, 1997
"... The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room n ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party. In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database. We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19 % (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%. The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper. The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
Efficient Sampling and Feature Selection in Whole Sentence Maximum Entropy Language Models
"... Conditional Maximum Entropy models have been successfully ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Conditional Maximum Entropy models have been successfully
Multilingual Stochastic N-Gram Class Language Models
, 1996
"... Stochastic language models are widely used in continuous speech recognition systems where a priori probabilites of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n increases, it becomes harder to find reliable stat ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Stochastic language models are widely used in continuous speech recognition systems where a priori probabilites of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n increases, it becomes harder to find reliable statistics, even with huge texts. Grouping words is a way to overcome this problem. We have developed an automatic language independant classification procedure, which is able to optimize the classification of tens of millions of untagged words in less than a few hours on a Unix workstation. With this language independent approach, three corpora each containing about 30 million words of newspaper texts, in French, German and English, have been mapped into different numbers of classes. From these classifications, bi-gram and tri-gram class language models have been built. The perplexities of held-out test texts have been assessed, showing that tri-gram class models give lower values than those ob...

