Results 1 - 10
of
48
Robust Grammatical Analysis for Spoken Dialogue Systems
- Natural Language Engineering
, 1997
"... We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of inform ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.
VERBMOBIL: The Use of Prosody in the Linguistic Components of a Speech Understanding System
, 2000
"... In this paper, we show how prosody can be used in speech understanding systems. This is demonstrated with the VERBMOBIL speech-to-speech translation system which, to our knowledge, is the first complete system which successfully uses prosodic information in the linguistic analysis. Prosody is used b ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
In this paper, we show how prosody can be used in speech understanding systems. This is demonstrated with the VERBMOBIL speech-to-speech translation system which, to our knowledge, is the first complete system which successfully uses prosodic information in the linguistic analysis. Prosody is used by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings.
Building a Tree-Bank of Modern Hebrew Text
, 2001
"... This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parse ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored.
FSA Utilities: A Toolbox to Manipulate Finite-state Automata
- Automata Implementation
, 1997
"... This paper describes the FSA Utilities toolbox: a collection of utilities to manipulate finite-state automata and finite-state transducers. Manipulations include determinization (both for finite-state acceptors and finite-state transducers), minimization, composition, complementation, intersection, ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
This paper describes the FSA Utilities toolbox: a collection of utilities to manipulate finite-state automata and finite-state transducers. Manipulations include determinization (both for finite-state acceptors and finite-state transducers), minimization, composition, complementation, intersection, Kleene closure, etc. Furthermore, various visualization tools are available to browse finite-state automata. The toolbox is implemented in SICStus Prolog.
A Comparison Of Time Conditioned And Word Conditioned Search Techniques For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditi ..."
Abstract
-
Cited by 19 (14 self)
- Add to MetaCart
In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditioned method is novel in the context of 20 000--word vocabulary recognition. We extend both methods to handle trigram language models in a one--pass strategy. Both methods were tested on a train schedule inquiry task (1 850 words, telephone speech) and on the North American Business (Nov.'94) development corpus (20 000 words).
The Thoughtful Elephant: Strategies for Spoken Dialog Systems
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2000
"... In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and fle ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human--human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe
A voice-controlled automatic telephone switchboard and directory information system
- Speech Communication
, 1997
"... The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room n ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party. In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database. We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19 % (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%. The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper. The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
Near-Miss Modeling: A Segment-Based Approach to Speech Recognition
, 1998
"... Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors and facilitate the incorporation of a wide range of modeling strategies. However, difficulties in segmentbased recognition have impeded the realization of potential advantages in modeling. This thesis
High Quality Word Graphs Using Forward-Backward Pruning
- In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1999
"... This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the timesynchronous lexical tree search method. Then, i ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the timesynchronous lexical tree search method. Then, in the second pass, this huge word graph is pruned by applying a modified forwardbackward algorithm. To analyze the characteristic properties of this word graph pruning method, we present a detailed comparison with the conventional time-synchronous forward pruning. The recognition experiments, carried out on the North American Business (NAB) 20 000-word task, demonstrate that, in comparison to the forward pruning, the new method leads to a significant reduction in the size of the word graph without an increase in the graph word error rate. 1. INTRODUCTION In this paper, we present a different approach to the word graph forward pruning technique [7]. This approach is based on the paradigm of...
Towards an Automated Directory Information System
- Proc. EUROSPEECH
, 1997
"... This paper describes a design and feasibility study for a large-scale automatic directory information system with a scalable architecture. The current demonstrator, called PADIS-XL 1, operates in realtime and handles a database of a medium-size German city with 130,000 listings. The system uses a ne ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper describes a design and feasibility study for a large-scale automatic directory information system with a scalable architecture. The current demonstrator, called PADIS-XL 1, operates in realtime and handles a database of a medium-size German city with 130,000 listings. The system uses a new technique of taking a combined decision on the joint probability over multiple dialogue turns, and a dialogue strategy that strives to restrict the search space more and more with every dialogue turn. During the course of the dialogue, the last name of the desired subscriber must be spelled out. The spelling recognizer permits continuous spelling and uses a context-free grammar to parse common spelling expressions. This paper describes the system architecture, our maximum a-posteriori (MAP) decision rule, the spelling grammar, and the dialogue strategy. We give results on the SPEECHDAT and SIETILL databases on recognition of first names by spelling and on jointly deciding on the spelled and the spoken name. In a 35,000-names setup, the joint decision reduced name-recognition errors by 31%. 1.

