Results 1 -
7 of
7
A Speech-In List-Out Approach to Spoken User Interfaces
- in Proc. Human Language Technologies 2004
, 2004
"... Spoken user interfaces are conventionally either dialoguebased or menu-based. In this paper we propose a third approach, in which the task of invoking responses from the system is treated as one of retrieval from the set of all possible responses. Unlike conventional spoken user interfaces that retu ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Spoken user interfaces are conventionally either dialoguebased or menu-based. In this paper we propose a third approach, in which the task of invoking responses from the system is treated as one of retrieval from the set of all possible responses. Unlike conventional spoken user interfaces that return a unique response to the user, the proposed interface returns a shortlist of possible responses, from which the user must make the final selection. We refer to such interfaces as Speech-In List-Out or SILO interfaces. Experiments show that SILO interfaces can be very effective, are highly robust to degraded speech recognition performance, and can impose significantly lower cognitive load on the user as compared to menu-based interfaces.
Speech-based interactive information guidance system using question-answering technique
- In Proc. IEEE-ICASSP
, 2007
"... This paper addresses an interactive framework for information navigation based on document knowledge base. In conventional audio guidance systems, such as those deployed in museums, the information �ow is one-way and the content is �xed. In order to make an interactive guidance system, we propose th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper addresses an interactive framework for information navigation based on document knowledge base. In conventional audio guidance systems, such as those deployed in museums, the information �ow is one-way and the content is �xed. In order to make an interactive guidance system, we propose the application of question-answering (QA) techniques. Since users tend to use anaphoric expressions in successive questions, we investigate appropriate handling of contextual information based on topic detection, together with the effect of using N-best information in ASR output. Moreover, we apply the QA technique to generation of system-initiative information recommendation. A navigation system on Kyoto city information was implemented. Effectiveness of the proposed techniques was con�rmed through a �eld trial by a number of real novice users. Index Terms — spoken dialogue system, questionanswering, information guidance 1.
A discriminative HMM/n-gram-based retrieval approach for Mandarin spoken documents
- ACM Transactions on Asian Language Information Processing
, 2004
"... Statistical modeling approaches have been steadily gaining popularity in the field of information retrieval in recent years. This paper presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and different structures of this approach were extensi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Statistical modeling approaches have been steadily gaining popularity in the field of information retrieval in recent years. This paper presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and different structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with indexing features of word- and syllable-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. The information fusion of indexing features of word- and syllable-levels was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained. 1.
Spoken Document Understanding and Organization
- IEEE SIGNAL PROCESSING MAGAZINE
, 2005
"... Speech is the primary and most convenient means of communication between individuals [1]. In the future network era, the digital content over the network will include all the information activities for human life, from real-time information to knowledge archives, from working environments to private ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Speech is the primary and most convenient means of communication between individuals [1]. In the future network era, the digital content over the network will include all the information activities for human life, from real-time information to knowledge archives, from working environments to private services. Apparently, the most attractive form of the network content will be in multimedia, including speech information. Such speech information usually provides insight concerning the subjects, topics, and concepts of the multimedia content. As a result, the spoken documents associated with the network content will become key for retrieval and browsing. On the other hand, the rapid development of network and wireless technologies is making it possible for people to access the network content not only from the office/home, but from anywhere, at any time, via small handheld devices such as personal digital assistants (PDAs) or cell phones. Today, network access is primarily text based. The users enter the instructions by words or texts, and the network or search engine offers text materials from which the user can select. The users interact with the network or search engine and obtain the desired information via text-based media. In the future, it can be imagined that almost all such functions of text can also be performed with speech. The user’s instructions can be entered not only by text but possibly through speech as well since speech is a convenient user interface for a variety of user terminals, especially for small handheld devices. The network content may be indexed/retrieved and browsed not only by text but possibly also by the associated spoken documents as mentioned above. The users may also interact with the network or the search engine via either text-based media or spoken/multimodal dialogs. Text-to-speech synthesis can be used to transform the text information in the content into speech when required. This is the general environment of retrieval/browsing applications for multimedia content with associated spoken documents.
Mobile Information Access with Spoken Query Answering
- in COST278 Final Workshop on ”Applied Spoken Language Interaction in Distributed Environments" ASIDE
, 2005
"... This paper addresses the problem of information and service accessibility in mobile devices with limited resources. A solution is developed and tested through a prototype that applies state-of-the-art Distributed Speech Recognition (DSR) and knowledge-based Information Retrieval (IR) processing for ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper addresses the problem of information and service accessibility in mobile devices with limited resources. A solution is developed and tested through a prototype that applies state-of-the-art Distributed Speech Recognition (DSR) and knowledge-based Information Retrieval (IR) processing for spoken query answering. For the DSR part, a configurable DSR system is implemented on the basis of the ETSI-DSR advanced front-end and the SPHINX IV recognizer. For the knowledge-based IR part, a distributed system solution is developed for fast retrieval of the most relevant documents, with a text window focused over the part which most likely contains an answer to the query. The two systems are integrated into a full spoken query answering system. The prototype can answer queries and questions within the chosen football (soccer) test domain, but the system has the flexibility for being ported to other domains. 1.
Spoken versus Written Queries for Mobile Information Access
- Proceedings of the MobileHCI03 workshop on Mobile and Ubiquitous Information Access
, 2003
"... As Chinese is not alphabetic and the input of Chinese characters into computer is still a difficult and unsolved problem, voice retrieval of information becomes apparently an important application area of mobile information retrieval (IR). It is intuitive to think that users would speak more words a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
As Chinese is not alphabetic and the input of Chinese characters into computer is still a difficult and unsolved problem, voice retrieval of information becomes apparently an important application area of mobile information retrieval (IR). It is intuitive to think that users would speak more words and require less time when issuing queries vocally to an IR system than forming queries in writing. This paper presents some new findings derived from an experimental study on Mandarin Chinese to test this hypothesis and assesses the feasibility of spoken queries for search purposes. 1
A Speech Interface for Open-Domain Question-Answering
- In Proceedings of the 41st Annual Meeting of the ACL
, 2003
"... Speech interfaces to question-answering systems offer significant potential for finding information with phones and mobile networked devices. We describe a demonstration of spoken question answering using a commercial dictation engine whose language models we have customized to questions, a Web-base ..."
Abstract
- Add to MetaCart
Speech interfaces to question-answering systems offer significant potential for finding information with phones and mobile networked devices. We describe a demonstration of spoken question answering using a commercial dictation engine whose language models we have customized to questions, a Web-based textprediction interface allowing quick correction of errors, and an open-domain question-answering system, AnswerBus, which is freely available on the Web. We describe a small evaluation of the effect of recognition errors on the precision of the answers returned and make some concrete recommendations for modifying a question-answering system for improving robustness to spoken input.

