Results 1 - 10
of
32
Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval
- In First International Conference on Human Language Technologies
, 2000
"... We describe a system which supports English text queries searching for Mandarin Chinese spoken documents. ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
We describe a system which supports English text queries searching for Mandarin Chinese spoken documents.
A system for spoken query information retrieval on mobile devices
- IEEE TRANS. SPEECH AND AUDIO PROC
, 2002
"... With the proliferation of handheld devices, information access on mobile devices is a topic of growing relevance. This paper presents a system that allows the user to search for information on mobile devices using spoken natural-language queries. We explore several issues related to the creation of ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
With the proliferation of handheld devices, information access on mobile devices is a topic of growing relevance. This paper presents a system that allows the user to search for information on mobile devices using spoken natural-language queries. We explore several issues related to the creation of this system, which combines state-of-the-art speech-recognition and information-retrieval technologies. This is the first work that we are aware of which evaluates spoken query based information retrieval on a commonly available and well researched text database, the Chinese news corpus used in National Institute of Standards and Technology (NIST)’s TREC-5 and TREC-6 benchmarks. To compare spoken-query retrieval performance for different relevant scenarios and recognition accuracies, the benchmark queries—read verbatim by 20 speakers—were recorded simultaneously through three channels: headset microphone, PDA microphone, and cellular phone. Our results show that for mobile devices with high-quality microphones, spoken-query retrieval based on existing technologies yields retrieval precisions that come close to that for perfect text input (mean average precision 0.459 and 0.489, respectively, on TREC-6).
Searching the Web by Voice
- Proc. Conf. Computational Linguistics (COLING
, 2002
"... Spoken queries are a natural medium for searching the Web in settings where typing on a keyboard is not practical. This paper describes a speech interface to the Google search engine. We present experiments with various statistical language models, concluding that a unigram model with collocations p ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Spoken queries are a natural medium for searching the Web in settings where typing on a keyboard is not practical. This paper describes a speech interface to the Google search engine. We present experiments with various statistical language models, concluding that a unigram model with collocations provides the best combination of broad coverage, predictive power, and real-time performance. We also report accuracy results of the prototype system.
Multi-scale audio indexing for translingual spoken document retrieval
- In Proceedings of International Conference on Acoustics, Speech, and Signal Processing
, 2001
"... MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advo ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) are used in retrieval. The use of subword units can complement the word unit in handling the problems of Chinese word tokenization ambiguity, Chinese homophone ambiguity, and out-ofvocabulary words in audio indexing. This paper focuses on multi-scale audio indexing in MEI. Experiments are based on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3), where we indexed Voice of America Mandarin news broadcasts by speech recognition on both the word and subword scales. In this paper, we discuss the development of the MEI syllable recognizer, the representations of spoken documents using overlapping subword n-grams and lattice structures. Results show that augmenting words with subwords is beneficial to CL-SDR performance. 1.
Spoken Document Understanding and Organization
- IEEE SIGNAL PROCESSING MAGAZINE
, 2005
"... Speech is the primary and most convenient means of communication between individuals [1]. In the future network era, the digital content over the network will include all the information activities for human life, from real-time information to knowledge archives, from working environments to private ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Speech is the primary and most convenient means of communication between individuals [1]. In the future network era, the digital content over the network will include all the information activities for human life, from real-time information to knowledge archives, from working environments to private services. Apparently, the most attractive form of the network content will be in multimedia, including speech information. Such speech information usually provides insight concerning the subjects, topics, and concepts of the multimedia content. As a result, the spoken documents associated with the network content will become key for retrieval and browsing. On the other hand, the rapid development of network and wireless technologies is making it possible for people to access the network content not only from the office/home, but from anywhere, at any time, via small handheld devices such as personal digital assistants (PDAs) or cell phones. Today, network access is primarily text based. The users enter the instructions by words or texts, and the network or search engine offers text materials from which the user can select. The users interact with the network or search engine and obtain the desired information via text-based media. In the future, it can be imagined that almost all such functions of text can also be performed with speech. The user’s instructions can be entered not only by text but possibly through speech as well since speech is a convenient user interface for a variety of user terminals, especially for small handheld devices. The network content may be indexed/retrieved and browsed not only by text but possibly also by the associated spoken documents as mentioned above. The users may also interact with the network or the search engine via either text-based media or spoken/multimodal dialogs. Text-to-speech synthesis can be used to transform the text information in the content into speech when required. This is the general environment of retrieval/browsing applications for multimedia content with associated spoken documents.
Query-by-example spoken term detection using phonetic posteriorgram templates
- in Proc. ASRU
, 2009
"... Abstract—This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of usin ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus. I.
OPEN-VOCABULARY SPOKEN TERM DETECTION USING GRAPHONE-BASED HYBRID RECOGNITION SYSTEMS
"... We address the problem of retrieving out-of-vocabulary (OOV) words/queries from audio archives for spoken term detection (STD) task. Many STD systems use the output of an automatic speech recognition (ASR) system which has a limited and fixed vocabulary, and are not capable of detecting rare words o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We address the problem of retrieving out-of-vocabulary (OOV) words/queries from audio archives for spoken term detection (STD) task. Many STD systems use the output of an automatic speech recognition (ASR) system which has a limited and fixed vocabulary, and are not capable of detecting rare words of high information content, such as named entities. Since such words are often of great interest for a retrieval task it is important to index spoken archives in a way that allows a user to search an OOV query/term. 1 In this work, we employ hybrid recognition systems which contain both words and subword units (graphones) to generate hybrid lattice indexes. We use a word-based STD system as our baseline, and present improvements by employing our proposed hybrid STD system that uses words plus graphones on the English broadcast news genre of the 2006 NIST STD task. Index Terms — spoken term detection, audio indexing, voice search, open vocabulary 1.
Retrieval and browsing of spoken content
- IEEE Signal Processing Mag
, 2008
"... [A discussion of the technical issues involved in developing information retrieval systems for the spoken word] © IMAGESTATE Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produce ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
[A discussion of the technical issues involved in developing information retrieval systems for the spoken word] © IMAGESTATE Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on one’s own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures.
Vocabulary-independent search in spontaneous speech
- In Proceedings of ICASSP
, 2004
"... For efficient organization of speech recordings – meetings, interviews, ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
For efficient organization of speech recordings – meetings, interviews,
Phonetic Confusion Based Document Expansion for Spoken Document Retrieval
, 2004
"... This paper presents a phone-based approach of spoken document retrieval (SDR), developed in the framework of the emerging MPEG-7 standard. We describe an indexing and retrieval system that uses phonetic information only. The retrieval method is based on the vector space IR model, using phone N- ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a phone-based approach of spoken document retrieval (SDR), developed in the framework of the emerging MPEG-7 standard. We describe an indexing and retrieval system that uses phonetic information only. The retrieval method is based on the vector space IR model, using phone N-grams as indexing terms. We propose a technique to expand the representation of documents by means of phone confusion probabilities in order to improve the retrieval performance. This method is tested on a collection of short German spoken documents, using 10 city names as queries.

