Subword-based Approaches for Spoken Document Retrieval (2000)
| Citations: | 40 - 0 self |
BibTeX
@MISC{Ng00subword-basedapproaches,
author = {Kenney Ng},
title = {Subword-based Approaches for Spoken Document Retrieval},
year = {2000}
}
Years of Citing Articles
OpenURL
Abstract
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four







