Results 1 -
3 of
3
Position specific posterior lattices for indexing speech
- In Proceedings of ACL, Ann Arbor
, 2005
"... The paper presents the Position Specific Posterior Lattice, a novel representation of automatic speech recognition lattices that naturally lends itself to efficient indexing of position information and subsequent relevance ranking of spoken documents using proximity. In experiments performed on a co ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The paper presents the Position Specific Posterior Lattice, a novel representation of automatic speech recognition lattices that naturally lends itself to efficient indexing of position information and subsequent relevance ranking of spoken documents using proximity. In experiments performed on a collection of lecture recordings — MIT iCampus data — the spoken document ranking accuracy was improved by 20 % relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer. The Mean Average Precision (MAP) increased from 0.53 when using 1-best output to 0.62 when using the new lattice representation. The reference used for evaluation is the output of a standard retrieval engine working on the manual transcription of the speech collection. Albeit lossy, the PSPL lattice is also much more compact than the ASR 3-gram lattice from which it is computed — which translates in reduced inverted index size as well — at virtually no degradation in word-error-rate performance. Since new paths are introduced in the lattice, the OR-ACLE accuracy increases over the original ASR lattice. 1
Web Derived Pronunciations for Spoken Term Detection Dogan Can ∗ Bo˘gaziçi University
"... Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of applications, from customer analytics to on-line media search. For most retrieval applications, the speech content is typically firs ..."
Abstract
- Add to MetaCart
Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of applications, from customer analytics to on-line media search. For most retrieval applications, the speech content is typically first converted to a lexical or phonetic representation using automatic speech recognition (ASR). The first step in searching through indexes built on these representations is the generation of pronunciations for named entities and foreign language query terms. This paper summarizes the results of the work conducted during the 2008 JHU Summer Workshop by the Multilingual Spoken Term Detection team, on mining the web for pronunciations and analyzing their impact on spoken term detection. We will first present methods to use the vast amount of pronunciation information available on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic representations from ad-hoc transcriptions. We then present an analysis of the effectiveness of using these pronunciations to represent Out-Of-Vocabulary (OOV) query terms on the performance of a spoken term detection (STD) system. We will provide comparisons of Web pronunciations against automated techniques for pronunciation generation as well as pronunciations generated by human experts. Our results Authors listed in alphabetical order
A Critical Assessment of Spoken Utterance Retrieval through Approximate Lattice Representations
"... This paper compares the performance of position-specific posterior lattices (PSPL) and confusion networks applied to spoken utterance retrieval, and tests these recent proposals against several baselines in two disparate domains. These lossy methods provide compact representations that generalize th ..."
Abstract
- Add to MetaCart
This paper compares the performance of position-specific posterior lattices (PSPL) and confusion networks applied to spoken utterance retrieval, and tests these recent proposals against several baselines in two disparate domains. These lossy methods provide compact representations that generalize the original segment lattices and provide greater recall and robustness, but have yet to be evaluated against each other in multiple WER conditions for spoken utterance retrieval. Our comparisons suggest that while PSPL and confusion networks have comparable recall, the former is slightly more precise, although its merit appears to be coupled to the assumptions of low-frequency search queries and low-WER environments.

