Acoustic Model Clustering Based on Syllable Structure (2002)
Cached
Download Links
- [ssli.ee.washington.edu]
- [www.research.att.com]
- DBLP
Other Repositories/Bibliography
| Citations: | 2 - 0 self |
BibTeX
@MISC{Shafran02acousticmodel,
author = {Izhak Shafran and Mari Ostendorf},
title = {Acoustic Model Clustering Based on Syllable Structure},
year = {2002}
}
OpenURL
Abstract
Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are not being captured in the current acoustic models. Such variation may be modeled using a broader definition of context than in traditional systems which restrict context to be the neighboring phonemes. In this paper, we study the use of word- and syllable-level context conditioning in recognizing conversational speech. We describe a method to extend standard tree-based clustering to incorporate a large number of features, and we report results on the Switchboard task which indicate that syllable structure outperforms pentaphones and incurs less computational cost. It has been hypothesized that previous work in using syllable models for recognition of English was limited because of ignoring the phenomenon of re-syllabification (change of syllable structure at word boundaries), but our analysis shows that accounting for re-syllabification does not impact recognition performance.







