Automatic Sentence Structure Annotation for Spoken Language Processing (2008)
| Citations: | 1 - 0 self |
BibTeX
@MISC{Hillard08automaticsentence,
author = {Dustin Lundring Hillard},
title = {Automatic Sentence Structure Annotation for Spoken Language Processing},
year = {2008}
}
OpenURL
Abstract
Increasing amounts of easily available electronic data are precipitating a need for automatic processing that can aid humans in digesting large amounts of data. Speech and video are becoming an increasingly significant portion of on-line information, from news and television broadcasts, to oral histories, on-line lectures, or user generated content. Automatic processing of audio and video sources requires automatic speech recognition (ASR) in order to provide transcripts. Typical ASR generates only words, without punctuation, capitalization, or further structure. Many techniques available from natural language processing therefore suffer when applied to speech recognition output, because they assume the presence of reliable punctuation and structure. In addition, errors from automatic transcription also degrade the performance of downstream processing such as machine translation, name detection, or information retrieval. We develop approaches for automatically annotating structure in speech, including sentence and sub-sentence segmentation, and then turn towards optimizing ASR and annotation for downstream applications.







