Robust Statistical Techniques for the Categorization of Images Using Associated Text (2003)
| Citations: | 2 - 0 self |
BibTeX
@TECHREPORT{Sable03robuststatistical,
author = {Carl Sable},
title = {Robust Statistical Techniques for the Categorization of Images Using Associated Text},
institution = {},
year = {2003}
}
OpenURL
Abstract
The field of text categorization, which aids applications such as browsing, filtering, and search, has experienced a revival due to the vast amounts of unlabeled data available on line and as part of digital collections. Almost all of the literature in the field, however, deals with the categorization of text-only documents. Many of the same techniques can be applied to text associated with multimedia docu-ments to label the multimedia component. My dissertation provides an in-depth exploration of the automatic categorization of images using associated text. This research takes advantage of a corpus I have created containing news documents with embedded captioned images and multiple sets of categories. It turns out that the text and categories associated with images tend to have different properties than those associated with full-length text documents such as e-mails, articles, and web pages. Also, images provide us with an additional type of information; namely, low-level image features. For these reasons, I have achieved success in several ar-eas of research that have previously been problematic, such as combining systems and using NLP techniques to improve performance. Some benefits of this work







