Results 1 -
2 of
2
Automatic Detection of Text Genre
, 1997
"... As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a th ..."
Abstract
-
Cited by 112 (0 self)
- Add to MetaCart
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a th
Abstract Using Visual Features for Fine-Grained Genre Classification of Web Pages
"... The field of automatic genre classification has primarily focused on extracting textual features from documents. The goal of this research is to investigate whether visual features of HTML web pages can improve the classification of fine-grained genres. Intuitively it seems that this should be helpf ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The field of automatic genre classification has primarily focused on extracting textual features from documents. The goal of this research is to investigate whether visual features of HTML web pages can improve the classification of fine-grained genres. Intuitively it seems that this should be helpful and the challenge is to extract those visual features that capture the layout characteristics of the genres. A corpus of Web pages from different e-commerce sites was generated and manually classified into several genres. Three different sets of features were compared- one with just textual features, one with HTML level features added, and a third with visual features added. Our experiments confirm that using HTML features and particularly URL address features can improve classification beyond using textual features alone. We also show that adding visual features can be useful for further improving fine-grained genre classification.

