Results 1 -
2 of
2
Chaker Jebari A New Centroid-based Approach for Genre Categorization of Web Pages
"... In this paper we propose a new centroid-based approach for genre categorization of web pages. Our approach constructs genre centroids using a set of genre-labeled web pages, called training web pages. The obtained centroids will be used to classify new web pages. The aim of our approach is to provid ..."
Abstract
- Add to MetaCart
In this paper we propose a new centroid-based approach for genre categorization of web pages. Our approach constructs genre centroids using a set of genre-labeled web pages, called training web pages. The obtained centroids will be used to classify new web pages. The aim of our approach is to provide a flexible, incremental, refined and combined categorization, which is more suitable for automatic web genre identification. Our approach is flexible because it assigns a web page to all predefined genres with a confidence score; it is incremental because it classifies web pages one by one; it is refined because each web page either refines the centroids or is discarded as noisy page; finally, our approach combines three different feature sets, i.e. URL addresses, logical structure and hypertext structure. The experiments conducted on two known corpora show that our approach is very fast and outperforms other approaches. 1
F. JUNGERMANN
"... »Sprachtechnologie und eHumanities« 26.02.2009- 27.02.2009 UNIVERSITÄT DUISBURG-ESSEN, CAMPUS DUISBURG PROGRAMMKOMITEE UND ORGANISATION ..."
Abstract
- Add to MetaCart
»Sprachtechnologie und eHumanities« 26.02.2009- 27.02.2009 UNIVERSITÄT DUISBURG-ESSEN, CAMPUS DUISBURG PROGRAMMKOMITEE UND ORGANISATION

