• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Structural classifiers of text types: Towards a novel model of text representation. LDV Forum – Zeitschrift für Computerlinguistik und Sprachtechnologie (2007)

by A Mehler, P Geibel, O Pustylnikov
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

Chaker Jebari A New Centroid-based Approach for Genre Categorization of Web Pages

by unknown authors
"... In this paper we propose a new centroid-based approach for genre categorization of web pages. Our approach constructs genre centroids using a set of genre-labeled web pages, called training web pages. The obtained centroids will be used to classify new web pages. The aim of our approach is to provid ..."
Abstract - Add to MetaCart
In this paper we propose a new centroid-based approach for genre categorization of web pages. Our approach constructs genre centroids using a set of genre-labeled web pages, called training web pages. The obtained centroids will be used to classify new web pages. The aim of our approach is to provide a flexible, incremental, refined and combined categorization, which is more suitable for automatic web genre identification. Our approach is flexible because it assigns a web page to all predefined genres with a confidence score; it is incremental because it classifies web pages one by one; it is refined because each web page either refines the centroids or is discarded as noisy page; finally, our approach combines three different feature sets, i.e. URL addresses, logical structure and hypertext structure. The experiments conducted on two known corpora show that our approach is very fast and outperforms other approaches. 1

F. JUNGERMANN

by Angewandte Kognitionswissenschaft, Abteilung Informatik, Angewandte Kognitionswissenschaft, Angela Klutsch, Marc Lechtenfeld, Nino Simunic, Universität Duisburg-essen, Universität Duisburg-essen, Universität Duisburg-essen, G. Büchel, M. Burghardt C. Wolff, F. Fritzinger, M. Kisselew, U. Heid, A. Madsack, H. Schmid, K. Ignatova, C. Toprak, D. Bernhard, I. Gurevych, A. Mehler
"... »Sprachtechnologie und eHumanities« 26.02.2009- 27.02.2009 UNIVERSITÄT DUISBURG-ESSEN, CAMPUS DUISBURG PROGRAMMKOMITEE UND ORGANISATION ..."
Abstract - Add to MetaCart
»Sprachtechnologie und eHumanities« 26.02.2009- 27.02.2009 UNIVERSITÄT DUISBURG-ESSEN, CAMPUS DUISBURG PROGRAMMKOMITEE UND ORGANISATION
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University