Results 1 -
4 of
4
WysiWyg Web Wrapper Factory (W4F
- Proceedings of WWW Conference
, 1999
"... In this paper, we present the W4F toolkit for the generation of wrappers for Web sources. W4F consists of a retrieval language to identify Web sources, a declarative extraction language (the HTML Extraction Language) to express robust extraction rules and a mapping interface to export the extracted ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
In this paper, we present the W4F toolkit for the generation of wrappers for Web sources. W4F consists of a retrieval language to identify Web sources, a declarative extraction language (the HTML Extraction Language) to express robust extraction rules and a mapping interface to export the extracted information into some userde ned data-structures. To assist the user and make the creation of wrappers rapid and easy, the toolkit o ers some wysiwyg support via some wizards. Together, they permit the fast and semi-automatic generation of ready-to-go wrappers provided as Java classes. W4F has been successfully used to generate wrappers for database systems and software agents, making the content of Web sources easily accessible to any kind of application. Keywords: Web wrapper, information extraction, HTML parsing, HTML to XML conversion.
Automated Meta-Data Extraction for Confsearch Semester Thesis
, 2011
"... Extracting meta-data from websites is an open eld and up till now there exists no satisfying solution for extracting important dates (e.g. the Paper Submission Deadline) from conference websites. We present an automated way to extract the meta-data of an academic conference from its website. We aim ..."
Abstract
- Add to MetaCart
Extracting meta-data from websites is an open eld and up till now there exists no satisfying solution for extracting important dates (e.g. the Paper Submission Deadline) from conference websites. We present an automated way to extract the meta-data of an academic conference from its website. We aim to facilitate the manual update of such data on the conference directory Confsearch

