Results 1 - 10
of
27
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
- In Intl. World Wide Web Conf. (WWW
, 2003
"... In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Seg ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27 % performance improvement on Web Track dataset.
Towards Domain-Independent Information Extraction from Web Tables
- WORLD WIDE WEB CONFERENCE
, 2007
"... Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of tags. A multitude of different HTML implementations of web tables make these approaches difficult to scale. In this paper, we approach the ..."

