Wrapper Induction for Information Extraction (1997)
Cached
Download Links
- [www.cs.ucd.ie]
- [ftp.cs.washington.edu]
- [www.cs.wisc.edu]
- [www.cs.wisc.edu]
- DBLP
Other Repositories/Bibliography
| Citations: | 460 - 30 self |
BibTeX
@MISC{Kushmerick97wrapperinduction,
author = {Nicholas Kushmerick},
title = {Wrapper Induction for Information Extraction},
year = {1997}
}
Years of Citing Articles
OpenURL
Abstract
The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Wrappers are often used for this purpose. A wrapper is a procedure for extracting a particular resource's content. Unfortunately, hand-coding wrappers is tedious. We introduce wrapper induction, a technique for automatically constructing wrappers. Our techniques can be described in terms of three main contributions. First, we pose the problem of wrapper construction as one of inductive learn...







