MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Wrapper Induction for Information Extraction (1997) [408 citations — 28 self]

Abstract:

The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Wrappers are often used for this purpose. A wrapper is a procedure for extracting a particular resource's content. Unfortunately, hand-coding wrappers is tedious. We introduce wrapper induction, a technique for automatically constructing wrappers. Our techniques can be described in terms of three main contributions. First, we pose the problem of wrapper construction as one of inductive learn...

Citations

1328 A theory of the learnable – Valiant - 1984
525 Learnability and the Vapnik-Chervonenkis dimension – Blumer, Ehrenfeucht, et al. - 1989
422 An Introduction to Computational Learning Theory – Kearns, Vazirani - 1994
408 The TSIMMIS project: Integration of heterogeneous information sources – Chawathe, Garcia-Molina, et al.
365 Learning Regular Sets from Queries and Counterexamples – Angluin - 1987
272 A Scalable Comparison-Shopping Agent for the World Wide Web – Doorenbos, Etzioni, et al. - 1997
265 A survey of inductive inference: Theory and methods – Smith - 1983
261 A softbot-based interface to the Internet – Etzioni, Weld - 1994
200 Introduction to Software Agents – BradShaw - 1997
184 Middle-agents for the internet – Decker, Sycara, et al. - 1997
179 Learning from noisy examples – Angluin, Laird - 1988
170 Query caching and optimization in distributed mediator systems – Adali, Candan, et al. - 1996
150 Resource integration using a large knowledge base in Carnot – Huhns, Collet, et al. - 1991
140 The Information Manifold – Kirk, Levy, et al. - 1995
122 Inference of reversible languages – Angluin - 1982
106 Semi-automatic wrapper generation for Internet information sources – Ashish, Knoblock
103 A comparative review of selected methods for learning from examples – Dietterich - 1983
91 Migrating Legacy System – Brodie, Stonebraker - 1995
90 Towards heterogeneous multimedia information systems: The garlic approach – CAREY, HAAS, et al. - 1995
86 KQML–A language and protocol for knowledge and information exchange – Finin, Fritzson - 1994
68 Computational learning theory: Survey and selected bibliography – Angluin - 1992
48 The world wide web: Quagmire or gold mine – Etzioni - 1996
39 Intelligence without robots (a reply to brooks – Etzioni - 1993
38 Learnability by fixed distributions – Benedek, Itai - 1988
30 Building softbots for UNIX (preliminary report – Etzioni, Lesh, et al. - 1993
29 The Constraint-Based Knowledge Broker Model: Semantics, Implementation and Analysis – Andreoli, Borghoff, et al. - 1996
27 Learning to query the web – Cohen, Singer - 1996
26 Getty’s synoname and its cousins: A survey of applications of personal name-matching algorithms – Borgman, Siegfried - 1992
21 Using natural language processing for identifying and interpreting tables in texts – Douglas, Hurst, et al. - 1995
19 Towards Sophisticated Wrapping of Web-based Information Repositories – Chidlovskii, Borghoff, et al. - 1997
16 On learning from noisy and incomplete examples – Decatur, Gennaro - 1995
12 Investigating the distributional assumptions of the pac learning model – Bartlett, Williamson - 1991
12 On the synthesis of finite state machines from samples of their behavior – Biermann, Feldman - 1972
12 Moving up the information food chain: softbots as information carnivores – Etzioni - 1996
7 Quantifying inductive bias – Haussler - 1988
6 6th Message Understanding Conference – Proc - 1995
5 Database Language SQL – ANSI - 1986
5 Wrapper generation for semi-structured information sources – Asish, Knoblock - 1997
4 SIMS: Single interface to multiple sources – Arens, Knoblock, et al. - 1996
2 Wrapper Construction for Information Extraction – Kushmerick - 1997
2 Data Reverse Engineering: Staying the Legacy Dragon – Aiken - 1995
2 6th Message Understanding Conf – Proc - 1995
2 Scalable Internet discovery: Research problems and approaches – Bowman, Danzig, et al. - 1994
1 Layout and language: Lists and tables in tehcnical documents – Douglas, Hurst - 1996