Combining classifiers to identify online databases (2007)
Cached
Download Links
- [www.cs.utah.edu]
- [www.cs.utah.edu]
- [www2.research.att.com]
- DBLP
Other Repositories/Bibliography
| Venue: | In Proceedings of WWW |
| Citations: | 11 - 4 self |
BibTeX
@INPROCEEDINGS{Barbosa07combiningclassifiers,
author = {Luciano Barbosa},
title = {Combining classifiers to identify online databases},
booktitle = {In Proceedings of WWW},
year = {2007},
pages = {431--440}
}
OpenURL
Abstract
We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database domain D, our goal is to select from F only the forms that are entry points to databases in D. Having a set of Web forms that serve as entry points to similar online databases is a requirement for many applications and techniques that aim to extract and integrate hidden-Web information, such as meta-searchers, online database directories, hidden-Web crawlers, and form-schema matching and merging. We propose a new strategy that automatically and accurately classifies online databases based on features that can be easily extracted from Web forms. By judiciously partitioning the space of form features, this strategy allows the use of simpler classifiers that can be constructed using learning techniques that are better suited for the features of each partition. Experiments using real Web data in a representative set of domains show that the use of different classifiers leads to high accuracy, precision and recall. This indicates that our modular classifier composition provides an effective and scalable solution for classifying online databases.







