Results 1 -
4 of
4
Google’s WebTables and Deep Web Crawler
"... identify and deliver this otherwise inaccessible resource directly to end users. by Michael J. Cafarella, Alon Halevy, ..."
Abstract
- Add to MetaCart
identify and deliver this otherwise inaccessible resource directly to end users. by Michael J. Cafarella, Alon Halevy,
WebSets: Unsupervised Information Extraction approach to Extract Sets of Entities from the Web [Extended Abstract] ∗
"... We propose an unsupervised information extraction system, which exploits the structured information in the form of HTML tables to build meaningful sets of entities belonging to certain categories. Due to redundancy on the Web, we believe that entities belonging to important categories will frequentl ..."
Abstract
- Add to MetaCart
We propose an unsupervised information extraction system, which exploits the structured information in the form of HTML tables to build meaningful sets of entities belonging to certain categories. Due to redundancy on the Web, we believe that entities belonging to important categories will frequently co-occur in table columns. We present a clustering algorithm to cluster such frequently occurring entities
WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction
"... We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and conceptinstance pairs obtained with Hearst patterns. In contrast, our method rel ..."
Abstract
- Add to MetaCart
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and conceptinstance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs. 1.
AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables
"... We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust frame ..."
Abstract
- Add to MetaCart
We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework centred around collective disambiguation exploiting the prominence of entities, similarity between the context of the mention and its candidates, and the coherence among candidate entities for all mentions. We have developed a Web-based online interface for AIDA where different formats of inputs can be processed on the fly, returning proper entities and showing intermediate steps of the disambiguation process. 1.

