• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Annotating and Searching Web Tables Using Entities, Types and Relationships

by Girija Limaye, Sunita Sarawagi, Soumen Chakrabarti
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

Google’s WebTables and Deep Web Crawler

by Jayant Madhavan
"... identify and deliver this otherwise inaccessible resource directly to end users. by Michael J. Cafarella, Alon Halevy, ..."
Abstract - Add to MetaCart
identify and deliver this otherwise inaccessible resource directly to end users. by Michael J. Cafarella, Alon Halevy,

WebSets: Unsupervised Information Extraction approach to Extract Sets of Entities from the Web [Extended Abstract] ∗

by Bhavana Dalvi, William Cohen, Jamie Callan
"... We propose an unsupervised information extraction system, which exploits the structured information in the form of HTML tables to build meaningful sets of entities belonging to certain categories. Due to redundancy on the Web, we believe that entities belonging to important categories will frequentl ..."
Abstract - Add to MetaCart
We propose an unsupervised information extraction system, which exploits the structured information in the form of HTML tables to build meaningful sets of entities belonging to certain categories. Due to redundancy on the Web, we believe that entities belonging to important categories will frequently co-occur in table columns. We present a clustering algorithm to cluster such frequently occurring entities

WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

by Bhavana Dalvi, William Cohen, Jamie Callan
"... We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and conceptinstance pairs obtained with Hearst patterns. In contrast, our method rel ..."
Abstract - Add to MetaCart
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and conceptinstance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs. 1.

AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables

by Mohamed Amir Yosef, Johannes Hoffart, Ilaria Bordino, Marc Spaniol, Gerhard Weikum
"... We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust frame ..."
Abstract - Add to MetaCart
We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework centred around collective disambiguation exploiting the prominence of entities, similarity between the context of the mention and its candidates, and the coherence among candidate entities for all mentions. We have developed a Web-based online interface for AIDA where different formats of inputs can be processed on the fly, returning proper entities and showing intermediate steps of the disambiguation process. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University