@MISC{_understandingtables, author = {}, title = {Understanding Tables on the Web}, year = {} }
Share
OpenURL
Abstract
The Web contains a wealth of information, and a key challenge is to make this information machine processable. Because natural language understanding at web scale remains difficult and costly at present, in this paper, we focus our attention on understanding well-structured html tables on the Web. From 0.3 billion Web doc-uments, we obtain 1.95 billion tables, and 0.5-1 % of these contain meaningful information of various entities and their properties. Our work focuses on detecting these tables, understanding their content, and using the obtained information and knowledge to support im-portant applications such as search. Our starting point is a rich, general purpose taxonomy whose content is harvested automati-cally from the Web and search log data. We use the taxonomy to help us interpret and understand tables. We then use the content we understand to enrich the taxonomy, which, in turn, enables us to understand more tables. We report large scale experimental re-sults that demonstrate the feasibility of this approach, and we build a semantic search engine over tables to demonstrate how structured data can empower information retrieval on the Web. 1.