What can you do with a Web in your Pocket? (1998) [41 citations — 2 self]
http://www.yc.musashi-tech.ac.jp/~otani/air/WhatCa
http://www.research.microsoft.com/research/db/debu
DBLP
CACHED:
Abstract:
The amount of information available online has grown enormously over the past decade. Fortunately, computing power, disk capacity, and network bandwidth have also increased dramatically. It is currently possible for a university research project to store and process the entire World Wide Web. Since there is a limit on how much text humans can generate, it is plausible that within a few decades one will be able to store and process all the human-generated text on the Web in a shirt pocket. The Web is a very rich and interesting data source. In this paper, we describe the Stanford WebBase, a local repository of a significant portion of the Web. Furthermore, we describe a number of recent experiments that leverage the size and the diversity of the WebBase. First, we have largely automated the process of extracting a sizable relation of books (title, author pairs) from hundreds of data sources spread across the World Wide Web using a technique we call Dual Iterative Pattern Relation Extra...
Citations
| 1839 | The Anatomy of a Large-Scale Hypertextual Web Search Engine – Brin, Page - 1998 |
| 1064 | The PageRank Citation Ranking: Bringing Order to the Web – Page, Brin, et al. - 1999 |
| 209 | Extracting patterns and relations from the world wide web – Brin - 1998 |
| 128 | A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines – Bharat, Broder - 1998 |
| 93 | GENVL and WWWW: Tools for Taming the Web – McBryan - 1994 |
| 49 | Finding near-replicas of documents on the Web – SHIVAKUMAR, GARCIA-MOLINA - 1998 |
| 5 | Speculation in the biomedical community abounds over likely candidates for nobel – Sankaran - 1995 |
| 4 | Google search engine. http://google.stanford.edu – Brin, Page |
| 4 | Dynamic data mining: Exploring large rule space by sampling – Brin, Page - 1999 |
| 4 | Search engine watch. http://www.searchenginewatch.com – Sullivan - 2000 |

