Extracting Patterns and Relations from the World Wide Web (1998) [209 citations — 1 self]
http://bolek.ii.pw.edu.pl/~gawrysia/WEDT/brin.pdf
http://www.ai.mit.edu/people/jimmylin/papers/Brin9
DBLP
CACHED:
Abstract:
The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs from the World Wide Web.
Citations
| 1636 | Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990 |
| 1 | Google search engine. http://google. stanford.edu – Brin, Page |
| 1 | List of books. http://www-db.stanford.edu/~sergey/ booklist.html – Brin |
| 1 | The Young Gardeners' Kalendar – Radford - 1904 |
| 1 | Indexing by latent semantic analysis – Press - 1990 |

