MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Trawling the Web for Emerging Cyber-Communities (1999) [220 citations — 6 self]

by Ravi Kumar ,  Prabhakar Raghavan ,  Sridhar Rajagopalan ,  Andrew Tomkins
Computer Networks
Add To MetaCart

Abstract:

: The web harbors a large number of communities - groups of content-creators sharing a common interest which manifests itself as a set of web pages. Whereas newgroups and commercial web directories together contain of the order of 10000 such communities, our particular interest here is on emerging communities - those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. 1. Overview The web has several thousand well-known, explicitly-defined communities --- groups of individuals who share a common interest, together with the web pages most popular amongst them. Consider for ...

Citations

1669 Authoritative sources in a hyperlinked environment – Kleinberg - 1999
1636 Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990
595 The Lorel Query Language for Semistructured Data – Abiteboul, Quass, et al. - 1997
349 Improved algorithms for topic distillation in hyperlinked environments – Bharat, Henzinger - 1998
263 Syntactic clustering of the Web – Broder, Glassman, et al.
253 Inferring Web communities from link topology – Gibson, Kleinberg, et al. - 1998
244 Automatic resource compilation by analyzing hyperlink structure and associated text – Chakrabarti, Dom, et al. - 1998
205 Silk from a sow’s ear: Extracting usable structures from the Web – Pirolli, Pitkow, et al. - 1996
128 A.: A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines – Bharat, Broder - 1998
101 The anatomy of a large scale hypertextual Web search engine – Brin, Page - 1998
86 WebQuery: Searching and visualizing the Web through connectivity – Carrière, Kazman
86 Finding Regular Simple Paths in Graph Databases – Mendelzon, Wood - 1995
65 Applications of a Web query language – Arocena, Mendelzon, et al.
39 Intermediaries: New Places for Producing and Manipulating Web Content – Barrett, Manglio - 1998
19 The limits of Web metadata, and beyond – Marchiori - 1998
15 Rajeev Motwani, Svetlozar Nestorov, and Arnon Rosenthal. Query flocks: A generalization of association-rule mining – Tsur, Ullman, et al. - 1998
11 Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems – Konopnicki, Shmueli - 1998
6 Prabhakar Raghavan, Sridhar Rajagopalan and Andrew Tomkins, Mining the Link Structure of the World Wide Web – Chakrabarti, Dom, et al. - 1999
4 Raghavan P: Inferring Web communities from link topology – Gibson, Kleinberg - 1998
1 Srikant 94 Rakesh Agrawal and Ramakrishnan Srikanth. Fast Algorithms for mining Association rules – Agrawal - 1994
1 Mendelzon 98 Daniela Florescu, Alon Levy, Alberto Mendelzon. Database Techniques for the World-Wide Web: A Survey – Florescu - 1998
1 Rajagopalan 98 Monika Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets. Also technical note – Henzinger - 1998
1 Oded Shmueli, Information gathering on the world wide web: the W3QL query language and the W3QS system. Transactions on Database Systems – Konopnicki, Shmueli - 1998
1 A Declarative Approach to Querying and Retsructuring the World-Wide-Web – Lakshmanan, Sadri, et al. - 1996
1 received his Ph.D in Computer Science from Cornell University in 1998 and since then he has been a Research Staff Member at the IBM Almaden Research Center. His research interests include randomization, complexity theory, and information processing. Prabh – Sci, Technology - 1989
1 Mendelzon 98 Gustavo Arocena and Alberto Mendelzon. WebOQL: Restructuring Documents – Arocena - 1998
1 Sridhar Rajagopalan and Andrew Tomkins. Experiments in Topic Distillation – Chakrabarti, Dom, et al. - 1998
1 Rajeev Motwani, and Jeffrey Ullman Computing iceberg queries efficiently – Fang, Shivakumar, et al. - 1998
1 Spertus 97 Ellen Spertus. ParaSite: mining structural information on the Web – Spertus, Stein - 1998