Results 1 -
4 of
4
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, ..."
Abstract
-
Cited by 257 (10 self)
- Add to MetaCart
. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
The Web as a graph
, 2000
"... The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the e ..."
Abstract
-
Cited by 147 (2 self)
- Add to MetaCart
The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
Finding related pages in the World Wide Web
- IN INTERNATIONAL WORLD WIDE WEB CONFERENCE
, 1999
"... When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach toweb searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related ..."
Abstract
-
Cited by 138 (1 self)
- Add to MetaCart
When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach toweb searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related web pages. A related web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers. We describe two algorithms to identify related web pages. These algorithms use only the connectivity information in the web (i.e., the links between pages) and not the content of pages or usage information. We haveimplemented both algorithms and measured their runtime performance. To evaluate the e ectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's \What's Related " service [12]. Our study showed that the precision at 10 for our two algorithms are 73 % better and 51 % better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.
Node Similarity in Networked Information Spaces
- In Proceedings of the Conference of the IBM Centre for Advanced Studies on Collaborative Research (CASCON’01). IBM
, 2001
"... Netvorked information spaces contain information entities, corresponding to nodes, vhich are (:orlrle(:l.ed [)y associm.i(ms, (:or'r'esporldirlg 1.o links irl (.he nel.wor'k. Exarrq)les of nel.wor'ked information spaces are: the World Wide Web, vhere information entities are veb pages, and associ ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Netvorked information spaces contain information entities, corresponding to nodes, vhich are (:orlrle(:l.ed [)y associm.i(ms, (:or'r'esporldirlg 1.o links irl (.he nel.wor'k. Exarrq)les of nel.wor'ked information spaces are: the World Wide Web, vhere information entities are veb pages, and associations are hyperlinks; the scientific literature, vhere information entities are articles and associations are references to other articles. SimilariW betveen information entities in a net- vorked information space can be defined not only based on the content of the information entities, but also based on the connectivity established by the associations present. This paper explores the definition of similariW based on connectivity only, and proposes several algorithms [)r' I, his [mr'pose. Our' rrlei, r'i(:s I,ake mJvard,age o[' I, he local rleigh[)or'hoo(ts o[' I, he rmcJes irl I, he rlel,- is no required, as long as a query engine is available for fo]]oving ]inks and extracting he necessary local neighbourhoods for similarity estimation. Tvo variations of similarity estimation beveen vo nodes are described, one based on he separate local neighbourhoods of he nodes, and another based on he join local neighbourhood expanded from boh nodes a he same ime. The algorithms are imp]emened and evaluated on he citation graph of computer science. The immediate application of his vork is in finding papers similar o a given paper [he Web.

