Results 1 - 10
of
62
Authoritative Sources in a Hyperlinked Environment
- JOURNAL OF THE ACM
, 1999
"... The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and repo ..."
Abstract
-
Cited by 2222 (9 self)
- Add to MetaCart
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authoritative ” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages ” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Inferring Web Communities from Link Topology
, 1998
"... The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more traditionally-created hypermedia. To extract meaningful structure under such circumstances, we develop a ..."
Abstract
-
Cited by 298 (4 self)
- Add to MetaCart
The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more traditionally-created hypermedia. To extract meaningful structure under such circumstances, we develop a notion of hyperlinked communities on the www through an analysis of the link topology. Byinvoking a simple, mathematically clean method for de ning and exposing the structure of these communities, we are able to derive anumber of themes: The communities can be viewed as containing a core of central, "authoritative" pages linked together by "hub pages"; and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pattern of linkage. Our investigation shows that although the process by which users of the Web create pages and links is very di cult to understand at a "local" level, it results in a much greater degree of orderly high-level structure than has typically been assumed.
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, ..."
Abstract
-
Cited by 257 (10 self)
- Add to MetaCart
. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
Trawling the Web for Emerging Cyber-Communities
- Computer Networks
, 1999
"... : The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here ..."
Abstract
-
Cited by 257 (7 self)
- Add to MetaCart
: The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. Keywords: web mining, communities, trawling, link analysis 1. Overview The web has several thousand well-known, explicitly-defined communities -- groups of individuals who share a common int...
Graph structure in the Web
- In Proceedings of the 9th International World Wide Web conference on Computer Networks: The International Journal of Computer and Telecommunications Networking
, 2000
"... The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of ..."
Abstract
-
Cited by 185 (6 self)
- Add to MetaCart
The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the web graph using two Altavista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the web is considerably more intricate than suggested by earlier experiments on a smaller scale.
The Web as a graph
, 2000
"... The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the e ..."
Abstract
-
Cited by 147 (2 self)
- Add to MetaCart
The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
Finding related pages in the World Wide Web
- IN INTERNATIONAL WORLD WIDE WEB CONFERENCE
, 1999
"... When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach toweb searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related ..."
Abstract
-
Cited by 138 (1 self)
- Add to MetaCart
When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach toweb searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related web pages. A related web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers. We describe two algorithms to identify related web pages. These algorithms use only the connectivity information in the web (i.e., the links between pages) and not the content of pages or usage information. We haveimplemented both algorithms and measured their runtime performance. To evaluate the e ectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's \What's Related " service [12]. Our study showed that the precision at 10 for our two algorithms are 73 % better and 51 % better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.
Trust Networks on the Semantic Web
- In Proceedings of Cooperative Intelligent Agents
, 2003
"... Abstract. The so-called "Web of Trust " is one of the ultimate goals of the Semantic Web. Research on the topic of trust in this domain has focused largely on digital signatures, certificates, and authentication. At the same time, there is a wealth of research into trust and social network ..."
Abstract
-
Cited by 109 (1 self)
- Add to MetaCart
Abstract. The so-called "Web of Trust " is one of the ultimate goals of the Semantic Web. Research on the topic of trust in this domain has focused largely on digital signatures, certificates, and authentication. At the same time, there is a wealth of research into trust and social networks in the physical world. In this paper, we describe an approach for integrating the two to build a web of trust in a more social respect. This paper describes the applicability of social network analysis to the semantic web, particularly discussing the multi-dimensional networks that evolve from ontological trust specifications. As a demonstration of algorithms used to infer trust relationships, we present several tools that allow users to take advantage of trust metrics that use the network. 1

