The Web as a graph: measurements, models, and methods
, 1999
Abstract

Cited by 312 (11 self)
. The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical, sociological, and commercial  for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new subfield of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
The Web as a graph
, 2000
Abstract

Cited by 192 (2 self)
The pages and hyperlinks of the WorldWide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasonsmathematical, sociological, and commercialfor studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
Extracting LargeScale Knowledge Bases From the Web
 Proceedings of the 25th VLDB Conference
, 1999
"... The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightlyfocused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the ..."
Abstract

Cited by 110 (2 self)
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightlyfocused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities. 1 Overview The subject of this paper is the creation of knowledge bases by ...
Mining the Link Structure of the World Wide Web
 IEEE Computer
, 1999
"... Abstract The World Wide Web contains an enormous amount of information, but it can be exceedingly difficult for users to locate resources that are both high in quality and relevant to their information needs. We develop algorithms that exploit the hyperlink structure of the WWW for information disco ..."
Abstract

Cited by 68 (0 self)
Abstract The World Wide Web contains an enormous amount of information, but it can be exceedingly difficult for users to locate resources that are both high in quality and relevant to their information needs. We develop algorithms that exploit the hyperlink structure of the WWW for information discovery and categorization, the construction of highquality resource lists, and the analysis of online hyperlinked communities.
Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction
 IEEE Transactions on Visualization and Computer Graphics
Infinite Limits of Copying Models of the Web Graph
 INTERNET MATHEMATICS
"... Several stochastic models were proposed recently to model the dynamic evolution of the web graph. We study the infinite limits of the stochastic processes proposed to model the web graph when time goes to infinity. We prove that deterministic variations of the socalled copying model can lead to ..."
Abstract

Cited by 18 (10 self)
Several stochastic models were proposed recently to model the dynamic evolution of the web graph. We study the infinite limits of the stochastic processes proposed to model the web graph when time goes to infinity. We prove that deterministic variations of the socalled copying model can lead to several nonisomorphic limits. Some models converge to the infinite random graph R, while the convergence of other models is sensitive to initial conditions or minor changes in the rules of the model. We explain how limits of the copying model of the web graph share several properties with R that seem to reflect known properties of the web graph.
Detecting Emerging Concepts in Textual Data Mining
 In Computational Information Retrieval
, 2001
"... This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecastin ..."
Abstract

Cited by 15 (4 self)
This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecasting is another example with numerous applications of both academic and practical interest. In general, trending analysis of textual data can be performed in any domain that involves written records of human endeavors whether scientific or artistic in nature
Applications of Linear Algebra to Information Retrieval and Hypertext Analysis
 Proc. ACM Symp. Principles of Database Systems Conf., Tutorial Survey
, 1999
"... Information retrieval is concerned with representing content in a form that can be easily accessed by users with information needs [61, 651. A definition at this level of generality applies equally well to any indexbased retrieval system or database application; so let us focus the topic a little ..."
Abstract

Cited by 14 (1 self)
Information retrieval is concerned with representing content in a form that can be easily accessed by users with information needs [61, 651. A definition at this level of generality applies equally well to any indexbased retrieval system or database application; so let us focus the topic a little more carefully. Information retrieval, as a field, works primarily with highly unstructured content, such as text documents written in natural language; it deals with information needs that are generally not formulated according to precise specifications; and its criteria for success are based in large part on the demands of a diverse set of human users. Our purpose in this short article is not to provide a survey of the field of information retrieval for this we refer the reader to texts and surveys such as [25, 29,
A novel Web usage mining approach for search engines
 COMPUTER NETWORKS
, 2002
"... Web usage mining can be very useful to search engines. This paper proposes a novel effective approach to exploit the relationships among users, queries and resources based on the search engine's log. How this method can be applied is illustrated a Chinese image search engine. ..."
Abstract

Cited by 11 (0 self)
Web usage mining can be very useful to search engines. This paper proposes a novel effective approach to exploit the relationships among users, queries and resources based on the search engine's log. How this method can be applied is illustrated a Chinese image search engine.
Decoding the structure of the WWW: A comparative analysis of Web crawls
 ACM Trans. Web
"... The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with W ..."
Abstract

Cited by 9 (2 self)
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with Web pages and hyperlinks, respectively. Data gathered in largescale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this article, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In short, the stability of the widely accepted statistical description of the Web is called into question. In order to provide a more accurate characterization of the Web graph, we study statistical measures beyond the degree distribution,