Results 1 - 10
of
14
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, ..."
Abstract
-
Cited by 257 (10 self)
- Add to MetaCart
. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
The Web as a graph
, 2000
"... The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the e ..."
Abstract
-
Cited by 147 (2 self)
- Add to MetaCart
The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
Extracting Large-Scale Knowledge Bases From the Web
- Proceedings of the 25th VLDB Conference
, 1999
"... The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the ..."
Abstract
-
Cited by 97 (2 self)
- Add to MetaCart
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities. 1 Overview The subject of this paper is the creation of knowledge bases by ...
Mining the Link Structure of the World Wide Web
- IEEE Computer
, 1999
"... Abstract The World Wide Web contains an enormous amount of information, but it can be exceedingly difficult for users to locate resources that are both high in quality and relevant to their information needs. We develop algorithms that exploit the hyperlink structure of the WWW for information disco ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Abstract The World Wide Web contains an enormous amount of information, but it can be exceedingly difficult for users to locate resources that are both high in quality and relevant to their information needs. We develop algorithms that exploit the hyperlink structure of the WWW for information discovery and categorization, the construction of high-quality resource lists, and the analysis of on-line hyperlinked communities.
Infinite Limits of Copying Models of the Web Graph
- Internet Mathematics
"... Several stochastic models were proposed recently to model the dynamic evolution of the web graph. We study the infinite limits of the stochastic processes proposed to model the web graph when time goes to infinity. We prove that deterministic variations of the so-called copying model can lead to ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Several stochastic models were proposed recently to model the dynamic evolution of the web graph. We study the infinite limits of the stochastic processes proposed to model the web graph when time goes to infinity. We prove that deterministic variations of the so-called copying model can lead to several nonisomorphic limits. Some models converge to the infinite random graph R, while the convergence of other models is sensitive to initial conditions or minor changes in the rules of the model. We explain how limits of the copying model of the web graph share several properties with R that seem to reflect known properties of the web graph.
Detecting Emerging Concepts in Textual Data Mining
- In Computational Information Retrieval
, 2001
"... This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecastin ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecasting is another example with numerous applications of both academic and practical interest. In general, trending analysis of textual data can be performed in any domain that involves written records of human endeavors whether scientific or artistic in nature
Visual analysis of large heterogeneous social networks by semantic and structural abstraction
- IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
, 2006
"... Social network analysis is an active area of study beyond sociology. It uncovers the invisible relationships between actors in a network and provides understanding of social processes and behaviors. It has become an important technique in a variety of application areas such as the Web, organization ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Social network analysis is an active area of study beyond sociology. It uncovers the invisible relationships between actors in a network and provides understanding of social processes and behaviors. It has become an important technique in a variety of application areas such as the Web, organizational studies, and homeland security. This paper presents a visual analytics tool, OntoVis, for understanding large, heterogeneous social networks, in which nodes and links could represent different concepts and relations, respectively. These concepts and relations are related through an ontology (also known as a schema). OntoVis is named such because it uses information in the ontology associated with a social network to semantically prune a large, heterogeneous network. In addition to semantic abstraction, OntoVis also allows users to do structural abstraction and importance filtering to make large networks manageable and to facilitate analytic reasoning. All these unique capabilities of OntoVis are illustrated with several case studies.
A novel Web usage mining approach for search engines
- COMPUTER NETWORKS
, 2002
"... Web usage mining can be very useful to search engines. This paper proposes a novel effective approach to exploit the relationships among users, queries and resources based on the search engine's log. How this method can be applied is illustrated a Chinese image search engine. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Web usage mining can be very useful to search engines. This paper proposes a novel effective approach to exploit the relationships among users, queries and resources based on the search engine's log. How this method can be applied is illustrated a Chinese image search engine.
Decoding the structure of the WWW: A comparative analysis of Web crawls
- ACM Trans. Web
"... The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with W ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this article, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In short, the stability of the widely accepted statistical description of the Web is called into question. In order to provide a more accurate characterization of the Web graph, we study statistical measures beyond the degree distribution,
On Approximation Algorithms for Data Mining Applications
, 2002
"... We aim to present current trends in the theoretical computer science research on topics which have applications in data mining. We briefly describe data mining tasks in various application contexts. We give an overview of some of the questions and algorithmic issues that are of concern when mining h ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We aim to present current trends in the theoretical computer science research on topics which have applications in data mining. We briefly describe data mining tasks in various application contexts. We give an overview of some of the questions and algorithmic issues that are of concern when mining huge amounts of data that do not fit in main memory.

