Results 1 - 10
of
33
Evolution of networks
- Adv. Phys
, 2002
"... We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence rece ..."
Abstract
-
Cited by 201 (1 self)
- Add to MetaCart
We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence recently. This opens a wide field for the study of their topology, evolution, and complex processes occurring in them. Such networks possess a rich set of scaling properties. A number of them are scale-free and show striking resilience against random breakdowns. In spite of large sizes of these networks, the distances between most their vertices are short — a feature known as the “smallworld” effect. We discuss how growing networks self-organize into scale-free structures and the role of the mechanism of preferential linking. We consider the topological and structural properties of evolving networks, and percolation in these networks. We present a number of models demonstrating the main features of evolving networks and discuss current approaches for their simulation and analytical study. Applications of the general results to particular networks in Nature are discussed. We demonstrate the generic connections of the network growth processes with the general problems
The Web as a graph
, 2000
"... The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the e ..."
Abstract
-
Cited by 147 (2 self)
- Add to MetaCart
The pages and hyperlinks of the World-Wide Web maybe viewed as nodes and edges in a directed graph. This graph has about a billion nodes today,several billion links, and appears to grow exponentially with time. There are many reasons---mathematical, sociological, and commercial---for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.
Extracting Large-Scale Knowledge Bases From the Web
- Proceedings of the 25th VLDB Conference
, 1999
"... The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the ..."
Abstract
-
Cited by 97 (2 self)
- Add to MetaCart
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities. 1 Overview The subject of this paper is the creation of knowledge bases by ...
Mining Topic Specific Concepts and Definitions on the Web
, 2003
"... Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic from the Web is becoming increasingly important and popular. This is also due to the Webs convenience and its richne ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic from the Web is becoming increasingly important and popular. This is also due to the Webs convenience and its richness of information. In many cases, learning from the Web may even be essential because in our fast changing world, emerging topics appear constantly and rapidly. There is often not enough time for someone to write a book on such topics. To learn such emerging topics, one can resort to research papers. However, research papers are often hard to understand by non-researchers, and few research papers cover every aspect of the topic. In contrast, many Web pages often contain intuitive descriptions of the topic. To find such Web pages, one typically uses a search engine. However, current search techniques are not designed for in-depth learning. Top ranking pages from a search engine may not contain any description of the topic. Even if they do, the description is usually incomplete since it is unlikely that the owner of the page has good knowledge of every aspect of the topic. In this paper, we attempt a novel and challenging task, mining topic-specific knowledge on the Web. Our goal is to help people learn in-depth knowledge of a topic systematically on the Web. The proposed techniques first identify those sub-topics or salient concepts of the topic, and then find and organize those informative pages, containing definitions and descriptions of the topic and sub-topics, just like those in a traditional book. Experimental results using 28 topics show that the proposed techniques are highly effective. Categories and Subject Descriptors H.3.3 [Information S...
Scaling link-based similarity search
, 2004
"... To exploit the similarity information hidden in the hyperlink structure of the web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multi-step neighborhoods of vertices are numerically evaluated by similarity functions in ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
To exploit the similarity information hidden in the hyperlink structure of the web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multi-step neighborhoods of vertices are numerically evaluated by similarity functions including SimRank [20], a recursive refinement of cocitation; PSimRank, a novel variant with better theoretical characteristics; and the Jaccard coefficient, extended to multi-step neighborhoods. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. The performance and quality of the methods were tested on the Stanford Webbase [19] graph of 80M pages by comparing our scores to similarities extracted from the ODP directory [26]. Our experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than singlestep neighborhoods.
Link Mining: A New Data Mining Challenge
- SIGKDD Explorations
, 2003
"... A key challenge for data mining is tackling the problem of mining richly structured datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistic ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
A key challenge for data mining is tackling the problem of mining richly structured datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistical models. Recently there has been a surge of interest in this area, fueled largely by interest in web and hypertext mining, but also by interest in mining social networks, security and law enforcement data, bibliographic citations and epidemiological records. 1.
Evaluating contents-link coupled web page clustering for web search results
- In Proc. 11th Intl. Conference on Information and Knowledge Management
, 2002
"... ..."
PicASHOW: Pictorial Authority Search by Hyperlinks on the Web
, 2001
"... We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page # displays (or links to) an image when the author of # considers the image to be of value to the viewers of the page. Wethus extend some w ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page # displays (or links to) an image when the author of # considers the image to be of value to the viewers of the page. Wethus extend some well known link-based WWW #### ######### schemes to the context of image retrieval. PicASHOW's analysis of the link structure enables it to retrieve relevant images even when those are stored in les with meaningless names. The same analysis also allows it to identify ##### ########## and ##### ####. We dene these as Web pages that are rich in relevant images, or from which many images are readily accessible. PicASHOW requires no image analysis whatsoever and no creation of taxonomies for pre-classication of the Web's images. It can be implemented by standard WWW search engines with reasonable overhead, in terms of both computations and storage, and with no change to user query formats. It can thus be used to easily add image retrieving capabilities to standard search engines. Our results demonstrate that PicASHOW, while relying almost exclusively on link analysis, compares well with dedicated WWW image retrieval systems. We conclude that link analysis, a bona-de eective technique for Web page search, can improve the performance of Web image retrieval, as well as extend its denition to include the retrieval of image hubs and containers. Keywords Image Retrieval; Link Structure Analysis; Hubs and Authorities; Image Hubs. 1.
Discovering Unexpected Information from Your Competitors' Web Sites
- In Proceedings of ACM SIG KDD-2001
, 2001
"... Ever since the beginning of the Web, finding useful information from the Web has been an important problem. Existing approaches include keyword-based search, wrapper-based information extraction, Web query and user preferences. These approaches essentially find information that matches the user's ex ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Ever since the beginning of the Web, finding useful information from the Web has been an important problem. Existing approaches include keyword-based search, wrapper-based information extraction, Web query and user preferences. These approaches essentially find information that matches the user's explicit specifications. This paper argues that this is insufficient. There is another type of information that is also of great interest, i.e., unexpected information, which is unanticipated by the user. Finding unexpected information is useful in many applications. For example, it is useful for a company to find unexpected information about its competitors, e.g., unexpected services and products that its competitors offer. With this information, the company can learn from its competitors and/or design counter measures to improve its competitiveness. Since the number of pages of a typical commercial site is very large and there are also many relevant sites (competitors), it is very difficult for a human user to view each page to discover the unexpected information. Automated assistance is needed. In this paper, we propose a number of methods to help the user find various types of unexpected information from his/her competitors' Web sites. Experiment results show that these techniques are very useful in practice and also efficient. Keywords Information interestingness, Web comparison, Web mining. 1.
Beyond Similarity
- IN PROCEEDINGS OF THE 2000 WORKSHOP ON ARTIFICIAL INTELLIGENCE AND WEB SEARCH
, 2000
"... Agents that provide just-in-time access to relevant online material by observing user behavior in everyday applications have been the focus of much research, both in our lab, and elsewhere. These systems analyze information objects the user is manipulating in order to recommend additional infor ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Agents that provide just-in-time access to relevant online material by observing user behavior in everyday applications have been the focus of much research, both in our lab, and elsewhere. These systems analyze information objects the user is manipulating in order to recommend additional information. Designers of such systems typically make the assumption that objects similar to the one being manipulated by the user will be useful to her. Our own experiments show that users do find many of the documents retrieved by a system of this type are relevant. Yet in the context of a specific task, users find fewer of these documents are useful. Our main point is that in order to make just-in-time information systems truly useful, we need to reexamine the "similarity assumption" inherent in many of these systems' designs. In light of this, we propose techniques that bring modest amounts of task-specific knowledge to bear in order to perform lexical transformations on the queri...

