Results 11 - 20
of
1,870
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, ..."
Abstract
-
Cited by 257 (10 self)
- Add to MetaCart
. The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons --- mathematical, sociological, and commercial --- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
Rank Aggregation Methods for the Web
, 2001
"... We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. Wed ..."
Abstract
-
Cited by 235 (4 self)
- Add to MetaCart
We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. Wedevelop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques that can effectively combat "spam," a serious problem in Web searches. Experiments show that our methods are simple, efficient, and effective. Keywords: rank aggregation, ranking functions, metasearch, multi-word queries, spam 1.
Mining the Network Value of Customers
- In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining
, 2002
"... One of the major applications of data mining is in helping companies determine which potential customers to market to. If the expected pro t from a customer is greater than the cost of marketing to her, the marketing action for that customer is executed. So far, work in this area has considered only ..."
Abstract
-
Cited by 217 (10 self)
- Add to MetaCart
One of the major applications of data mining is in helping companies determine which potential customers to market to. If the expected pro t from a customer is greater than the cost of marketing to her, the marketing action for that customer is executed. So far, work in this area has considered only the intrinsic value of the customer (i.e, the expected pro t from sales to her). We propose to model also the customer's network value: the expected pro t from sales to other customers she may inuence to buy, the customers those may inuence, and so on recursively. Instead of viewing a market as a set of independent entities, we view it as a social network and model it as a Markov random eld. We show the advantages of this approach using a social network mined from a collaborative ltering database. Marketing that exploits the network value of customers|also known as viral marketing|can be extremely eective, but is still a black art. Our work can be viewed as a step towards providing a more solid foundation for it, taking advantage of the availability of large relevant databases. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications| data mining
Evolution of networks
- Adv. Phys
, 2002
"... We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence rece ..."
Abstract
-
Cited by 201 (1 self)
- Add to MetaCart
We review the recent fast progress in statistical physics of evolving networks. Interest has focused mainly on the structural properties of random complex networks in communications, biology, social sciences and economics. A number of giant artificial networks of such a kind came into existence recently. This opens a wide field for the study of their topology, evolution, and complex processes occurring in them. Such networks possess a rich set of scaling properties. A number of them are scale-free and show striking resilience against random breakdowns. In spite of large sizes of these networks, the distances between most their vertices are short — a feature known as the “smallworld” effect. We discuss how growing networks self-organize into scale-free structures and the role of the mechanism of preferential linking. We consider the topological and structural properties of evolving networks, and percolation in these networks. We present a number of models demonstrating the main features of evolving networks and discuss current approaches for their simulation and analytical study. Applications of the general results to particular networks in Nature are discussed. We demonstrate the generic connections of the network growth processes with the general problems
The Evolution of the Web and Implications for an Incremental Crawler
, 1999
"... In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the "freshness" of the collecti ..."
Abstract
-
Cited by 182 (16 self)
- Add to MetaCart
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode. The incremental crawler can improve the "freshness" of the collection significantly and bring in new pages in a more timely manner. We first present results from an experiment conducted on more than half million web pages over 4 months, to estimate how web pages evolve over time. Based on these experimental results, we compare various design choices for an incremental crawler and discuss their trade-offs. We propose an architecture for the incremental crawler, which combines the best design choices.
Keyword searching and browsing in databases using BANKS
- In ICDE
, 2002
"... With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BA ..."
Abstract
-
Cited by 181 (11 self)
- Add to MetaCart
With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BANKS, a system which enables keyword-based search on relational databases, together with data and schema browsing. BANKS enables users to extract information in a simple manner without any knowledge of the schema or any need for writing complex queries. A user can get information by typing a few keywords, following hyperlinks, and interacting with controls on the displayed results. BANKS models tuples as nodes in a graph, connected by links induced by foreign key and other relationships. Answers to a query are modeled as rooted trees connecting tuples that match individual keywords in the query. Answers are ranked using a notion of proximity coupled with a notion of prestige of nodes based on inlinks, similar to techniques developed for Web search. We present an efficient heuristic algorithm for finding and ranking query results. 1.
Efficient Peer-to-Peer Keyword Searching
"... The recent file storage applications built on top of peer-to-peer distributed hash tables lack search capabilities. We believe that search is an important part of any document publication system. To that end, we have designed and analyzed a distributed search engine based on a distributed hash ta ..."
Abstract
-
Cited by 165 (1 self)
- Add to MetaCart
The recent file storage applications built on top of peer-to-peer distributed hash tables lack search capabilities. We believe that search is an important part of any document publication system. To that end, we have designed and analyzed a distributed search engine based on a distributed hash table. Our simulation results predict that our search engine can answer an average query in under one second, using under one kilobyte of bandwidth.

