Results 1 - 10
of
118
The WebGraph Framework I: Compression Techniques
- In Proc. of the Thirteenth International World Wide Web Conference
, 2003
"... Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms ..."
Abstract
-
Cited by 114 (23 self)
- Add to MetaCart
Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
Trust Networks on the Semantic Web
- In Proceedings of Cooperative Intelligent Agents
, 2003
"... Abstract. The so-called "Web of Trust " is one of the ultimate goals of the Semantic Web. Research on the topic of trust in this domain has focused largely on digital signatures, certificates, and authentication. At the same time, there is a wealth of research into trust and social network ..."
Abstract
-
Cited by 109 (1 self)
- Add to MetaCart
Abstract. The so-called "Web of Trust " is one of the ultimate goals of the Semantic Web. Research on the topic of trust in this domain has focused largely on digital signatures, certificates, and authentication. At the same time, there is a wealth of research into trust and social networks in the physical world. In this paper, we describe an approach for integrating the two to build a web of trust in a more social respect. This paper describes the applicability of social network analysis to the semantic web, particularly discussing the multi-dimensional networks that evolve from ontological trust specifications. As a demonstration of algorithms used to infer trust relationships, we present several tools that allow users to take advantage of trust metrics that use the network. 1
Deeper inside pagerank
- Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract
-
Cited by 107 (4 self)
- Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
Exploiting the Block Structure of the Web for Computing PageRank
, 2003
"... The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage alg ..."
Abstract
-
Cited by 106 (5 self)
- Add to MetaCart
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link structure of that host, (2) these local PageRanks are then weighted by the "importance" of the corresponding host, and (3) the standard PageRank algorithm is then run using as its starting vector the weighted concatenation of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different "personalized" PageRanks, and a variant that efficiently recomputes PageRank after node updates.
The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking
- In EDBT
, 2002
"... Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electroni ..."
Abstract
-
Cited by 89 (7 self)
- Add to MetaCart
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retreval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness.
Power-Laws and the AS-level Internet Topology
- IEEE/ACM Transactions on Networking
, 2003
"... In this paper, we study and characterize the topology of the Internet at the Autonomous System level. First, we show that the topology can be described efficiently with power-laws. The elegance and simplicity of the powerlaws provide a novel perspective into the seemingly uncontrolled Internet struc ..."
Abstract
-
Cited by 77 (8 self)
- Add to MetaCart
In this paper, we study and characterize the topology of the Internet at the Autonomous System level. First, we show that the topology can be described efficiently with power-laws. The elegance and simplicity of the powerlaws provide a novel perspective into the seemingly uncontrolled Internet structure. Second, we show that power-laws appear consistently over the last 5 years. We also observe that the power-laws hold even in the most recent and more complete topology [10] with correlation coefficient above 99% for the degree power-law. In addition, we study the evolution of the power-law exponents over the 5 year interval and observe a variation for the degree based power-law of less than 10%. Third, we provide relationships between the exponents and other topological metrics.
ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs
- NTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2002
"... Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can ..."
Abstract
-
Cited by 73 (15 self)
- Add to MetaCart
Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.
A General Model of Web Graphs
, 2003
"... We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web. ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
We describe a very general model of a random graph process whose proportional degree sequence obeys a power law. Such laws have recently been observed in graphs associated with the world wide web.
The XML Web: a First Study
, 2003
"... Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of inte ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of interest on XML both in industry and in academia. Nevertheless, to date no comprehensive study on the XML Web (i.e., the subset of the Web made of XML documents only) nor on its contents has been made. This paper is the first attempt at describing the XML Web and the documents contained in it. Our results are drawn from a sample of a repository of the publicly available XML documents on the Web, consisting of about 200,000 documents. Our results show that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically. Also, our results about the contents of the XML Web provide valuable input for the design of algorithms, tools and systems that use XML in one form or another.

