Results 1 - 10
of
19
Efficiently Answering Reachability Queries on Very Large Directed Graphs
"... Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as pathtree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.
Fast computation of reachability labeling for large graphs
- In Proc. of EDBT’06
, 2006
"... There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in O(|V | · |E | 1/2) space, and be used to answer reachability query efficiently. Howev ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in O(|V | · |E | 1/2) space, and be used to answer reachability query efficiently. However, it is challenging to compute a 2-hop cover. The existing approaches suffer from either large resource consumption or low compression rate. In this paper, we propose a hierarchical partitioning approach to partition a large graph G into two subgraphs repeatedly in a top-down fashion. The unique feature of our approach is that we compute 2-hop cover while partitioning. In brief, in every iteration of top-down partitioning, we provide techniques to compute the 2-hop cover for connections between the two subgraphs first. A cover is computed to cut the graph into two subgraphs, which results in an overall cover with high compression for the entire graph G. Two approaches are proposed, namely a node-oriented approach and an edge-oriented approach. Our approach can efficiently compute 2-hop cover for a large graph with high compression rate. Our extensive experiment study shows that the 2-hop cover for a graph with 1,700,000 nodes and 169 billion connections can be obtained in less than 30 minutes with a compression rate about 40,000 using a PC.
Fast and accurate estimation of shortest paths in large graphs
- In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
On Pushing Multilingual Query Operators into Relational Engines
- In ICDE ’06: Proc. of the 22nd International Conference on Data Engineering
, 2006
"... To effectively support today’s global economy, database systems need to manage data in multiple languages simultaneously. While current database systems do support the storage and management of multilingual data, they are not capable of querying across different natural languages. To address this la ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
To effectively support today’s global economy, database systems need to manage data in multiple languages simultaneously. While current database systems do support the storage and management of multilingual data, they are not capable of querying across different natural languages. To address this lacuna, we have recently proposed two cross-lingual functionalities, LexEQUAL[13] and SemEQUAL[14], for matching multilingual names and concepts, respectively. In this paper, we investigate the native implementation of these multilingual functionalities as first-class operators on relational engines. Specifically, we propose a new multilingual storage datatype, and an associated algebra of the multilingual operators on this datatype. These components have been successfully implemented in the PostgreSQL database system, including integration of the algebra with the query optimizer and inclusion of a metric index in the access layer. Our experiments demonstrate that the performance of the native implementation is up to two orders-of-magnitude faster than the corresponding outsidethe-server implementation. Further, these multilingual additions do not adversely impact the existing functionality and performance. To the best of our knowledge, our prototype represents the first practical implementation of a crosslingual database query engine. 1
FliX: A flexible framework for indexing complex XML document collections
- In 1st Int. Workshop on Database Technologies for Handling XML Information on the Web
, 2004
"... Abstract. While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. Existing strategies lack support for intra- or inter-document links, require large amounts of time to build or space to stor ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. While there are many proposals for path indexes on XML documents, none of them is perfectly suited for indexing large-scale collections of interlinked XML documents. Existing strategies lack support for intra- or inter-document links, require large amounts of time to build or space to store the index, or cannot efficiently answer connection queries. This paper presents the FliX framework for connection indexing that supports large, heterogeneous document collections with many links, using the existing path indexes as building blocks. We introduce some example configurations of the framework that are appropriate for many important application scenarios. Experiments show the feasibility of our approach. 1
Autonomous Index Optimization in XML Databases
, 2005
"... Defining suitable indexes is a major task when optimizing a database. Usually, a human database administrator defines a set of indexes in the design phase of the database. This can be done manually or with the help of so called index wizard tools analyzing predefined database operations. Even havin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Defining suitable indexes is a major task when optimizing a database. Usually, a human database administrator defines a set of indexes in the design phase of the database. This can be done manually or with the help of so called index wizard tools analyzing predefined database operations. Even having an optimal initial set of indexes when setting up a database, there is no guarantee that these indexes will suit future demands. Rather, it is realistic that the typical usage of the database will change after a while because new queries appear, for instance. In consequence, the existing indexes are suboptimal. The typical way to handle this problem is that a database administrator maintains the database permanently. In XML database management systems (XDBMS) this problem becomes even worse: Because XML queries cover both content and structure the number of possible queries and indexes is significantly higher. Additionally, for XML data without schema information, queries and indexes cannot be defined in advance, because the structure and the content of the data is not restricted. Both facts tend to result in higher maintenance costs for XML indexes compared to relational indexes. In this paper we show by performance measurements that an adaptive XDBMS that analyzes its workload periodically and creates/drops XML indexes automatically guarantees a high performance over the total life time of a database. Although we present our index system called KeyX the idea and the results are transferable to other XML indexing approaches.
The Index Update Problem for XML Data in XDBMS
- Proceedings of the 9th International Database Engineering & Application Symposium (IDEAS 2005
"... Database Management Systems are a major component of almost every information system. In relational Database Management Systems (RDBMS) indexes are well known and essential for the performant execution of frequent queries. For XML Database Management Systems (XDBMS) no index standards are establish ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Database Management Systems are a major component of almost every information system. In relational Database Management Systems (RDBMS) indexes are well known and essential for the performant execution of frequent queries. For XML Database Management Systems (XDBMS) no index standards are established yet; although they are required not less. An inevitable side effect of any index is that modifications of the indexed data have to be reflected by the index structure itself. This leads to two problems: first it has to be determined whether a modifying operation affects an index or not. Second, if an index is affected, the index has to be updated efficiently- best without rebuilding the whole index. In recent years a lot of approaches were introduced for indexing XML data in an XDBMS. All approaches lack more or less in the field of updates. In this paper we give an algorithm that is based on finite automaton theory and determines whether an XPath based database operation affects an index that is defined universally upon keys, qualifiers and a return value of an XPath expression. In addition, we give algorithms how we update our KeyX indexes efficiently if they are affected by a modification. The Index Update Problem is relevant for all applications that use a secondary XML data representation (e.g. indexes, caches, XML replication/synchronization services) where updates must be identified and realized.
XML Perspectives on RDF Querying: Towards integrated Access to Data and Metadata on the Web
"... The integral processing of data and metadata is starting to get recognized as a central challenge for the next decade (e.g. in Pat Selinger’s ICDE 2005 Keynote) not only as part of realizing the Semantic Web vision, but also on a smaller scale as part of the next generation of desktop data managemen ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The integral processing of data and metadata is starting to get recognized as a central challenge for the next decade (e.g. in Pat Selinger’s ICDE 2005 Keynote) not only as part of realizing the Semantic Web vision, but also on a smaller scale as part of the next generation of desktop data management (cf. Apple’s Spotlight and Microsoft’s WinFS). In this article, we focus on metadata represented in the W3C’s RDF formalism. We illustrate first steps towards integrating access to RDF metadata and access to standard Web data in XML format. For this, two XML views over RDF data are expressed in the query language Xcerpt and discussed. These views illustrate two different approaches for integrating RDF metadata processing and current data processing techniques. 1
Marriages of Convenience: Triples and Graphs, RDF and XML in Web Querying
"... Abstract. Metadata processing is recognized as a central challenge for database research in the next decade. Already, novel desktop data management and search applications (cf. Apple’s Spotlight and Microsoft’s WinFS) are enabled by rich metadata. Efficient and effective access to such data becomes ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Metadata processing is recognized as a central challenge for database research in the next decade. Already, novel desktop data management and search applications (cf. Apple’s Spotlight and Microsoft’s WinFS) are enabled by rich metadata. Efficient and effective access to such data becomes a crucial issue for more and more application scenarios. In this article, we focus on metadata represented in RDF. A number of query languages for RDF have been presented in recent years. This article argues that most of these approaches fail to address properly two core issues: the provision of rich operators and constructs to adequately support RDF’s graph data model and the ability to intertwine access to metadata (in RDF format) and data (in XML format). To address this points, two XML views over RDF data are expressed in the query language Xcerpt and discussed. Furthermore, it is shown how these views together with Xcerpt’s rich graph patterns allow the succinct expression of complex, but common queries against RDF graphs. 1

