• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Distributed socialite: A datalog-based language for large-scale graph analysis. (2013)

by J Seo, J Park, J Shin, M S Lam
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

The case against specialized graph analytics engines

by Jing Fan, Adalbert Gerald, Soosai Raj, Jignesh M. Patel
"... Graph analytic processing has started to become a nearly ubiquitous component in the enterprise data analytics ecosys-tem. In response to this growing need, various specialized graph processing engines have been created in recent years. Sadly, the use of relational database management systems (RDBMS ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Graph analytic processing has started to become a nearly ubiquitous component in the enterprise data analytics ecosys-tem. In response to this growing need, various specialized graph processing engines have been created in recent years. Sadly, the use of relational database management systems (RDBMSs) for graph processing is largely ignored in most enterprise settings. This oversight is surprising since in most enterprise settings, RDBMSs are already present and used for a variety of other analytic tasks. This situation then begs the question of whether the use of RDBMS for graph pro-cessing is fundamentally lacking in some respect compared to the specialized graph processing engines. In this paper, we aim to address this question both from the programmer productivity perspective and from the performance perspec-tive. We present Grail – a syntactic layer for querying graph in a vertex-centric way in an RDBMS, which can be com-piled to translate graph queries to SQL. In a single node setting, we also compare Grail to GraphLab and Giraph, and examine the performance implications of using Grail, showing that the RDBMS engine is competitive to these specialized engines. Given that RDBMSs are ubiquitous in enterprise settings, and have a robust and mature technol-ogy that has been hardened over decades, and are part of existing administrative methods in place, we argue that it is time to reconsider if specialized graph engines have a role to play in most enterprises. 1.
(Show Context)

Citation Context

...g graph analytic queries. We note that even this area of designing APIs and/or languages for graph analytics is an active area of research, with a number of recent Datalog-inspired proposals, such as =-=[16, 18, 7]-=-. But these previous efforts largely end up building a new data processing platform. We also note that in the future, it may make sense to generate other APIs/languages like Grail to map to SQL engine...

Keys for Graphs

by Wenfei Fan , Zhe Fan , Chao Tian , Xin Luna Dong
"... ABSTRACT Keys for graphs aim to uniquely identify entities represented by vertices in a graph. We propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. Extending conventional keys for relations and XML, these keys find applic ..."
Abstract - Add to MetaCart
ABSTRACT Keys for graphs aim to uniquely identify entities represented by vertices in a graph. We propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. Extending conventional keys for relations and XML, these keys find applications in object identification, knowledge fusion and social network reconciliation. As an application, we study the entity matching problem that, given a graph G and a set Σ of keys, is to find all pairs of entities (vertices) in G that are identified by keys in Σ. We show that the problem is intractable, and cannot be parallelized in logarithmic rounds. Nonetheless, we provide two parallel scalable algorithms for entity matching, in MapReduce and a vertex-centric asynchronous model. Using real-life and synthetic data, we experimentally verify the effectiveness and scalability of the algorithms.
(Show Context)

Citation Context

...ods. Our algorithms differ from previous ones in the following. (a) Entity matching is far more intriguing than conventional subgraph isomorphism, and the prior algorithms [22, 26, 35, 38] cannot be applied to entity matching. (b) For the same reasons, entity matching is more involved than record matching of [7,27,32,36] to identify tuples in relations, and than the task of [25] that does not enforce topological constraints in the matching process. (c) We propose optimization strategies that have not been studied before. Related to this work are also parallel algorithms for evaluating datalog [4, 37]. However, entity matching with keys requires to identify bijective functions for subgraph isomorphism, which are more challenging to compute. Worse still, we show that entity linking does not have PFP [4], and is harder to be parallelized than, e.g., transitive closures. 2. SPECIFYING KEYS WITH GRAPH PATTERNS In this section, we formally define keys for graphs. 2.1 Graphs and Graph Pattern Matching We start with graphs, patterns and pattern matching. “Anthology 2”“1996” release year recorded by alb1 name of “The Beatles” “John Farnham” G1: G2: “AT&T” “SBC” “1997” parent of name of alb2 alb3 a...

GRAPHiQL: A Graph Intuitive Query Language for Relational Databases

by Alekh Jindal, Samuel Madden
"... Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environ ..."
Abstract - Add to MetaCart
Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environment. However, SQL, the query language for relational databases, makes it difficult to express graph analytics operations. This is because SQL requires programmers to think in terms of tables and joins, rather than the more natural representation of graphs as collections of nodes and edges. As a result, even relatively simple graph operations can require very complex SQL queries. In this paper, we present GRAPHiQL, an intuitive query language for graph analytics, which allows developers to reason in terms of nodes and edges. GRAPHiQL provides key graph constructs such as looping, recursion, and neighborhood operations. At runtime, GRAPHiQL compiles graph programs into efficient SQL queries that can run on any relational database. We demonstrate the applicability of GRAPHiQL on several applications and compare the performance of GRAPHiQL queries with those of Apache Giraph (a popular ‘vertex centric ’ graph programming language). I.
(Show Context)

Citation Context

...nd its extensions [4], [5], [6], GPS [7], Trinity [8], GRACE [9], [10], Pregelix [11]; neighborhood-centric systems, e.g. Giraph++ [12], NScale [13], [14]; datalog-based systems, e.g. Socialite [15], =-=[16]-=-, GrDB [17], [18]; SPARQL-based systems, e.g. G-SPARQL [19]; RDF stores, e.g. Jena [20] and AllegroGraph [21]; key-value stores, e.g. Neo4j [22], HypergraphDB [23]; and others such as TAO [24] and Flo...

†Parallel Computing Lab, Intel Labs

by unknown authors
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract - Add to MetaCart
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.
(Show Context)

Citation Context

...This difficulty has motivated the rise of numerous in-memory frameworks to help improve productivity and performance on graph computations such as GraphLab, Giraph, CombBLAS/KDT, SociaLite and Galois =-=[8, 11, 21, 26, 31]-=-. These frameworks are aimed at computations varying from classical graph traversals to graph statistics calculations such as triangle counting to complex machine learning tasks like collaborative fil...

†Parallel Computing Lab, Intel Labs

by unknown authors
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract - Add to MetaCart
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves Vs programming model abstractions Vs the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use. 1.
(Show Context)

Citation Context

...This difficulty has motivated the rise of numerous in-memory frameworks to help improve productivity and performance on graph computations such as GraphLab, Giraph, CombBLAS/KDT, SociaLite and Galois =-=[8, 11, 21, 26, 31]-=-. These frameworks are aimed at computations varying from classical graph traversal algorithms to graph statistics calculations such as triangle counting to complex machine learning algorithms like co...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University