Results 1 -
5 of
5
The case against specialized graph analytics engines
"... Graph analytic processing has started to become a nearly ubiquitous component in the enterprise data analytics ecosys-tem. In response to this growing need, various specialized graph processing engines have been created in recent years. Sadly, the use of relational database management systems (RDBMS ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Graph analytic processing has started to become a nearly ubiquitous component in the enterprise data analytics ecosys-tem. In response to this growing need, various specialized graph processing engines have been created in recent years. Sadly, the use of relational database management systems (RDBMSs) for graph processing is largely ignored in most enterprise settings. This oversight is surprising since in most enterprise settings, RDBMSs are already present and used for a variety of other analytic tasks. This situation then begs the question of whether the use of RDBMS for graph pro-cessing is fundamentally lacking in some respect compared to the specialized graph processing engines. In this paper, we aim to address this question both from the programmer productivity perspective and from the performance perspec-tive. We present Grail – a syntactic layer for querying graph in a vertex-centric way in an RDBMS, which can be com-piled to translate graph queries to SQL. In a single node setting, we also compare Grail to GraphLab and Giraph, and examine the performance implications of using Grail, showing that the RDBMS engine is competitive to these specialized engines. Given that RDBMSs are ubiquitous in enterprise settings, and have a robust and mature technol-ogy that has been hardened over decades, and are part of existing administrative methods in place, we argue that it is time to reconsider if specialized graph engines have a role to play in most enterprises. 1.
Keys for Graphs
"... ABSTRACT Keys for graphs aim to uniquely identify entities represented by vertices in a graph. We propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. Extending conventional keys for relations and XML, these keys find applic ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Keys for graphs aim to uniquely identify entities represented by vertices in a graph. We propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. Extending conventional keys for relations and XML, these keys find applications in object identification, knowledge fusion and social network reconciliation. As an application, we study the entity matching problem that, given a graph G and a set Σ of keys, is to find all pairs of entities (vertices) in G that are identified by keys in Σ. We show that the problem is intractable, and cannot be parallelized in logarithmic rounds. Nonetheless, we provide two parallel scalable algorithms for entity matching, in MapReduce and a vertex-centric asynchronous model. Using real-life and synthetic data, we experimentally verify the effectiveness and scalability of the algorithms.
GRAPHiQL: A Graph Intuitive Query Language for Relational Databases
"... Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environ ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environment. However, SQL, the query language for relational databases, makes it difficult to express graph analytics operations. This is because SQL requires programmers to think in terms of tables and joins, rather than the more natural representation of graphs as collections of nodes and edges. As a result, even relatively simple graph operations can require very complex SQL queries. In this paper, we present GRAPHiQL, an intuitive query language for graph analytics, which allows developers to reason in terms of nodes and edges. GRAPHiQL provides key graph constructs such as looping, recursion, and neighborhood operations. At runtime, GRAPHiQL compiles graph programs into efficient SQL queries that can run on any relational database. We demonstrate the applicability of GRAPHiQL on several applications and compare the performance of GRAPHiQL queries with those of Apache Giraph (a popular ‘vertex centric ’ graph programming language). I.
†Parallel Computing Lab, Intel Labs
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract
- Add to MetaCart
(Show Context)
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.
†Parallel Computing Lab, Intel Labs
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract
- Add to MetaCart
(Show Context)
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves Vs programming model abstractions Vs the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use. 1.