Results 1 -
9 of
9
Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis
"... Large-scale graph analysis is becoming important with the rise of world-wide social network services. Recently in SociaLite, we proposed extensions to Datalog to efficiently and succinctly implement graph analysis programs on sequential machines. This paper describes novel extensions and optimizatio ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Large-scale graph analysis is becoming important with the rise of world-wide social network services. Recently in SociaLite, we proposed extensions to Datalog to efficiently and succinctly implement graph analysis programs on sequential machines. This paper describes novel extensions and optimizations of SociaLite for parallel and distributed executions to support large-scale graph analysis. With distributed SociaLite, programmers simply annotate how data are to be distributed, then the necessary communication is automatically inferred to generate parallel code for cluster of multi-core machines. It optimizes the evaluation of recursive monotone aggregate functions using a delta stepping technique. In addition, approximate computation is supported in SociaLite, allowing programmers to trade off accuracy for less time and space. We evaluated SociaLite with six core graph algorithms used in many social network analyses. Our experiment with 64 Amazon EC2 8-core instances shows that SociaLite programs performed within a factor of two with respect to ideal weak scaling. Compared to optimized Giraph, an opensource alternative of Pregel, SociaLite programs are 4 to 12 times faster across benchmark algorithms, and 22 times more succinct on average. As a declarative query language, SociaLite, with the help of a compiler that generates efficient parallel and approximate code, can be used easily to create many social apps that operate on large-scale distributed graphs. 1.
NScale: Neighborhoodcentric Large-Scale Graph Analytics
- in the Cloud,” http://arxiv.org/abs/1405.1499
, 2014
"... There is an increasing interest in executing rich and complex analy-sis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego net-work analysis, motif counting, findi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
There is an increasing interest in executing rich and complex analy-sis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego net-work analysis, motif counting, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing in-fluence cascades, and so on. These tasks are not well served by the existing vertex-centric graph processing frameworks, whose com-putation and execution models limit the user program to directly access the state of a single vertex; this results in high communi-cation, scheduling, and memory overheads in executing such tasks using those frameworks. Further, most existing graph processing frameworks typically ignore the challenges in extracting the rele-vant portion of the graph that an analysis task needs, and loading it
Optimizing recursive queries with monotonic aggregates in deals
- In ICDE 2015. IEEE
, 2015
"... Abstract—The exploding demand for analytics has refocused the attention of data scientists on applications requiring aggrega-tion in recursion. After resisting the efforts of researchers for more than twenty years, this problem is being addressed by innovative systems that are raising logic-oriented ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—The exploding demand for analytics has refocused the attention of data scientists on applications requiring aggrega-tion in recursion. After resisting the efforts of researchers for more than twenty years, this problem is being addressed by innovative systems that are raising logic-oriented data languages to the levels of generality and performance that are needed to support efficiently a broad range of applications. Foremost among these new systems, the Deductive Application Language System (DeALS) achieves superior generality and performance via new constructs and optimization techniques for monotonic aggregates which are described in the paper. The use of a special class of monotonic aggregates in recursion was made possible by recent theoretical results that proved that they preserve the rigorous least-fixpoint semantics of core Datalog programs. This paper thus describes how DeALS extends their definitions and modifies their syntax to enable a concise expression of applications that, without them, could not be expressed in performance-conducive ways, or could not be expressed at all. Then the paper turns to the performance issue, and introduces novel implementation and optimization techniques that outperform tra-ditional approaches, including Semi-naive evaluation. An extensive experimental evaluation was executed comparing DeALS with other systems on large datasets. The results suggest that, unlike other systems, DeALS indeed combines superior generality with superior performance. I.
Compiled Plans for In-Memory Path-Counting Queries
, 2013
"... Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation detai ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Dissatisfaction with relational databases for large-scale graph processing has motivated a new class of graph databases that offer fast graph processing but sacrifice the ability to express basic relational idioms. However, we hypothesize that the performance benefits amount to implementation details, not a fundamental limitation of the relational model. To evaluate this hypothesis, we are exploring code-generation to produce fast in-memory algorithms and data structures for graph patterns that are inaccessible to conventional relational optimizers. In this paper, we present preliminary results for this approach on path-counting queries, which includes triangle counting as a special case. We compile Datalog queries into main-memory pipelined hash-join plans in C++, and show that the resulting programs easily outperform PostgreSQL on real graphs with different degrees of skew. We then produce analogous parallel programs for Grappa, a runtime system for distributed memory architectures. Grappa is a good target for building a parallel query system as its shared memory programming model and communication mechanisms provide productivity and performance when building communication-intensive applications. Our experiments suggest that Grappa programs using hash joins have competitive performance with queries executed on Greenplum, a commercial parallel database. We find preliminary evidence that a code generation approach simplifies the design of a query engine for graph analysis and improves performance over conventional relational databases.
GRAPHiQL: A Graph Intuitive Query Language for Relational Databases
"... Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environ ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Graph analytics is becoming increasingly popular, driving many important business applications from social net-work analysis to machine learning. Since most graph data is collected in a relational database, it seems natural to attempt to perform graph analytics within the relational environment. However, SQL, the query language for relational databases, makes it difficult to express graph analytics operations. This is because SQL requires programmers to think in terms of tables and joins, rather than the more natural representation of graphs as collections of nodes and edges. As a result, even relatively simple graph operations can require very complex SQL queries. In this paper, we present GRAPHiQL, an intuitive query language for graph analytics, which allows developers to reason in terms of nodes and edges. GRAPHiQL provides key graph constructs such as looping, recursion, and neighborhood operations. At runtime, GRAPHiQL compiles graph programs into efficient SQL queries that can run on any relational database. We demonstrate the applicability of GRAPHiQL on several applications and compare the performance of GRAPHiQL queries with those of Apache Giraph (a popular ‘vertex centric ’ graph programming language). I.
†Parallel Computing Lab, Intel Labs
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract
- Add to MetaCart
(Show Context)
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.
†Parallel Computing Lab, Intel Labs
"... Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in ..."
Abstract
- Add to MetaCart
(Show Context)
Graph algorithms are becoming increasingly important for analyz-ing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed to-wards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among oth-ers) have been developed each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap " between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users ’ choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for im-proving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get per-formance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves Vs programming model abstractions Vs the framework implemen-tations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific al-gorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use. 1.
Adobe Systems
, 2015
"... Using existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturall ..."
Abstract
- Add to MetaCart
Using existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturally describes the behavior of an entire physical system using the language of linear algebra. However, simulations also manipulate individual geometric elements, which are best represented using linked data structures like meshes. Translating between the linked data structures and linear algebra comes at significant cost, both to the programmer and the machine. High-performance implementations avoid the cost by rephrasing the computation in terms of linked or index data structures, leaving the code complicated and monolithic, often increasing its size by an order of magnitude. In this paper, we present Simit, a new language for physical simulations that lets the programmer view the system both as a linked data structure in the form of a hypergraph, and as a set of global vectors, matrices and tensors
Continuous query processing; Temporal analytics; Dynamic social
"... networks; Incremental computation. ..."
(Show Context)