Results 1 - 10
of
25
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
"... DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by s ..."
Abstract
-
Cited by 273 (27 self)
- Add to MetaCart
(Show Context)
DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by supporting general-purpose imperative and declarative operations on datasets within a traditional high-level programming language. A DryadLINQ program is a sequential program composed of LINQ expressions performing arbitrary sideeffect-free transformations on datasets, and can be written and debugged using standard.NET development tools. The DryadLINQ system automatically and transparently translates the data-parallel portions of the program into a distributed execution plan which is passed to the Dryad execution platform. Dryad, which has been in continuous operation for several years on production clusters made up of thousands of computers, ensures efficient, reliable execution of this plan. We describe the implementation of the DryadLINQ compiler and runtime. We evaluate DryadLINQ on a varied set of programs drawn from domains such as web-graph analysis, large-scale log mining, and machine learning. We show that excellent absolute performance can be attained—a general-purpose sort of 1012 Bytes of data executes in 319 seconds on a 240-computer, 960disk cluster—as well as demonstrating near-linear scaling of execution time on representative applications as we vary the number of computers used for a job. 1
Distributed data-parallel computing using a highlevel programming language
- In SIGMOD’09: Proc. of the 35th SIGMOD Intnl. Conf. on Mgmt. of Data
, 2009
"... The Dryad and DryadLINQ systems offer a new programming model for large scale data-parallel computing. They generalize previous execution environments such as SQL and MapReduce in three ways: by providing a general-purpose distributed execution engine for data-parallel applications; by adopting an e ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
(Show Context)
The Dryad and DryadLINQ systems offer a new programming model for large scale data-parallel computing. They generalize previous execution environments such as SQL and MapReduce in three ways: by providing a general-purpose distributed execution engine for data-parallel applications; by adopting an expressive data model of strongly typed.NET objects; and by supporting general-purpose imperative and declarative operations on datasets within a traditional high-level programming language. A DryadLINQ program is a sequential program composed of LINQ expressions performing arbitrary side-effect-free operations on datasets, and can be written and debugged using standard.NET development tools. The DryadLINQ system automatically and transparently translates the data-parallel portions of the program into a distributed execution plan which is passed to the Dryad execution platform. Dryad, which has been in continuous operation for several years on production clusters made up of thousands of computers, ensures efficient, reliable execution of this plan on a large compute cluster. This paper describes the programming model, provides a high-level overview of the design and implementation of the Dryad and DryadLINQ systems, and discusses the tradeoffs and connections to parallel and distributed databases.
The Hadoop Distributed Filesystem: Balancing Portability and Performance
"... Abstract—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem — HDFS — is written in Java and designed for portability across heterogeneous hard ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem — HDFS — is written in Java and designed for portability across heterogeneous hardware and software platforms. This paper analyzes the performance of HDFS and uncovers several performance issues. First, architectural bottlenecks exist in the Hadoop implementation that result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Second, portability limitations prevent the Java implementation from exploiting features of the native platform. Third, HDFS implicitly makes portability assumptions about how the native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. This paper investigates the root causes of these performance bottlenecks in order to evaluate tradeoffs between portability and performance in the Hadoop distributed filesystem. I.
Tree Indexing on Solid State Drives
, 2010
"... Large flash disks, or solid state drives (SSDs), have become an attractive alternative to magnetic hard disks, due to their high random read performance, low energy consumption and other features. However, writes, especially small random writes, on flash disks are inherently much slower than reads b ..."
Abstract
-
Cited by 25 (5 self)
- Add to MetaCart
Large flash disks, or solid state drives (SSDs), have become an attractive alternative to magnetic hard disks, due to their high random read performance, low energy consumption and other features. However, writes, especially small random writes, on flash disks are inherently much slower than reads because of the erase-beforewrite mechanism. To address this asymmetry of read-write speeds in tree indexing on the flash disk, we propose FD-tree, a tree index designed with the logarithmic method and fractional cascading techniques. With the logarithmic method, an FD-tree consists of the head tree – a small B+-tree on the top, and a few levels of sorted runs of increasing sizes at the bottom. This design is write-optimized for the flash disk; in particular, an index search will potentially go through more levels or visit more nodes, but random writes are limited to a small area – the head tree, and are subsequently transformed into sequential ones through merging into the lower runs. With the fractional cascading technique, we store pointers, called fences, in lower level runs to speed up the search. Given an FD-tree of n entries, we analytically show that it performs an update in O(log B n) sequential I/Os and completes a search in O(log B n) random I/Os, where B is the flash page size. We evaluate FD-tree in comparison with representative B+-tree variants under a variety of workloads on three commodity flash SSDs. Our results show that FD-tree has a similar search performance to the standard B+-tree, and a similar update performance to the write-optimized B+-tree variant. As a result, FD-tree dominates the other B+-tree index variants on the overall performance on flash disks as well as on magnetic disks.
Unbundling Transaction Services in the Cloud
"... The traditional architecture for a DBMS engine has the recovery, concurrency control and access method code tightly bound together in a storage engine for records. We propose a different approach, where the storage engine is factored into two layers (each of which might have multiple heterogeneous i ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
The traditional architecture for a DBMS engine has the recovery, concurrency control and access method code tightly bound together in a storage engine for records. We propose a different approach, where the storage engine is factored into two layers (each of which might have multiple heterogeneous instances). A Transactional Component (TC) works at a logical level only: it knows about transactions and their ―logical ‖ concurrency control and undo/redo recovery, but it does not know about page layout, B-trees etc. A Data Component (DC) knows about the physical storage structure. It supports a record oriented interface that provides atomic operations, but it does not know about transactions. Providing atomic record operations may itself involve DC-local concurrency control and recovery, which can be implemented using system transactions. The interaction of the mechanisms in TC and DC leads to multi-level redo (unlike the repeat history paradigm for redo in integrated engines). This refactoring of the system architecture could allow easier deployment of application-specific physical structures and may also be helpful to exploit multi-core hardware. Particularly promising is its potential to enable flexible transactions in cloud database deployments. We describe the necessary principles for unbundled recovery, and discuss implementation issues.
A scalable lock manager for multicores
- In SIGMOD
, 2013
"... Modern implementations of DBMS software are intended to take advantage of high core counts that are becoming common in high-end servers. However, we have observed that several database platforms, including MySQL, Shore-MT, and a commercial system, exhibit throughput collapse as load increases, even ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Modern implementations of DBMS software are intended to take advantage of high core counts that are becoming common in high-end servers. However, we have observed that several database platforms, including MySQL, Shore-MT, and a commercial system, exhibit throughput collapse as load increases, even for a workload with little or no logical contention for locks. Our analysis of MySQL identifies latch contention within the lock manager as the bottleneck responsible for this collapse. We design a lock manager with reduced latching, implement it in MySQL, and show that it avoids the collapse and generally improves performance. Our efficient implementation of a lock manager is enabled by a staged allocation and de-allocation of locks. Locks are pre-allocated in bulk, so that the lock manager only has to perform simple list-manipulation operations during the acquire and release phases of a transaction. De-allocation of the lock datastructures is also performed in bulk, which enables the use of fast implementations of lock acquisition and release, as well as concurrent deadlock checking.
A.: H2O: A Hands-free Adaptive Store
- SIGMOD
"... Modern state-of-the-art database systems are designed around a single data storage layout. This is a fixed decision that drives the whole architectural design of a database system, i.e., row-stores, column-stores. However, none of those choices is a universally good solution; different workloads req ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Modern state-of-the-art database systems are designed around a single data storage layout. This is a fixed decision that drives the whole architectural design of a database system, i.e., row-stores, column-stores. However, none of those choices is a universally good solution; different workloads require different storage layouts and data access methods in order to achieve good performance. In this paper, we present the H2O system which introduces two novel concepts. First, it is flexible to support multiple storage layouts and data access patterns in a single engine. Second, and most importantly, it decides on-the-fly, i.e., during query process-ing, which design is best for classes of queries and the respective data parts. At any given point in time, parts of the data might be materialized in various patterns purely depending on the query workload; as the workload changes and with every single query, the storage and access patterns continuously adapt. In this way, H2O makes no a priori and fixed decisions on how data should be stored, allowing each single query to enjoy a storage and access pattern which is tailored to its specific properties. We present a detailed analysis of H2O using both synthetic bench-marks and realistic scientific workloads. We demonstrate that while existing systems cannot achieve maximum performance across all workloads, H2O can always match the best case performance with-out requiring any tuning or workload knowledge.
Quantifying Isolation Anomalies
"... Choosing a weak isolation level such as Read Committed is understood as a trade-off, where less isolation means that higher performance is gained but there is an increased possibility that data integrity will be lost. Previously, one side of this trade-off has been carefully studied quantitatively – ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Choosing a weak isolation level such as Read Committed is understood as a trade-off, where less isolation means that higher performance is gained but there is an increased possibility that data integrity will be lost. Previously, one side of this trade-off has been carefully studied quantitatively – there are well-known metrics for performance such as transactions per minute, standardized benchmarks that measure these in a controlled way, and analytic models that can predict how performance is influenced by system parameters like multiprogramming level. This paper contributes to quantifying the other aspect of the trade-off. We define a novel microbenchmark that measures how rapidly integrity violations are produced at different isolation levels, for a simple set of transactions. We explore how this rate is impacted by configuration factors such as multiprogramming level, or contention frequency. For the isolation levels in multi-version platforms (Snapshot Isolation and the multiversion variant of Read Committed), we offer a simple probabilistic model that predicts the rate of integrity violations in our microbenchmark from configuration parameters. We validate the predictive model against measurements from the microbenchmark. The model identifies a region of the configuration space where a surprising inversion occurs: for these parameter settings, more integrity violations happen with Snapshot Isolation than with multi-version Read Committed, even though the latter is considered a lower isolation level. 1.
Modularity and Scalability in Calvin
"... Calvin is a transaction scheduling and replication management layer for distributed storage systems. By first writing transaction requests to a durable, replicated log, and then using a concurrency control mechanism that emulates a deterministic serial execution of the log’s transaction requests, Ca ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Calvin is a transaction scheduling and replication management layer for distributed storage systems. By first writing transaction requests to a durable, replicated log, and then using a concurrency control mechanism that emulates a deterministic serial execution of the log’s transaction requests, Calvin supports strongly consistent replication and fully ACID distributed transactions while incurring significantly lower inter-partition transaction coordination costs than traditional distributed database systems. Furthermore, Calvin’s declarative specification of target concurrency-control behavior allows system components to avoid interacting with actual transaction scheduling mechanisms—whereas in traditional DBMSs, the analogous components often have to explicitly observe concurrency control modules’ (highly nondeterministic) procedural behaviors in order to function correctly. 1
PMAX: Tenant Placement in Multitenant Databases for
"... There has been a great interest in exploiting the cloud as a platform for database as a service. As with other cloud-based services, database services may enjoy cost efficiency through consolidation: hosting multiple databases within a single physical server. Aggressive consolidation, however, may h ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
There has been a great interest in exploiting the cloud as a platform for database as a service. As with other cloud-based services, database services may enjoy cost efficiency through consolidation: hosting multiple databases within a single physical server. Aggressive consolidation, however, may hurt the service quality, leading to SLA violation penalty, which in turn reduces the total business profit, called SLA profit. In this paper, we consider the problem of tenant placement in the cloud for SLA profit maximization, which, as will be shown in the paper, is strongly NP-hard. We propose SLA profit-aware solutions for database tenant placement based on our model for expected penalty computation for multitenant servers. Specifically, we present two approximation algorithms, which have constant approximation ratios, and we further discuss improving the quality of tenant placement using a dynamic programming algorithm. Extensive experiments based on TPC-W workload verified the performance of the proposed approaches.