Results 1 - 10
of
10
Declarative Networking
, 2009
"... Declarative Networking is a programming methodology that enables developers to concisely specify network protocols and services, which are directly compiled to a dataflow framework that executes the specifications. This paper provides an introduction to basic issues in declarative networking, includ ..."
Abstract
-
Cited by 76 (31 self)
- Add to MetaCart
Declarative Networking is a programming methodology that enables developers to concisely specify network protocols and services, which are directly compiled to a dataflow framework that executes the specifications. This paper provides an introduction to basic issues in declarative networking, including language design, optimization and dataflow execution. We present the intuition behind declarative programming of networks, including roots in Datalog, extensions for networked environments, and the semantics of long-running queries over network state. We focus on a sublanguage we call Network Datalog (NDlog), including execution strategies that provide crisp eventual consistency semantics with significant flexibility in execution. We also describe a more general language called Overlog, which makes some compromises between expressive richness and semantic guarantees. We provide an overview of declarative network protocols, with a focus on routing protocols and overlay networks. Finally, we highlight related work in declarative networking, and new declarative approaches to related problems.
The Declarative Imperative Experiences and Conjectures in Distributed Logic
"... The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solutions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in ..."
Abstract
-
Cited by 50 (5 self)
- Add to MetaCart
(Show Context)
The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solutions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in computing history, in part because these languages parallelize naturally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foundation for the next generation of parallel and distributed programming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1.
Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database
"... Abstract — In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be re ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of MapReduce, in which map or reduce jobs are greedily assigned to workers, and failed jobs are rerun on other workers. Osprey is implemented using a middleware approach, with only a small amount of custom code to handle cluster coordination. Each node in the system is a discrete database system running on a separate machine. Data, in the form of tables, is partitioned amongst database nodes and each partition is replicated on several nodes, using a technique called chained declustering [1]. A coordinator machine acts as a standard SQL interface to users; it transforms an input SQL query into a set of subqueries that are then executed on the nodes. Each subquery represents only a small fraction of the total execution of the query; worker nodes are assigned a new subquery as they finish their current one. In this greedy-approach, the amount of work lost due to node failure is small (at most one subquery’s work), and the system is automatically load balanced, because slow nodes will be assigned fewer subqueries. We demonstrate Osprey’s viability as a distributed system for a small data warehouse data set and workload. Our experiments show that the overhead introduced by the middleware is small compared to the workload, and that the system shows promising load balancing and fault tolerance properties. I.
Fast Checkpoint Recovery Algorithms for Frequently Consistent Applications
"... Advances in hardware have enabled many long-running applications to execute entirely in main memory. As a result, these applications have increasingly turned to database techniques to ensure durability in the event of a crash. However, many of these applications, such as massively multiplayer online ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Advances in hardware have enabled many long-running applications to execute entirely in main memory. As a result, these applications have increasingly turned to database techniques to ensure durability in the event of a crash. However, many of these applications, such as massively multiplayer online games and mainmemory OLTP systems, must sustain extremely high update rates – often hundreds of thousands of updates per second. Providing durability for these applications without introducing excessive overhead or latency spikes remains a challenge for application developers. In this paper, we take advantage of frequent points of consistency in many of these applications to develop novel checkpoint recovery algorithms that trade additional space in main memory for significantly lower overhead and latency. Compared to previous work, our new algorithms do not require any locking or bulk copies of the application state. Our experimental evaluation shows that one of our new algorithms attains nearly constant latency and reduces overhead by more than an order of magnitude for low to medium update rates. Additionally, in a heavily loaded main-memory transaction processing system, it still reduces overhead by more than a factor of two.
Cogset: A High-Performance MapReduce Engine
, 2011
"... MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mech-anism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mech-anism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store temporary copies of intermediate data, but requires a tighter coupling between the components for storage and processing. The initial intuition motivating our work is that reading and writing less temporary data could improve performance, while the tight coupling of storage and processing could be leveraged to improve data locality. We therefore conjecture that a high-performance MapReduce engine can be based on static routing, while preserving the non-functional properties associated with traditional engines. To investigate this thesis, we design, implement, and experiment with Cogset, a distributed MapReduce engine that deviates considerably from the traditional design. We evaluate the performance of Cogset by comparing it to a widely used traditional MapRe-duce engine using a previously established benchmark. The results confirm our thesis that a high-performance MapReduce engine can be based on static routing, although analysis indi-
CDF Nectar: Automatic Management of Data and Computation in Data Centers
"... Managing data and computation is at the heart of data center computing. Manual management of data can lead to data loss, wasteful consumption of storage, and laborious bookkeeping. Lack of proper management of computation can result in lost opportunities to share common computations across multiple ..."
Abstract
- Add to MetaCart
(Show Context)
Managing data and computation is at the heart of data center computing. Manual management of data can lead to data loss, wasteful consumption of storage, and laborious bookkeeping. Lack of proper management of computation can result in lost opportunities to share common computations across multiple jobs or to compute results incrementally. Nectar is a system designed to address all the aforementioned problems. Nectar uses a novel approach that automates and unifies the management of data and computation in a data center. With Nectar, the results of a computation, called derived datasets, are uniquely identified by the program that computes it, and together with the program are automatically managed by a data center wide caching service. All computations and uses of derived datasets are controlled by the system. The system automatically regenerates a derived dataset from its program if it is determined missing. Nectar greatly improves data center management and resource utilization: obsolete or infrequently used derived datasets are automatically garbage collected, and shared common computations are computed only once and reused by others. This paper describes the design and implementation of Nectar, and reports our evaluation of the system using both analysis of actual logs from a number of production clusters and an actual deployment on a 240-node cluster. 1
Under consideration for publication in Theory and Practice of Logic Programming 1 Applying Prolog to Develop Distributed Systems
, 2003
"... Development of distributed systems is a difficult task. Declarative programming techniques hold a promising potential for effectively supporting programmer in this challenge. While Datalog-based languages have been actively explored for programming distributed systems, Prolog received relatively lit ..."
Abstract
- Add to MetaCart
(Show Context)
Development of distributed systems is a difficult task. Declarative programming techniques hold a promising potential for effectively supporting programmer in this challenge. While Datalog-based languages have been actively explored for programming distributed systems, Prolog received relatively little attention in this application area so far. In this paper, we investigate the applicability of a Prolog-based programming system, called DAHL, for the declarative development of distributed systems. For this task, we extend Prolog with an event-driven control mechanism and built-in networking procedures. Our experimental evaluation using a distributed hashtable data structure, a protocol for achieving Byzantine fault tolerance, and a distributed software model checker – all implemented in DAHL – indicates the viability of the approach. 1
Introduction to Linguistic Annotation andText Analytics
, 2009
"... withMapReduce ..."
(Show Context)