Results 1 - 10
of
137
MapReduce: Simplified data processing on large clusters.
- In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04),
, 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of ..."
Abstract
-
Cited by 3439 (3 self)
- Add to MetaCart
(Show Context)
Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
The Google File System
- ACM SIGOPS OPERATING SYSTEMS REVIEW
, 2003
"... We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While s ..."
Abstract
-
Cited by 1501 (3 self)
- Add to MetaCart
(Show Context)
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.
Hippodrome: Running Circles around Storage Administration
- In Proceedings of the Conference on File and Storage Technologies
, 2002
"... Enterprise-scale computer storage systems are extremely difficult to manage due to their size and complexity. It is difficult to generate a good storage system design for a given workload and to correctly implement the selected design. Traditionally, initial system configuration is performed by admi ..."
Abstract
-
Cited by 142 (10 self)
- Add to MetaCart
(Show Context)
Enterprise-scale computer storage systems are extremely difficult to manage due to their size and complexity. It is difficult to generate a good storage system design for a given workload and to correctly implement the selected design. Traditionally, initial system configuration is performed by administrators who are guided by rules of thumb. Unfortunately, this process involves trial and error, and as a result is tedious and error-prone. In this paper, we introduce Hippodrome, an approach to automating initial system configuration. Hippodrome is an iterative loop that analyzes an existing system to determine its requirements, creates a new storage system design to better meet these requirements, and migrates the existing system to the new design. In this paper, we show how Hippodrome automates initial system configuration. 1
Interposed Request Routing for Scalable Network Storage
- IN PROCEEDINGS OF THE FOURTH SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION (OSDI
, 2000
"... This paper presents Slice, a new storage system architecture for highspeed LANs incorporating network-attached block storage. Slice interposes a request switching filter -- called a /proxy -- along the network path between the client and the network storage system (e.g., in a network adapter or swit ..."
Abstract
-
Cited by 103 (11 self)
- Add to MetaCart
(Show Context)
This paper presents Slice, a new storage system architecture for highspeed LANs incorporating network-attached block storage. Slice interposes a request switching filter -- called a /proxy -- along the network path between the client and the network storage system (e.g., in a network adapter or switch). The purpose of the/proxy is to route requests among a server ensemble that implements the file service. We present a prototype that uses this approach to virtualize the standard NFS file protocol to provide scalable, high-bandwidth file service to ordinary NFS clients. The paper presents and justifies the architecture, proposes and evaluates several request routing policies realizable within the architecture, and explores the effects of these policies on service structure
The Kangaroo Approach to Data Movement on the Grid
, 2001
"... Access to remote data is one of the principal challenges of grid computing. While performing I/O, grid applications must be prepared for server crashes, performance variations, and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service th ..."
Abstract
-
Cited by 102 (20 self)
- Add to MetaCart
Access to remote data is one of the principal challenges of grid computing. While performing I/O, grid applications must be prepared for server crashes, performance variations, and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.
Adaptive Query Processing: Technology in Evolution
- IEEE DATA ENGINEERING BULLETIN
, 2000
"... As query engines are scaled and federated, they must cope with highly unpredictable and changeable environments. In the Telegraph project, we are attempting to architect and implement a continuously adaptive query engine suitable for global-area systems, massive parallelism, and sensor networks. To ..."
Abstract
-
Cited by 97 (11 self)
- Add to MetaCart
As query engines are scaled and federated, they must cope with highly unpredictable and changeable environments. In the Telegraph project, we are attempting to architect and implement a continuously adaptive query engine suitable for global-area systems, massive parallelism, and sensor networks. To set the stage for our research, we present a survey of prior work on adaptive query processing, focusing on three characterizations of adaptivity: the frequency of adaptivity, the effects of adaptivity, and the extent of adaptivity. Given this survey, we sketch directions for research in the Telegraph project.
Diamond: A storage architecture for early discard in interactive search
, 2004
"... Permission is granted for noncommercial reproduction of the work for educational or research purposes. ..."
Abstract
-
Cited by 66 (21 self)
- Add to MetaCart
(Show Context)
Permission is granted for noncommercial reproduction of the work for educational or research purposes.
Robustness in Complex Systems
- 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII
, 2001
"... This paper argues that a common design paradigm for systems is fundamentally awed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this awed paradigm, designers carefully attempt to predict the operating environment and failure modes of the system in order to ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
(Show Context)
This paper argues that a common design paradigm for systems is fundamentally awed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this awed paradigm, designers carefully attempt to predict the operating environment and failure modes of the system in order to design its basic operational mechanisms. However, as a system grows in complexity, the diuse coupling between the components in the system inevitably leads to the buttery eect, in which small perturbations can result in large changes in behavior. We explore this in the context of distributed data structures, a scalable, cluster-based storage server. We then consider a number of design techniques that help a system to be robust in the face of the unexpected, including overprovisioning, admission control, introspection, adaptivity through closed control loops. Ultimately, however, all complex systems eventually must contend with the unpredictable. Because of this, we believe systems shoul...
Adaptive control of extreme-scale stream processing systems
- In ICDCS 2006
, 2006
"... Abstract — Distributed stream processing systems offer a highly scalable and dynamically configurable platform for time-critical applications ranging from real-time, exploratory data mining to high performance transaction processing. Resource management for distributed stream processing systems is c ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Distributed stream processing systems offer a highly scalable and dynamically configurable platform for time-critical applications ranging from real-time, exploratory data mining to high performance transaction processing. Resource management for distributed stream processing systems is complicated by a number of factors – processing elements are constrained by their producer-consumer relationships, data and processing rates can be highly bursty, and traditional measures of effectiveness, such as utilization, can be misleading. In this paper, we propose a novel distributed, adaptive control algorithm that maximizes weighted throughput while ensuring stable operation in the face of highly bursty workloads. Our algorithm is designed to meet the challenges of extreme-scale stream processing systems, where overprovisioning is not an option, by making the best use of resources even when the proffered load is greater than available resources. We have implemented our algorithm in a real-world distributed stream processing system and a simulation environment. Our results show that our algorithm is not only self-stabilizing and robust to errors, but also outperforms traditional approaches over a broad range of buffer sizes, processing graphs, and burstiness types and levels 1. I.
Bridging the Information Gap in Storage Protocol Stacks
- In Proceedings of the USENIX Annual Technical Conference (USENIX ’02
, 2002
"... The functionality and performance innovations in file systems and storage systems have proceeded largely independently from each other over the past years. The result is an information gap: neither has information about how the other is designed or implemented, which can result in a high cost of mai ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
(Show Context)
The functionality and performance innovations in file systems and storage systems have proceeded largely independently from each other over the past years. The result is an information gap: neither has information about how the other is designed or implemented, which can result in a high cost of maintenance, poor performance, duplication of features, and limitations on functionality. To bridge this gap, we introduce and evaluate a new division of labor between the storage system and the file system. We develop an enhanced storage layer known as Exposed RAID (ERAID), which reveals information to file systems built above; specifically, ERAID exports the parallelism and failure-isolation boundaries of the storage layer, and tracks performance and failure characteristics on a fine-grained basis. To take advantage of the information made available by ERAID, we develop an Informed Log-Structured File System (ILFS). ILFS is an extension of the standard logstructured file system (LFS) that has been altered to take advantage of the performance and failure information exposed by ERAID. Experiments reveal that our prototype implementation yields benefits in the management, flexibility, reliability, and performance of the storage system, with only a small increase in file system complexity. For example, ILFS/ERAID can incorporate new disks into the system on-the-fly, dynamically balance workloads across the disks of the system, allow for user control of file replication, and delay replication of files for increased performance. Much of this functionality would be difficult or impossible to implement with the traditional division of labor between file systems and storage.