Results 1 - 10
of
26
Ceph: A scalable, high-performance distributed file system
- In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI
, 2006
"... We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous an ..."
Abstract
-
Cited by 112 (21 self)
- Add to MetaCart
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata management by replacing allocation tables with a pseudo-random data distribution function (CRUSH) designed for heterogeneous and dynamic clusters of unreliable object storage devices (OSDs). We leverage device intelligence by distributing data replication, failure detection and recovery to semi-autonomous OSDs running a specialized local object file system. A dynamic distributed metadata cluster provides extremely efficient metadata management and seamlessly adapts to a wide range of general purpose and scientific computing file system workloads. Performance measurements under a variety of workloads show that Ceph has excellent I/O performance and scalable metadata management, supporting more than 250,000 metadata operations per second. 1
Dynamic Metadata Management for Petabyte-scale File Systems
"... In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to effi ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.
CRUSH: Controlled, scalable, decentralized placement of replicated data
- In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06
, 2006
"... Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while f ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of userdefined policies that enforce separation of replicas across failure domains. 1
OBFS: A File System for Object-based Storage Devices
- IN PROCEEDINGS OF THE 21ST IEEE / 12TH NASA GODDARD CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES, COLLEGE PARK, MD
, 2004
"... The object-based storage model, in which files are made up of one or more data objects stored on self-contained Object-Based Storage Devices (OSDs), is emerging as an architecture for distributed storage systems. The workload presented to the OSDs will be quite different from that of generalpurpose ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
The object-based storage model, in which files are made up of one or more data objects stored on self-contained Object-Based Storage Devices (OSDs), is emerging as an architecture for distributed storage systems. The workload presented to the OSDs will be quite different from that of generalpurpose file systems, yet many distributed file systems employ general-purpose file systems as their underlying file system. We present OBFS, a small and highly efficient file system designed for use in OSDs. Our experiments show that our user-level implementation of OBFS outperforms Linux Ext2 and Ext3 by a factor of two or three, and while OBFS is 1/25 the size of XFS, it provides only slightly lower read performance and 10%--40% higher write performance.
Efficient Metadata Management in Large Distributed Storage Systems
, 2003
"... Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems. Directory subtree partitioning and pure hashing are two common techniques used for managing metadata in such systems, but both suffer from bottlenecks at very high concurrent access rates. We present a new approach called Lazy Hybrid (LH) metadata management that combines the best aspects of these two approaches while avoiding their shortcomings.
Towards an Object Store
- In Proceedings of the 20th IEEE / 11th NASA Goddard Conference on Mass Storage Systems and Technologies
, 2003
"... Today’s SAN architectures promise unmediated host access to storage (i.e., without going through a server). To achieve this promise, however, we must address several issues and opportunities raised by SANs, including security, scalability and management. Object storage, such as introduced by the NAS ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Today’s SAN architectures promise unmediated host access to storage (i.e., without going through a server). To achieve this promise, however, we must address several issues and opportunities raised by SANs, including security, scalability and management. Object storage, such as introduced by the NASD work [14], is a means of addressing these issues and opportunities. An object store raises the level of abstraction presented by a storage control unit from an array of 512 byte blocks to a collection of objects. The object store provides “fine-grain, ” object-level security, improved scalability by localizing space management, and improved management by allowing end-to-end management of semantically meaningful entities. This paper presents a detailed description of how an object store works and describes the design of Antara, our prototype object store. For a cache hit workload, our pure software prototype is able to service roughly 14000 4K I/O requests per second. We also present a layered security model for an object store which separates concerns of access security and network security, leveraging existing security infrastructure. 1.
Object storage: The future building block for storage systems
- In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia
, 2005
"... The concept of object storage was introduced in the early 1990’s by the research community. Since then it has greatly matured and is now in its early stages of adoption by the industry. Yet, object storage is still not widely accepted. Viewing object store technology as the future building block, pa ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The concept of object storage was introduced in the early 1990’s by the research community. Since then it has greatly matured and is now in its early stages of adoption by the industry. Yet, object storage is still not widely accepted. Viewing object store technology as the future building block, particularly for large storage systems, our team in IBM Haifa Research Lab has invested substantial efforts in this area. In this position paper we survey the latest developments in the area of object store technology, focusing on standardization, research prototypes, and technology adoption and deployment. A major step has been the approval of the T10 OSD protocol (version 1) as an OSD standard in late 2004. We also report on prototyping efforts that are carried out in IBM Haifa Research Lab in building an object store. Our latest prototype is compliant with a large subset of the T10 standard. To facilitate deployment of the new technology and protocol in the community at large, our team also implemented a T10-compliant OSD (iSCSI) initiator for Linux. The initiator is interoperable with object disks of other vendors. The initiator will be available as an open source driver for Linux. 1. Object Store in a Nutshell An object store (ObS) or object storage device (OSD) enables the creation of self-managed, shared and secure storage for storage networks. This moves lower-level functionalities such as space management into the storage device itself, accessing the device through a standard object interface [12]. An object store (ObS) raises the level of abstraction presented by today’s block devices. Instead of presenting the abstraction of a logical array of unrelated blocks, addressed
Scalable security for large, high performance storage systems
- In Proceedings of the 2006 ACM Workshop on Storage Security and Survivability. ACM
, 2006
"... New designs for petabyte-scale storage systems are now capable of transferring hundreds of gigabytes of data per second, but lack strong security. We propose a scalable and efficient protocol for security in high performance, objectbased storage systems that reduces protocol overhead and eliminates ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
New designs for petabyte-scale storage systems are now capable of transferring hundreds of gigabytes of data per second, but lack strong security. We propose a scalable and efficient protocol for security in high performance, objectbased storage systems that reduces protocol overhead and eliminates bottlenecks, thus increasing performance without sacrificing security primitives. Our protocol enforces security using cryptographically secure capabilities, with three novel features that make them ideal for high performance workloads: a scheme for managing coarse grained capabilities, methods for describing client and file groups, and strict security control through capability lifetime extensions. By reducing the number of unique capabilities that must be generated, metadata server load is reduced. Combining and caching client verifications reduces client latencies and workload because metadata and data requests are more frequently serviced by cached capabilities. Strict access control is handled quickly and efficiently through short-lived capabilities and lifetime extensions. We have implemented a prototype of our security protocol and evaluated its performance and scalability using a high performance file system workload. Our numbers demonstrate the ability of our protocol to drastically reduce client security latency to nearly zero. Additionally, our approach improves MDS performance considerably, serving over 99% of all file access requests with cached capabilities. OSD scalability is greatly improved; our solution requires 95 times fewer capability verifications than previous solutions.
Leveraging Intra-object Locality with EBOFS
"... The current and coming generations of large distributed file systems stripe data across large numbers of objectbased storage devices (OSDs). Subsequently, individual OSD workloads tend to exhibit no inter-object locality of reference. Small object sizes reduce OSD efficiency due to disk seek overhea ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The current and coming generations of large distributed file systems stripe data across large numbers of objectbased storage devices (OSDs). Subsequently, individual OSD workloads tend to exhibit no inter-object locality of reference. Small object sizes reduce OSD efficiency due to disk seek overheads. EBOFS, an extent and B+tree based object file system, allows arbitrarily sized objects and preserves intra-object locality of reference by allocating data contiguously on disk, and maintains high levels of contiguity even over the entire lifetime of a disk’s file system, allowing OSDs to operate more efficiently and distributed file systems to maximize performance.
Handling Heterogeneity in Shared-Disk File Systems
- IN PROCEEDINGS OF THE 2003 ACM/IEEE CONFERENCE ON SUPERCOMPUTING (SC ’03
, 2003
"... We develop and evaluate a system for load management in shared-disk file systems built on clusters of heterogeneous computers. The system generalizes load balancing and server provisioning. It balances file metadata workload by moving file sets among cluster server nodes. It also responds to changi ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We develop and evaluate a system for load management in shared-disk file systems built on clusters of heterogeneous computers. The system generalizes load balancing and server provisioning. It balances file metadata workload by moving file sets among cluster server nodes. It also responds to changing server resources that arise from failure and recovery and dynamically adding or removing servers. The system is adaptive and self-managing. It operates without any a-priori knowledge of workload properties or the capabilities of the servers. Rather, it continuously tunes load placement using a technique called adaptive, non-uniform (ANU) randomization. ANU randomization realizes the scalability and metadata reduction benefits of hash-based, randomized placement techniques. It also avoids hashing's drawbacks: load skew, inability to cope with heterogeneity, and lack of tunability. Simulation results show that our load-management algorithm performs comparably to a prescient algorithm.

