Results 1 - 10
of
19
Dynamic Metadata Management for Petabyte-scale File Systems
"... In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to effi ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
In petabyte-scale distributed file systems that decouple read and write from metadata operations, behavior of the metadata server cluster will be critical to overall system performance and scalability. We present a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time. We examine the relative merits of our approach in the context of traditional workload partitioning strategies, and demonstrate the performance, scalability and adaptability advantages in a simulation environment.
An efficient data location protocol for self-organizing storage clusters
- In Proc. of ACM/IEEE SC’03
, 2003
"... Component additions and failures are common for large-scale storage clusters in production environments. To improve availability and manageability, we investigate and compare data location schemes for a large self-organizing storage cluster that can quickly adapt to the additions or departures of st ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Component additions and failures are common for large-scale storage clusters in production environments. To improve availability and manageability, we investigate and compare data location schemes for a large self-organizing storage cluster that can quickly adapt to the additions or departures of storage nodes. We further present an efficient location scheme that differentiates between small and large file blocks for reduced management overhead compared to uniform strategies. In our protocol, small blocks, which are typically in large quantities, are placed through consistent hashing. Large blocks, much fewer in practice, are placed through a usage-based policy, and their locations are tracked by Bloom filters. The proposed scheme results in improved storage utilization even with non-uniform cluster nodes. To achieve high scalability and fault resilience, this protocol is fully distributed, relies only on soft states, and supports data replication. We demonstrate the effectiveness and efficiency of this protocol through trace-driven simulation. 1.
Distributed Speculations: Providing Fault-tolerance and Improving Performance
, 2006
"... c ○ 2006 ..."
Experiences in Building an ObjectBased Storage System based on the OSD T-10
- Standard, Proceedings of the IEEE Conference on Mass Storage Systems and Technologies, MSST, 2006
"... With ever increasing storage demands and management costs, object based storage is on the verge of becoming the next standard storage interface. The American National Standards Institute (ANSI) ratified the object based storage interface standard (also referred to as OSD T-10) in January 2005. In th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
With ever increasing storage demands and management costs, object based storage is on the verge of becoming the next standard storage interface. The American National Standards Institute (ANSI) ratified the object based storage interface standard (also referred to as OSD T-10) in January 2005. In this paper we present our experiences building a reference implementation of the T10 standard based on an initial implementation done at Intel Corporation. Our implementation consists of a file system, object based target and a security manager. To the best of our knowledge, there is no reference implementation suite that is as complete as ours. Efforts are underway to open source our implementation very soon. We also present performance analysis of our implementation and compare it with an iSCSI based SAN and NFS storage configurations. In future, we intend to use this implementation as a platform to explore different forms of storage intelligence. 1.
Sorrento: A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications
"... This paper describes the design and implementation of Sorrento – a self-organizing storage cluster built upon commodity components. Sorrento complements previous researches on distributed file/storage systems by focusing on incremental expandability and manageability of the system and on design choi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper describes the design and implementation of Sorrento – a self-organizing storage cluster built upon commodity components. Sorrento complements previous researches on distributed file/storage systems by focusing on incremental expandability and manageability of the system and on design choices for optimizing performance of parallel data-intensive applications with low write-sharing patterns. Sorrento virtualizes distributed storage devices as incrementally expandable volumes and automatically manages storage node additions and failures. Its consistency model chooses a version-based scheme for data updating and replica management, which is especially suitable for data-intensive applications where distributed processes access disjoint datasets most of the time. To further facilitate parallel I/O, Sorrento provides load-aware or localitydriven data placement and an adaptive migration strategy. This paper presents experimental results to demonstrate features and performance of Sorrento using both microbenchmarks and trace-replay of real applications from several domains, including scientific computing, data mining, and offline processing for web search.
Revisiting the Metadata Architecture of Parallel File Systems
"... Abstract—As the types of problems we solve in highperformance computing and other areas become more complex, the amount of data generated and used is growing at a rapid rate. Today many terabytes of data are common; tomorrow petabytes of data will be the norm. Much work has been put into increasing ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—As the types of problems we solve in highperformance computing and other areas become more complex, the amount of data generated and used is growing at a rapid rate. Today many terabytes of data are common; tomorrow petabytes of data will be the norm. Much work has been put into increasing capacity and I/O performance for large-scale storage systems. However, one often ignored area is metadata management. Metadata can have a significant impact on the performance of a system. Past approaches have moved metadata activities to a separate server in order to avoid potential interference with data operations. However, with the advent of object-based storage technology, there is a compelling argument to recouple metadata and data. In this paper we present two metadata management schemes, both of which remove the need for a separate metadata server and replace it with object-based storage. I.
An OSD-based approach to managing directory operations in parallel file systems
- in IEEE International Conference on Cluster Computing
, 2008
"... Abstract—Distributed file systems that use multiple servers to store data in parallel are becoming commonplace. Much work has already gone into such systems to maximize data throughput. However, metadata management has historically been treated as an afterthought. In previous work we focused on impr ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—Distributed file systems that use multiple servers to store data in parallel are becoming commonplace. Much work has already gone into such systems to maximize data throughput. However, metadata management has historically been treated as an afterthought. In previous work we focused on improving metadata management techniques by placing file metadata along with data on Object-based Storage Devices (OSDs). However, we did not investigate directory operations. This work looks at the possibility of designing directory structures directly on OSDs, without the need for intervening servers. In particular, the need for atomicity is a fundamental requirement that we explore in depth. Through performance results of benchmarks and applications we show the feasibility of using OSDs directly for metadata, including directory operations. I.
Ceph: A Scalable Object-Based Storage System
, 2006
"... The data storage needs of large high-performance and general-purpose computing environments are generally best served by distributed storage systems. Traditional solutions, exemplified by NFS, provide a simple distributed storage system model, but cannot meet the demands of high-performance computin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The data storage needs of large high-performance and general-purpose computing environments are generally best served by distributed storage systems. Traditional solutions, exemplified by NFS, provide a simple distributed storage system model, but cannot meet the demands of high-performance computing environments where a single server may become a bottleneck, nor do they scale well due to the need to manually partition (or repartition) the data among the servers. Object-based storage promises to address these needs through a simple networked data storage unit, the Object Storage Device (OSD) that manages all local storage issues and exports a simple read/write data interface. Despite this simple concept, many challenges remain, including efficient object storage, centralized metadata management, data and metadata replication, and data and metadata reliability. We describe Ceph, a distributed object-based storage system that meets these challenges, providing highperformance file storage that scales directly with the number of OSDs and Metadata servers.
Metadata driven filesystem
, 2005
"... Filesystems should allow users to store data in a way that makes it easy for them to retrieve it. Everyone has there own method of how to name files and directories to make them meaningful. This is moderately efficient to a person in finding his or her own information, but can be quite confusing to ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Filesystems should allow users to store data in a way that makes it easy for them to retrieve it. Everyone has there own method of how to name files and directories to make them meaningful. This is moderately efficient to a person in finding his or her own information, but can be quite confusing to others. One reason for this tight coupling between user and data is that the semantics for that data is encoded in a way that only the original user fully understands. I propose a better way of encoding semantics into filesystems. This system will give the data creator the same flexibility they have with directories and filenames, but encode the semantics in such a way that they can be useful for other users. 1

