Results 1 -
5 of
5
Finding a Needle in Haystack: Facebook’s Photo Storage
- In Proc. of OSDI
, 2010
"... Abstract: This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data. Users upload one billion new photos (∼60 terabytes) each week and Facebook serves over one ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract: This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data. Users upload one billion new photos (∼60 terabytes) each week and Facebook serves over one million images per second at peak. Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS. Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups. We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory. This choice conserves disk operations for reading actual data and thus increases overall throughput. 1
Rump File Systems: Kernel Code Reborn
"... When kernel functionality is desired in userspace, the common approach is to reimplement it for userspace interfaces. We show that use of existing kernel file systems in userspace programs is possible without modifying the kernel file system code base. Two different operating modes are explored: 1) ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
When kernel functionality is desired in userspace, the common approach is to reimplement it for userspace interfaces. We show that use of existing kernel file systems in userspace programs is possible without modifying the kernel file system code base. Two different operating modes are explored: 1) a transparent mode, in which the file system is mounted in the typical fashion by using the kernel code as a userspace server, and 2) a standalone mode, in which applications can use a kernel file system as a library. The first mode provides isolation from the trusted computing base and a secure way for mounting untrusted file systems on a monolithic kernel. The second mode is useful for file system utilities and applications, such as populating an image or viewing the contents without requiring host operating system kernel support. Additional uses for both modes include debugging, development and testing. The design and implementation of the Runnable Userspace Meta Program file system (rump fs) framework for NetBSD is presented. Using rump, ten diskbased file systems, a memory file system, a network file system and a userspace framework file system have been tested to be functional. File system performance for an estimated typical workload is found to be ±5 % of kernel performance. The prototype of a similar framework for Linux was also implemented and portability was verified: Linux file systems work on NetBSD and NetBSD file systems work on Linux. Finally, the implementation is shown to be maintainable by examining the 1.5 year period it has been a part of NetBSD. 1
Object-based SCM: An Efficient Interface for Storage Class Memories
"... Abstract—Storage Class Memory (SCM) has become increasingly popular in enterprise systems as well as embedded and mobile systems. However, replacing hard drives with SCMs in current storage systems often forces either major changes in file systems or suboptimal performance, because the current block ..."
Abstract
- Add to MetaCart
Abstract—Storage Class Memory (SCM) has become increasingly popular in enterprise systems as well as embedded and mobile systems. However, replacing hard drives with SCMs in current storage systems often forces either major changes in file systems or suboptimal performance, because the current block-based interface does not deliver enough information to the device to allow it to optimize data management for specific device characteristics such as the out-of-place update. To alleviate this problem and fully utilize different characteristics of SCMs, we propose the use of an object-based model that provides the hardware and firmware the ability to optimize performance for the underlying implementation, and allows drop-in replacement for devices based on new types of SCM. We discuss the design of object-based SCMs and implement an object-based flash memory prototype. By analyzing different design choices for several subsystems, such as data placement policies and index structures, we show that our object-based model provides comparable performance to other flash file systems while enabling advanced features such as object-level reliability. I.
TABLEFS: Embedding a NoSQL Database Inside the Local File System
, 2012
"... Conventional file systems are optimzed for large file transfers instead of workloads that are dominated by metadata and small file accesses. This paper examines using techniques adopated from NoSQL databases to manage file system metadata and small files, which feature high rate of changes and effic ..."
Abstract
- Add to MetaCart
Conventional file systems are optimzed for large file transfers instead of workloads that are dominated by metadata and small file accesses. This paper examines using techniques adopated from NoSQL databases to manage file system metadata and small files, which feature high rate of changes and efficient out-of-core data representation. A FUSE file system prototype was built by storing file system metadata and small files into a modern key-value store LevelDB. We demonstrate that such techniques can improve the performance of modern local file systems in Linux as much as an order of magnitude for workloads dominated by metadata and tiny files. Acknowledgements: We thank the members and companies of the PDL Consortium (including APC, EMC, Facebook, Fusion-IO, Google,
TABLEFS: Enhancing Metadata Efficiency in the Local File System
, 2013
"... File systems that manage magnetic disks have long recognized the importance of sequential allocation and large transfer sizes for file data. Fast random access has dominated metadata lookup data structures with increasing use of B-trees on-disk. Yet our experiments with workloads dominated by metada ..."
Abstract
- Add to MetaCart
File systems that manage magnetic disks have long recognized the importance of sequential allocation and large transfer sizes for file data. Fast random access has dominated metadata lookup data structures with increasing use of B-trees on-disk. Yet our experiments with workloads dominated by metadata and small file access indicate that even sophisticated local disk file systems like Ext4, XFS and Btrfs leave a lot of opportunity for performance improvement in workloads dominated by metadata and small files. In this paper we present a stacked file system, TABLEFS, which uses another local file system as an object store. TABLEFS organizes all metadata into a single sparse table backed on disk using a Log-Structured Merge (LSM) tree, LevelDB in our experiments. By stacking, TABLEFS asks only for efficient large file allocation and access from the local file system. By using an LSM tree, TABLEFS ensures metadata is written to disk in large, non-overwrite, sorted and indexed logs. Even an inefficient FUSE based user level implementation of TABLEFS can perform comparably to Ext4, XFS and Btrfs on data-intensive benchmarks, and can outperform them by 50 % to as much as 1000 % for metadata-intensive workloads. Such promising performance results from TABLEFS suggest that local disk file systems can be significantly improved by more aggressive aggregation and batching of metadata updates. 1

