Results 1 - 10
of
19
Magellan: A Searchable Metadata Architecture for Large-Scale File Systems
, 2009
"... As file systems continue to grow, metadata search is becoming an increasingly important way to access and manage files. However, existing solutions that build a separate metadata database outside of the file system face consistency and management challenges at large-scales. To address these issues, ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
As file systems continue to grow, metadata search is becoming an increasingly important way to access and manage files. However, existing solutions that build a separate metadata database outside of the file system face consistency and management challenges at large-scales. To address these issues, we developed Magellan, a new large-scale file system metadata architecture that enables the file system’s metadata to be efficiently and directly searched. This allows Magellan to avoid the consistency and management challenges of a separate database, while providing performance comparable to that of other large file systems. Magellan enables metadata search by introducing several techniques to metadata server design. First, Magellan uses a new on-disk inode layout that makes metadata retrieval efficient for searches. Second, Magellan indexes inodes in data structures that enable fast, multi-attribute search and allow all metadata lookups, including directory searches, to be handled as queries. Third, a query routing technique helps to keeps the search space small, even at large-scales. Fourth, a new journaling mechanism enables efficient update performance and metadata reliability. An evaluation with realworld metadata from a file system shows that, by combining these techniques, Magellan is capable of searching millions of files in under a second, while providing metadata performance comparable to, and sometimes better than, other large-scale file systems.
Just-in-time analytics on large file systems
- In FAST ’11
, 2011
"... As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring highlevel analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring highlevel analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. In this paper, we advocate that user interests can often be sufficiently satisfied by approximate-i.e., statistically accurate- answers. We develop Glance, a just-in-time sampling-based system which, after consuming a small number of disk accesses, is capable of producing extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. We use a number of real-world file systems to demonstrate the efficiency, accuracy and scalability of Glance. 1
Copernicus: A Scalable, High-Performance Semantic File System
, 2009
"... Hierarchical file systems do not effectively meet the needs of users at the petabyte-scale. Users need dynamic, search-based file access in order to properly manage and use their growing sea of data. This paper presents the design of Copernicus, a new scalable, semantic file system that provides a s ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Hierarchical file systems do not effectively meet the needs of users at the petabyte-scale. Users need dynamic, search-based file access in order to properly manage and use their growing sea of data. This paper presents the design of Copernicus, a new scalable, semantic file system that provides a searchable namespace for billions of files. Instead of augmenting a traditional file system with a search index, Copernicus uses a dynamic, graph-based file system design that indexes file attributes and relationships to provide scalable search and navigation of files.
Propeller: A scalable metadata organization for a versatile searchable file system
, 2011
"... The exponentially increasing amount of data in file systems has made it increasingly important for file systems to pro-vide fast file-search services. The quality of the file-search services is significantly affected by the file-index overhead, the file-search responsiveness and the accuracy of sear ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
The exponentially increasing amount of data in file systems has made it increasingly important for file systems to pro-vide fast file-search services. The quality of the file-search services is significantly affected by the file-index overhead, the file-search responsiveness and the accuracy of search re-sults. Unfortunately, the existing file-search solutions either are so poorly scalable that their performance degrades un-acceptably when the systems scale up, or incur so much
Security Aware Partitioning for Efficient File System Search
"... Index partitioning techniques—where indexes are broken into multiple distinct sub-indexes—are a proven way to improve metadata search speeds and scalability for large file systems, permitting early triage of the file system. A partitioned metadata index can rule out irrelevant files and quickly focu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Index partitioning techniques—where indexes are broken into multiple distinct sub-indexes—are a proven way to improve metadata search speeds and scalability for large file systems, permitting early triage of the file system. A partitioned metadata index can rule out irrelevant files and quickly focus on files that are more likely to match the search criteria. Also, in a large file system that contains many users, a user’s search should not include confidential files the user doesn’t have permission to view. To meet these two parallel goals, we propose a new partitioning algorithm, Security Aware Partitioning, that integrates security with the partitioning method to enable efficient and secure file system search. In order to evaluate our claim of improved efficiency, we compare the results of Security Aware Partitioning to six other partitioning methods, including implementations of the metadata partitioning algorithms of SmartStore and Spyglass, two recent systems doing partitioned search in similar environments. We propose a general set of criteria for comparing partitioning algorithms, and use them to evaluate the partitioning algorithms. Our results show that Security Aware Partitioning can provide excellent search performance at a low computational cost to build indexes, O(n). Based on metrics such as information gain, we also conclude that expensive clustering algorithms do not offer enough benefit to make them worth the additional cost in time and memory. 1.
Direct Lookup and Hash-Based Metadata Placement for Local File Systems
"... New challenges to file systems ’ metadata performance are imposed by the continuously growing number of files existing in file systems. The total amount of metadata can become too big to be cached, potentially leading to multiple storage device accesses for a single metadata lookup operation. This p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
New challenges to file systems ’ metadata performance are imposed by the continuously growing number of files existing in file systems. The total amount of metadata can become too big to be cached, potentially leading to multiple storage device accesses for a single metadata lookup operation. This paper takes a look at the limitations of traditional file sys-tem designs and discusses an alternative metadata handling approach, using hash-based concepts already established for metadata and data placement in distributed storage systems. Furthermore, a POSIX compliant prototype implementation based on these concepts is introduced and benchmarked. A variety of file system metadata and data operations as well as the influence of different storage technologies are taken into account and performance is compared with traditional file systems.
1 SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness
"... Fast and flexible metadata retrieving is critical in the nextgeneration data storage systems. As the storage capacity approaches the Exabyte level and the stored files number is in the billions, directory-tree based metadata management widely ..."
Abstract
- Add to MetaCart
(Show Context)
Fast and flexible metadata retrieving is critical in the nextgeneration data storage systems. As the storage capacity approaches the Exabyte level and the stored files number is in the billions, directory-tree based metadata management widely
Data
"... Emerging HPC analytics applications urgently demand filesearch services to drastically reduce the scale of the input data in real-time, so that the speed of computation and data analytics can be greatly accelerated. Unfortunately, the existing file-search solutions are either poorly scalable for lar ..."
Abstract
- Add to MetaCart
(Show Context)
Emerging HPC analytics applications urgently demand filesearch services to drastically reduce the scale of the input data in real-time, so that the speed of computation and data analytics can be greatly accelerated. Unfortunately, the existing file-search solutions are either poorly scalable for large-scale systems, or lack a well-integrated interface to allow applications to easily use them for critical tasks. We believe that the time is ripe for the design of a searchable file system capable of accurate and scalable system-level filesearch functionality. (a) (b) (c)
Searchable File System
"... The exponentially increasing amount of data in file systems has made it increasingly important for file systems to provide fast file-search services. The quality of the file-search services is significantly affected by the file-index overhead, the file-search responsiveness and the accuracy of searc ..."
Abstract
- Add to MetaCart
(Show Context)
The exponentially increasing amount of data in file systems has made it increasingly important for file systems to provide fast file-search services. The quality of the file-search services is significantly affected by the file-index overhead, the file-search responsiveness and the accuracy of search results. Unfortunately, the existing file-search solutions either are so poorly scalable that their performance degrades unacceptably when the systems scale up, or incur so much crawling delays that they produce acceptably inaccurate results. We believe that the time is ripe for the re-designing of a searchable file system capable of accurate and scalable system-level file search.
File System and Pipes for Streaming Physical Data Management
"... The amount of sensor data produced continues to grow at a tremendous rate. From wired sensors distributed throughout an office building to senor data collected from smart phones, PCs, and laptops. Most of ..."
Abstract
- Add to MetaCart
(Show Context)
The amount of sensor data produced continues to grow at a tremendous rate. From wired sensors distributed throughout an office building to senor data collected from smart phones, PCs, and laptops. Most of