Results 1  10
of
21
Objectbased Storage
 In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST 11), SanJose,CA,Feb 1517 2011. The USENIX Association
"... We propose an I/O classification architecture to close the widening semantic gap between computer systems and storage systems. By classifying I/O, a computer system can request that different classes of data be handled with different storage system policies. Specifically, when a storage system is fi ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
We propose an I/O classification architecture to close the widening semantic gap between computer systems and storage systems. By classifying I/O, a computer system can request that different classes of data be handled with different storage system policies. Specifically, when a storage system is first initialized, we assign performance policies to predefined classes, such as the filesystem journal. Then, online, we include a classifier with each I/O command (e.g., SCSI), thereby allowing the storage system to enforce the associated policy for each I/O that it receives. Our immediate application is caching. We present filesystem prototypes and a database proofofconcept that classify all disk I/O — with very little modification to the filesystem, database, and operating system. We associate caching policies with various classes (e.g., large files shall be evicted before metadata and small files), and we show that endtoend file system performance can be improved by over a factor of two, relative to conventional caches like LRU. And caching is simply one of many possible applications. As part of our ongoing work, we are exploring other classes, policies and storage system mechanisms that can be used to improve endtoend performance, reliability and security.
On Generalized Gossiping and Broadcasting
 in Proceedings of the 11th Annual European Symposium on Algorithms (ESA), Lecture Notes in Comput. Sci. 2832
, 2003
"... The problems of gossiping and broadcasting have been widely studied. The basic gossip problem is defined as follows: there are n individuals, with each individual having an item of gossip. The goal is to communicate each item of gossip to every other individual. ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The problems of gossiping and broadcasting have been widely studied. The basic gossip problem is defined as follows: there are n individuals, with each individual having an item of gossip. The goal is to communicate each item of gossip to every other individual.
Efficient disk replacement and data migration algorithms for large disk subsystems
, 2004
"... Random data placement has recently emerged as an alternative to traditional data striping. From a performance perspective, it has been demonstrated to be an efficient and scalable approach for largescale storage systems. In this study we address the challenge of effectively managing the physical siz ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Random data placement has recently emerged as an alternative to traditional data striping. From a performance perspective, it has been demonstrated to be an efficient and scalable approach for largescale storage systems. In this study we address the challenge of effectively managing the physical size of large data repositories. Specifically, we define the disk replacement problem (DRP) as the challenge of finding a sequence of disk additions and removals for a storage system while migrating the data and respecting the following constraints: (1) the data is initially balanced across the existing configuration, (2) the data must again be balanced across the new configuration, and (3) the data migration cost (either the amount of data moved or the elapsed time) must be minimized. Removing and adding disks in a large storage system may be required when devices are approaching the end of their life span (i.e., old disks are replaced with new ones) or when applications require increased storage space or performance. In practice, migrating data from old disks to new devices is complicated by the fact that the total number of disks that can physically be connected to the storage system is often limited by a fixed number of available slots and not all the old and new disks can be connected at the same time. We present solutions for both cases, where the number of disk slots is either unconstrained or constrained. We introduce a cost model that allows our algorithms to either optimize for minimal data movement or shortest elapsed time. Additionally, we suggest a heuristic to minimize the time cost while reducing the computational complexity. Finally, we extensively compare and evaluate all algorithms with analytical models and the results show that the presented approach provides efficient solutions to the disk replacement problem. 1.
Data Migration on Parallel Disks
"... Our work is motivated by the problem of managing data on storage devices, typically a set of disks. Such storage servers are used as web servers or multimedia servers, for handling high demand for data. As the system is running, it needs to dynamically respond to changes in demand for different data ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Our work is motivated by the problem of managing data on storage devices, typically a set of disks. Such storage servers are used as web servers or multimedia servers, for handling high demand for data. As the system is running, it needs to dynamically respond to changes in demand for different data items. There are known algorithms for mapping demand to a layout. When the demand changes, a new layout is computed. In this work we study the data migration problem, which arises when we need to quickly change one layout to another. This problem has been studied earlier when for each disk the new layout has been prescribed. However, to apply these algorithms effectively, we identify another problem that we refer to as the correspondence problem, whose solution has a significant impact on the solution for the data migration problem. We examine algorithms for the data migration problem in more detail and identify variations of the basic algorithm that seem to improve performance in practice, even though some of the variations have poor worst case behavior.
Minimal Cost Reconfiguration of Data Placement in Storage Area Network
"... VideoonDemand (VoD) services require frequent updates in file configuration on the storage subsystem, so as to keep up with the frequent changes in movie popularity. This defines a natural reconfiguration problem in which the goal is to minimize the cost of moving from one file configuration to an ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
VideoonDemand (VoD) services require frequent updates in file configuration on the storage subsystem, so as to keep up with the frequent changes in movie popularity. This defines a natural reconfiguration problem in which the goal is to minimize the cost of moving from one file configuration to another. The cost is incurred by file replications performed throughout the transition. The problem shows up also in production planning, preemptive scheduling with setup costs, and dynamic placement of Web applications. We show that the reconfiguration problem is NPhard already on very restricted instances. We then develop algorithms which achieve the optimal cost by using servers whose load capacities are increased by O(1), in particular, by factor 1 + δ for any small 0 < δ < 1 when the number of servers is fixed, and by factor of 2 + ε for arbitrary number of servers, for some ε ∈ [0, 1). To the best of our knowledge, this fundamental optimization problem is studied here for the first time.
Efficient data migration in selfmanaging storage systems
 In Proc. ICAC ’06
, 2006
"... Abstract — Selfmanaging storage systems automate the tasks of detecting hotspots and triggering data migration to alleviate them. This paper argues that existing data migration techniques do not minimize data copying overhead incurred during reconfiguration, which in turn impacts application perfor ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract — Selfmanaging storage systems automate the tasks of detecting hotspots and triggering data migration to alleviate them. This paper argues that existing data migration techniques do not minimize data copying overhead incurred during reconfiguration, which in turn impacts application performance. We propose a novel technique that automatically detects hotspots and uses the bandwidthtospace ratio metric to greedily reconfigure the system while minimizing the resulting data copying overhead. We validate our technique with simulations and a prototype implemented into the Linux Kernel. Our prototype and simulations show our algorithm successfully eliminates hotspots with a factor of two reduction in data copying overhead compared to other approaches. I.
Primaldual algorithms for combinatorial optimization problems
, 2007
"... Combinatorial optimization problems such as routing, scheduling, covering and packing problems abound in everyday life. At a very high level, a combinatorial optimization problem amounts to finding a solution with minimum or maximum cost among a large number of feasible solutions. An algorithm for a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Combinatorial optimization problems such as routing, scheduling, covering and packing problems abound in everyday life. At a very high level, a combinatorial optimization problem amounts to finding a solution with minimum or maximum cost among a large number of feasible solutions. An algorithm for a given optimization problem is said to be exact if it always returns an optimal solution and is said to be efficient if it runs in time polynomial on the size of its input. The theory of NPcompleteness suggests that exact and efficient algorithms are unlikely to exist for the class of NPhard problems. Unfortunately, a large number of natural and interesting combinatorial optimization problems are NPhard. One way to cope with NPhardness is to relax the optimality requirement and instead look for solutions that are provably close to the optimum. This is the main idea behind approximation algorithms. An algorithm is said to be a ρapproximation if it always returns a solution whose cost is at most a ρ factor away from the optimal cost. Arguably, one of the most important techniques in the design of combinatorial algorithms is the primaldual schema in which the cost of the primal solution is compared to the cost of a dual solution. In this dissertation we study the primaldual schema in the design of approximation algorithms for a number of covering and scheduling problems.
Hierarchical Replication Hierarchical Replication
"... Abstract—This paper describes work in progress whereby a dynamic data replication scheme, under marketbased control is applied to a proposed autonomic distributed data layer for managing configuration management data. The scope of the proposed autonomic system is described and also some experimenta ..."
Abstract
 Add to MetaCart
Abstract—This paper describes work in progress whereby a dynamic data replication scheme, under marketbased control is applied to a proposed autonomic distributed data layer for managing configuration management data. The scope of the proposed autonomic system is described and also some experimental work presented. Analytic approximations of the performance achieved for management requests under various static data replication schemes are compared with eventbased simulations of the same system under dynamic marketbased replication control. The purpose of this comparison is to evaluate the performance and suitability of a marketbased control approach for such autonomic replication systems.
unknown title
, 2008
"... My research interest primarily lies in theoretical computer science and more specifically in approximation algorithms and game theory. In the first half of my Ph.D. the focus of my research was mostly on approximation algorithms. After spending a summer at Yahoo! as an intern, I became interested in ..."
Abstract
 Add to MetaCart
My research interest primarily lies in theoretical computer science and more specifically in approximation algorithms and game theory. In the first half of my Ph.D. the focus of my research was mostly on approximation algorithms. After spending a summer at Yahoo! as an intern, I became interested in the sponsored search problems and as a result the new field of algorithmic game theory which is the intersection of algorithms and game theory. During the past two years, I studied various problems related to sponsored search and electronic commerce from both combinatorial aspects as well the game theoretic aspects. Next, I provide a short summary of the problems that I would like to work on in future and a very short summary of the work that I have done during my Ph.D. Recurring Auctions A large number of sellers in eBay sell their goods through eBay’s ascending proxy auction. These sellers usually put only a small number of copies for sale at any time although they may have a large inventory. Most of the times, a bidder interested in buying a copy keeps participating in subsequent auctions until she wins. In classical analysis of auctions, a bidder who does not win an auction, is assumed to have a utility of 0 and that is used as the basis for the argument that truthful bidding is
EnergyEfficient Data Redistribution in Sensor Networks
"... We address the energyefficient data redistribution problem in data intensive sensor networks (DISNs). In a DISN, large volume of data gets generated which is first stored in the network, and is later collected for further analysis when the next uploading opportunity arises. The key concern in DISNs ..."
Abstract
 Add to MetaCart
We address the energyefficient data redistribution problem in data intensive sensor networks (DISNs). In a DISN, large volume of data gets generated which is first stored in the network, and is later collected for further analysis when the next uploading opportunity arises. The key concern in DISNs is to be able to redistribute the data from data generating nodes into the network, under limited storage and energy constraints at the sensor nodes. We formulate the data redistribution problem where the objective is to minimize the energy consumption during this process, while guaranteeing full utilization of the distributed storage capacity in the DISNs. We show that the problem is APXhard for arbitrary data sizes, therefore a polynomial time approximation algorithm is unlikely. For unit data sizes, we show that the problem is equivalent to minimum cost flow problem which can be solved optimally. However, the optimal solution’s centralized nature makes it unsuitable for largescale distributed sensor networks. We thus design a distributed algorithm for the data redistribution problem which performs very close to the optimal, and compare its performance with various intuitive heuristics. The distributed algorithm relies on potential function based computations, incurs limited message and computational overhead at both the sensor nodes and data generator nodes, and is easily implementable in a distributed manner. We