Results 1  10
of
13
Network Coding for Joint Storage and Transmission with Minimum Cost
 In ISIT
, 2006
"... Abstract — Network coding provides elegant solutions to many data transmission problems. The usage of coding for distributed data storage has also been explored. In this work, we study a joint storage and transmission problem, where a source transmits a file to storage nodes whenever the file is upd ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Network coding provides elegant solutions to many data transmission problems. The usage of coding for distributed data storage has also been explored. In this work, we study a joint storage and transmission problem, where a source transmits a file to storage nodes whenever the file is updated, and clients read the file by retrieving data from the storage nodes. The cost includes the transmission cost for file update and file read, as well as the storage cost. We show that such a problem can be transformed into a pure flow problem and is solvable in polynomial time using linear programming. Coding is often necessary for obtaining the optimal solution with the minimum cost. However, we prove that for networks of generalized tree structures, where adjacent nodes can have asymmetric links between them, file splitting — instead of coding — is sufficient for achieving optimality. In particular, if there is no constraint on the numbers of bits that can be stored in storage nodes, there exists an optimal solution that always transmits and stores the file as a whole. The proof is accompanied by an algorithm that optimally assigns file segments to storage nodes. I.
Distributed Storage Allocations
, 2010
"... We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of storage used does not exceed the given budget. A data collector subsequently attempts to recover the original data object by accessing only the data stored in a random subset of the nodes. By using an appropriate code, successful recovery can be achieved whenever the total amount of data accessed is at least the size of the original data object. The goal is to find an optimal storage allocation that maximizes the probability of successful recovery. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models. The optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) are determined for a variety of cases. Our results indicate that the optimal allocations often have nonintuitive structure and are difficult to specify. We also show that depending on the circumstances, coding may or may not be beneficial for reliable storage.
Symmetric Allocations for Distributed Storage
"... Abstract—We consider the problem of optimally allocating a given total storage budget in a distributed storage system. A source has a data object which it can code and store over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of stor ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract—We consider the problem of optimally allocating a given total storage budget in a distributed storage system. A source has a data object which it can code and store over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of storage used does not exceed the given budget. A data collector subsequently attempts to recover the original data object by accessing each of the nodes independently with some constant probability. By using an appropriate code, successful recovery occurs when the total amount of data in the accessed nodes is at least the size of the original data object. The goal is to find an optimal storage allocation that maximizes the probability of successful recovery. This optimization problem is challenging because of its discrete nature and nonconvexity, despite its simple formulation. Symmetric allocations (in which all nonempty nodes store the same amount of data), though intuitive, may be suboptimal; the problem is nontrivial even if we optimize over only symmetric allocations. Our main result shows that the symmetric allocation that spreads the budget maximally over all nodes is asymptotically optimal in a regime of interest. Specifically, we derive an upper bound for the suboptimality of this allocation and show that the performance gap vanishes asymptotically in the specified regime. Further, we explicitly find the optimal symmetric allocation for a variety of cases. Our results can be applied to distributed storage systems and other problems dealing with reliability under uncertainty, including delay tolerant networks (DTNs) and content delivery networks (CDNs). I.
Minimum Cost Mirror Sites Using Network Coding: Replication versus Coding at the Source Nodes
, 2011
"... Content distribution over networks is often achieved by using mirror sites that hold copies of files or portions thereof to avoid congestion and delay issues arising from excessive demands to a single location. Accordingly, there are distributed storage solutions that divide the file into pieces an ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Content distribution over networks is often achieved by using mirror sites that hold copies of files or portions thereof to avoid congestion and delay issues arising from excessive demands to a single location. Accordingly, there are distributed storage solutions that divide the file into pieces and place copies of the pieces (replication) or coded versions of the pieces (coding) at multiple source nodes. We consider a network which uses network coding for multicasting the file. There is a set of source nodes that contains either subsets or coded versions of the pieces of the file. The cost of a given storage solution is defined as the sum of the storage cost and the cost of the flows required to support the multicast. Our interest is in finding the storage capacities and flows at minimum combined cost. We formulate the corresponding optimization problems by using the theory of information measures. In particular, we show that when there are two source nodes, there is no loss in considering subset sources. For three source nodes, we derive a tight upper bound on the cost gap between the coded and uncoded cases. We also present algorithms for determining the content of the source nodes.
Optimal interleaving on tori
 IN PROC. IEEE INT. SYMP. INFORMATION THEORY (ISIT2004)
, 2004
"... This paper studies tinterleaving on twodimensional tori, which is defined by the property that every connected subgraph of order t in the torus is labelled by t distinct integers. This is the first time that the tinterleaving problem is solved for graphs of modular structures. tinterleaving on t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper studies tinterleaving on twodimensional tori, which is defined by the property that every connected subgraph of order t in the torus is labelled by t distinct integers. This is the first time that the tinterleaving problem is solved for graphs of modular structures. tinterleaving on tori has applications in distributed data storage and burst error correction, and is closely related to Lee metric codes. We say that a torus can be perfectly tinterleaved if its tinterleaving number — the minimum number of distinct integers needed to tinterleave the torus — meets the spherepacking lower bound. We prove the necessary and sufficient conditions for tori that can be perfectly tinterleaved, and present efficient perfect tinterleaving constructions. The most important contribution of this paper is to prove that when a torus is large enough in both dimensions, its tinterleaving number is at most one more than the spherepacking lower bound, and to present an optimal and efficient tinterleaving scheme for such tori. Then we prove bounds for the tinterleaving numbers of the remaining cases, completing a general characterization of the tinterleaving problem on 2dimensional tori.
1Distributed Storage Allocations
"... Abstract—We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as t ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of storage used does not exceed the given budget. A data collector subsequently attempts to recover the original data object by accessing only the data stored in a random subset of the nodes. By using an appropriate code, successful recovery can be achieved whenever the total amount of data accessed is at least the size of the original data object. The goal is to find an optimal storage allocation that maximizes the probability of successful recovery. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models. The optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) are determined for a variety of cases. Our results indicate that the optimal allocations often have nonintuitive structure and are difficult to specify. We also show that depending on the circumstances, coding may or may not be beneficial for reliable storage. Index Terms—Data storage systems, distributed storage, network coding, reliability, storage allocation. I.
On Erasure Coding for Distributed Storage and Streaming Communications
, 2013
"... iii To my parents and grandparents iv Acknowledgments I would like to express my gratitude to my research adviser Tracey Ho for her guidance and infinite wisdom, generosity, and patience, and also for her constant encouragement and prodding to tighten this bound and generalize that theorem. I would ..."
Abstract
 Add to MetaCart
(Show Context)
iii To my parents and grandparents iv Acknowledgments I would like to express my gratitude to my research adviser Tracey Ho for her guidance and infinite wisdom, generosity, and patience, and also for her constant encouragement and prodding to tighten this bound and generalize that theorem. I would also like to extend my appreciation to my research mentor and collaborator Alex Dimakis for sharing his intuitions and insights on various problems, and for his candid tips on preventing paper rejections and audience narcolepsy. My thanks also go to my other thesis committee members Michelle Effros, Steven Low, Babak Hassibi, and Jehoshua (Shuki) Bruck for their feedback and comments on improving my work. I would also like
1 General field of research Research Statement
"... My research interest is in the general field of information networks. My study and research are in the areas of algorithms, combinatorial and convex optimization, distributed systems and information theory. So far my research has focused on two fields — file storage in networks, and wireless ad hoc ..."
Abstract
 Add to MetaCart
(Show Context)
My research interest is in the general field of information networks. My study and research are in the areas of algorithms, combinatorial and convex optimization, distributed systems and information theory. So far my research has focused on two fields — file storage in networks, and wireless ad hoc communication and sensor networks. I plan to use my research experience and knowledge to explore broader aspects of information networks, including overlay storage/distribution networks, sensor networks and many other forms, all essential for pervasive computing. Two key components shared by different kinds of information networks are data storage/sharing and network structure design/utilization. The first component, data storage/sharing, requires optimized placement of data for efficient access, even when the users of the data are extensively distributed, mobile or have very different communication and computing capabilities. Information theory can be applied to help both the storage and the retrieval of data to achieve an optimal performance/redundancy tradeoff. Examples include the storage of shared files in networks using erasure codes for high availability, rate allocation for nodes collecting data in sensor networks, fractionally cascading of information for fast data detection and locating, multicast based on Network Coding, etc. The second component, network structure design/utilization, is on the design of real or overlaynetwork
Hitting Set Algorithms for Fast Data Recovery in the Face of Geographic Correlated Attacks
"... Abstract—In distributed storage networks, ensuring data availability in the presence of hardware faults is an important requirement. Typically, redundancy schemes such as replication and erasure coding are used to ensure this. In case of hardware failures, these networks may be disconnected into mul ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—In distributed storage networks, ensuring data availability in the presence of hardware faults is an important requirement. Typically, redundancy schemes such as replication and erasure coding are used to ensure this. In case of hardware failures, these networks may be disconnected into multiple components each of which may require access to the data. In addition, the placement of redundant information must also be optimized as it is everchanging and requires constant updating. We study the problem of selecting a set of nodes in networks of this kind so that data availability is maintained in the face of geographically correlated failures. We model failure events of arbitrary shapes as the union of disks or lines in the plane and present approximation algorithms for the problem of selecting a minimum number of redundant information locations (such as replicas or coded file segments) so that data recovery is guaranteed at every node in the face of any failure event. Using tools from computational geometry, our algorithms are efficient and provide good guarantees. I.
TABLE OF CONTENTS LIST OF TABLES...................................
"... Minimum cost content distribution using network coding: Replication vs. coding at the source nodes by ..."
Abstract
 Add to MetaCart
(Show Context)
Minimum cost content distribution using network coding: Replication vs. coding at the source nodes by