Results 1 
7 of
7
Network Coding for Joint Storage and Transmission with Minimum Cost
 In ISIT
, 2006
"... Abstract — Network coding provides elegant solutions to many data transmission problems. The usage of coding for distributed data storage has also been explored. In this work, we study a joint storage and transmission problem, where a source transmits a file to storage nodes whenever the file is upd ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Abstract — Network coding provides elegant solutions to many data transmission problems. The usage of coding for distributed data storage has also been explored. In this work, we study a joint storage and transmission problem, where a source transmits a file to storage nodes whenever the file is updated, and clients read the file by retrieving data from the storage nodes. The cost includes the transmission cost for file update and file read, as well as the storage cost. We show that such a problem can be transformed into a pure flow problem and is solvable in polynomial time using linear programming. Coding is often necessary for obtaining the optimal solution with the minimum cost. However, we prove that for networks of generalized tree structures, where adjacent nodes can have asymmetric links between them, file splitting — instead of coding — is sufficient for achieving optimality. In particular, if there is no constraint on the numbers of bits that can be stored in storage nodes, there exists an optimal solution that always transmits and stores the file as a whole. The proof is accompanied by an algorithm that optimally assigns file segments to storage nodes. I.
Symmetric Allocations for Distributed Storage
"... Abstract—We consider the problem of optimally allocating a given total storage budget in a distributed storage system. A source has a data object which it can code and store over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of stor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—We consider the problem of optimally allocating a given total storage budget in a distributed storage system. A source has a data object which it can code and store over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of storage used does not exceed the given budget. A data collector subsequently attempts to recover the original data object by accessing each of the nodes independently with some constant probability. By using an appropriate code, successful recovery occurs when the total amount of data in the accessed nodes is at least the size of the original data object. The goal is to find an optimal storage allocation that maximizes the probability of successful recovery. This optimization problem is challenging because of its discrete nature and nonconvexity, despite its simple formulation. Symmetric allocations (in which all nonempty nodes store the same amount of data), though intuitive, may be suboptimal; the problem is nontrivial even if we optimize over only symmetric allocations. Our main result shows that the symmetric allocation that spreads the budget maximally over all nodes is asymptotically optimal in a regime of interest. Specifically, we derive an upper bound for the suboptimality of this allocation and show that the performance gap vanishes asymptotically in the specified regime. Further, we explicitly find the optimal symmetric allocation for a variety of cases. Our results can be applied to distributed storage systems and other problems dealing with reliability under uncertainty, including delay tolerant networks (DTNs) and content delivery networks (CDNs). I.
Optimal interleaving on tori
 IN PROC. IEEE INT. SYMP. INFORMATION THEORY (ISIT2004)
, 2004
"... This paper studies tinterleaving on twodimensional tori, which is defined by the property that every connected subgraph of order t in the torus is labelled by t distinct integers. This is the first time that the tinterleaving problem is solved for graphs of modular structures. tinterleaving on t ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper studies tinterleaving on twodimensional tori, which is defined by the property that every connected subgraph of order t in the torus is labelled by t distinct integers. This is the first time that the tinterleaving problem is solved for graphs of modular structures. tinterleaving on tori has applications in distributed data storage and burst error correction, and is closely related to Lee metric codes. We say that a torus can be perfectly tinterleaved if its tinterleaving number — the minimum number of distinct integers needed to tinterleave the torus — meets the spherepacking lower bound. We prove the necessary and sufficient conditions for tori that can be perfectly tinterleaved, and present efficient perfect tinterleaving constructions. The most important contribution of this paper is to prove that when a torus is large enough in both dimensions, its tinterleaving number is at most one more than the spherepacking lower bound, and to present an optimal and efficient tinterleaving scheme for such tori. Then we prove bounds for the tinterleaving numbers of the remaining cases, completing a general characterization of the tinterleaving problem on 2dimensional tori.
Distributed Storage Allocations
, 2010
"... We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We examine the problem of allocating a given total storage budget in a distributed storage system for maximum reliability. A source has a single data object that is to be coded and stored over a set of storage nodes; it is allowed to store any amount of coded data in each node, as long as the total amount of storage used does not exceed the given budget. A data collector subsequently attempts to recover the original data object by accessing only the data stored in a random subset of the nodes. By using an appropriate code, successful recovery can be achieved whenever the total amount of data accessed is at least the size of the original data object. The goal is to find an optimal storage allocation that maximizes the probability of successful recovery. This optimization problem is challenging in general because of its combinatorial nature, despite its simple formulation. We study several variations of the problem, assuming different allocation models and access models. The optimal allocation and the optimal symmetric allocation (in which all nonempty nodes store the same amount of data) are determined for a variety of cases. Our results indicate that the optimal allocations often have nonintuitive structure and are difficult to specify. We also show that depending on the circumstances, coding may or may not be beneficial for reliable storage.
Minimum Cost Mirror Sites Using Network Coding: Replication versus Coding at the Source Nodes
, 2011
"... Content distribution over networks is often achieved by using mirror sites that hold copies of files or portions thereof to avoid congestion and delay issues arising from excessive demands to a single location. Accordingly, there are distributed storage solutions that divide the file into pieces an ..."
Abstract
 Add to MetaCart
Content distribution over networks is often achieved by using mirror sites that hold copies of files or portions thereof to avoid congestion and delay issues arising from excessive demands to a single location. Accordingly, there are distributed storage solutions that divide the file into pieces and place copies of the pieces (replication) or coded versions of the pieces (coding) at multiple source nodes. We consider a network which uses network coding for multicasting the file. There is a set of source nodes that contains either subsets or coded versions of the pieces of the file. The cost of a given storage solution is defined as the sum of the storage cost and the cost of the flows required to support the multicast. Our interest is in finding the storage capacities and flows at minimum combined cost. We formulate the corresponding optimization problems by using the theory of information measures. In particular, we show that when there are two source nodes, there is no loss in considering subset sources. For three source nodes, we derive a tight upper bound on the cost gap between the coded and uncoded cases. We also present algorithms for determining the content of the source nodes.
TABLE OF CONTENTS LIST OF TABLES...................................
"... Minimum cost content distribution using network coding: Replication vs. coding at the source nodes by ..."
Abstract
 Add to MetaCart
Minimum cost content distribution using network coding: Replication vs. coding at the source nodes by