Results 1  10
of
32
Set Reconciliation with Nearly Optimal Communication Complexity
 in International Symposium on Information Theory
, 2000
"... We consider the problem of efficiently reconciling two similar sets held by different hosts while minimizing the communication complexity. This type of problem arises naturally from gossip protocols used for the distribution of information. We describe an approach to set reconciliation based on the ..."
Abstract

Cited by 59 (15 self)
 Add to MetaCart
We consider the problem of efficiently reconciling two similar sets held by different hosts while minimizing the communication complexity. This type of problem arises naturally from gossip protocols used for the distribution of information. We describe an approach to set reconciliation based on the encoding of sets as polynomials. The resulting protocols exhibit tractable computational complexity and nearly optimal communication complexity. Also, these protocols can be adapted to work over a broadcast channel, allowing many clients to reconcile with one host based on a single broadcast, even if each client is missing a different subset.
Communication Complexity of Document Exchange
, 2000
"... We address the problem of minimizing the communication involved in the exchange of similar documents. We consider two users, A and B, who hold documents x and y respectively. Neither of the users has any information about the other's document. They exchange messages so that B computes x; it may be ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
We address the problem of minimizing the communication involved in the exchange of similar documents. We consider two users, A and B, who hold documents x and y respectively. Neither of the users has any information about the other's document. They exchange messages so that B computes x; it may be required that A compute y as well. Our goal is to design communication protocols with the main objective of minimizing the total number of bits they exchange; other objectives are minimizing the number of rounds and the complexity of internal computations. An important notion which determines the efficiency of the protocols is how one measures the distance between x and y. We consider several metrics for measuring this distance, namely the Hamming metric, the Levenshtein metric (edit distance), and a new LZ metric, which is introduced in this paper. We show how to estimate the distance between x and y using a single message of logarithmic size. For each metric, we present the first communica...
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks
 IN PROC. OF THE INT. CONF. ON DATA ENGINEERING
, 2004
"... We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distibution and web caching netw ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distibution and web caching networks, web site mirroring, storage networks, and large scale web search and mining. At the core of the problem lies the following challenge, called the file synchronization problem: given two versions of a file on different machines, say an outdated and a current one, how can we update the outdated version with minimum communication cost, by exploiting the significant similarity between the versions? While a popular open source tool for this problem called rsync is used in hundreds of thousands of installations, there have been only very few attempts to improve upon this tool in practice. In this paper,
Algorithms for Delta Compression and Remote File Synchronization
 In Khalid Sayood, editor, Lossless Compression Handbook
, 2002
"... Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software o ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
Delta compression and remote file synchronization techniques are concerned with efficient file transfer over a slow communication link in the case where the receiving party already has a similar file (or files). This problem arises naturally, e.g., when distributing updated versions of software over a network or synchronizing personal files between different accounts and devices. More generally, the problem is becoming increasingly common in many networkbased applications where files and content are widely replicated, frequently modified, and cut and reassembled in different contexts and packagings.
On Interactive Communication
, 1997
"... Almost two decades ago Ahlswede introduced an abstract correlated source (V \Theta W;S) with outputs (v; w) 2 S ae V \Theta W , where persons P V and PW observe v and w , resp. More recently Orlitsky considered the minimal number Cm of bits to be transmitted in m rounds to "inform PW about v over ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Almost two decades ago Ahlswede introduced an abstract correlated source (V \Theta W;S) with outputs (v; w) 2 S ae V \Theta W , where persons P V and PW observe v and w , resp. More recently Orlitsky considered the minimal number Cm of bits to be transmitted in m rounds to "inform PW about v over 1 channel". He showed that C 2 4C1 + 3 and that generally C 2 6¸ C1 . We give a simpler example than Zhang and Xia for C 3 6¸ C1 . However, for the new model "inform PW over 2 channels" 4 rounds are optimal for this example  a result, we conjecture in general. If both, P V and PW , are to be informed over 2 channels about the other outcome, we determine asymptotically the complexities for all sources. In our last model "inform P V and PW over 1 channel" for all sources the total number T 2 of required bits is known asymptotically and T1 is bounded from below in terms of average degrees. There are exact results for several classes of regular sources. An attempt is made to discuss the m...
Data Verification and Reconciliation With Generalized ErrorControl Codes
 IEEE Trans. on Info. Theory
, 2001
"... We consider the problem of data reconciliation, which we model as two separate multisets of data that must be reconciled with minimum communication. Under this model, we show that the problem of reconciliation is equivalent to a variant of the graph coloring problem and provide consequent upper a ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
We consider the problem of data reconciliation, which we model as two separate multisets of data that must be reconciled with minimum communication. Under this model, we show that the problem of reconciliation is equivalent to a variant of the graph coloring problem and provide consequent upper and lower bounds on the communication complexity of reconciliation. More interestingly, we show by means of an explicit construction that the problem of reconciliation is, under certain general conditions, equivalent to the problem of finding good errorcorrecting codes. We show analogous results for the problem of multiset verification, in which we wish to determine whether two multisets are equal using minimum communication. As a result, a wide body of literature in coding theory may be applied to the problems of reconciliation and verification.
DESIGNING COMPRESSIVE SENSING DNA MICROARRAYS
"... A Compressive Sensing Microarray (CSM) is a new device for DNAbased identification of target organisms that leverages the nascent theory of Compressive Sensing (CS). In contrast to a conventional DNA microarray, in which each genetic sensor spot is designed to respond to a single target organism, i ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
A Compressive Sensing Microarray (CSM) is a new device for DNAbased identification of target organisms that leverages the nascent theory of Compressive Sensing (CS). In contrast to a conventional DNA microarray, in which each genetic sensor spot is designed to respond to a single target organism, in a CSM each sensor spot responds to a group of targets. As a result, significantly fewer total sensor spots are required. In this paper, we study how to design group identifier probes that simultaneously account for both the constraints from the CS theory and the biochemistry of probetarget DNA hybridization. We employ Belief Propagation as a CS recovery method to estimate target concentrations from the microarray intensities.
Fast PDA Synchronization Using Characteristic Polynomial Interpolation
 Proc. INFOCOM
, 2002
"... Modern Personal Digital Assistant (PDA) architectures often utilize a wholesale data transfer protocol known as "slow sync" for synchronizing PDAs with Personal Computers (PCs). This approach is markedly inefficient with respect to bandwidth usage and latency, since the PDA and PC typically share ma ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
Modern Personal Digital Assistant (PDA) architectures often utilize a wholesale data transfer protocol known as "slow sync" for synchronizing PDAs with Personal Computers (PCs). This approach is markedly inefficient with respect to bandwidth usage and latency, since the PDA and PC typically share many common records. We propose, analyze, and implement a novel PDA synchronization scheme (CPIsync) predicated upon recent informationtheoretic research. The salient property of this scheme is that its communication complexity depends on the number of differences between the PDA and PC, and is essentially independent of the overall number of records. Moreover, our implementation shows that the computational complexity of CPIsync is practical, and that the overall latency is typically much smaller than that of slow sync. Thus, CPIsync has potential for significantly improving synchronization protocols for PDAs and, more generally, for heterogeneous networks of many machines.
Three Results on Interactive Communication
 IEEE Transactions on Information Theory
, 1993
"... X and Y are random variables. Person PX knows X, Person P Y knows Y , and both know the underlying probability distribution of the random pair (X; Y ). Using a predetermined protocol, they exchange messages over a binary, errorfree, channel in order for P Y to learn X. PX may or may not learn Y . ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
X and Y are random variables. Person PX knows X, Person P Y knows Y , and both know the underlying probability distribution of the random pair (X; Y ). Using a predetermined protocol, they exchange messages over a binary, errorfree, channel in order for P Y to learn X. PX may or may not learn Y . Cm is the number of information bits that must be transmitted (by both persons) in the worst case if only m messages are allowed. C1 is the corresponding number of bits when there is no restriction on the number of messages exchanged. We consider three aspects of this problem. C 4 . It is known that onemessage communication may require exponentially more bits than the minimum possible: for some random pairs, C 1 = 2 C1 \Gamma1 . Yet just two messages suffice to reduce communication to almost the minimum: for all random pairs, C 2 4 C1+3. We show that, asymptotically, four messages require at most three times the minimum number of bits: for all random pairs, C 4 3 C1 + ...
Bandwidth efficient string reconciliation using puzzles
 IEEE Transactions on Parallel and Distributed Systems
, 2006
"... Of considerable interest in recent years has been the problem of exchanging correlated data with minimum communication. We thus consider the problem of exchanging two similar strings held by different hosts. Our approach involves transforming a string into a multiset of substrings that are reconcil ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Of considerable interest in recent years has been the problem of exchanging correlated data with minimum communication. We thus consider the problem of exchanging two similar strings held by different hosts. Our approach involves transforming a string into a multiset of substrings that are reconciled efficiently using known multiset reconciliation algorithms, and then put back together on a remote host using tools from graph theory. We present analyses, experiments and results to show that the communication complexity of our approach for highentropy data compares favorably to existing algorithms including rsync, a widelyused string reconciliation engine. We also quantify the tradeoff between communication and the computation complexity of our approach. Index Terms efficient file synchronization, string reconstruction, rsync I.