MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Improved File Synchronization Techniques for Maintaining (2004)

by Large Replicated Collections ,  Torsten Suel ,  Patrick Noel ,  Dimitre Trendafilov
In Proc. of the Int. Conf. on Data Engineering
Add To MetaCart

Abstract:

We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distibution and web caching networks, web site mirroring, storage networks, and large scale web search and mining. At the core of the problem lies the following challenge, called the file synchronization problem: given two versions of a file on different machines, say an outdated and a current one, how can we update the outdated version with minimum communication cost, by exploiting the significant similarity between the versions? While a popular open source tool for this problem called rsync is used in hundreds of thousands of installations, there have been only very few attempts to improve upon this tool in practice.

Citations

394 Communication Complexity – Kushilevitz, Nisan - 1997
201 Efficient randomized pattern-matching algorithms – Karp, Rabin - 1981
171 A Lowbandwidth Network File System – Muthitacharoen, Chen, et al.
162 Rate of change and other metrics: a live study of the world wide web – Douglis, Feldmann - 1997
136 The evolution of the Web and implications for an incremental crawler – Cho, Garcia-Molina - 2000
129 Synchronizing a database to improve freshness – Cho, Garcia-Molina - 2000
110 The rsync algorithm – Tridgell, Mackerras - 1996
106 Pastiche: making backup cheap and easy – Cox, Murray, et al.
71 WebBase: A Repository of Web Pages – Hirai, Raghavan, et al. - 2000
68 Efficient Algorithms for Sorting and Synchronization – Tridgell - 1999
67 A protocol-independent technique for eliminating redundant network traffic – Spring, Wetherall - 2000
55 An Adaptive Model for Optimizing Performance of an Incremental Web Crawler – Edwards, McCurley, et al. - 2001
51 File system support for delta compression – MACDONALD
44 Communication complexity of document exchange – Cormode, Paterson, et al. - 2000
43 What is a file synchronizer – Balasubramaniam, Pierce - 1998
43 Delta algorithms: An empirical analysis – HUNT, VO, et al. - 1998
41 The detection of defective members of large populations – Dorfman - 1943
35 Keeping Up with the Changing Web – Brewington, Cybenko
34 Value-based web caching – Rhea, Liang, et al. - 2003
31 Adventures of a Mathematician – Ulam - 1991
30 Interactive communication of balanced distributions and of correlated files – Orlitsky - 1993
27 Effective change detection using sampling – Cho, Ntoulas - 2002
24 Crawler-friendly web servers – Brandman, Cho, et al. - 2000
22 K.P.: Engineering a differencing and compression data format – Korn, Vo
20 An algebraic approach to file synchronization – Ramsey, Csirmaz - 2001
19 A class of Randomized Strategies for Low-Cost Comparison of File Copies – Barbara, Lipton - 1991
19 Low cost comparison of file copies – Schwarz, Bowdidge, et al. - 1990
17 On the scalability of data synchronization protocols for PDAs and mobile devices – Agarwal, Starobinski, et al. - 2002
16 An optimal strategy for comparing file copies – Abdel-Ghaffar, Abbadi - 1994
15 Worst-case interactive communication II: Two messages are not optimal – Orlitsky - 1991
14 A parity structure for large remotely located replicated data files – Metzner - 1983
13 Efficient replicated remote file comparison – Metzner - 1991
13 Searching games with errors - fifty years of coping with liars, Theoretical Computer Science 270 – Pelc - 2002
10 zdelta: a simple delta compression tool – Trendafilov, Memon, et al. - 2002
9 A probabilistic algorithm for updating files over a communication link – Evfimievski - 1998
9 An application of group testing to the file comparison problem – Madej - 1989
9 Algorithms for delta compression and remote file synchronization – Suel, Memon - 2002
8 Efficient PDA synchronization – Starobinski, Trachtenberg, et al. - 2003
7 Terascale sneakernet: Using inexpensive disks for backup, archiving, and data exchange – Gray, Chong, et al. - 2002
7 Multiround rsync – Langford - 2001
7 Practical algorithms for interactive communication – Orlitsky, Viswanathan - 2001
6 Set reconciliation with almost optimal communication complexity – Minsky, Trachtenberg, et al. - 2000
6 One-way communication and error-correcting codes,” p – Orlitsky, Viswanathan - 2001
5 Using the web efficiently: Mobile crawlers – Fiedler, Hammer - 1999
3 Remote file transfer method and apparatus – Pyne - 1995
2 Efficient location of discrepancies in multiple replicated large files – Park, Metzner - 2002
2 In-place rsync: File synchronization for mobile and wireless devices – Rasch, Burns - 2003
1 Webbase: Building a web warehouse – Garcia-Molina - 2003