## An Efficient Compression Scheme for Data Communication Which Uses a New Family of Self-Organizing Binary Search Trees

### BibTeX

@MISC{Rueda_anefficient,

author = {Luis Rueda and B. John Oommen},

title = {An Efficient Compression Scheme for Data Communication Which Uses a New Family of Self-Organizing Binary Search Trees},

year = {}

}

### OpenURL

### Abstract

In this paper, we demonstrate that we can effectively use results from the field of adaptive self-organizing data structures in enhancing compression schemes. Unlike adaptive lists, which have already been used in compression, to the best of our knowledge, adaptive self-organizing trees have not been used in this regard. To achieve this, we introduce a new data structure, the Partitioning Binary Search Tree (PBST) which, although based on the well-known Binary Search Tree (BST), also appropriately partitions the data elements into mutually exclusive sets. When used in conjunction with Fano encoding, the PBST leads to the so-called Fano Binary Search Tree (FBST), which, indeed, incorporates the required Fano coding (nearly-equal-probability) property into the BST. We demonstrate how both the PBST and FBST can be maintained adaptively and in a self-organizing manner. The updating procedure that converts a PBST into an FBST, and the corresponding new tree-based operators, namely the Shift-To-Left (STL) and the Shift-To-Right (STR) operators, are explicitly presented. The encoding and decoding procedures that also update the FBST have been implemented and rigorously tested. Our empirical results on files of the well-known benchmark, the Canterbury corpus, show that the adaptive Fano coding using FBSTs, the Huffman, and the greedy adaptive Fano coding achieve similar compression ratios. However, in terms of encoding/decoding speed, the new scheme is much faster than the latter two in the encoding phase, and they achieve approximately the same speed in the decoding phase. We believe that the same philosophy, namely that of using an adaptive self-organizing BST to maintain the frequencies, can also be utilized for other data encoding mechanisms, even as the Fenwick scheme has been used in arithmetic coding. 1

### Citations

2318 |
The art of computer programming
- KNUTH
- 1973
(Show Context)
Citation Context ... BST during the search process. Consider a set of records whose keys are given by the ordered set of distinct elements K = {k1, . . . , km}, where k1 < . . . < km. By following the procedure given in =-=[23]-=-, the optimal BST can be constructed using dynamic programming in O(m2 ) time and space. Alternatively, using dynamic programming and divide-andconquer techniques [37], a nearly-optimal BST can be con... |

1210 | A universal algorithm for sequential data compression
- Ziv, Lempel
- 1977
(Show Context)
Citation Context ...ta from universal sources. On the other hand, adaptive coding approaches that use higher-order statistical models, and other structural models, include dictionary techniques (LZ and its enhancements) =-=[39, 40]-=-, prediction with partial matching (PPM) [11], and grammar based compression (GBC) [22]. Splay trees have also been used in adaptive data compression [21]. Adaptive methods for Fano coding have been r... |

1034 |
A method for the construction of minimum-redundancy codes
- Huffman
- 1952
(Show Context)
Citation Context ...omplish the encoding. Most of the well-known static encoding techniques have been extended to also function in an adaptive manner. The most well-known adaptive coding technique is Huffman’s algorithm =-=[19]-=-, which was first presented by Faller in 1973 [13]. Being unaware of the work done by Faller, Gallager presented an alternate adaptive version of Huffman’s algorithm in 1978 [17]. The latter was later... |

775 | Compression of individual sequences via variable-rate coding
- Ziv, Lempel
- 1978
(Show Context)
Citation Context ...ta from universal sources. On the other hand, adaptive coding approaches that use higher-order statistical models, and other structural models, include dictionary techniques (LZ and its enhancements) =-=[39, 40]-=-, prediction with partial matching (PPM) [11], and grammar based compression (GBC) [22]. Splay trees have also been used in adaptive data compression [21]. Adaptive methods for Fano coding have been r... |

386 |
Self-adjusting binary search trees
- Sleator, Tarjan
- 1985
(Show Context)
Citation Context ...ssed record one level up towards the root. Although this approach is not very efficient, it has the advantage that it does not use extra space. Splaying is another technique due to Sleator and Tarjan =-=[20, 34, 38]-=-. It uses its own tree structure called the splay tree. The main idea of this technique is to move the accessed record towards the root, and to simultaneously allow accesses to each record by an in-or... |

352 | Data compression using adaptive coding and partial string matching - Cleary, Witten - 1984 |

298 |
Introduction to Data Compression
- Sayood
- 1996
(Show Context)
Citation Context ...Vitter in 1987 [36]. Another important encoding method that has been extended for its adaptive version, is the arithmetic coding scheme. Details of its modeling and its implementation can be found in =-=[18, 32]-=-. Other important adaptive methods are the interval and recency rank encoding [18], and the Elias omega codes [2]. While the former methods are efficient for a particular source distribution only, the... |

184 |
An algorithm for the organization of information
- Adel’son-Vel’skii, Landis
- 1962
(Show Context)
Citation Context ...e. These are used by the heuristic called the conditional rotation, based on the fundamental rotation operation (also known as the promotion operation [25]) introduced by Adel’son-Vel’skii and Landis =-=[1]-=-. 1.2 Available Compression Schemes Since we intend to propose a “marriage” between the fields of adaptive tree-based data structures and compression, a brief introduction of the latter field is not o... |

142 | Randomized search trees
- Aragon, Seidel
- 1989
(Show Context)
Citation Context ...gh entropy. Empirical results have also shown that, on the average, it behaves poorly. Other adaptive binary search tree approaches are biasing [8], dynamic binary search [26], weighted randomization =-=[7]-=-, deepsplaying [33], and the technique that uses conditional rotations [10]. The basic idea of the latter approach is to maintain certain key pieces 2of information in each node. These are used by th... |

123 |
Variations on a theme by Huffman
- Gallager
- 1978
(Show Context)
Citation Context ...s Huffman’s algorithm [19], which was first presented by Faller in 1973 [13]. Being unaware of the work done by Faller, Gallager presented an alternate adaptive version of Huffman’s algorithm in 1978 =-=[17]-=-. The latter was later augmented by Knuth in 1985, who presented a more efficient algorithm to adaptively maintain the Huffman tree [24]. The most recent and efficient version of the adaptive Huffman ... |

106 |
Dynamic Huffman coding
- Knuth
- 1985
(Show Context)
Citation Context ...ed an alternate adaptive version of Huffman’s algorithm in 1978 [17]. The latter was later augmented by Knuth in 1985, who presented a more efficient algorithm to adaptively maintain the Huffman tree =-=[24]-=-. The most recent and efficient version of the adaptive Huffman coding is the one introduced by Vitter in 1987 [36]. Another important encoding method that has been extended for its adaptive version, ... |

92 | Design and Analysis of Dynamic Huffman Codes
- Vitter
- 1987
(Show Context)
Citation Context ...85, who presented a more efficient algorithm to adaptively maintain the Huffman tree [24]. The most recent and efficient version of the adaptive Huffman coding is the one introduced by Vitter in 1987 =-=[36]-=-. Another important encoding method that has been extended for its adaptive version, is the arithmetic coding scheme. Details of its modeling and its implementation can be found in [18, 32]. Other imp... |

58 |
Grammar-Based Codes: a New Class of Universal Lossless Source Codes
- Kieffer, Yang
(Show Context)
Citation Context ...der statistical models, and other structural models, include dictionary techniques (LZ and its enhancements) [39, 40], prediction with partial matching (PPM) [11], and grammar based compression (GBC) =-=[22]-=-. Splay trees have also been used in adaptive data compression [21]. Adaptive methods for Fano coding have been recently introduced (for the binary and multi-symbol code alphabets) [30], which have be... |

50 |
Biased search trees
- Bent, Sleator, et al.
- 1985
(Show Context)
Citation Context ...ssed. This approach performs poorly for key sets with high entropy. Empirical results have also shown that, on the average, it behaves poorly. Other adaptive binary search tree approaches are biasing =-=[8]-=-, dynamic binary search [26], weighted randomization [7], deepsplaying [33], and the technique that uses conditional rotations [10]. The basic idea of the latter approach is to maintain certain key pi... |

42 |
Alternatives to splay trees with O(log n) worst-case access times
- Iacono
(Show Context)
Citation Context ...ssed record one level up towards the root. Although this approach is not very efficient, it has the advantage that it does not use extra space. Splaying is another technique due to Sleator and Tarjan =-=[20, 34, 38]-=-. It uses its own tree structure called the splay tree. The main idea of this technique is to move the accessed record towards the root, and to simultaneously allow accesses to each record by an in-or... |

41 | Improved randomized on-line algorithms for the list update problem
- Albers
- 1995
(Show Context)
Citation Context ... Adaptive lists have been investigated for more than three decades, and schemes such as the Move-to-Front (MTF), Transposition, Move-k-Ahead, the Move-to-Rear families [28], and randomized algorithms =-=[3]-=- have been proposed. A complete survey of these methods and their applications can be found in [5], and their applicability in compression has also been acclaimed, for example, of the MTF in block sor... |

39 |
Introduction to Information Theory and Data Compression
- Hankerson, Harris, et al.
- 1998
(Show Context)
Citation Context ...Vitter in 1987 [36]. Another important encoding method that has been extended for its adaptive version, is the arithmetic coding scheme. Details of its modeling and its implementation can be found in =-=[18, 32]-=-. Other important adaptive methods are the interval and recency rank encoding [18], and the Elias omega codes [2]. While the former methods are efficient for a particular source distribution only, the... |

38 | Heuristics that dynamically organize data structures - Bitner - 1979 |

35 |
Applications of splay trees to data compression
- Jones
- 1988
(Show Context)
Citation Context ...nary techniques (LZ and its enhancements) [39, 40], prediction with partial matching (PPM) [11], and grammar based compression (GBC) [22]. Splay trees have also been used in adaptive data compression =-=[21]-=-. Adaptive methods for Fano coding have been recently introduced (for the binary and multi-symbol code alphabets) [30], which have been shown to work faster than adaptive Huffman coding, and consume o... |

27 |
Self-organizing binary search trees
- Allen, Munro
(Show Context)
Citation Context ...robabilities are unknown, and the structure and the content of the BST are dynamically changed while the records are searched for, in the tree. The move-to-root heuristic, proposed by Allen and Munro =-=[6]-=-, is a very simple approach to maintain an adaptive BST. The aim of this approach is to maintain the most frequently accessed records near the root, and consequently, to minimize the average cost of s... |

24 | step algorithms in the Burrows-Wheeler compression algorithm
- Second
(Show Context)
Citation Context ...da. E-mail: oommen@scs.carleton.ca. This author is also an Adjunct Professor with the Department of Information and Communication Technology, University of Agder, Grimstad, Norway. 1in block sorting =-=[12]-=-, and in [4]) we argue that adaptive tree-based schemes have their distinct advantages, and this claim has been demonstrated by using the principles on a Fano scheme, which recently, has attracted fas... |

22 | Average case analyses of list update algorithms, with applications to data compression - Albers, Mitzenmacher - 1996 |

18 | Self-organizing data structures
- Albers, Westbrook
- 1998
(Show Context)
Citation Context ...o-Front (MTF), Transposition, Move-k-Ahead, the Move-to-Rear families [28], and randomized algorithms [3] have been proposed. A complete survey of these methods and their applications can be found in =-=[5]-=-, and their applicability in compression has also been acclaimed, for example, of the MTF in block sorting [12], and by Albers et al. in [4]. As opposed to this, a number of adaptive tree-based algori... |

16 |
Dynamic binary search
- Mehlhorn
- 1979
(Show Context)
Citation Context ...s poorly for key sets with high entropy. Empirical results have also shown that, on the average, it behaves poorly. Other adaptive binary search tree approaches are biasing [8], dynamic binary search =-=[26]-=-, weighted randomization [7], deepsplaying [33], and the technique that uses conditional rotations [10]. The basic idea of the latter approach is to maintain certain key pieces 2of information in eac... |

15 | A new data structure for cumulative frequency tables
- Fenwick
- 1994
(Show Context)
Citation Context ...can effectively use results from the field of adaptive self-organizing data structures in enhancing compression schemes. Adaptive lists have been used earlier in compression [4], and the Fenwick tree =-=[14, 27]-=- has brilliantly used the list of probabilities, maintained as a tree, to maintain probability estimates3 . But, to the best of our knowledge, adaptive self-organizing trees have not been used in this... |

13 | Universal coding of integers and unbounded search trees - Ahlswede, Han, et al. - 1997 |

13 | Self-adjusting trees in practice for large text collections
- WILLIAMS, ZOBEL, et al.
(Show Context)
Citation Context ...ssed record one level up towards the root. Although this approach is not very efficient, it has the advantage that it does not use extra space. Splaying is another technique due to Sleator and Tarjan =-=[20, 34, 38]-=-. It uses its own tree structure called the splay tree. The main idea of this technique is to move the accessed record towards the root, and to simultaneously allow accesses to each record by an in-or... |

9 | Dynamic Shannon coding
- Gagie
- 2004
(Show Context)
Citation Context ...adaptive tree-based schemes have their distinct advantages, and this claim has been demonstrated by using the principles on a Fano scheme, which recently, has attracted fascinating research attention =-=[15, 16]-=-. 1.1 Adaptive Lists and Trees Adaptive lists have been investigated for more than three decades, and schemes such as the Move-to-Front (MTF), Transposition, Move-k-Ahead, the Move-to-Rear families [2... |

8 | Adaptive heuristics for binary search trees and constant linkage cost
- Lai, Wood
- 1991
(Show Context)
Citation Context ...n certain key pieces 2of information in each node. These are used by the heuristic called the conditional rotation, based on the fundamental rotation operation (also known as the promotion operation =-=[25]-=-) introduced by Adel’son-Vel’skii and Landis [1]. 1.2 Available Compression Schemes Since we intend to propose a “marriage” between the fields of adaptive tree-based data structures and compression, a... |

7 |
An improved data structure for cumulative probability tables,” Software—Practice and Experience
- Moffat
- 1999
(Show Context)
Citation Context ...can effectively use results from the field of adaptive self-organizing data structures in enhancing compression schemes. Adaptive lists have been used earlier in compression [4], and the Fenwick tree =-=[14, 27]-=- has brilliantly used the list of probabilities, maintained as a tree, to maintain probability estimates3 . But, to the best of our knowledge, adaptive self-organizing trees have not been used in this... |

6 | D.T.: Adaptive structuring of binary search trees using conditional rotations
- Cheetham, Oommen, et al.
- 1993
(Show Context)
Citation Context ...aves poorly. Other adaptive binary search tree approaches are biasing [8], dynamic binary search [26], weighted randomization [7], deepsplaying [33], and the technique that uses conditional rotations =-=[10]-=-. The basic idea of the latter approach is to maintain certain key pieces 2of information in each node. These are used by the heuristic called the conditional rotation, based on the fundamental rotat... |

5 |
Self-adjusting k-ary Search Trees and Self-adjusting Balanced Search Trees
- Sherk
- 1990
(Show Context)
Citation Context ...cal results have also shown that, on the average, it behaves poorly. Other adaptive binary search tree approaches are biasing [8], dynamic binary search [26], weighted randomization [7], deepsplaying =-=[33]-=-, and the technique that uses conditional rotations [10]. The basic idea of the latter approach is to maintain certain key pieces 2of information in each node. These are used by the heuristic called ... |

5 |
C.C.: A top-down algorithm for constructing nearly optimal lexicographical trees, in Graph theory and Computing
- Walker, Gotlieb
- 1972
(Show Context)
Citation Context ...following the procedure given in [23], the optimal BST can be constructed using dynamic programming in O(m2 ) time and space. Alternatively, using dynamic programming and divide-andconquer techniques =-=[37]-=-, a nearly-optimal BST can be constructed in O(m) space and O(m log m) time. These two approaches can be used whenever the statistical information about the access to the records is known beforehand. ... |

4 |
A Learning Automaton Solution to Breaking Substitution Ciphers
- Oommen, Zgierski
- 1993
(Show Context)
Citation Context ...6]. 1.1 Adaptive Lists and Trees Adaptive lists have been investigated for more than three decades, and schemes such as the Move-to-Front (MTF), Transposition, Move-k-Ahead, the Move-to-Rear families =-=[28]-=-, and randomized algorithms [3] have been proposed. A complete survey of these methods and their applications can be found in [5], and their applicability in compression has also been acclaimed, for e... |

4 |
A nearly optimal Fano-based coding algorithm
- Rueda, Oommen
- 2004
(Show Context)
Citation Context ...on a conditional shifting heuristic, and used to transform a PBST into an FBST. The conditional shifting heuristic is based on the principles of the Fano coding – the nearly-equalprobability property =-=[29]-=-. This heuristic, used in conjunction with the STL and the STR operators defined in this paper, are used to transform a PBST into an FBST, as per the following rule. 186 9 -1 P j 2 5 10 4 t i +1 P j ... |

1 |
Dynamic Shannon Coding. Submitted to Elsevier Science. Electronically available at http://www.cs.toronto.edu/∼travis/ipl4.pdf
- Gagie
- 2008
(Show Context)
Citation Context ...adaptive tree-based schemes have their distinct advantages, and this claim has been demonstrated by using the principles on a Fano scheme, which recently, has attracted fascinating research attention =-=[15, 16]-=-. 1.1 Adaptive Lists and Trees Adaptive lists have been investigated for more than three decades, and schemes such as the Move-to-Front (MTF), Transposition, Move-k-Ahead, the Move-to-Rear families [2... |

1 |
A New Approach to Adaptive Encoding Data using Self-organizing Data Structures
- Rueda, Oommen
- 2007
(Show Context)
Citation Context ...d principles. 26Acknowledgments: A preliminary version of this paper was presented at ISCIS 2007, the 2007 International Symposium on Computer and Information Sciences, Ankara, Turkey, November 2007 =-=[31]-=-. We sincerely thank the anonymous Referees of this present paper. Their comments helped to improve the readability of the paper. The work of L. Rueda was partially supported by CONICYT, the Chilean N... |