Results 11  20
of
36
SelfAdjusting Trees in Practice for Large Text Collections
 Software  Practice and Experience
, 2002
"... Splay and randomised search trees are selfbalancing binary tree structures with little or no space overhead compared to a standard binary search tree. Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collecti ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Splay and randomised search trees are selfbalancing binary tree structures with little or no space overhead compared to a standard binary search tree. Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such vocabulary accumulation. Surprisingly, unmodified splaying and randomised search trees are on average around 25% slower than using a standard binary tree. We investigate heuristics to limit splay tree reorganisation costs and show their effectiveness in practice. In particular, a periodic rotation scheme improves the speed of splaying by 27%, while other proposed heuristics are less effective. We also report the performance of efficient bitwise hashing and redblack trees for comparison.
Splay Trees for Data Compression
, 1995
"... We present applications of splay trees to two topics in data compression. First is a variant of the movetofront (mtf) data compression (of Bentley,Sleator Tarjan and Wei) algorithm, where we introduce secondary list(s). This seems to capture higherorder correlations. An implementation of this alg ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We present applications of splay trees to two topics in data compression. First is a variant of the movetofront (mtf) data compression (of Bentley,Sleator Tarjan and Wei) algorithm, where we introduce secondary list(s). This seems to capture higherorder correlations. An implementation of this algorithm with SleatorTarjan splay trees runs in time (provably) proportional to the entropy of the input sequence. When tested on some telephony data, compression ratio and run time showed significant improvements over original mtfalgorithm, making it competitive or better than popular programs. For stationary ergodic sources, we analyse the compression and output distribution of the original mtfalgorithm, which suggests why the secondary list is appropriate to introduce. We also derive analytical upper bounds on the average codeword length in terms of stochastic parameters of the source. Secondly, we consider the compression (or coding) of source sequences where the codewords are required ...
Lossless Compression for Text and Images
 International Journal of High Speed Electronics and Systems
, 1995
"... Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as imagesparticularly bilevel ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as imagesparticularly bilevel ones, or ones arising in medical and remotesensing applications, or ones that may be required to be certified true for legal reasons. Moreover, during the process of lossy compression, many occasions for lossless compression of coefficients or other information arise. This paper surveys techniques for lossless compression. The process of compression can be broken down into modeling and coding. We provide an extensive discussion of coding techniques, and then introduce methods of modeling that are appropriate for text and images. Standard methods used in popular utilities (in the case of text) and international standards (in the case of images) are described. Keywords Text compression, ima...
Cyclic Debugging for pSather, a Parallel ObjectOriented Programming Language
"... The paper discusses the main aspects of a parallel debugger for the parallel objectoriented language pSather. PSather provides for a single sharedaddress space and for multiple threads per processor. Threads can arbitrarily migrate between processors. The debugger supports cyclic debugging which i ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
The paper discusses the main aspects of a parallel debugger for the parallel objectoriented language pSather. PSather provides for a single sharedaddress space and for multiple threads per processor. Threads can arbitrarily migrate between processors. The debugger supports cyclic debugging which is a standard and quite effective technique for sequential programs. To address nondeterminism, deterministic replay is provided. For that reason, one program run is traced. The debugger uses these traces to repeatedly and identically reexecute the recorded program run. An efficient trace and replay scheme that is in particular suited for pSather and that reduces program perturbation and trace file length is presented. Furthermore, breakpoints and singlestepping for such a debugger using deterministic replay are discussed. 1 Introduction Debugging of sequential programs seems to be well understood. The standard technique is testing until an error manifests as a program failure and then to l...
Dynamic LengthRestricted Coding
, 2003
"... Suppose that $S$ is a string of length $m$ drawn from an alphabet of $n$ characters, $d$ of which occur in $S$. Let $P$ be the relative frequency distribution of characters in $S$. We present a new algorithm for dynamic coding that uses at most \(\lceil \lg n \rceil 1\) bits to encode each character ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Suppose that $S$ is a string of length $m$ drawn from an alphabet of $n$ characters, $d$ of which occur in $S$. Let $P$ be the relative frequency distribution of characters in $S$. We present a new algorithm for dynamic coding that uses at most \(\lceil \lg n \rceil 1\) bits to encode each character in $S$
MultiSplay Trees
, 2006
"... In this thesis, we introduce a new binary search tree data structure called multisplay tree and prove that multisplay trees have most of the useful properties different binary search trees (BSTs) have. First, we demonstrate a close variant of the splay tree access lemma [ST85] for multisplay tree ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this thesis, we introduce a new binary search tree data structure called multisplay tree and prove that multisplay trees have most of the useful properties different binary search trees (BSTs) have. First, we demonstrate a close variant of the splay tree access lemma [ST85] for multisplay trees, a lemma that implies multisplay trees have the O(log n) runtime property, the static finger property, and the static optimality property. Then, we extend the access lemma by showing the remassing lemma, which is similar to the reweighting lemma for splay trees [Geo04]. The remassing lemma shows that multisplay trees satisfy the working set property and keyindependent optimality, and multisplay trees are competitive to parametrically balanced trees, as defined in [Geo04]. Furthermore, we also prove that multisplay trees achieve the O(log log n)competitiveness and that sequential access in multisplay trees costs O(n). Then we naturally extend the static model to allow insertions and deletions and show how to carry out these operations in multisplay trees to achieve
Profiling of LosslessCompression Algorithms for a Novel BiomedicalImplant Architecture
 International Conference on HardwareSoftware Codesign and System Synthesis (CODES’08), Atlanda
, 2008
"... In view of a booming market for microelectronic implants, our ongoing research work is focusing on the specification and design of a novel biomedical microprocessor core targeting a large subset of existing and future biomedical applications. Towards this end, we have taken steps in identifying vari ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In view of a booming market for microelectronic implants, our ongoing research work is focusing on the specification and design of a novel biomedical microprocessor core targeting a large subset of existing and future biomedical applications. Towards this end, we have taken steps in identifying various tasks commonly required by such applications and profiling their behavior and requirements. A prominent family of such tasks is lossless data compression. In this work we profile a large collection of compression algorithms on suitably selected biomedical workloads. Compression ratio, average and peak power consumption, total energy budget, compression rate and programcode size metrics have been evaluated. Findings indicate the bestperforming algorithms across most metrics to be mlzo (scores high in 5 out of 6 imposed metrics) and fin (present in 4 out of 6 metrics). Further mlzo profiling reveals the dominance of i) addressgeneration, load, branch and compare instructions, and ii) interdependent logicallogical and logicalcompare instructions combinations.
Fast Codes for Large Alphabet Sources and its Application to Block Encoding, 2003
 IEEE Intl. Symp. Inform. Theory
, 2003
"... The computational efficiency of lossless data compression for large alphabets has attracted attention of researches for ages due to its great importance in practice. The point is that, on the one hand, often a source alphabet is very large or even infinite ( see, for example, [2]) and, on the other ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The computational efficiency of lossless data compression for large alphabets has attracted attention of researches for ages due to its great importance in practice. The point is that, on the one hand, often a source alphabet is very large or even infinite ( see, for example, [2]) and, on the other hand, for many adaptive codes the speed of coding depends substantially on the alphabet size. Thus, the number of operations of an obvious (or naive) method of updating the cumulative probabilities is proportional to the alphabet size N. Jones [1] and Ryabko [4] have independently suggested two different algorithms of updating, which perform all the necessary transitions between individual and cumulative probabilities in O(log N) operations. Later many such algorithms have been developed and investigated in numerous papers, see for a review, for example, [3]. In this paper we suggest a method for speeding up codes based on the following main idea. Letters of the alphabet are put in order according to their probabilities (or frequencies of occurrence), and the letters with probabilities close to each other are grouped in subsets (as new super letters), which contain letters with small probabilities. The key point is the following: equal probability is ascribed to all letters in one subset, and, consequently, their codewords have the same length. This gives a possibility to encode and decode them much faster than if they are different. Then each subset of the grouped letters is treated as one letter in the new alphabet, whose size is much smaller than the original alphabet. Such a grouping can increase the redundancy of the code. It turns out, however, that a large decrease in the alphabet size may cause a relatively small increase in the redundancy. Since the frequencies are changing after coding of each message letter, the order should be updated. Now there exist algorithms and data structures, which give a possibility to carry out the updating using a few operations per message letter, see [5, 3]. Let us give some definitions. Let A = {a1, a2,..., aN} be an alphabet with a probability distribution ¯p =
KIST: A new encryption algorithm based on splay
"... In this paper, we proposed a new encryption algorithm called KIST. This algorithm uses an asynchronous key sequence and a splay tree. It is very efficient in the usage of both space and time. Some elementary security tests have been done. Key words asynchronous key sequence, splay tree, symmetric ke ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we proposed a new encryption algorithm called KIST. This algorithm uses an asynchronous key sequence and a splay tree. It is very efficient in the usage of both space and time. Some elementary security tests have been done. Key words asynchronous key sequence, splay tree, symmetric key encryption 1
An Application of Selforganizing Data Structures to Compression
"... Abstract. List update algorithms have been widely used as subroutines in compression schemas, most notably as part of BurrowsWheeler compression. The BurrowsWheeler transform (BWT), which is the basis of many stateoftheart general purpose compressors applies a compression algorithm to a permute ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. List update algorithms have been widely used as subroutines in compression schemas, most notably as part of BurrowsWheeler compression. The BurrowsWheeler transform (BWT), which is the basis of many stateoftheart general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWTbased compression. In this paper we perform an experimental comparison of various list update algorithms both as stand alone compression mechanisms and as a second stage of the BWTbased compression. Our experiments show MTF outperforms other list update algorithms in practice after BWT. This is consistent with the intuition that BWT increases locality of reference and the predicted result from the locality of reference model of Angelopoulos et al. [1]. Lastly, we observe that due to an often neglected difference in the cost models, good list update algorithms may be far from optimal for BWT compression and construct an explicit example of this phenomena. This is a fact that had yet to be supported theoretically in the literature. 1