Results 1  10
of
25
Data Compression
 ACM Computing Surveys
, 1987
"... This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing eff ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested. INTRODUCTION Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of eff...
A Natural Law of Succession
, 1995
"... Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we presen ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.
Nearoptimal routing lookups with bounded worst case performance
 In IEEE INFOCOM’00
, 2000
"... Abstract — The problem of route address lookup has received much attention recently and several algorithms and data structures for performing address lookups at high speeds have been proposed. In this paper we consider one such data structure – a binary search tree built on the intervals created by ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Abstract — The problem of route address lookup has received much attention recently and several algorithms and data structures for performing address lookups at high speeds have been proposed. In this paper we consider one such data structure – a binary search tree built on the intervals created by the routing table prefixes. We wish to exploit the difference in the probabilities with which the various leaves of the tree (where the intervals are stored) are accessed by incoming packets in order to speedup the lookup process. More precisely, we seek an answer to the question “How can the search tree be drawn so as to minimize the average packet lookup time while keeping the worstcase lookup time within a fixed bound? ” We use ideas from information theory to derive efficient algorithms for computing nearoptimal routing lookup trees. Finally, we consider the practicality of our algorithms through analysis and simulation.
Is Huffman Coding Dead?
 COMPUTING
, 1993
"... In recent publications about data compression, arithmetic codes are often suggested as the state of the art, rather than the more popular Huffman codes. While it is true that Huffman codes are not optimal in all situations, we show that the advantage of arithmetic codes in compression performance is ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
In recent publications about data compression, arithmetic codes are often suggested as the state of the art, rather than the more popular Huffman codes. While it is true that Huffman codes are not optimal in all situations, we show that the advantage of arithmetic codes in compression performance is often negligible. Referring also to other criteria, we conclude that for many applications, Huffman codes should still remain a competitive choice.
Bounding the Depth of Search Trees
 The Computer Journal
, 1993
"... For an ordered sequence of n weights, Huffman's algorithm constructs in time and space O(n) a search tree with minimum average path length, or, which is equivalent, a minimum redundancy code. However, if an upper bound B is imposed on the length of the codewords, the best known algorithms for t ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
For an ordered sequence of n weights, Huffman's algorithm constructs in time and space O(n) a search tree with minimum average path length, or, which is equivalent, a minimum redundancy code. However, if an upper bound B is imposed on the length of the codewords, the best known algorithms for the construction of an optimal code have time and space complexities O(Bn 2 ). A new algorithm is presented, which yields suboptimal codes, but in time O(n log n) and space O(n). Under certain conditions, these codes are shown to be close to optimal, and extensive experiments suggest that in many practical applications, the deviation from the optimum is negligible. 1. Motivation and Introduction We consider the set B(n; b) of extended binary trees with n leaves, labelled 1 to n, and with depth b, henceforth called brestricted trees. An extended binary tree is a binary tree in which every internal node has two sons (here, and in what follows, we use the terminology of Knuth [16, pp. 399...
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
The MiniMISS speech synthesis system
 In P. Suppes (Ed.), Universitylevel
"... collection of hardware and software modules, designed and constructed at the Institute for Mathematical Studies in the Social Sciences (IMSSS), that provide highquality speech synthesis for users of IMSSS's computerassisted instruction (CAl) programs. In a CAl course, certain sorts of informat ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
collection of hardware and software modules, designed and constructed at the Institute for Mathematical Studies in the Social Sciences (IMSSS), that provide highquality speech synthesis for users of IMSSS's computerassisted instruction (CAl) programs. In a CAl course, certain sorts of information can most appropriately be presented tp the student through an audiochannel. For example, it is often helpful for the student to hear informal comments about formulas or tables that are simultaneously being displayed. In addition, emphasis patterns of speech convey semantic information in a way that is often awkward to reproduce in written form. Finally, it is useful to have a second channel available for communicating with the student and for directinghis attention. The MISS system reproduces speech in two ways: First. by resynthesizing recorded phrases and, second. by resynthesizing the prosodically adjusted concatenation of individually recorded words. Prosodic adjustment of concatenated words is directed by linguistic textanalysis routines and is accomplished by modifying the fundamentalfrequency (FO) contour, duration, and amplitude of the words during concatenation in real time. In conjunction with a program to develop CAl curriculums, the concatenation of words with prosody provides speech of
Resourceaware conference key establishment for heterogeneous networks
 U 2 Sliver Sport Members U 1 Gold Members U 4 Basic Members U 3 Sliver Finance Members (b) R 1 Sports News R 3 Top News Weather R 2 Financial News Stock (c) V 2 U 2 , R 1 V 1 U 1 V 4 U 4 , R 3 V 3 U 3 , R 2
, 2005
"... Abstract—The Diffie–Hellman problem is often the basis for establishing conference keys. In heterogeneous networks, many conferences have participants of varying resources, yet most conference keying schemes do not address this concern and place the same burden upon less powerful clients as more pow ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Abstract—The Diffie–Hellman problem is often the basis for establishing conference keys. In heterogeneous networks, many conferences have participants of varying resources, yet most conference keying schemes do not address this concern and place the same burden upon less powerful clients as more powerful ones. The establishment of conference keys should minimize the burden placed on resourcelimited users while ensuring that the entire group can establish the key. In this paper, we present a hierarchical conference keying scheme that forms subgroup keys for successively larger subgroups en route to establishing the group key. A tree, called the conference tree, governs the order in which subgroup keys are formed. Key establishment schemes that consider users with varying costs or budgets are built by appropriately designing the conference tree. We then examine the scenario where users have both varying costs and budget constraints. A greedy algorithm is presented that achieves nearoptimal performance, and requires significantly less computational effort than finding the optimal solution. We provide a comparison of the total cost of treebased conference keying schemes against several existing schemes, and introduce a new performance criterion, the probability of establishing the session key (PESKY), to study the likelihood that a conference key can be established in the presence of budget constraints. Simulations show that the likelihood of forming a group key using a treebased conference keying scheme is higher than the GDH schemes of Steiner et al.. Finally, we study the effect that greedy users have upon the Huffmanbased conference keying scheme, and present a method to mitigate the detrimental effects of the greedy users upon the total cost. Index Terms—Conference key agreement, DiffieHellman, Huffman algorithm.
Restructuring Ordered Binary Trees
"... We consider the problem of restructuring an ordered binary tree T, preserving the inorder sequence of its nodes, so as to reduce its height to some target value h. Such a restructuring necessarily involves the downward displacement of some of the nodes of T. Our results, focusing both on the maximu ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of restructuring an ordered binary tree T, preserving the inorder sequence of its nodes, so as to reduce its height to some target value h. Such a restructuring necessarily involves the downward displacement of some of the nodes of T. Our results, focusing both on the maximum displacement over all nodes and on the maximum displacement
The minimum average code for finite memoryless monotone sources
 in Proc., IEEE Information Theory Workshop
, 2002
"... Abstract—The problem of selecting a code for finite monotone sources with x symbols is considered. The selection criterion is based on minimizing the average redundancy (called Minave criterion) instead of its maximum (i.e., Minimax criterion). The average probability distribution € x, whose associa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract—The problem of selecting a code for finite monotone sources with x symbols is considered. The selection criterion is based on minimizing the average redundancy (called Minave criterion) instead of its maximum (i.e., Minimax criterion). The average probability distribution € x, whose associated Huffman code has the minimum average redundancy, is derived. The entropy of the average distribution (i.e.,