• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Using difficulty of prediction to decrease computation: Fast sort, priority queue and convex hull on entropy bounded inputs (1993)

by S Chen, J H Reif
Venue:In 34th Symposium on Foundations of Computer Science
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

Opportunistic Data Structures with Applications

by Paolo Ferragina, Giovanni Manzini , 2000
"... In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space ..."
Abstract - Cited by 142 (11 self) - Add to MetaCart
In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because a text T [1, u] is stored using O(H k (T )) + o(1) bits per input symbol in the worst case, where H k (T ) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P [1; p], the opportunistic data structure allows to search for the occ occurrences of P in T in O(p + occ log u) time (for any fixed > 0). If data are uncompressible we achieve the best space bound currently known [12]; on compressible data our solution improves the succinct suffix array of [12] and the classical suffix tree and suffix array data structures either in space or in query time or both.

Faster Deterministic Sorting and Searching in Linear Space

by Arne Andersson , 1995
"... We present a significant improvement on linear space deterministic sorting and searching. On a unit-cost RAM with word size w, an ordered set of n w-bit keys (viewed as binary strings or integers) can be maintained in O ` min ` p log n; log n log w + log log n; log w log log n " time per op ..."
Abstract - Cited by 35 (7 self) - Add to MetaCart
We present a significant improvement on linear space deterministic sorting and searching. On a unit-cost RAM with word size w, an ordered set of n w-bit keys (viewed as binary strings or integers) can be maintained in O ` min ` p log n; log n log w + log log n; log w log log n " time per operation, including insert, delete, member search, and neighbour search. The cost for searching is worst-case while the cost for updates is amortized. For range queries, there is an additional cost of reporting the found keys. As an application, n keys can be sorted in linear space at a worst-case cost of O \Gamma n p log n \Delta . The best previous method for deterministic sorting and searching in linear space has been the fusion trees which supports queries in O(logn= log log n) amortized time and sorting in O(n log n= log log n) worst-case time. We also make two minor observations on adapting our data structure to the input distribution and on the complexity of perfect hashing. 1 I...

A New Efficient Radix Sort

by Arne Andersson , Stefan Nilsson , 1994
"... We present new improved algorithms for the sorting problem. The algorithms are not only efficient but also clear and simple. First, we introduce Forward Radix Sort which combines the advantages of traditional left-to-right and right-to-left radix sort in a simple manner. We argue that this algorithm ..."
Abstract - Cited by 29 (7 self) - Add to MetaCart
We present new improved algorithms for the sorting problem. The algorithms are not only efficient but also clear and simple. First, we introduce Forward Radix Sort which combines the advantages of traditional left-to-right and right-to-left radix sort in a simple manner. We argue that this algorithm will work very well in practice. Adding a preprocessing step, we obtain an algorithm with attractive theoretical properties. For example, n binary strings can be sorted in \Theta i n log i B n log n + 2 jj time, where B is the minimum number of bits that have to be inspected to distinguish the strings. This is an improvement over the previously best known result by Paige and Tarjan. The complexity may also be expressed in terms of H, the entropy of the input: n strings from a stationary ergodic process can be sorted in \Theta \Gamma n log \Gamma 1 H + 1 \Delta\Delta time, an improvement over the result recently presented by Chen and Reif.

Dynamic ordered sets with exponential search trees

by Arne Andersson, Mikkel Thorup - CoRR cs.DS/0210006. See also FOCS’96, STOC’00 , 2002
"... We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O ( � log n / log log n) for searching and updating a dynamic set X of n integer ke ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O ( � log n / log log n) for searching and updating a dynamic set X of n integer keys in linear space. Searching X for an integer y means finding the maximum key in X which is smaller than or equal to y. This problem is equivalent to the standard text book problem of maintaining an ordered set. The best previous deterministic linear space bound was O(log n / log log n) due to Fredman and Willard from STOC 1990. No better deterministic search bound was known using polynomial space. We also get the following worst-case linear space trade-offs between the number n, the word length W, and the maximal key U < 2W: O(min{log log n + log log U log n / log W, log log n · log log log U}). These trade-offs are, however, not likely to be optimal. Our results are generalized to finger searching and string searching, providing optimal results for both in terms of n.

Nonparametric Entropy Estimation for Stationary Processes and Random Fields, with Applications to English Text

by I. Kontoyiannis , P. H. Algoet, Yu. M. Suhov, A. J. Wyner , 1998
"... We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Ces`aro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Ces`aro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker. We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.

Efficient Lossless Compression of Trees and Graphs

by Shenfeng Chen, John H. Reif - In IEEE Data Compression Conference (DCC , 1996
"... In this paper, we study the problem of compressing a data structure (e.g. tree, undirected and directed graphs) in an efficient way while keeping a similar structure in the compressed form. To date, there has been no proven optimal algorithm for this problem. We use the idea of building LZW tree in ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
In this paper, we study the problem of compressing a data structure (e.g. tree, undirected and directed graphs) in an efficient way while keeping a similar structure in the compressed form. To date, there has been no proven optimal algorithm for this problem. We use the idea of building LZW tree in LZW compression to compress a binary tree generated by a stationary ergodic source in an optimal manner. We also extend our tree compression algorithm to compress undirected and directed acyclic graphs.

The Complexity and Entropy of Literary Styles

by I. Kontoyiannis , 1996
"... Since Shannon's original experiment in 1951, several methods have been applied to the problem of determining the entropy of English text. These methods were based either on prediction by human subjects, or on computer-implemented parametric models for the data, of a certain Markov order. We ask why ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Since Shannon's original experiment in 1951, several methods have been applied to the problem of determining the entropy of English text. These methods were based either on prediction by human subjects, or on computer-implemented parametric models for the data, of a certain Markov order. We ask why computer-based experiments almost always yield much higher entropy estimates than the ones produced by humans. We argue that there are two main reasons for this discrepancy. First, the long-range correlations of English text are not captured by Markovian models and, second, computerbased models only take advantage of the text statistics without being able to "understand" the contextual structure and the semantics of the given text. The second question we address is what does the "entropy" of a text say about the author's literary style. In particular, is there an intuitive notion of "complexity of style" that is captured by the entropy? We present preliminary results based on a non-parametric entropy estimation algorithm that o er partial answers to these questions. These results indicate that taking long-range correlations into account significantly improves the entropy estimates. We get an estimate of 1.77 bits-per-character for a onemillion-character sample taken from Jane Austen's works. Also comparing the estimates obtained from several di erent texts provides some insight into the interpretation of the notion of "entropy" when applied to English text rather than to random processes, and the relationship between the entropy and the "literary complexity" of an author's style. Advantages of this entropy estimation method are that it does not require prior training, it is uniformly good over different styles and languages, and it seems to converge reasonably fast.

Stationary Entropy Estimation via String Matching (Extended Abstract)

by I. Kontoyiannis, Yurii M. Suhov , 1996
"... , submitted to DCC 1996, Snowbird, Utah Ioannis Kontoyiannis Yurii M. Suhov September 1995, revised March 1996 We prove an asymptotic relationship between certain longest match-lengths along a single realization of a stationary process, and its entropy rate: Given a process X = fX n ; n 2 Zg and a ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
, submitted to DCC 1996, Snowbird, Utah Ioannis Kontoyiannis Yurii M. Suhov September 1995, revised March 1996 We prove an asymptotic relationship between certain longest match-lengths along a single realization of a stationary process, and its entropy rate: Given a process X = fX n ; n 2 Zg and a realization x from X, we define N i (x) as the length of the shortest substring starting at x i , that does not appear as a contiguous substring of (x i\GammaN ; x i\GammaN +1 ; : : : ; x i\Gamma1 ). We show that, for a class of stationary processes with finite state-space (including all i.i.d. and mixing Markov processes of all orders), the following limiting relation holds: lim N!1 P N i=1 N i (x) N log N = 1 H ; (1) almost surely and in L 1 , where H ? 0 is the entropy rate of the process. We generalize this result to the cases where the alphabet of the process is countably infinite, and to random fields in several dimensions. 1 Introduction Beginning with Wyner and Ziv's 198...

Estimating the Entropy of Binary Time Series: Methodology, Some Theory and a Simulation Study

by Yun Gao, Ioannis Kontoyiannis, Elie Bienenstock
"... entropy ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract not found

Fast Pattern Matching for Entropy Bounded Text

by Shenfeng Chen, John H. Reif - in Proceedings of DCC'95 Data Compression Conference, Snowbird , 1995
"... We present the first known case of one-dimensional and two-dimensional string matching algorithms for text with bounded entropy. Let n be the length of the text and m be the length of the pattern. We show that the expected complexity of the algorithms is related to the entropy of the text for variou ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
We present the first known case of one-dimensional and two-dimensional string matching algorithms for text with bounded entropy. Let n be the length of the text and m be the length of the pattern. We show that the expected complexity of the algorithms is related to the entropy of the text for various assumptions of the distribution of the pattern. For the case of uniformly distributed patterns, our one dimensional matching algorithm works in O(n log m=pm)) expected running time where H is the entropy of the text and p = 1 \Gamma (1 \Gamma H 2 ) H=(1+H) . The worst case running time T can also be bounded by n log m p(m+ p V ) T n log m p(m\Gamma p V ) if V is the variance of the source from which the pattern is generated. Our algorithm utilizes data structures and probabilistic analysis techniques that are found in certain lossless data compression schemes. 1 Introduction 1.1 Pattern matching problem Given a text of length n and a pattern of length m, the pattern match...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University