Results 1  10
of
123
Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences
 In SIGMOD
, 1997
"... Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access. In this paper we consider a very large dataset com ..."
Abstract

Cited by 102 (14 self)
 Add to MetaCart
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access. In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5% error in any data value after compressing to a mere 2.5% of the original space (i.e., a 40:1 compression ratio), with these numbers not very sensitive to dataset size. Experiments on aggregate queries achieved a 0.5% reconstruction error with a space requirement under 2%. 1 Introduction The bulk of the data...
Linear Approximation of Shortest Superstrings
, 1991
"... We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used ..."
Abstract

Cited by 76 (5 self)
 Add to MetaCart
We consider the following problem: given a collection of strings s 1 ; . . . ; s m , find the shortest string s such that each s i appears as a substring (a consecutive block) of s. Although this problem is known to be NPhard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of distinct strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomialtime algorithm is a recent O(n log n) result. We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the sup...
A New Challenge for Compression Algorithms: Genetic Sequences
 Information Processing & Management
, 1994
"... Universal data compression algorithms fail to compress genetic sequences. It is due to the specificity of this particular kind of "text". We analyze in some details the properties of the sequences, which cause the failure of classical algorithms. We then present a lossless algorithm, biocompress2, ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
Universal data compression algorithms fail to compress genetic sequences. It is due to the specificity of this particular kind of "text". We analyze in some details the properties of the sequences, which cause the failure of classical algorithms. We then present a lossless algorithm, biocompress2, to compress the information contained in DNA and RNA sequences, based on the detection of regularities, such as the presence of palindromes. The algorithm combines substitutional and statistical methods, and to the best of our knowledge, lead to the highest compression of DNA. The results, although not satisfactory, gives insight to the necessary correlation between compression and comprehension of genetic sequences. 1 Introduction There are plenty of specific types of data which need to be compressed, for ease of storage and communication. Among them are texts (such as natural language and programs), images, sounds, etc. In this paper, we focus on the compression of a specific kin...
Finding Maximal Repetitions in a Word in Linear Time
 In Symposium on Foundations of Computer Science
, 1999
"... A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in, that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in.We first prove a combi ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
A repetition in a word is a subword with the period of at most half of the subword length. We study maximal repetitions occurring in, that is those for which any extended subword of has a bigger period. The set of such repetitions represents in a compact way all repetitions in.We first prove a combinatorial result asserting that the sum of exponents of all maximal repetitions of a word of length is bounded by a linear function in. This implies, in particular, that there is only a linear number of maximal repetitions in a word. This allows us to construct a lineartime algorithm for finding all maximal repetitions. Some consequences and applications of these results are discussed, as well as related works. 1.
Dynamic Dictionary Matching
, 1993
"... We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any p ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any pattern of the dictionary in the text. Let D 0 be the empty dictionary. We present an algorithm that performs any sequence of the following operations in the given time bounds: (1) insert(p; D i01 ): Insert pattern p[1; m] into the dictionary D i01 . D i is the dictionary after the operation. The time complexity is O(m log jD i j). (2) delete(p; D i01 ): Delete pattern p[1; m] from the dictionary D i01 . D i is the dictionary after the operation. The time complexity is O(m log jD i01 j). (3) search(t; D i ): Search text t[1; n] for all occurrences of the patterns of dictionary D i . The time complexity is O((n + tocc) log jD i j), where tocc is the total number of occurrences of patterns i...
Compressing relations and indexes
 In proceedings of IEEE International Conference on Data Engineering
, 1998
"... We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially e ective for records with many low to medium cardinality elds and numeric elds. In addition, this new technique supports very fast decompression. Promisin ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially e ective for records with many low to medium cardinality elds and numeric elds. In addition, this new technique supports very fast decompression. Promising application domains include decision support systems (DSS), since \fact tables", which are by far the largest tables in these applications, contain many low and medium cardinality elds and typically no text elds. Further, our decompression rates are faster than typical disk throughputs for sequential scans � in contrast, gzip is slower. This is important in DSS applications, which often scan large ranges of records. An important distinguishing characteristic of our algorithm, in contrasttocompression algorithms proposed earlier, is that we can decompress individual tuples (even individual elds), rather than a full page (or an entire relation) at a time. Also, all the information needed for tuple decompression resides on the same page with the tuple. This means that a page can be stored in the bu er pool and used in compressed form, simplifying the job of the bu er manager and improving memory utilization. Our compression algorithm also improves index structures such as Btrees and Rtrees signi cantly by reducing the number of leaf pages and compressing index entries, which greatly increases the fanout. We can also use lossy compression on the internal nodes of an index. 1
MobilityBased Predictive Call Admission Control and Bandwidth Reservation in Wireless Cellular Networks
 IEEE INFOCOM
, 2001
"... This paper presents call admission control and bandwidth reservation schemes in wireless cellular networks that have been developed based on assumptions more realistic than existing proposals. In order to guarantee the handoff dropping probability, we propose to statistically predict user mobility b ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
This paper presents call admission control and bandwidth reservation schemes in wireless cellular networks that have been developed based on assumptions more realistic than existing proposals. In order to guarantee the handoff dropping probability, we propose to statistically predict user mobility based on the mobility history of users. Our mobility prediction scheme is motivated by computational learning theory, which has shown that prediction is synonymous with data compression. We derive our mobility prediction scheme from data compression techniques that are both theoretically optimal and good in practice. In order to utilize resource more efficiently, we predict not only the cell to which the mobile will handoff but also when the handoff will occur. Based on the mobility prediction, bandwidth is reserved to guarantee some target handoff dropping probability. We also adaptively control the admission threshold to achieve a better balance between guaranteeing handoff dropping probability and maximizing resource utilization. Simulation results show that the proposed schemes meet our design goals and outperform the staticreservation and cellreservation schemes. Paper submitted to Computer Networks. This paper is based on a paper presented at IEEE Infocom 2001, Anchorage, Alaska, April 2001. Technical subject area: call admission control, bandwidth reservation, mobility prediction. Please address all correspondence to Professor Victor Leung at the above address. This work was supported by a grant from Motorola Canada Ltd., and by the Canadian Natural Sciences and Engineering Research Council under grant CRDPJ 223095. MobilityBased Predictive Call Admission Control and Bandwidth Reservation in Wireless Cellular Networks Yu 1 I.
Data Compression and Database Performance
 In Proc. ACM/IEEECS Symp. On Applied Computing
, 1991
"... Data compression is widely used in data management to save storage space and network bandwidth. In this report, we outline the performance improvements that can be achieved by exploiting data compression in query processing. The novel idea is to leave data in compressed state as long as possible, an ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
Data compression is widely used in data management to save storage space and network bandwidth. In this report, we outline the performance improvements that can be achieved by exploiting data compression in query processing. The novel idea is to leave data in compressed state as long as possible, and to only uncompress data when absolutely necessary. We will show that many query processing algorithms can manipulate compressed data just as well as decompressed data, and that processing compressed data can speed query processing by a factor much larger than the compression factor.
Crowd Modelling in Collaborative Virtual Environments
 ACM VRST
, 1998
"... This paper presents a crowd modelling method in Collaborative Virtual Environment (CVE) which aims to create a sense of group presence to provide a more realistic virtual world. An adaptive display is also presented as a key element to optimise the needed information to keep an acceptable frame rate ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
This paper presents a crowd modelling method in Collaborative Virtual Environment (CVE) which aims to create a sense of group presence to provide a more realistic virtual world. An adaptive display is also presented as a key element to optimise the needed information to keep an acceptable frame rate during crowd visualisation. This system has been integrated in the several CVE platforms which will be presented at the end of this paper. 1.1 Keywords Autonomous agents, virtual crowds, virtual environments.