Results 11  20
of
27
Models of Bitmap Generation: A Systematic Approach to Bitmap Compression
 Inf. Proc. & Management, v28
, 1992
"... : In large IR systems, information about word occurrence may be stored in form of a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
: In large IR systems, information about word occurrence may be stored in form of a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible correlations between rows and between columns. The methods are based on partitioning the matrix into small blocks and predicting the 1bit distribution within a block by means of various bit generation models. Each block is then encoded using Huffman or arithmetic coding. The methods also use a new way of enumerating subsets of fixed size from a given superset. Preliminary experimental results indicate improvements over previous methods. 1. Introduction The common approach to processing complex boolean queries in large fulltext document retrieval systems is to use inverted files: a concordance is accessed via a dictionary, and includes for each different word of the text, the ordered list ...
DNA compression challenge revisited
 IN PROC. CPM2005 COMBINATORIAL PATTERN MATCHING, JEJU ISLAND, KOREA
, 2005
"... Standard compression algorithms are not able to compress DNA sequences. Recently, new algorithms have been introduced specifically for this purpose, often using detection of long approximate repeats. In this paper, we present another algorithm, DNAPack, based on dynamic programming. In comparison wi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Standard compression algorithms are not able to compress DNA sequences. Recently, new algorithms have been introduced specifically for this purpose, often using detection of long approximate repeats. In this paper, we present another algorithm, DNAPack, based on dynamic programming. In comparison with former existing programs, it compresses DNA slighly better, while the cost of dynamic programming is almost neglectible.
Combinatorial Representation of Generalized Fibonacci Numbers
, 1991
"... New formulae are presented which express various generalizations of Fibonacci numbers as simple sums of binomial and multinomial coefficients. The equalities are inferred from the special properties of the representations of the integers in certain numeration systems. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
New formulae are presented which express various generalizations of Fibonacci numbers as simple sums of binomial and multinomial coefficients. The equalities are inferred from the special properties of the representations of the integers in certain numeration systems.
Comparative Study between Various Algorithms of Data Compression Techniques
"... The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for "data compression", that is, the ability to reduce the amount of storage or Internet bandwidth required to handle this data. This p ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The spread of computing has led to an explosion in the volume of data to be stored on hard disks and sent over the Internet. This growth has led to a need for "data compression", that is, the ability to reduce the amount of storage or Internet bandwidth required to handle this data. This paper provides a survey of data compression techniques. The focus is on the most prominent data compression schemes, particularly popular.DOC,.TXT,.BMP,.TIF,.GIF, and.JPG files. By using different compression algorithms, we get some results and regarding to these results we suggest the efficient algorithm to be used with a certain type of file to be compressed taking into consideration both the compression ratio and compressed file size.
Bridging Lossy and Lossless Compression by Motif Pattern Discovery ∗
"... We present data compression techniques hinged on the notion of a motif, interpreted here as a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. This notion arises originally in the analysis of sequences, particularly b ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present data compression techniques hinged on the notion of a motif, interpreted here as a string of intermittently solid and wild characters that recurs more or less frequently in an input sequence or family of sequences. This notion arises originally in the analysis of sequences, particularly biomolecules, due to its multiple implications in the understanding of biological structure and function, and it has been the subject of various characterizations and study. Correspondingly, motif discovery techniques and tools have been devised. This task is made hard by the circumstance that the number of motifs identifiable in general in a sequence can be exponential in the size of that sequence. A significant gain in the direction of reducing the number of motifs is achieved through the introduction of irredundant motifs, which in intuitive terms are motifs of which the structure and list of occurrences cannot be inferred by a combination of other motifs’ occurrences. Although suboptimal, the available procedures for the extraction of some such motifs are not prohibitively expensive. Here we show that irredundant motifs can be usefully exploited in lossy compression methods based on textual substitution and suitable for signals as well as text. Actually, once the motifs in our lossy encodings are disambiguated into corresponding lossless codebooks, they still prove capable of yielding savings over popular methods in use. Preliminary experiments with these fungible strategies at the crossroads of lossless and lossy data compression show performances that improve over popular methods (i.e. GZip) by more than 20 % in lossy and 10 % in lossless implementations.
Using Fibonacci Compression Codes as Alternatives to Dense Codes
"... Abstract Recent publications advocate the use of various variable length codes forwhich each codeword consists of an integral number of bytes in compression applications using large alphabets. This paper shows that another tradeoffwith similar properties can be obtained by Fibonacci codes. These are ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract Recent publications advocate the use of various variable length codes forwhich each codeword consists of an integral number of bytes in compression applications using large alphabets. This paper shows that another tradeoffwith similar properties can be obtained by Fibonacci codes. These are fixed codeword sets, using binary representations of integers based on Fibonaccinumbers of order m> = 2. Fibonacci codes have been used before, and thispaper extends previous work presenting several novel features. In particular,
The Responsa Storage and Retrieval System  Whither?
, 1996
"... p. 173). We did develop such a tool [CCDFS1971]. As each of these methods has certain advantages and disadvantages, we ended up by merging  2  them into a joint analysissynthesis method; a global analysis of all words in the database is done, but without prepositions (otiyot shimush), in order ..."
Abstract
 Add to MetaCart
p. 173). We did develop such a tool [CCDFS1971]. As each of these methods has certain advantages and disadvantages, we ended up by merging  2  them into a joint analysissynthesis method; a global analysis of all words in the database is done, but without prepositions (otiyot shimush), in order to end up with a database of manageable size; the prepositions are left to the synthesis phase. See [AFCS1972] for full details. I also set up a "Committee for the Mechanization in Jewish Law Research" whose first members were, I think, Dr. Choueka, Mr. Asa Kasher, later professor of Philosophy at Tel Aviv University, Mr. Joseph Dueck, a young lawyer and research assistant at the IRJL, who served as their representative, and assistants, to formulate procedures for preediting and postediting texts to be inputted, and various algorithms needed for the work. (Many other persons, such as Mr. Reuven Mirkin of the Academy of the Hebrew Language, and research students, joined later.) I also felt ...
Bioinformatics
, 2003
"... Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We e ..."
Abstract
 Add to MetaCart
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.
The BurrowsWheeler compression algorithm is even better than what you have thought
, 2005
"... The best compression algorithm today for English text is based on the BurrowsWheeler transform. This algorithm (whose common implementation is bzip2) consists of the following three essential steps: 1) Obtain the BurrowsWheeler transform of the text, 2) Convert the transform into a sequence of int ..."
Abstract
 Add to MetaCart
The best compression algorithm today for English text is based on the BurrowsWheeler transform. This algorithm (whose common implementation is bzip2) consists of the following three essential steps: 1) Obtain the BurrowsWheeler transform of the text, 2) Convert the transform into a sequence of integers using the movetofront algorithm, 3) Encode the integers using arithmetic code or any order0 encoding (possibly with run length encoding). In this paper we achieve a strong bound on the worstcase compression ratio of this algorithm, that is significantly better than bounds known to date and is obtained via simple analytical techniques. Specifically, for any input string s, and µ> 1, the length of the compressed string is bounded by µ · sHk(s) + log(ζ(µ)) · s  + gk where Hk is the kth order empirical entropy, gk is a constant depending only on k and on the size of the alphabet, and ζ(µ) = 1 1 µ + 1 2 µ +... is the standard zeta function. In fact we prove a stronger result: That this bound without the additive term gk holds when we replace Hk(s) by the sum of the logarithms of the integers obtain by the movetofront encoding of the transform. This refined bound is tight and close to the actual compression achieved in practice. To obtain this result we prove a tight result on the compressibility of integer sequences, which is of independent interest. 1