Results 11 - 20
of
136
Compressing Integers for Fast File Access
- The Computer Journal
, 1999
"... this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare di#erent approaches to compressing integers, including the Elias gamma and delta codes, Golom ..."
Abstract
-
Cited by 51 (13 self)
- Add to MetaCart
this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare di#erent approaches to compressing integers, including the Elias gamma and delta codes, Golomb coding, and a variable-byte integer scheme. As a conclusion, we recommend that, for fast access to integers, files be stored compressed
System-on-a-Chip Test-Data Compression and Decompression Architectures Based on Golomb Codes
, 2001
"... We present a new test-data compression method and decompression architecture based on variable-to-variable-length Golomb codes. The proposed method is especially suitable for encoding precomputed test sets for embedded cores in a system-on-a-chip (SoC). The major advantages of Golomb coding of test ..."
Abstract
-
Cited by 50 (11 self)
- Add to MetaCart
We present a new test-data compression method and decompression architecture based on variable-to-variable-length Golomb codes. The proposed method is especially suitable for encoding precomputed test sets for embedded cores in a system-on-a-chip (SoC). The major advantages of Golomb coding of test data include very high compression, analytically predictable compression results, and a low-cost and scalable on-chip decoder. In addition, the novel interleaving decompression architecture allows multiple cores in an SoC to be tested concurrently using a single automatic test equipment input-output channel. We demonstrate the effectiveness of the proposed approach by applying it to the Internaional Symposium on Circuits and Systems' benchmark circuits and to two industrial production circuits. We also use analytical and experimental means to highlight the superiority of Golomb codes over run-length codes.
Adding Compression to Block Addressing Inverted Indexes
, 2000
"... . Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it dire ..."
Abstract
-
Cited by 47 (26 self)
- Add to MetaCart
. Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it directly and faster than the uncompressed text. Inverted index compression obtains significant reduction of their original size at the same processing speed. Block addressing makes the inverted lists point to text blocks instead of exact positions and pay the reduction in space with some sequential text scanning. In this work we combine the three ideas in a single scheme. We present a compressed inverted file that indexes compressed text and uses block addressing. We consider different techniques to compress the index and study their performance with respect to the block size. We compare the index against three separate techniques for varying block sizes, showing that our index is superior to each isolated approach. For instance, with just 4% of extra space overhead the index has to scan less than 12% of the text for exact searches and about 20% allowing one error in the matches. Keywords: Text compression, inverted files, block addressing, text databases. 1.
Indexing and Retrieval for Genomic Databases
- IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Genomic sequence databases are widely used by molecular biologists for homology searching. Amino-acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationall ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
Genomic sequence databases are widely used by molecular biologists for homology searching. Amino-acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences only and to reduce the costs of the alignments that are attempted. We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in signi cant savings in computationally intensive local alignments, and that index-based searching is as accurate as existing exhaustive search schemes.
A Low-Complexity Modeling Approach for Embedded Coding of Wavelet Coefficients
- In Proc. IEEE Data Compression Conference
, 1997
"... We present a new low-complexity method for modeling and coding the bitplanes of a wavelet-transformed image in a fully embedded fashion. The scheme uses a simple ordering model for embedding, based on the principle that coefficient bits that are likely to reduce the distortion the most should be ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
We present a new low-complexity method for modeling and coding the bitplanes of a wavelet-transformed image in a fully embedded fashion. The scheme uses a simple ordering model for embedding, based on the principle that coefficient bits that are likely to reduce the distortion the most should be described first in the encoded bitstream. The ordering model is tied to a conditioning model in a way that deinterleaves the conditioned subsequences of coefficient bits, making them amenable to coding with a very simple, adaptive elementary Golomb code. The proposed scheme, without relying on zerotrees or arithmetic coding, attains PSNR vs. bit rate performance superior to that of SPIHT, and competitive with its arithmetic coding variant, SPIHT-AC. Keywords: Progressive image compression; Laplacian density; Run-length coding; Rate-distortion. 1 1 Introduction Progressive image compression refers to the encoding of an image into a bitstream that can be parsed efficiently to obtain ...
Efficient Single-Pass Index Construction for Text Databases
- Jour. of the American Society for Information Science and Technology
, 2003
"... Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this paper, we review the principal approaches to inversion, analyse their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approa ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this paper, we review the principal approaches to inversion, analyse their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
Fast and efficient lossless image compression
- in Proc. 1993 Data Compression Conference, (Snowbird)
, 1993
"... We present a new method for lossless image compression that gives compression comparable to JPEG lossless mode with about five times the speed. Our method, called FELICS, is based on a novel use of two neighboring pixels for both prediction and error modeling. For coding we use single bits, adjusted ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
We present a new method for lossless image compression that gives compression comparable to JPEG lossless mode with about five times the speed. Our method, called FELICS, is based on a novel use of two neighboring pixels for both prediction and error modeling. For coding we use single bits, adjusted binary codes, and Golomb or Rice codes. For the latter we present and analyze a provably good method for estimating the single coding parameter.
In-Place versus Re-Build versus Re-Merge: Index Maintenance Strategies for . . .
- IN PROCEEDINGS OF THE 27TH CONFERENCE ON AUSTRALASIAN COMPUTER SCIENCE
, 2004
"... Indexes are the key technology underpinning efficient text search. A range of algorithms have been developed for fast query evaluation and for index creation, but update algorithms for high-performance indexes have not been evaluated or even fully described. In this paper, we explore the three main ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Indexes are the key technology underpinning efficient text search. A range of algorithms have been developed for fast query evaluation and for index creation, but update algorithms for high-performance indexes have not been evaluated or even fully described. In this paper, we explore the three main alternative strategies for index update: in-place update, index merging, and complete re-build. Our experiments with large volumes of web data show that re-merge is for large numbers of updates the fastest approach, but in-place update is suitable when the rate of update is low or buffer size is limited.
Parameterised Compression for Sparse Bitmaps
- Proc. ACM-SIGIR International Conference on Research and Development in Information Retrieval
, 1992
"... : Full-text retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
: Full-text retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques. Categories and Subject Descriptors: E.4 [Coding and Information Theory]: Data compaction and compression; H.3.2 [Information Storage]: File organisation . Keywords: Full-text retrieval, data compression, document database, Huffman coding, geometric distribution, inverted file. 1 Introduction Full-text retrieval systems are used for storing and accessing document collections such as newspaper a...

