Results 1  10
of
101
Inverted files for text search engines
 ACM Computing Surveys
, 2006
"... The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolida ..."
Abstract

Cited by 316 (6 self)
 Add to MetaCart
The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.
The LOCOI Lossless Image Compression Algorithm: Principles and Standardization into JPEGLS
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... LOCOI (LOw COmplexity LOssless COmpression for Images) is the algorithm at the core of the new ISO/ITU standard for lossless and nearlossless compression of continuoustone images, JPEGLS. It is conceived as a "low complexity projection" of the universal context modeling paradigm, match ..."
Abstract

Cited by 246 (11 self)
 Add to MetaCart
LOCOI (LOw COmplexity LOssless COmpression for Images) is the algorithm at the core of the new ISO/ITU standard for lossless and nearlossless compression of continuoustone images, JPEGLS. It is conceived as a "low complexity projection" of the universal context modeling paradigm, matching its modeling unit to a simple coding unit. By combining simplicity with the compression potential of context models, the algorithm "enjoys the best of both worlds." It is based on a simple fixed context model, which approaches the capability of the more complex universal techniques for capturing highorder dependencies. The model is tuned for efficient performance in conjunction with an extended family of Golombtype codes, which are adaptively chosen, and an embedded alphabet extension for coding of lowentropy image regions. LOCOI attains compression ratios similar or superior to those obtained with stateoftheart schemes based on arithmetic coding. Moreover, it is within a few percentage points of the best available compression ratios, at a much lower complexity level. We discuss the principles underlying the design of LOCOI, and its standardization into JPEGLS.
Contextbased adaptive binary arithmetic coding in the h.264/avc video compression standard. Circuits and Systems for VideoTechnology, IEEETransactions on
"... (CABAC) as a normative part of the new ITUT/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel l ..."
Abstract

Cited by 207 (13 self)
 Add to MetaCart
(Show Context)
(CABAC) as a normative part of the new ITUT/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel lowcomplexity method for binary arithmetic coding and probability estimation that is well suited for efficient hardware and software implementations. CABAC significantly outperforms the baseline entropy coding method of H.264/AVC for the typical area of envisaged target applications. For a set of test sequences representing typical material used in broadcast applications and for a range of acceptable video quality of about 30 to 38 dB, average bitrate savings of 9%–14 % are achieved. Index Terms—Binary arithmetic coding, CABAC, context modeling, entropy coding, H.264, MPEG4 AVC. I.
SelfIndexing Inverted Files for Fast Text Retrieval
 ACM Transactions on Information Systems
, 1996
"... Query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Here we show that query response time for conjunctive Boolean queries and for informal ranked queries can be dramatically reduced, at little cost in terms of storage, b ..."
Abstract

Cited by 171 (27 self)
 Add to MetaCart
(Show Context)
Query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Here we show that query response time for conjunctive Boolean queries and for informal ranked queries can be dramatically reduced, at little cost in terms of storage, by the inclusion of an internal index in each inverted list. This method has been applied in a retrieval system for a collection of nearly two million short documents. Our experimental results show that the selfindexing strategy adds less than 20% to the size of the inverted file, but, for Boolean queries of 510 terms, can reduce processing time to under one fifth of the previous cost. Similarly, ranked queries of 4050 terms can be evaluated in as little as 25% of the previous time, with little or no loss of retrieval effectiveness.
Code and parse trees for lossless source encoding
 Communications in Information and Systems
, 2001
"... This paper surveys the theoretical literature on fixedtovariablelength lossless source code trees, called code trees, and on variablelengthtofixed lossless sounce code trees, called parse trees. Huffman coding [ l] is the most well known code tree problem, but there are a number of interestin ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
(Show Context)
This paper surveys the theoretical literature on fixedtovariablelength lossless source code trees, called code trees, and on variablelengthtofixed lossless sounce code trees, called parse trees. Huffman coding [ l] is the most well known code tree problem, but there are a number of interesting variants of the problem formulation which lead to other combinatorial optimization problems. Huffman coding as an
A lowcomplexity modeling approach for embedded coding of wavelet coefficients
 IN PROC. 1998 IEEE DATA COMPRESSION CONFERENCE
, 1998
"... We present a new lowcomplexity method for modeling and coding the bitplanes of a wavelettransformed image in a fully embedded fashion. The scheme uses a simple ordering model for embedding, based on the principle that coefficient bits that are likely to reduce the distortion the most should be d ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
(Show Context)
We present a new lowcomplexity method for modeling and coding the bitplanes of a wavelettransformed image in a fully embedded fashion. The scheme uses a simple ordering model for embedding, based on the principle that coefficient bits that are likely to reduce the distortion the most should be described first in the encoded bitstream. The ordering model is tied to a conditioning model in a way that deinterleaves the conditioned subsequences of coefficient bits, making them amenable to coding with a very simple, adaptive
Fast and efficient lossless image compression
 in Proc. 1993 Data Compression Conference, (Snowbird)
, 1993
"... We present a new method for lossless image compression that gives compression comparable to JPEG lossless mode with about five times the speed. Our method, called FELICS, is based on a novel use of two neighboring pixels for both prediction and error modeling. For coding we use single bits, adjusted ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
We present a new method for lossless image compression that gives compression comparable to JPEG lossless mode with about five times the speed. Our method, called FELICS, is based on a novel use of two neighboring pixels for both prediction and error modeling. For coding we use single bits, adjusted binary codes, and Golomb or Rice codes. For the latter we present and analyze a provably good method for estimating the single coding parameter.
Lossless Compression of Color Mosaic Images,” presented at the Int
 McMaster University
, 2004
"... Abstract—Lossless compression of color mosaic images poses a unique and interesting problem of spectral decorrelation of spatially interleaved R, G, B samples. We investigate reversible lossless spectralspatial transforms that can remove statistical redundancies in both spectral and spatial doma ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Lossless compression of color mosaic images poses a unique and interesting problem of spectral decorrelation of spatially interleaved R, G, B samples. We investigate reversible lossless spectralspatial transforms that can remove statistical redundancies in both spectral and spatial domains and discover that a particular wavelet decomposition scheme, called Mallat wavelet packet transform, is ideally suited to the task of decorrelating color mosaic data. We also propose a lowcomplexity adaptive contextbased Golomb–Rice coding technique to compress the coefficients of Mallat wavelet packet transform. The lossless compression performance of the proposed method on color mosaic images is apparently the best so far among the existing lossless image codecs. Index Terms—Context quantization, entropy coding, digital camera, image compression. I.
Parameterised Compression for Sparse Bitmaps
 Proc. ACMSIGIR International Conference on Research and Development in Information Retrieval
, 1992
"... : Fulltext retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a ..."
Abstract

Cited by 31 (8 self)
 Add to MetaCart
: Fulltext retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques. Categories and Subject Descriptors: E.4 [Coding and Information Theory]: Data compaction and compression; H.3.2 [Information Storage]: File organisation . Keywords: Fulltext retrieval, data compression, document database, Huffman coding, geometric distribution, inverted file. 1 Introduction Fulltext retrieval systems are used for storing and accessing document collections such as newspaper a...