Results 1 - 10
of
14
The JPEG 2000 still image compression standard
- IEEE Signal processing Magazine
, 2001
"... The development of standards (emerging and established) by the International Organization for Standardization (ISO), the International Telecommunications Union (ITU), and the International Electrotechnical Commission (IEC) for audio, image, and video, for both transmission and storage, has led to wo ..."
Abstract
-
Cited by 67 (9 self)
- Add to MetaCart
The development of standards (emerging and established) by the International Organization for Standardization (ISO), the International Telecommunications Union (ITU), and the International Electrotechnical Commission (IEC) for audio, image, and video, for both transmission and storage, has led to worldwide activity in developing hardware and software systems and products applicable to a number of diverse disciplines [7], [22], [23], [55], [56], [73]. Although the standards implicitly address the basic encoding operations, there is freedom and flexibility in the actual design and development of devices. This is because only the syntax and semantics of the bit stream for decoding are specified by standards, their main objective being the compatibility and interoperability among the systems (hardware/software) manufactured by different companies. There is, thus, much room for innovation and ingenuity. Since the mid 1980s, members from both the ITU and the ISO have been working together to establish a joint international standard for the compression of grayscale and color still images. This effort has been known as JPEG, the Joint
An overview of the JPEG2000 still image compression standard
- Signal Processing: Image Communication
, 2002
"... In 1996, the JPEGcommittee began to investigate possibilities for a new still image compression standard to serve current and future applications. This initiative, which was named JPEG2000, has resulted in a comprehensive standard (ISO 154447ITU-T Recommendation T.800) that is being issued in six pa ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In 1996, the JPEGcommittee began to investigate possibilities for a new still image compression standard to serve current and future applications. This initiative, which was named JPEG2000, has resulted in a comprehensive standard (ISO 154447ITU-T Recommendation T.800) that is being issued in six parts. Part 1, in the same vein as the JPEG baseline system, is aimed at minimal complexity and maximal interchange and was issued as an International Standard at the end of 2000. Parts 2–6 define extensions to both the compression technology and the file format and are currently in various stages of development. In this paper, a technical description of Part 1 of the JPEG2000 standard is provided, and the rationale behind the selected technologies is explained. Although the JPEG2000 standard only specifies the decoder and the codesteam syntax, the discussion will span both encoder and decoder issues to provide a better
Evaluation of Design Alternatives for the 2D-Discrete Wavelet Transform
- IEEE TRANS. CIRC. AND SYST. FOR VIDEO TECH
, 2001
"... In this paper the three main hardware architectures for the two-dimensional discrete wavelet transform (2D-DWT) are reviewed. Also optimization techniques applicable to all three architectures are described. The main contribution of this work is the quantitative comparison among these design alterna ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper the three main hardware architectures for the two-dimensional discrete wavelet transform (2D-DWT) are reviewed. Also optimization techniques applicable to all three architectures are described. The main contribution of this work is the quantitative comparison among these design alternatives for the 2D-DWT. The comparison is performed in terms of memory requirements, throughput, and energy dissipation, and is based on a theoretical analysis of the alternative architectures and schedules. Memory requirements, throughput, and energy are expressed by analytical equations with parameters from both the 2D-DWT algorithm and the implementation platform. The parameterized equations enable the early but efficient exploration of the various trade-off related to the selection to the one or the other architecture.
Ever considered SystemC
- In Proceedings of the 15th ProRISC Workshop
, 2004
"... Abstract—In recent years a lot of new C-based design languages have been developed. They all promise a smoother transition from a high level to a low level description of a hardware system. A disadvantage of these new languages is that a lot of simulation models of e.g. FPGA-cores are only available ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract—In recent years a lot of new C-based design languages have been developed. They all promise a smoother transition from a high level to a low level description of a hardware system. A disadvantage of these new languages is that a lot of simulation models of e.g. FPGA-cores are only available in standard languages like VHDL or Verilog. This makes it hard to develop a complete system with one of those new languages. However in a modular design approach this should not cause a problem. Here a distinction is made between application specific and hardware platform specific modules. It is possible to describe the application at a high level with e.g. SystemC and refine its modules with the same language, using simple models of the underlying hardware, until a level where translation to VHDL/Verilog is trivial. The platform specific modules can then be developed bottom up using the available VHDL/Verilog models of the used cores. In this paper the design of an IDWT is taken as a testcase. The design traject starts from a C-description and uses SystemC to arrive at a synthesizable level where an automatic translation to VHDL is done, which allows implementation on an FPGA. One can conclude that although tool support is only emerging recently for these new C-based languages their benefits can already be exploited when using the right language at the right place.
Analysis Of Wavelet Transform Implementations For Image And Texture Coding Applications In Programmable Platforms
- IN PROC. OF THE 2001 IEEE SIGNAL PROCESSING SYSTEMS
, 2001
"... This paper compares various software implementations of the 2-D binary-tree wavelet decomposition by analyzing the data-related cache penalties in processor-based platforms. Such penalties appear to be the dominant factors that determine performance in this type of applications. The comparisons in ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper compares various software implementations of the 2-D binary-tree wavelet decomposition by analyzing the data-related cache penalties in processor-based platforms. Such penalties appear to be the dominant factors that determine performance in this type of applications. The comparisons include various image-scanning techniques, from the classical Row-Column approach to the Local Wavelet Transform and the Line-Based Wavelet Transform, which are proposed in the framework of multimedia-coding standards. For a conflict-free cache model, a theoretical framework is constructed allowing for predictions of the data-cache penalties that are expected to diminish the system performance. The theoretical results are verified with measurements from simulations and also from a real platform.
Improving the Memory Behavior of Vertical Filtering in the Discrete Wavelet Transform
- In Proc. 3rd ACM Int. Conf. on Computing Frontiers
, 2006
"... The discrete wavelet transform (DWT) is used in several image and video compression standards, in particular JPEG2000. A 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. It is wellknown that a straightforward implementation of vertical filterin ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
The discrete wavelet transform (DWT) is used in several image and video compression standards, in particular JPEG2000. A 2D DWT consists of horizontal filtering along the rows followed by vertical filtering along the columns. It is wellknown that a straightforward implementation of vertical filtering (assuming a row-major layout) induces many cache misses, due to lack of spatial locality. This can be avoided by interchanging the loops. This paper shows, however, that the resulting implementation suffers significantly from 64K aliasing, which occurs in the Pentium 4 when two data blocks are accessed that are a multiple of 64K apart, and we propose two techniques to avoid it. In addition, if the filter length is longer than four, the number of ways of the L1 data cache of the Pentium 4 is insufficient to avoid cache conflict misses. Consequently, we propose two methods for reducing conflict misses. Although experimental results have been collected on the Pentium 4, the techniques are general and can be applied to other processors with different cache organizations as well. The proposed techniques improve the performance of vertical filtering compared to already optimized baseline implementations by a factor of 3.11 for the (5, 3) lifting scheme, 3.11 for Daubechies ’ transform of four coefficients, and by a factor of 1.99 for the Cohen, Daubechies, and Feauveau 9/7 transform.
Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors
"... up to 68 % of the JPEG2000 encoding time. In this paper, we develop efficient implementations of this important kernel on general-purpose processors (GPPs), in particular the Pentium 4 (P4). Efficient implementations of the 2-D DWT on the P4 must address three issues. First, the P4 suffers from a pr ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
up to 68 % of the JPEG2000 encoding time. In this paper, we develop efficient implementations of this important kernel on general-purpose processors (GPPs), in particular the Pentium 4 (P4). Efficient implementations of the 2-D DWT on the P4 must address three issues. First, the P4 suffers from a problem known as 64K aliasing, which can degrade performance by an order of magnitude. We propose two techniques to avoid 64K aliasing which improve performance by a factor of up to 4.20. Second, a straightforward implementation of vertical filtering incurs many cache misses. Cache performance can be improved by applying loop interchange, but there will still be many conflict misses if the filter length exceeds the cache associativity. Two methods are proposed to reduce the number of conflict misses which provide an additional performance improvement of up to 1.24. To show that these methods are general, results for the P3 and Opteron are also provided. Third, efficient implementations of the 2-D DWT must exploit the SIMD instructions supported by most GPPs, including the P4, and we present MMX and SSE implementations of horizontal and vertical filtering which provide a maximum speedup of 3.39 and 6.72, respectively. Index Terms—Cache, Discrete Wavelet Transform, memory hierarchy, multimedia extensions, SIMD.
High-Level Cache Modeling for 2-D Discrete Wavelet Transform Implementations
- JOURNAL OF VLSI SIGNAL PROC
, 2003
"... The main implementations of the 2-D binary-tree discrete wavelet decomposition are theoretically analyzed and compared with respect to data-cache performance on instruction-set processor-based realizations. These implementations include various image-scanning techniques, from the classical row-colum ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The main implementations of the 2-D binary-tree discrete wavelet decomposition are theoretically analyzed and compared with respect to data-cache performance on instruction-set processor-based realizations. These implementations include various image-scanning techniques, from the classical row-column approach to the block-based and line-based methods, which are proposed in the framework of multimedia-coding standards. Analytical parameterized equations for the prediction of data-cache misses under general realistic assumptions are proposed. The accuracy and the consistency of the theory are verified through simulations on test platforms and a comparison is made with the results from a real platform.
Cache Misses And Energy-Dissipation Results For Jpeg-2000 Filtering
"... After its establishment as a new standard for still-image coding, JPEG-2000 is now in the stage of the exploration for efficient implementation in real-life systems. In this paper we focus on the implementation of the frontend part, i.e. the filtering processes performed by the discrete wavelet tran ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
After its establishment as a new standard for still-image coding, JPEG-2000 is now in the stage of the exploration for efficient implementation in real-life systems. In this paper we focus on the implementation of the frontend part, i.e. the filtering processes performed by the discrete wavelet transform (DWT). The target platforms are programmable processors, since the modern evolution of their design positions them as good alternatives for efficient, flexible and fast time-to-market designs. As a result, to facilitate the incorporation of the new standard in such designs, we present experimental and theoretical results for the data-related cache misses that occur during the DWT, since it has been shown that these consist the main bottleneck in this type of applications. In addition to this, we also rank the presented designs with respect to the power efficiency, under a high-level power estimation scheme. To validate our models, apart from the simulation results, indicative experimental results are presented in superscalar and VLIW real- life architectures.
Combined Spatial And Subband Block Coding Of Images
- In International Conference on Image Processing, ICIP’00
, 2000
"... This paper describes a low-memory cache efficient Hybrid Block Coder (HBC) for images in which an image subband decomposition is partitioned into a combination of spatial blocks and subband blocks, which are independently coded. Spatial blocks contain hierarchical trees spanning subband levels, and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes a low-memory cache efficient Hybrid Block Coder (HBC) for images in which an image subband decomposition is partitioned into a combination of spatial blocks and subband blocks, which are independently coded. Spatial blocks contain hierarchical trees spanning subband levels, and are each encoded using the SPIHT algorithm. Subband blocks contain a block of coefficients from within a single subband, and are each encoded by the SPECK algorithm. The decomposition may have the dyadic or a wavelet packet structure. Rate is allocated amongst the sub-bitstreams produced for each block and they are packetized. The partitioning structure supports resolution embedding. The final bitstream may be progressive in fidelity or in resolution.

