Results 1 -
4 of
4
Cache Misses And Energy-Dissipation Results For Jpeg-2000 Filtering
"... After its establishment as a new standard for still-image coding, JPEG-2000 is now in the stage of the exploration for efficient implementation in real-life systems. In this paper we focus on the implementation of the frontend part, i.e. the filtering processes performed by the discrete wavelet tran ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
After its establishment as a new standard for still-image coding, JPEG-2000 is now in the stage of the exploration for efficient implementation in real-life systems. In this paper we focus on the implementation of the frontend part, i.e. the filtering processes performed by the discrete wavelet transform (DWT). The target platforms are programmable processors, since the modern evolution of their design positions them as good alternatives for efficient, flexible and fast time-to-market designs. As a result, to facilitate the incorporation of the new standard in such designs, we present experimental and theoretical results for the data-related cache misses that occur during the DWT, since it has been shown that these consist the main bottleneck in this type of applications. In addition to this, we also rank the presented designs with respect to the power efficiency, under a high-level power estimation scheme. To validate our models, apart from the simulation results, indicative experimental results are presented in superscalar and VLIW real- life architectures.
VLSI Design for High-Speed Image Computing Using Fast Convolution-Based Discrete Wavelet Transform
, 2009
"... Abstract—This paper presents a VLSI design for very high-speed image computing using discrete wavelet transform. The proposed architecture, based on new and fast convolution approach, reduces the hardware complexity in addition to reduce the critical path to the multiplier delay. Furthermore, an adv ..."
Abstract
- Add to MetaCart
Abstract—This paper presents a VLSI design for very high-speed image computing using discrete wavelet transform. The proposed architecture, based on new and fast convolution approach, reduces the hardware complexity in addition to reduce the critical path to the multiplier delay. Furthermore, an advanced two-dimensional (2-D) discrete wavelet transform (DWT) implementation, with an efficient memory area, is designed to produce one output in every clock cycle. As a result, a very high-speed is attained. The system is verified, using JPEG2000 coefficients filters, on Xilinx Virtex-II Field Programmable Gate Array (FPGA) device without accessing any external memory. The resulting computing rate is up to 275 M samples/s and the (9,7) 2-D wavelet filter uses only 16 kb of memory with 256×256 image size. In this way, the developed design requests reduced memory and provides very high-speed processing as well as high PSNR quality.
A reconfigurable platform for high-end real-time digital film processing ∗
"... Digital film processing is characterized by a resolution of at least 2K (2048x1536 pixels per frame at 30 bit/pixel and 24 pictures/s, data rate of 2.2 GBit/s); higher resolutions of 4K (8.8 GBit/s) and even 8K (35.2 GBit/s) are on their way. Real-time processing at this data rate is beyond the scop ..."
Abstract
- Add to MetaCart
Digital film processing is characterized by a resolution of at least 2K (2048x1536 pixels per frame at 30 bit/pixel and 24 pictures/s, data rate of 2.2 GBit/s); higher resolutions of 4K (8.8 GBit/s) and even 8K (35.2 GBit/s) are on their way. Real-time processing at this data rate is beyond the scope of today’s standard and DSP processors, and ASICs are not economically viable due to the small market volume. Therefore, an FPGA-based approach was followed in the FlexFilm project. Different applications are supported on a single hardware platform by using different FPGA configurations. The multi-board, multi-FPGA hardware/software architecture is based on Xilinx Virtex-II Pro FPGAs which contain the reconfigurable image stream processing data path, large SDRAM memories for multiple frame storage and a PCI express communication backbone network. The FPGAembedded CPU is used for control and less computation intensive tasks. This paper will focus on three key aspects: a) the used design methodology which combines macro component configuration and macro-level floorplanning with weak programmability using distributed microcoding, b) the global communication framework with communication scheduling and c) the configurable, multi-stream scheduling SDRAM controller with QoS support by access prioritization and traffic shaping. As an example, a complex noise reduction algorithm including a 2.5 dimensions DWT and a full 16x16 motion estimation at 24 fps requiring a total of 203 Gops/s net computing performance and a total of 28 Gbit/s DDR-SDRAM frame memory bandwidth will be shown.
VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High-Speed Image Computing
"... Abstract—This paper presents a VLSI design approach of a highspeed and real-time 2-D Discrete Wavelet Transform computing. The proposed architecture, based on new and fast convolution approach, reduces the hardware complexity in addition to reduce the critical path to the multiplier delay. Furthermo ..."
Abstract
- Add to MetaCart
Abstract—This paper presents a VLSI design approach of a highspeed and real-time 2-D Discrete Wavelet Transform computing. The proposed architecture, based on new and fast convolution approach, reduces the hardware complexity in addition to reduce the critical path to the multiplier delay. Furthermore, an advanced twodimensional (2-D) discrete wavelet transform (DWT) implementation, with an efficient memory area, is designed to produce one output in every clock cycle. As a result, a very highspeed is attained. The system is verified, using JPEG2000 coefficients filters, on Xilinx Virtex-II Field Programmable Gate Array (FPGA) device without accessing any external memory. The resulting computing rate is up to 270 M samples/s and the (9,7) 2-D wavelet filter uses only 18 kb of memory (16 kb of first-in-first-out memory) with 256×256 image size. In this way, the developed design requests reduced memory and provide very high-speed processing as well as high PSNR quality.

