Results 1 - 10
of
25
Order-independent texture synthesis
, 2002
"... Search-based texture synthesis algorithms are sensitive to the order in which texture samples are generated; different synthesis orders yield different textures. Unfortunately, most polygon rasterizers and ray tracers do not guarantee the order with which surfaces are sampled. To circumvent this pro ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
(Show Context)
Search-based texture synthesis algorithms are sensitive to the order in which texture samples are generated; different synthesis orders yield different textures. Unfortunately, most polygon rasterizers and ray tracers do not guarantee the order with which surfaces are sampled. To circumvent this problem, textures are synthesized beforehand at some maximum resolution and rendered using texture mapping. We describe a search-based texture synthesis algorithm in which samples can be generated in arbitrary order, yet the resulting texture remains identical. The key to our algorithm is a pyramidal representation in which each texture sample depends only on a fixed number of neighboring samples at each level of the pyramid. The bottom (coarsest) level of the pyramid consists of a noise image, which is small and predetermined. When a sample is requested by the renderer, all samples on which it depends are generated at once. Using this approach, samples can be generated in any order. To make the algorithm efficient, we propose storing texture samples and their dependents in a pyramidal cache. Although the first few samples are expensive to generate, there is substantial reuse, so subsequent samples cost less. Fortunately, most rendering algorithms exhibit good coherence, so cache reuse is high.
The Design of a Parallel Graphics Interface
, 1998
"... It has become increasingly difficult to drive a modern highperformance graphics accelerator at full speed with a serial immediate -mode graphics interface. To resolve this problem, retainedmode constructs have been integrated into graphics interfaces. While retained-mode constructs provide a good so ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
(Show Context)
It has become increasingly difficult to drive a modern highperformance graphics accelerator at full speed with a serial immediate -mode graphics interface. To resolve this problem, retainedmode constructs have been integrated into graphics interfaces. While retained-mode constructs provide a good solution in many cases, at times they provide an undesirable interface model for the application programmer, and in some cases they do not solve the performance problem. In order to resolve some of these cases, we present a parallel graphics interface that may be used in conjunction with the existing API as a new paradigm for highperformance graphics applications.
Prefetching in a texture cache architecture
- SIGGRAPH / Eurographics Workshop on Graphics Hardware
, 1998
"... Texture mapping has become so ubiquitous in real-time graphics hardware that many systems are able to perform filtered texturing without any penalty in fill rate. The computation rates available in hardware have been outpacing the memory access rates, and texture systems are becoming constrained by ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
(Show Context)
Texture mapping has become so ubiquitous in real-time graphics hardware that many systems are able to perform filtered texturing without any penalty in fill rate. The computation rates available in hardware have been outpacing the memory access rates, and texture systems are becoming constrained by memory bandwidth and latency. Caching in conjunction with prefetching can be used to alleviate this problem. In this paper, we introduce a prefetching texture cache architecture designed to take advantage of the access characteristics of texture mapping. The structures needed are relatively simple and are amenable to high clock rates. To quantify the robustness of our architecture, we identify a set of six scenes whose texture locality varies over nearly two orders of magnitude and a set of four memory systems with varying bandwidths and latencies. Through the use of a cycle-accurate simulation, we demonstrate that even in the presence of a high-latency memory system, our architecture can attain at least 97 % of the performance of a zerolatency memory system.
Cache Performance for Multimedia Applications
- In Proceedings of the 15th IEEE International Conference on Supercomputing
, 2001
"... The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache performance due to non-locality of data references. Despite this, there is no published research deriving or measuring the ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
(Show Context)
The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache performance due to non-locality of data references. Despite this, there is no published research deriving or measuring these qualities. Utilizing the previously developed Berkeley Multimedia Workload, we present the results of execution driven cache simulations with the goal of aiding future media processing architecture design. Our analysis examines the differences between multimedia and traditional applications in cache behavior. We find that multimedia applications actually exhibit lower instruction miss ratios and comparable data miss ratios when contrasted with other widely studied workloads. In addition, we find that longer data cache line sizes than are currently used would benefit multimedia processing.
Parallel Texture Caching
, 1999
"... The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the g ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems a...
Dynamic 3D Graphics Workload Characterization and the Architectural Implications
- In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO
, 1999
"... Although PC-class 3D graphics hardware has made significant strides in the last several years, the underlying architectural design principles are still generally considered as a black art. The quantitative approach prevalent in mainstream computer architecture design is rarely applied, at least as f ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
Although PC-class 3D graphics hardware has made significant strides in the last several years, the underlying architectural design principles are still generally considered as a black art. The quantitative approach prevalent in mainstream computer architecture design is rarely applied, at least as far as publicly available research literature is concerned. One main reason for this deficiency is the absence of a detailed workload characterization of 3D applications. This paper reports the results of a dynamic 3D workload characterization effort, which, to the best of our knowledge, is one of the first such studies that have ever been attempted. This study is different from previous similar studies because it focuses on dynamic behaviors of 3D applications, specifically, correlations of workload statistics among neighboring frames in interactive 3D applications. Such inter-frame coherence is the basis of many performance optimization techniques, but has never been actually quantified in a syste...
A Hardware/Software Co-Simulation Environment for Graphics Accelerator Development in ARM-Based SOCs
- In Proceedings of 13th Annual Workshop on Circuits, Systems and Signal Processing, ProRISC 2002
, 2002
"... This paper focuses on the challenging aspects of developing a versatile hardware/software co-design and co-simulation environment for the development of 3D graphics hardware accelerators in ARM-based system-on-chip designs. The tool we propose integrates the ARMulator, the cycle-accurate instruction ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper focuses on the challenging aspects of developing a versatile hardware/software co-design and co-simulation environment for the development of 3D graphics hardware accelerators in ARM-based system-on-chip designs. The tool we propose integrates the ARMulator, the cycle-accurate instruction-level simulator for the ARM lowpower processor family, with an augmented open source SystemC modeling framework and simulation engine, which allows the development of cycle-accurate or more abstract models of software algorithms, hardware architectures, and system-level design. The tool permits the simulation of an entire computer graphics pipeline allowing experimental software/hardware partitioning schemes, and performance monitoring in terms of throughput and power consumption. Moreover, it provides graphical output for the visualization of the potential impact tweaking the algorithms or the bit operand width precision may have on the resulted image quality.
Compression-Based 3D Texture Mapping for Real-Time Rendering
"... While 2D texture mapping is one of the most powerful rendering techniques that make 3D objects appear visually interesting, it suffers from visual artifacts produced when 2D image patterns are wrapped onto the surface of objects with arbitrary shapes. On the other hand, 3D texture mapping generat ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
While 2D texture mapping is one of the most powerful rendering techniques that make 3D objects appear visually interesting, it suffers from visual artifacts produced when 2D image patterns are wrapped onto the surface of objects with arbitrary shapes. On the other hand, 3D texture mapping generates highly natural visual effects in which objects appear carved from lumps of materials rather than laminated with thin sheets as in 2D texture mapping. Storing 3D texture images in a table for fast mapping computations, instead of evaluating procedures on the fly,however, has been considered impractical due to the extremely high memory requirements. In this paper, we present a new effective method for 3D texture mapping designed for real-time rendering of polygonal models. Our scheme attempts to resolve the potential texture memory problem arising from the very large size of 3D images by compressing them using a wavelet-based encoding method. The experimental results on various non-t...
The Best Distribution for a Parallel OpenGL 3D Engine with Texture Caches
- In Proceedings of the 6th International Symposium on High Performance Computer Architecture
, 2000
"... The quality of a real-time high end virtual reality system depends on its ability to draw millions of textured triangles in 1/60s. The idea of using commodity PC 3D accelerators to build a parallel machine instead of custom ASICs seems more and more attractive as such chips are getting faster. If im ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
The quality of a real-time high end virtual reality system depends on its ability to draw millions of textured triangles in 1/60s. The idea of using commodity PC 3D accelerators to build a parallel machine instead of custom ASICs seems more and more attractive as such chips are getting faster. If image parallelism is used, designers have the choice between two distributions: line interleaving and square block interleaving. Having a fixed block shape and size makes chip design easier. A PC 3D accelerator has a cost-effective external bus and an on-chip texture cache. The performance of such a cache depends on spatial locality. If the image is rendered in multiple engines, this locality is reduced. Locality and load balancing depend on the distribution scheme of the machine. This paper investigates the impact of the distribution scheme on the performance of such a machine. We use detailed cache and memory system simulations on virtual reality benchmarks running on different configurati...
A reconfigurable multilevel parallel texture cache memory with 75-gb/s parallel cache replacement bandwidth.IEEEJournalofSolid-StateCircuits
, 2002
"... Abstract—Recently, the level of realism in PC graphics applications has been approaching that of high-end graphics workstations, necessitating a more sophisticated texture data cache memory to overcome the finite bandwidth of the AGP or PCI bus. This paper proposes a multilevel parallel texture cach ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—Recently, the level of realism in PC graphics applications has been approaching that of high-end graphics workstations, necessitating a more sophisticated texture data cache memory to overcome the finite bandwidth of the AGP or PCI bus. This paper proposes a multilevel parallel texture cache memory to reduce the required data bandwidth on the AGP or PCI bus and to accelerate the operations of parallel graphics pipelines in PC graphics cards. The proposed cache memory is fabricated by 0.16- m DRAM-based SOC technology. It is composed of four components: an 8-MB DRAM L2 cache, 8-way parallel SRAM L1 caches, pipelined texture data filters, and a serial-to-parallel loader. For high-speed parallel L1 cache data replacement, the internal bus bandwidth has been maximized up to 75 GB/s with a newly proposed hidden double data transfer scheme. In addition, the cache memory has a reconfigurable architecture in its line size for optimal caching performance in various graphics applications from three-dimensional (3-D) games to high-quality 3-D movies. This architecture also leads to optimal power consumption with an adaptive sub-wordline activation scheme. The pipelined texture data filters and the dedicated structure of the L1 caches implemented by the DRAM peripheral transistors show the potential of DRAM-based SOC design with better performance-to-cost ratio. Index Terms—3-D graphics, DRAM-based SOC, DRAM L2 cache, multilevel parallel cache, texture cache.