Results 1 - 10
of
29
NanoFabrics: Spatial Computing Using Molecular Electronics
"... The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part to the physics of deep-submicron CMOS devices and the costs of both chip masks and future fabrication plants. A promising solution to these problems is offered by a ..."
Abstract
-
Cited by 110 (9 self)
- Add to MetaCart
The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part to the physics of deep-submicron CMOS devices and the costs of both chip masks and future fabrication plants. A promising solution to these problems is offered by an alternative to CMOS-based computing, chemically assembled electronic nanotechnology (CAEN). In this paper we outline how CAEN-based computing can become a reality. We briefly describe recent work in CAEN and how CAEN will affect computer architecture. We show how the inherently reconfigurable nature of CAEN devices can be exploited to provide high-density chips with defect tolerance at significantly reduced manufacturing costs. We develop a layered abstract architecture for CAEN-based computing devices and we present preliminary results which indicate that such devices will be competitive with CMOS circuits.
Cube-4 -- A Scalable Architecture for Real-Time Volume Rendering
, 1996
"... We present Cube-4, a special-purpose volume rendering architecture that is capable of rendering high-resolution (e.g., 1024³) datasets at 30 frames per second. The underlying algorithm, called slice-parallel ray-casting, uses tri-linear interpolation of samples between data slices for parallel and p ..."
Abstract
-
Cited by 86 (30 self)
- Add to MetaCart
We present Cube-4, a special-purpose volume rendering architecture that is capable of rendering high-resolution (e.g., 1024³) datasets at 30 frames per second. The underlying algorithm, called slice-parallel ray-casting, uses tri-linear interpolation of samples between data slices for parallel and perspective projections. The architecture uses a distributed interleaved memory, several parallel processing pipelines, and an innovative parallel dataflow scheme that requires no global communication, except at the pixel level. This leads to local, fixed bandwidth interconnections and has the benefits of high memory bandwidth, real-time data input, modularity, and scalability. We have simulated the architecture and have implemented a working prototype of the complete hardware on a configurable custom hardware machine. Our results indicate true real-time performance for high-resolution datasets and linear scalability of performance with the number of processing pipelines.
Reconfigurable Computing for Digital Signal Processing: A Survey
- Journal of VLSI Signal Processing
, 2000
"... Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of ch ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follow Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance. This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems o...
Configurable computing solutions for automatic target recognition
- Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines
, 1996
"... FPGAs can be used to build systems for automatic target recognition (ATR) that achieve an order of magnitude increase in performance over systems built using general purpose processors. This improvement is possible because the bit-level operations that comprise much of the ATR computational burden m ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
FPGAs can be used to build systems for automatic target recognition (ATR) that achieve an order of magnitude increase in performance over systems built using general purpose processors. This improvement is possible because the bit-level operations that comprise much of the ATR computational burden map extremely efficiently into FPGAs, and because the specificity of ATR target templates can be leveraged via fast reconfiguration. We describe here algorithms, design tools, and implementation strategies that are being used in a configurable computing
Reconfigurable Computing Systems
- Proceedings of the IEEE
, 2002
"... Reconfigurable computing is emerging as the new paradigm for satisfying the simultaneous demand for application performance and flexibility. The ability to customize the architecture to match the computation and the dataflow of the application has demonstrated significant performance benefits compar ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Reconfigurable computing is emerging as the new paradigm for satisfying the simultaneous demand for application performance and flexibility. The ability to customize the architecture to match the computation and the dataflow of the application has demonstrated significant performance benefits compared to general purpose architectures. Computer vision applications are one class of applications that have significant heterogeneity in their computation and communication structures. At the low level vision algorithms have regular, repetitive computations operating on large sets of image data with predictable data dependencies. At the higher level the computations have irregular dependencies. Computer vision application characteristics have significant overlap with the advantages of reconfigurable architectures. The main focus of the paper is on outlining the methodologies required to realize the potential of reconfigurable architectures for vision applications. After giving a broad introduction to reconfigurable computing, the advantages of utilizing reconfigurable architectures for vision applications are outlined and illustrated using example computations. The paper discusses the development of fundamental configurable computing models that abstract the underlying hardware for high level application mapping. The Hybrid System Architecture Model and algorithms utilizing the model are illustrated to demonstrate a formal framework. The paper also outlines ongoing research and provides a comprehensive list of references for further reading.
The RAW Benchmark Suite: Computation Structures for General Purpose Computing
- IN IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES
, 1997
"... The RAW benchmark suite consists of twelve programs designed to facilitate comparing, validating, and improving reconfigurable computing systems. These benchmarks run the gamut of algorithms found in general purpose computing, including sorting, matrix operations, and graph algorithms. The suite inc ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
The RAW benchmark suite consists of twelve programs designed to facilitate comparing, validating, and improving reconfigurable computing systems. These benchmarks run the gamut of algorithms found in general purpose computing, including sorting, matrix operations, and graph algorithms. The suite includes an architecture-independent compilation framework, Raw Computation Structures (RawCS), to express each algorithm's dependencies and to support automatic synthesis, partitioning, and mapping to a reconfigurable computer. Within this framework, each benchmark is portably designed in both C and Behavioral Verilog and scalably parameterized to consume a range of hardware resource capacities. To establish initial benchmark ratings, we have targeted a commercial logic emulation system based on virtual wires technology to automatically generate designs up to millions of gates (14 to 379 FPGAs). Because the virtual wires techniques abstract away machine-level details like FPGA capacity and interconnect, our hardware target for this system is an abstract reconfigurable logic fabric with memorymapped host I/O. We report initial speeds in the range of 2X to 1800X faster than a 2.82 SPECint95 SparcStation 20 and encourage others in the field to run these benchmarks on other systems to provide a standard comparison.
MORPH: A System Architecture for Robust High Performance Using Customization (An NSF 100 TeraOps Point Design Study)
, 1996
"... ..."
Using Configurable Computing to Accelerate Boolean Satisfiability
, 1999
"... The issues of software compute time and complexity are very important in current CAD tools. As FPGA speeds and densities increase, the opportunity for effective hardware accelerators built from FPGA technology has opened up. This paper describes and evaluates a formula-specific method for implementi ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The issues of software compute time and complexity are very important in current CAD tools. As FPGA speeds and densities increase, the opportunity for effective hardware accelerators built from FPGA technology has opened up. This paper describes and evaluates a formula-specific method for implementing Boolean satisfiability solver circuits in configurable hardware. That is, using a template generator, we create circuits specific to the problem instance to be solved. This approach yields impressive runtime speedups of up to several hundred times compared to the software approaches. The high performance comes from realizing fine-grained parallelism inherent in the clause evaluation and implication and from direct mapping of Boolean relations into logic gates. Our implementation uses a commercially-available hardware system for proof of concept. This system yields more than 100 times run-time speedup on many problems, even though the clock rate of the hardware is 100 times slower than tha...
Computer Vision Algorithms on Reconfigurable Logic Arrays
- IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1999
"... Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million additions to be performed in one second. Computer vision tasks can be classified into three categories based on their computational complexity andcommunication complexity: low-level, intermediate-level and high-level. Special-purpose hardware provides better performance compared to a general-purpose hardware for all the three levels of vision tasks. With recent advances in very large scale integration (VLSI) technology, an application specific integrated circuit (ASIC) can provide the best performance in terms of total execution time. However, long design cycle time, high development cost and inflexibility of a dedicated hardware deter design of ASICs. In contrast, field programmable gate arrays (FPGAs) support lower design verification time and easier design adaptability atalower cost. Hence, FPGAs with an array of reconfigurable logic blocks canbevery useful compute elements. FPGA-based custom computing machines are
Supporting FPGA Microprocessors through Retargetable Software Tools
- in Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines
, 1996
"... FPGA systems outperform many ASIC and super computer systems through effective use of the reconfigurable resource. Reusing design effort across different applications requires a standard, flexible software environment. Driving FPGA systems from ANSI C is possible using lcc (an ANSI C compiler) targe ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
FPGA systems outperform many ASIC and super computer systems through effective use of the reconfigurable resource. Reusing design effort across different applications requires a standard, flexible software environment. Driving FPGA systems from ANSI C is possible using lcc (an ANSI C compiler) targeted at an FPGA system and dasm (a retargetable, flexible assembler) . The compiler supports custom hardware capabilities of FPGA systems, as well as all constructs of C. The assembler reads instruction definitions at assemble time, allowing the user to add new custom hardware functions which dasm can assemble correctly to an instruction stream the hardware executes. A source code debugger has been implemented for this system. 1 Introduction FPGAs are capable of achieving high performance on many application-specific tasks. In many cases performance achievable with FPGAs on certain applications exceeds comparable ASIC designs or even super computers[2, 7]. One approach used in obtaining this...

