Results 1 - 10
of
100
Combining MBP-speculative computation and loop pipelining in high-level synthesis
- in Proc. European Design & Test Conf
, 1995
"... Frequent control dependencies caused by IF- and loop-statements limit the parallelism usable in High-Level Synthesis. Loop pipelining is a powerful way to increase parallelism, but is often limited by these control dependencies. Multiple branch prediction (MBP-SC) applies loop pipelining and specula ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Frequent control dependencies caused by IF- and loop-statements limit the parallelism usable in High-Level Synthesis. Loop pipelining is a powerful way to increase parallelism, but is often limited by these control dependencies. Multiple branch prediction (MBP-SC) applies loop pipelining
Rotation scheduling: A loop pipelining algorithm
- Dept. of Computer Science, Princeton University
, 1997
"... Abstract — We consider the resource-constrained scheduling of loops with interiteration dependencies. A loop is modeled as a data flow graph (DFG), where edges are labeled with the number of iterations between dependencies. We design a novel and flexible technique, called rotation scheduling, for sc ..."
Abstract
-
Cited by 114 (53 self)
- Add to MetaCart
very good performance. Index Terms — High-level synthesis, loop pipelining, parallel compiler, retiming, scheduling.
Pipeline vectorization
- IEEE Trans. Comput.-Aided Des
"... Abstract—This paper presents pipeline vectorization, a method for synthesizing hardware pipelines based on software vectorizing compilers. The method improves efficiency and ease of development of hardware designs, particularly for users with little electronics design experience. We propose several ..."
Abstract
-
Cited by 52 (11 self)
- Add to MetaCart
been found to improve vectorization performance 30–40 times above a PC-based software implementation, depending on whether runtime reconfiguration (RTR) is used. Index Terms—High-level synthesis, parallelization, pipelining, reconfigurable computing, vectorization.
A Transformational Approach To Asynchronous High-Level Synthesis
- IN PROCEEDINGS OF VLSI 93
, 1993
"... Asynchronous high-level synthesis is aimed at transforming high level descriptions of algorithms into efficient asynchronous circuit implementations. This approach is attractive from the point of view of the flexibility it affords in performing high level program transformations on users' initi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Asynchronous high-level synthesis is aimed at transforming high level descriptions of algorithms into efficient asynchronous circuit implementations. This approach is attractive from the point of view of the flexibility it affords in performing high level program transformations on users
Loop Pipelining with Resource and Timing Constraints
, 1995
"... 'as Lang, David Padua and Mateo Valero for giving me part of their valuable time, listening to my ideas and giving me their suggestions. I am equally grateful to Q. Ning, R. Govindarajan, Eric R. Altman and Guang G. Gao for supplying me the data dependence graphs used for comparisons in supersc ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
to such a wonderful family. This work is dedicated to them. vi To my family, the best in the world viii CONTENTS LIST OF FIGURES xi LIST OF TABLES xvii LIST OF ALGORITHMS xix PREFACE xxi 1 INTRODUCTION 1 1.1 Motivation of this work 1 1.2 High-level synthesis and parallel architectures 2 1.2.1 High-l
Efficient Pipelining of Nested Loops: Unroll-and-Squash
- In 16th Intl. Parallel and Distributed Processing Symposium (IPDPS ’02), Fort Lauderdale, FL
, 2001
"... The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology to map abstract designs into silicon. Since streaming data processing in DSP applications is typically described by loop constructs in ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The size and complexity of current custom VLSI have forced the use of high-level programming languages to describe hardware, and compiler and synthesis technology to map abstract designs into silicon. Since streaming data processing in DSP applications is typically described by loop constructs
Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis
"... ABSTRACT Loops are pervasive in numerical programs, so high-level synthesis (HLS) tools use state-of-the-art scheduling techniques to pipeline them efficiently. Still, the run time performance of the resultant FPGA implementation is limited by data dependences between loop iterations. Some of these ..."
Abstract
- Add to MetaCart
ABSTRACT Loops are pervasive in numerical programs, so high-level synthesis (HLS) tools use state-of-the-art scheduling techniques to pipeline them efficiently. Still, the run time performance of the resultant FPGA implementation is limited by data dependences between loop iterations. Some
Automatic Synthesis of Pipelined Processors †
"... The conventional classification of inter-instruction dependencies (data, anti and output dependencies) provides a basic scheme for the analysis of pipeline hazards in pipelined instruction set processors. However, it does not consider the relative spacial positions of micro-operations in the pipelin ..."
Abstract
- Add to MetaCart
-end of the supporting compilers. Keywords:inter-instruction dependency, pipeline hazard resolution, high level synthesis, compiler back-end generation, hardware/software tradeoffs
Optimizing Remote Accesses for Offloaded Kernels: Application to HighLevel Synthesis for FPGA
- in "Design, Automation, and Test in Europe (DATE’13
, 2013
"... Some data- and compute-intensive applications can be ac-celerated by offloading portions of codes to platforms such as GPGPUs or FPGAs. However, to get high performance for these kernels, it is mandatory to restructure the applica-tion, to generate adequate communication mechanisms for the transfer ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
of remote data, and to make good usage of the memory bandwidth. In the context of the high-level synthe-sis (HLS), from a C program, of hardware accelerators on FPGA, we show how to automatically generate optimized re-mote accesses for an accelerator communicating to an exter-nal DDR memory. Loop tiling
GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs
"... Abstract—Recently GPUs have risen as one important par-allel platform for general purpose applications, both in HPC and cloud environments. Due to the special execution model, developing programs for GPUs is difficult even with the recent introduction of high-level languages like CUDA and OpenCL. To ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Recently GPUs have risen as one important par-allel platform for general purpose applications, both in HPC and cloud environments. Due to the special execution model, developing programs for GPUs is difficult even with the recent introduction of high-level languages like CUDA and Open
Results 1 - 10
of
100