Results 1  10
of
37
Rethinking custom ISE identification: A new processoragnostic method
 IN: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES, AND SYNTHESIS FOR EMBEDDED SYSTEMS
, 2007
"... The last decade has witnessed the emergence of the Application Specific Instructionset Processor (ASIP) as a viable platform for embedded systems. Extensible ASIPs allow the user to augment a base processor with Instruction Set Extensions (ISEs) that execute on Application Specific Functional Units ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
The last decade has witnessed the emergence of the Application Specific Instructionset Processor (ASIP) as a viable platform for embedded systems. Extensible ASIPs allow the user to augment a base processor with Instruction Set Extensions (ISEs) that execute on Application Specific Functional Units (AFUs) − dedicated hardware that executes the ISEs. Due to the limited number of read and write ports in the register file of the base processor, the size and complexity of AFUs are generally limited. Recent work has focused on overcoming these constraints by serialising access to the register file. Apart from these complications, the primary challenge in the identification and selection of the best AFU is the modelling of AFU performance in the context of different base processors: once the base processor changes, the ISE identification and AFU selection process must be redone from scratch. Exhaustive ISE/AFU enumeration methods are not scalable and generally fail for larger applications. To address this concern, a new approach to ISE/AFU identification is proposed. In particular, we show that the speedup model of ISEs/AFUs is independent of the specific details of the base processor, under fairly reasonable assumptions. The approach presented here significantly prunes the list of best ISE/AFU candidates compared to previous approaches. Experimentally, we observe the new approach produces optimal results on larger applications where prior approaches either fail or produce inferior results.
Polynomialtime subgraph enumeration for automated instruction set extension
 In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition
, 2007
"... This paper proposes a novel algorithm that, given a dataflow graph and an input/output constraint, enumerates all convex subgraphs under the given constraint in polynomial time with respect to the size of the graph. These subgraphs have been shown to represent efficient Instruction Set Extensions f ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
This paper proposes a novel algorithm that, given a dataflow graph and an input/output constraint, enumerates all convex subgraphs under the given constraint in polynomial time with respect to the size of the graph. These subgraphs have been shown to represent efficient Instruction Set Extensions for customizable processors. The search space for this problem is inherently polynomial but, to our knowledge, this is the first paper to prove this and to present a practical algorithm for this problem with polynomial complexity. Our algorithm is based on properties of convex subgraphs that link them to the concept of multiplevertex dominators. We discuss several pruning techniques that, without sacrificing the optimality of the algorithm, make it practical for dataflow graphs of a thousands nodes or more. 1.
Introduction of architecturally visible storage in instruction set extensions
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 2007
"... Abstract—Instruction set extensions (ISEs) can be used effectively ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract—Instruction set extensions (ISEs) can be used effectively
Automatic identification of applicationspecific functional units with architecturally visible storage
 in Proceedings of the Design, Automation and Test in Europe Conference and Exhibition
, 2006
"... Instruction Set Extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical, and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities, but are still limited i ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Instruction Set Extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical, and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities, but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem. We present here the first ISE identification technique that can automatically identify stateholding Applicationspecific Functional Units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and main memory. Our cycleaccurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques, and achieve an average speedup of 2.8 × over pure software execution. Moreover, the number of required memoryaccess instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption. 1.
E.: Speculative DMA for Architecturally Visible Storage in Instruction Set Extensions
 In: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis
, 2008
"... Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS)—compilercontrolled memories, similar to scratchpads, that are acce ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
Instruction set extensions (ISEs) can accelerate embedded processor performance. Many algorithms for ISE generation have shown good potential; some of them have recently been expanded to include Architecturally Visible Storage (AVS)—compilercontrolled memories, similar to scratchpads, that are accessible only to ISEs. To achieve a speedup using AVS, Direct Memory Access (DMA) transfers are required to move data from the main memory to the AVS; unfortunately, this creates coherence problems between the AVS and the cache, which previous methods for ISEs with AVS failed to address; additionally, these methods need to leave many conservative DMA transfers in place, whose execution significantly limits the achievable speedup. This paper presents a memory coherence scheme for ISEs with AVS, which can ensure execution correctness and memory consistency with minimal area overhead. We also present a method that speculatively removes redundant DMA transfers. Cycleaccurate experimental results were obtained using an FPGAemulation platform. These results show that the applicationspecific instructionset extended processors with speculative DMAenhanced AVS gain significantly over previous techniques, despite the overhead of the coherence mechanism.
Algorithms for Generating Convex Sets in Acyclic Digraphs
, 2008
"... A set X of vertices of an acyclic digraph D is convex if X � = ∅ and there is no directed path between vertices of X which contains a vertex not in X. A set X is connected if X � = ∅ and the underlying undirected graph of the subgraph of D induced by X is connected. Connected convex sets and conve ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
A set X of vertices of an acyclic digraph D is convex if X � = ∅ and there is no directed path between vertices of X which contains a vertex not in X. A set X is connected if X � = ∅ and the underlying undirected graph of the subgraph of D induced by X is connected. Connected convex sets and convex sets of acyclic digraphs are of interest in the area of modern embedded processor technology. We construct an algorithm A for enumeration of all connected convex sets of an acyclic digraph D of order n. The time complexity of A is O(n · cc(D)), where cc(D) is the number of connected convex sets in D. We also give an optimal algorithm for enumeration of all (not just connected) convex sets
Resource sharing in custom instruction set extensions
 IN: PROCEEDINGS OF THE 6TH IEEE SYMPOSIUM ON APPLICATION SPECIFIC PROCESSORS
, 2008
"... Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction set extensi ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction set extensions (ISEs) can reduce significantly the die area and energy consumption of a customized processor. This may increase the number of custom instructions that can be synthesized with a given area budget. Resource sharing involves combining the graph representations of two or more ISEs which contain a similar subgraph. This coupling of multiple subgraphs, if performed naively, can increase the latency of the extension instructions considerably. And yet, as we show in this paper, an appropriate level of resource sharing provides a significantly simpler design with only modest increases in average latency for extension instructions. Based on existing resourcesharing techniques, this study presents a new heuristic that controls the degree of resource sharing between a given set of custom instructions. Our main contributions are the introduction of a parametric method for exploring the tradeoffs that can be achieved between instruction latency and implementation complexity, and the coupling of designspace exploration with fast areadelay models for the operators comprising each ISE. We present experimental evidence that our heuristic exposes a broad range of design points, allowing advantageous tradeoffs between die area and latency to be found and exploited.
A HighLevel Synthesis Flow for Custom Instruction Set Extensions for ApplicationSpecific Processors
, 2010
"... Custom instruction set extensions (ISEs) are added to an extensible base processor to provide applicationspecific functionality at a low cost. As only one ISE executes at a time, resources can be shared. This paper presents a new highlevel synthesis flow targeting ISEs. We emphasize a new techniq ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Custom instruction set extensions (ISEs) are added to an extensible base processor to provide applicationspecific functionality at a low cost. As only one ISE executes at a time, resources can be shared. This paper presents a new highlevel synthesis flow targeting ISEs. We emphasize a new technique for resource allocation, binding, and port assignment during synthesis. Our method is derived from prior work on datapath merging, and increases area reduction by accounting for the cost of multiplexors that must be inserted into the resulting datapath to achieve multioperational functionality.
Exhaustive Enumeration of Legal Custom Instructions for Extensible Processors
"... Today’s customizable processors allow the designer to augment the base processor with custom accelerators. By choosing appropriate set of accelerators, designer can significantly enhance the performance and power of an application. Due to the large number of accelerator choices and their complex tra ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Today’s customizable processors allow the designer to augment the base processor with custom accelerators. By choosing appropriate set of accelerators, designer can significantly enhance the performance and power of an application. Due to the large number of accelerator choices and their complex tradeoffs among reuse, gain and area, manually deciding the optimal combination of accelerators is quite cumbersome and time consuming. This calls for CAD tools that select optimal combination of accelerators by thoroughly searching theentiredesignspace. Thetermpattern is commonly used to represent the computation performed by a custom accelerator. In this paper, we propose an algorithm for rapidly enumerating all the legal patterns taking into account several constraints posed by a typical microarchitecture. The proposed algorithm achieves significant reduction in runtime by a) enumerating the patterns in the increasing order of sizes and b) relating the characteristics of a (k +1) node pattern with the characteristics of its k node subgraphs. Also, in scenarios where I/O is not a bottleneck, designer can optionally relax the I/O constraint and our algorithm efficiently enumerates all legal I/O unbound legal patterns. The experimental evidence indicate an order of two runtime speedup over state of the art techniques. 1.
An algorithm for finding inputoutput constrained convex sets in an acyclic digraph
"... A set X of vertices of an acyclic graph is convex if any vertex on a directed path between elements of X is itself in X. We construct an algorithm for generating all inputoutput constrained convex (IOCC) sets in an acyclic digraph, which uses several novel ideas. We show that our algorithm is more ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A set X of vertices of an acyclic graph is convex if any vertex on a directed path between elements of X is itself in X. We construct an algorithm for generating all inputoutput constrained convex (IOCC) sets in an acyclic digraph, which uses several novel ideas. We show that our algorithm is more efficient than algorithms described in the literature in both the worst case and computational experiments. IOCC sets of acyclic digraphs are of interest in the area of modern embedded processor technology.