## Application-specific instruction generation for configurable processor architectures (2004)

### Cached

### Download Links

- [www.crhc.uiuc.edu]
- [www.cs.york.ac.uk]
- [ballade.cs.ucla.edu]
- [cadlab.cs.ucla.edu]
- [cadlab.cs.ucla.edu]
- [cadlab.cs.ucla.edu]
- [cadlab.cs.ucla.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | in Proc. ACM International Symposium on Field-Programmable Gate Arrays |

Citations: | 57 - 7 self |

### BibTeX

@INPROCEEDINGS{Cong04application-specificinstruction,

author = {Jason Cong and Yiping Fan and Guoling Han and Zhiru Zhang},

title = {Application-specific instruction generation for configurable processor architectures},

booktitle = {in Proc. ACM International Symposium on Field-Programmable Gate Arrays},

year = {2004},

pages = {183--189}

}

### Years of Citing Articles

### OpenURL

### Abstract

Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).

### Citations

4325 |
Computer Architecture: A Quantitative Approach
- HENNESSY, PATTERSON
- 1996
(Show Context)
Citation Context ...ll the instructions in a pattern need to be executed sequentially in a basic (single-issue) pipeline processor. Therefore, the number of cycles should be added. With the consideration of data hazards =-=[11]-=- in the pipeline execution, it is not trivial to compute the total latency. In our estimation, we assume an ideal pipeline without any data hazards. Equation (2) uses the critical path of a schedule t... |

1515 |
Reducibility among combinatorial problems
- Karp
- 1972
(Show Context)
Citation Context ...ded to identify whether two pattern instances are identical. The graph isomorphism problem is known to be in the set of NP (nondeterministic polynomial), but it is not clear whether it is NP-complete =-=[12]-=- or not. A number of 1 We assume that the custom logic would not degrade the clock period of the processor. algorithms such as [21], [3], and [17] have been proposed to compute graph isomorphism. In o... |

236 | Practical graph isomorphism
- McKay
- 1981
(Show Context)
Citation Context ...omial), but it is not clear whether it is NP-complete [12] or not. A number of 1 We assume that the custom logic would not degrade the clock period of the processor. algorithms such as [21], [3], and =-=[17]-=- have been proposed to compute graph isomorphism. In our system, we use the nauty package [25] for the isomorphism test. The pattern size is normally small because of architecture constraints, so the ... |

176 | The Chimaera reconfigurable functional unit
- Hauck, Fry, et al.
- 1997
(Show Context)
Citation Context ...of customized instructions supported by the specific hardware resources provided on the ASIP. The hardware implementing the specific instructions can be either runtime reconfigurable functional units =-=[10]-=-[5], or pre-synthesized circuits [27]. As an example, Figure 1 (taken from Altera’s website [24]) shows the instruction logic of a commercially available configurable processor architecture, called Ni... |

149 |
DSPstone: A DSP-oriented benchmarking methodology
- Zivojnovic, Velarde, et al.
- 1994
(Show Context)
Citation Context ...x 5 *+* Figure 5. A mapping solution for the example. 4. EXPERIMENTAL RESULTS We implemented our algorithms in a C++/Unix environment. The C examples used in the experiments are DSP applications from =-=[23]-=- and [22].sFigure 6 shows the relationships between pattern size and occurrence. The trend is quite consistent for these benchmarks. Basically, there are more small patterns than large ones in a DAG. ... |

148 | DAGON: technology binding and local optimization by DAG matching. DAC’87
- Keutzer
- 1987
(Show Context)
Citation Context ...has been extensively studied in the logic synthesis domain. Any existing algorithms to solve the minimum-area technology mapping problem, such as binate covering [20] and tree-based decomposition [20]=-=[15]-=-, can be applied. � This work allows the operation duplication implicitly during the cut enumeration and the mapping, and thus potentially achieves a higher speedup. � In contrast to previous works, o... |

128 |
Synthesis and Optimization of Digital Circuits
- Micheli
- 1994
(Show Context)
Citation Context ...imum execution time. THEOREM: The application mapping problem is equivalent to the minimum-area technology mapping problem. The basic idea of the proof is as follows: Library-based technology mapping =-=[18]-=- transforms a technologyindependent logic network into a bounded network, i.e., into an interconnection of components that are instances of element of a given library. For minimum-area technology mapp... |

121 | Xtensa: a configurable and extensible processor
- Gonzalez
- 2000
(Show Context)
Citation Context ... For a trivial pattern, we define its execution time in hardware to be equal to that in software.sSince most existing configurable processors only have one write port in register file (or memory) [24]=-=[8]-=-, we only consider the instruction format with multiple inputs and single output (MISO). Considering the limited reconfigurable resources, we introduce area constraint for the final ASIP implementatio... |

119 | Automatic applicationspecific instructionset extensions under microarchitectural constraints
- Atasu, Pozzi, et al.
- 2003
(Show Context)
Citation Context ...r of input and output operands, but this work does not address the architecture constraints for the templates. A more general method for application-specific instruction-set extension is presented in =-=[2]-=-. The authors define the candidate extended instruction to be a convex directed acyclic subgraph (which is defined as a cut) with certain input and output constraints. They use a branch and bound meth... |

96 |
Optimal Code Generation for Expression Trees
- Aho, Johnson
- 1976
(Show Context)
Citation Context ...g. In [20] the DAG is partitioned into a forest of trees for DAG covering. Then a tree pattern matching automation is used to match the individual trees. Dynamic programming, based on the approach of =-=[1]-=- is used to achieve the minimum area mapping. In [20] and [16] binate covering is applied to a DAG covering problem. Although binate covering is an NP-hard problem, much effort has been spent on this ... |

92 |
Logic synthesis for VLSI design
- Rudell
- 1989
(Show Context)
Citation Context ... the area minimization problem, which has been extensively studied in the logic synthesis domain. Any existing algorithms to solve the minimum-area technology mapping problem, such as binate covering =-=[20]-=- and tree-based decomposition [20][15], can be applied. � This work allows the operation duplication implicitly during the cut enumeration and the mapping, and thus potentially achieves a higher speed... |

80 |
Canonical labeling of graphs
- Babai, Luks
- 1983
(Show Context)
Citation Context ...tic polynomial), but it is not clear whether it is NP-complete [12] or not. A number of 1 We assume that the custom logic would not degrade the clock period of the processor. algorithms such as [21], =-=[3]-=-, and [17] have been proposed to compute graph isomorphism. In our system, we use the nauty package [25] for the isomorphism test. The pattern size is normally small because of architecture constraint... |

70 | Instruction generation for hybrid reconfigurable systems
- Kastner, Kaplan, et al.
(Show Context)
Citation Context ...ate application-specific instructions, taking full advantage of the extensisble capability of the ASIP. Several techniques and tools to aid ASIP design automation have been presented in recent years. =-=[13]-=- proposed a template generation, matching, and covering algorithm. The candidate templates are first generated by a clustering algorithm based on the occurrence frequency. Then the directed acyclic gr... |

54 | Cut ranking and pruning: Enabling a general and efficient FPGA mapping solution
- Cong, Wu, et al.
(Show Context)
Citation Context ... In theory, the number of Kfeasible cuts grows exponentially with respect to K. However, for K≤5, this computation is very efficient in practice. The same technique is used in FPGA technology mapping =-=[6]-=-, where a certain (homogeneous or heterogeneous) LUT library is given for covering a gate-level network. If we regard the LUT size as the input number constraint of the patterns, and regard the gates ... |

53 | Instruction Selection Using Binate Covering for Code Size Optimization
- Liao, Devadas, et al.
- 1995
(Show Context)
Citation Context ...DAG covering. Then a tree pattern matching automation is used to match the individual trees. Dynamic programming, based on the approach of [1] is used to achieve the minimum area mapping. In [20] and =-=[16]-=- binate covering is applied to a DAG covering problem. Although binate covering is an NP-hard problem, much effort has been spent on this because of its wide application. In [4] and [9] an exact solut... |

47 |
A method for minimizing the number of internal states in incompletely specified sequential networks
- Grasselli, Luccio
- 1965
(Show Context)
Citation Context .... In [20] and [16] binate covering is applied to a DAG covering problem. Although binate covering is an NP-hard problem, much effort has been spent on this because of its wide application. In [4] and =-=[9]-=- an exact solution based on branch and bound algorithm was discussed. The work in [7] has provided better lower-bound computation and two new pruning techniques for an exact solver. In this work, we u... |

32 | Optimum and heuristic transformation techniques for simultaneous optimization of latency and throughtput
- Srivastava, Potkonjak
- 1995
(Show Context)
Citation Context ...igure 5. A mapping solution for the example. 4. EXPERIMENTAL RESULTS We implemented our algorithms in a C++/Unix environment. The C examples used in the experiments are DSP applications from [23] and =-=[22]-=-.sFigure 6 shows the relationships between pattern size and occurrence. The trend is quite consistent for these benchmarks. Basically, there are more small patterns than large ones in a DAG. In this e... |

26 |
A fast backtracking algorithm to test directed graphs for isomorphism using distance matrices
- Schmidt, Druffel
- 1976
(Show Context)
Citation Context ...rministic polynomial), but it is not clear whether it is NP-complete [12] or not. A number of 1 We assume that the custom logic would not degrade the clock period of the processor. algorithms such as =-=[21]-=-, [3], and [17] have been proposed to compute graph isomorphism. In our system, we use the nauty package [25] for the isomorphism test. The pattern size is normally small because of architecture const... |

22 |
Boolean relations and the incomplete specification of logic networks
- Brayton, Somenzi
- 1989
(Show Context)
Citation Context ... mapping. In [20] and [16] binate covering is applied to a DAG covering problem. Although binate covering is an NP-hard problem, much effort has been spent on this because of its wide application. In =-=[4]-=- and [9] an exact solution based on branch and bound algorithm was discussed. The work in [7] has provided better lower-bound computation and two new pruning techniques for an exact solver. In this wo... |

13 | On solving binate covering problems
- Coudert
- 1996
(Show Context)
Citation Context ...ate covering is an NP-hard problem, much effort has been spent on this because of its wide application. In [4] and [9] an exact solution based on branch and bound algorithm was discussed. The work in =-=[7]-=- has provided better lower-bound computation and two new pruning techniques for an exact solver. In this work, we use the binate covering approach because it produces the exact solutions with affordab... |

9 |
Computational complexity of logic synthesis and optimization
- Keutzer, Richards
- 1989
(Show Context)
Citation Context .... Therefore, these two problems are equivalent.sCOROLLARY: The application mapping problem is NP-hard. Since library-based technology mapping for the area minimization problem is proven to be NP-hard =-=[14]-=-, the application mapping problem is NP-hard as well, according to the above theorem. Several approaches have been proposed for minimum-area technology mapping. In [20] the DAG is partitioned into a f... |

3 |
Automatic instruction-set extension and utilization for embedded processors
- Peymandoust, Pozzi, et al.
- 2003
(Show Context)
Citation Context ...ize the sum of speedup of each individual cut may not result in the minimum execution time. More importantly, cut reuse is not considered in this work. A complete ASIP compilation flow is proposed in =-=[19]-=-. The flow contains two phases: instruction selection and instruction mapping. The authors use either a greedy algorithm or the method in [2] to solve the instruction selection problem. Symbolic algeb... |