Results 1  10
of
162
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing
 PROCEEDIN&S OF THE FOURTEENTH ANNUAL WORKSHOP ON MICROPRO&RAMMING
, 1981
"... Horizontal architectures are attractive for costeffective, high performance scientific computing. They are, however, very difficult to schedule. Consequently, it is difficult to develop compilers that can generate efficient code for such architectures. The polycyclic architecture has been developed ..."
Abstract

Cited by 264 (10 self)
 Add to MetaCart
Horizontal architectures are attractive for costeffective, high performance scientific computing. They are, however, very difficult to schedule. Consequently, it is difficult to develop compilers that can generate efficient code for such architectures. The polycyclic architecture has been developed specifically to make the task of scheduling easy. As a result, it has been possible to develop a powerful scheduling algorithm that yields optimal and nearoptimal schedules for iterative computations. This novel architecture and this scheduling algorithm are the topic of this paper.
Optimizing Power Using Transformations
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 1995
"... : The increasing demand for portable computing has elevated power consumption to be one of the most critical design parameters. A highlevel synthesis system, HYPERLP, is presented for minimizing power consumption in application specific datapath intensive CMOS circuits using a variety of architect ..."
Abstract

Cited by 203 (14 self)
 Add to MetaCart
(Show Context)
: The increasing demand for portable computing has elevated power consumption to be one of the most critical design parameters. A highlevel synthesis system, HYPERLP, is presented for minimizing power consumption in application specific datapath intensive CMOS circuits using a variety of architectural and computational transformations. The synthesis environment consists of highlevel estimation of power consumption, a library of transformation primitives, and heuristic/probabilistic optimization search mechanisms for fast and efficient scanning of the design space. Examples with varying degree of computational complexity and structures are optimized and synthesized using the HYPERLP system. The results indicate that more than an order of magnitude reduction in power can be achieved over currentday design methodologies while maintaining the system throughput; in some cases this can be accomplished while preserving or reducing the implementation area. 1.0 Introduction VLSI research a...
The Multiflow Trace Scheduling Compiler
 Journal of Supercomputing
, 1993
"... The Multiflow compiler uses the trace scheduling algorithm to find and exploit instructionlevel parallelism beyond basic blocks. The compiler generates code for VLIW computers that issue up to 28 operations each cycle and maintain more than 50 operations in flight. At Multiflow the compiler generat ..."
Abstract

Cited by 192 (1 self)
 Add to MetaCart
(Show Context)
The Multiflow compiler uses the trace scheduling algorithm to find and exploit instructionlevel parallelism beyond basic blocks. The compiler generates code for VLIW computers that issue up to 28 operations each cycle and maintain more than 50 operations in flight. At Multiflow the compiler generated code for eight different target machine architectures and compiled over 50 million lines of FORTRAN and C applications and systems code. The requirement of finding large amounts of parallelism in ordinary programs, the trace scheduling algorithm, and the many unique features of the Multiflow hardware placed novel demands on the compiler. New techniques in instruction scheduling, register allocation, memorybank management, and intermediatecode optimizations were developed, as were refinements to reduce the overhead of trace scheduling. This paper describes the Multiflow compiler and reports on the Multiflow practice and experience with compiling for instructionlevel parallelism beyond basic blocks.
Finite differencing of computable expressions
, 1980
"... Finite differencing is a program optimization method that generalizes strength reduction, and provides an efficient implementation for a host of program transformations including "iterator inversion." Finite differencing is formally specified in terms of more basic transformations shown to ..."
Abstract

Cited by 133 (6 self)
 Add to MetaCart
(Show Context)
Finite differencing is a program optimization method that generalizes strength reduction, and provides an efficient implementation for a host of program transformations including "iterator inversion." Finite differencing is formally specified in terms of more basic transformations shown to preserve program semantics. Estimates of the speedup that the technique yields are given. A full illustrative example of algorithm derivation ispresented.
Global communication and memory optimizing transformations for low power systems
, 1994
"... Abstract In this paper we first illustrate the crucialimpact of memory related power consumption on the global system power budget, in particular for multidimensional realtime signal processingsubsystems. A realistic medical imaging demonstrator shows that the power budget of the system is clearl ..."
Abstract

Cited by 104 (18 self)
 Add to MetaCart
(Show Context)
Abstract In this paper we first illustrate the crucialimpact of memory related power consumption on the global system power budget, in particular for multidimensional realtime signal processingsubsystems. A realistic medical imaging demonstrator shows that the power budget of the system is clearly dominated by the memory access and that up to an order of magnitude can be gained by transforming the initial specification, even without incorporating the effect of a possible supply voltage reduction. We lhave analyzed the most relevant contributions in this power budget and propose several ways to reduce them. In addition, an automated transformation technique is described to decrease the memory related power budget. LOW POWER SYSTEM DESIGN Many ways exist to realize a given application. The system designer has for instance the choice between many algorithmic procedures for a desired behaviour (e.g. loop reordering). In general, a good tradeoff between several characteristics (like power and area) is crucial so there is a rieed for fast ailcl early feedback alieady ut the algoiitliin level Mithoiii ahuvs desceridiiig to detailed RTllogic realization
An Efficient AugmentedContextFree Parsing Algorithm
 Computational Linguistics
, 1987
"... This paper introduces an efficient online parsing algorithm, and focuses on its practical application to natural language interfaces. The algorithm can be viewed as a generalized LR parsing algorithm that can handle arbitrary contextfree grammars, including ambiguous grammars. Section 2 describes ..."
Abstract

Cited by 79 (3 self)
 Add to MetaCart
This paper introduces an efficient online parsing algorithm, and focuses on its practical application to natural language interfaces. The algorithm can be viewed as a generalized LR parsing algorithm that can handle arbitrary contextfree grammars, including ambiguous grammars. Section 2 describes the algorithm by .extending the standard LR parsing algorithm with the idea of a "graphstructured stack". Section 3 describes how to represent parse trees efficiently, so that all possible parse trees (the parse forest) take at most polynomial space as the ambiguity of a sentence grows exponentially. In section 4, several examples are given. Section 5 presents several empirical results of the algorithm's practical performance, including comparison with Earley's algorithm. In section 6, we discuss how to enhance the algorithm to handle augmented contextfree grammars rather than pure contextfree grammars. Section 7 describes the concept of online parsing, taking advantage of lefttoright operation of our parsing algorithm. The online parser parses a sentence strictly from left to right, and starts parsing as soon as the user types in the first word, without waiting for the end of line. Benefits of online parsing are then discussed. Finally, several versions of online parser have been implemented, and they are mentioned in section 8
Compiling a Functional Language
 IN CONFERENCE RECORD OF THE 1984 ACM SYMPOSIUM ON LISP AND FUNCTIONAL PROGRAMMING
, 1984
"... ..."
A simplified universal relation assumption and its properties
 ACM Transactions on Database Systems
, 1982
"... One problem concerning the universal relation assumption is the inability of known methods to obtain a database scheme design in the general case, where the realworld constraints are given by a set of dependencies that includes embedded multivalued dependencies. We propose a simpler method of descr ..."
Abstract

Cited by 72 (0 self)
 Add to MetaCart
(Show Context)
One problem concerning the universal relation assumption is the inability of known methods to obtain a database scheme design in the general case, where the realworld constraints are given by a set of dependencies that includes embedded multivalued dependencies. We propose a simpler method of describing the real world, where constraints are given by functional dependencies and a single join dependency. The relationship between this method of defining the real world and the classical methods is exposed. We characterize in terms of hypergrapbs those multivalued dependencies that are the consequence of a given join dependency. Also characterized in terms of hypergraphs are those join dependencies that are equivalent to a set of multivalued dependencies.
Recognizing Mathematical Expressions Using Tree Transformation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... We describe a robust and efficient system for recognizing typeset and handwritten mathematical notation. From a list of symbols with bounding boxes the system analyzes an expression in three successive passes. The Layout Pass constructs a Baseline Structure Tree (BST) describing the twodimensiona ..."
Abstract

Cited by 71 (16 self)
 Add to MetaCart
(Show Context)
We describe a robust and efficient system for recognizing typeset and handwritten mathematical notation. From a list of symbols with bounding boxes the system analyzes an expression in three successive passes. The Layout Pass constructs a Baseline Structure Tree (BST) describing the twodimensional arrangement of input symbols. Reading order and operator dominance are used to allow efficient recognition of symbol layout even when symbols deviate greatly from their ideal positions. Next, the Lexical Pass produces a Lexed BST from the initial BST by grouping tokens comprised of multiple input symbols; these include decimal numbers, function names, and symbols comprised of nonoverlapping primitives such as "=". The Lexical Pass also labels vertical structures such as fractions and accents. The Lexed BST is translated into L A T E X. Additional processing, necessary for producing output for symbolic algebra systems, is carried out in the Expression Analysis Pass. The Lexed BST is translated into an Operator Tree, which describes the order and scope of operations in the input expression. The tree manipulations used in each pass are represented compactly using tree transformations. The compilerlike architecture of the system allows robust handling of unexpected input, increases the scalability of the system, and provides the groundwork for handling dialects of mathematical notation.
Using Program Slicing to Simplify Testing
 EUROSTAR'94
, 1994
"... Program slicing is a technique for automatically identifying all the lines in a program which affect a selected subset of variables. A large program can be divided into a number of smaller programs (its slices), each constructed for different variable subsets. The slices are typically simpler tha ..."
Abstract

Cited by 67 (37 self)
 Add to MetaCart
(Show Context)
Program slicing is a technique for automatically identifying all the lines in a program which affect a selected subset of variables. A large program can be divided into a number of smaller programs (its slices), each constructed for different variable subsets. The slices are typically simpler than the original program, thereby simplifying the process of testing a property of the program which only concerns a subset of its variables. Some aspects of a program's computation are not captured by a set of variables, rendering slicing inapplicable. To overcome this difficulty we make a program introspective, adding assignments to denote these `implicit' computations. Initially this makes the program longer. However, slicing can now be applied to the introspective program, forming a slice concerned solely with the implicit computation. We improve the simplification power of slicing using program transformation. To illustrate our approach we consider the implicit computation which ...