Results 1 -
9 of
9
MULTILISP: a language for concurrent symbolic computation
- ACM Transactions on Programming Languages and Systems
, 1985
"... Multilisp is a version of the Lisp dialect Scheme extended with constructs for parallel execution. Like Scheme, Multilisp is oriented toward symbolic computation. Unlike some parallel programming languages, Multilisp incorporates constructs for causing side effects and for explicitly introducing par ..."
Abstract
-
Cited by 409 (1 self)
- Add to MetaCart
Multilisp is a version of the Lisp dialect Scheme extended with constructs for parallel execution. Like Scheme, Multilisp is oriented toward symbolic computation. Unlike some parallel programming languages, Multilisp incorporates constructs for causing side effects and for explicitly introducing parallelism. The potential complexity of dealing with side effects in a parallel context is mitigated by the nature of the parallelism constructs and by support for abstract data types: a recommended Multilisp programming style is presented which, if followed, should lead to highly parallel, easily understandable programs. Multilisp is being implemented on the 32-processor Concert multiprocessor; however, it is ulti-mately intended for use on larger multiprocessors. The current implementation, called Concert Multilisp, is complete enough to run the Multilisp compiler itself and has been run on Concert prototypes including up to eight processors. Concert Multilisp uses novel techniques for task scheduling and garbage collection. The task scheduler helps control excessive resource utilization by means of an unfair scheduling policy; the garbage collector uses a multiprocessor algorithm based on the incremental garbage collector of Baker.
Job Scheduling in Multiprogrammed Parallel Systems
, 1997
"... Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of us ..."
Abstract
-
Cited by 145 (15 self)
- Add to MetaCart
Scheduling in the context of parallel systems is often thought of in terms of assigning tasks in a program to processors, so as to minimize the makespan. This formulation assumes that the processors are dedicated to the program in question. But when the parallel system is shared by a number of users, this is not necessarily the case. In the context of multiprogrammed parallel machines, scheduling refers to the execution of threads from competing programs. This is an operating system issue, involved with resource allocation, not a program development issue. Scheduling schemes for multiprogrammed parallel systems can be classified as one or two leveled. Single-level scheduling combines the allocation of processing power with the decision of which thread will use it. Two level scheduling decouples the two issues: first, processors are allocated to the job, and then the job's threads are scheduled using this pool of processors. The processors of a parallel system can be shared i...
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
Strategic Directions in Computer Architecture
- ACM Computing Surveys
, 1996
"... Looking back on the last 30 years, we have seen the remarkable developments in semiconductor technology enabling the implementation of ideas that were previously ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Looking back on the last 30 years, we have seen the remarkable developments in semiconductor technology enabling the implementation of ideas that were previously
Message-Passing Algorithms for a SIMD Torus with Coteries
- ACM SIGArch Computer Architecture News
"... This paper describes the results of an investigation into routing algorithms to be used when programming the CAAPP (Content Addressable Array Parallel Processor) [19], a SIMD mesh-connected array processor enhanced with the coterie network, a mechanism similar to reconfigurable buses. We will show t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes the results of an investigation into routing algorithms to be used when programming the CAAPP (Content Addressable Array Parallel Processor) [19], a SIMD mesh-connected array processor enhanced with the coterie network, a mechanism similar to reconfigurable buses. We will show that the coterie network gives the CAAPP a capability far beyond solely meshconnected processors; in fact, the performance of routing on many classes of permutations is more comparable to the Connection Machine which has a dedicated hypercube routing network. Most of the current routing algorithms for meshconnected array processors (with N PEs in an n \Theta n
CA Computer Architecture
"... ion means programmers can describe algorithms in a "high level" notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will ..."
Abstract
- Add to MetaCart
ion means programmers can describe algorithms in a "high level" notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will translate them for each machine. In most programming situations reality is close to the ideal. Compilers for many high level languages are very good at generating efficient and portable code for typical computer systems, so programmers are able to express algorithms in high level languages and expect them to run efficiently on almost any machine. There may be a few isolated places where a programmer who invests a lot of effort may be able to write a more efficient routine in assembly language (the native language of the machine), but it is hardly ever worth the effort to write an entire program in assembly language. Obviously when all or part of a program is written in assembler it is not as...
CA Computer Architecture
"... ion means programmers can describe algorithms in a "high level" notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will ..."
Abstract
- Add to MetaCart
ion means programmers can describe algorithms in a "high level" notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will translate them for each machine. In most programming situations reality is close to the ideal. Compilers for many high level languages are very good at generating efficient and portable code for typical computer systems, so programmers are able to express algorithms in high level languages and expect them to run efficiently on almost any machine. There may be a few isolated places where a programmer who invests a lot of effort may be able to write a more efficient routine in assembly language (the native language of the machine), but it is hardly ever worth the effort to write an entire program in assembly language. Obviously when all or part of a program is written in assembler it is not a...
Dynamically Reconfigurable Architecture for a Class of Real-Time Applications
, 1992
"... This report (thesis) presents an architectural design methodology for computing systems suitable for a class of real-time applications, characterized by a large volume of periodic real-time data input at a high rate and vector operations on the real-time data. The proposed methodology incorporates i ..."
Abstract
- Add to MetaCart
This report (thesis) presents an architectural design methodology for computing systems suitable for a class of real-time applications, characterized by a large volume of periodic real-time data input at a high rate and vector operations on the real-time data. The proposed methodology incorporates into the architectural design the notion of resource sharing as well as techniques for satisfying timing requirements.
Efficient Throughput Cores for Asymmetric Manycore Processors
, 2009
"... The microprocessor industry has had to switch from developing ever more complex and more deeply pipelined single-core processors to multicore processors due to running into power, thermal and complexity limits. Future microprocessors will be asymmetric manycore chip multiprocessors, with a small num ..."
Abstract
- Add to MetaCart
The microprocessor industry has had to switch from developing ever more complex and more deeply pipelined single-core processors to multicore processors due to running into power, thermal and complexity limits. Future microprocessors will be asymmetric manycore chip multiprocessors, with a small number of complex cores for serial programs and serial sections of parallel programs. The majority of the cores will be small, power- and area-efficient cores to maximize overall throughput in a limited power budget. The main contributions of this dissertation are techniques for improving the performance and area-efficiency of these throughput-oriented cores. This work shows how the single-thread performance of small, scalar cores can be increased or dynamically combined to speed up programs with only a limited number of parallel threads. It also shows how to improve both the cores and the cache subsystem of multicore processor using SIMD cores. iv Acknowledgments

