Results 1 - 10
of
48,895
Compile-time Performance Prediction with
- In Proc. of the 4 th Int. Workshop on Compilers for Parallel Computers
, 1993
"... A procedure is described to automatically compile symbolic performance predictions in the course of program translation. It is also shown that a lower bound on the execution time can be predicted which outperforms traditional static estimations at a negligible increase in cost. The method is demonst ..."
Abstract
- Add to MetaCart
is demonstrated by its application to a parallel LU factorization algorithm on a multiprocessor. 1 Introduction Compile-time performance prediction can provide essential feedback to enable program and machine parameter optimization by both the user and the compiler. In this paper we study the possibility
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems
- IN PROCEEDINGS OF THE 1994 WINTER USENIX CONFERENCE
, 1994
"... TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Ou ..."
Abstract
-
Cited by 527 (17 self)
- Add to MetaCart
. Our objective is to determine the efficiency of a user-level DSM implementation on commercially available workstations and operating systems. We achieved good speedups on the 8-processor ATM network for Jacobi (7.4), TSP (7.2), Quicksort (6.3), and ILINK (5.7). For a slightly modified version
Compile-Time Pointer Reversal
, 1996
"... This paper introduces an alternative representation for λ-terms which has the notable property that the search for the leftmost outermost redex is restricted to two steps. This is important in the implementation of a lazy functional programming language, as this search consumes time and space. The r ..."
Abstract
- Add to MetaCart
This paper introduces an alternative representation for λ-terms which has the notable property that the search for the leftmost outermost redex is restricted to two steps. This is important in the implementation of a lazy functional programming language, as this search consumes time and space
Transfer of Cognitive Skill
, 1989
"... A framework for skill acquisition is proposed that includes two major stages in the development of a cognitive skill: a declarative stage in which facts about the skill domain are interpreted and a procedural stage in which the domain knowledge is directly embodied in procedures for performing the s ..."
Abstract
-
Cited by 869 (21 self)
- Add to MetaCart
. These processes include generalization, discrimination, and strengthening of productions. Comparisons are made to similar concepts from past learning theories. How these learning mechanisms apply to produce the power law speedup in processing time with practice is discussed. It requires at least 100 hours
Parallel database systems: the future of high performance database systems
- Communications of the ACM
, 1992
"... Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper ..."
Abstract
-
Cited by 638 (13 self)
- Add to MetaCart
Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper
The implementation of the cilk-5 multithreaded language
- In PLDI ’98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
, 1998
"... The fth release of the multithreaded language Cilk uses a provably good \work-stealing " scheduling algorithm similar to the rst system, but the language has been completely re-designed and the runtime system completely reengineered. The eciency of the new implementation was aided by a clear st ..."
Abstract
-
Cited by 493 (30 self)
- Add to MetaCart
-rst " principle has led to a portable Cilk-5 im-plementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk pro-grams run on one processor with virtually no degradation compared
Cilk: An Efficient Multithreaded Runtime System
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "cri ..."
Abstract
-
Cited by 750 (40 self)
- Add to MetaCart
strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Multiscalar Processors
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract
-
Cited by 585 (30 self)
- Add to MetaCart
are dynamically routed among the many parallel pro-cessing units with the help of compiler-generated masks. Memory accesses may occur speculatively without knowledge of preceding loads or stores. Addresses are disambiguated dynamically, many in parallel, and processing waits only for true data dependence
Simultaneous Multithreading: Maximizing On-Chip Parallelism
, 1995
"... This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide s ..."
Abstract
-
Cited by 802 (48 self)
- Add to MetaCart
multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous multithreading
Results 1 - 10
of
48,895