• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 48,895
Next 10 →

Compile-time Performance Prediction with

by Pamela Proc, Arjan J. C. Van Gemund - In Proc. of the 4 th Int. Workshop on Compilers for Parallel Computers , 1993
"... A procedure is described to automatically compile symbolic performance predictions in the course of program translation. It is also shown that a lower bound on the execution time can be predicted which outperforms traditional static estimations at a negligible increase in cost. The method is demonst ..."
Abstract - Add to MetaCart
is demonstrated by its application to a parallel LU factorization algorithm on a multiprocessor. 1 Introduction Compile-time performance prediction can provide essential feedback to enable program and machine parameter optimization by both the user and the compiler. In this paper we study the possibility

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems

by Pete Keleher , Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel - IN PROCEEDINGS OF THE 1994 WINTER USENIX CONFERENCE , 1994
"... TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Ou ..."
Abstract - Cited by 527 (17 self) - Add to MetaCart
. Our objective is to determine the efficiency of a user-level DSM implementation on commercially available workstations and operating systems. We achieved good speedups on the 8-processor ATM network for Jacobi (7.4), TSP (7.2), Quicksort (6.3), and ILINK (5.7). For a slightly modified version

Compile-Time Pointer Reversal

by Simon Brock , 1996
"... This paper introduces an alternative representation for λ-terms which has the notable property that the search for the leftmost outermost redex is restricted to two steps. This is important in the implementation of a lazy functional programming language, as this search consumes time and space. The r ..."
Abstract - Add to MetaCart
This paper introduces an alternative representation for λ-terms which has the notable property that the search for the leftmost outermost redex is restricted to two steps. This is important in the implementation of a lazy functional programming language, as this search consumes time and space

Transfer of Cognitive Skill

by John R. Anderson , 1989
"... A framework for skill acquisition is proposed that includes two major stages in the development of a cognitive skill: a declarative stage in which facts about the skill domain are interpreted and a procedural stage in which the domain knowledge is directly embodied in procedures for performing the s ..."
Abstract - Cited by 869 (21 self) - Add to MetaCart
. These processes include generalization, discrimination, and strengthening of productions. Comparisons are made to similar concepts from past learning theories. How these learning mechanisms apply to produce the power law speedup in processing time with practice is discussed. It requires at least 100 hours

Parallel database systems: the future of high performance database systems

by David J. Dewitt, Jim Gray - Communications of the ACM , 1992
"... Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper ..."
Abstract - Cited by 638 (13 self) - Add to MetaCart
Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper

The implementation of the cilk-5 multithreaded language

by Matteo Frigo, Charles E. Leiserson, Keith H. Randall - In PLDI ’98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation , 1998
"... The fth release of the multithreaded language Cilk uses a provably good \work-stealing " scheduling algorithm similar to the rst system, but the language has been completely re-designed and the runtime system completely reengineered. The eciency of the new implementation was aided by a clear st ..."
Abstract - Cited by 493 (30 self) - Add to MetaCart
-rst " principle has led to a portable Cilk-5 im-plementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk pro-grams run on one processor with virtually no degradation compared

Cilk: An Efficient Multithreaded Runtime System

by Robert D. Blumofe , Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou - JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING , 1995
"... Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "cri ..."
Abstract - Cited by 750 (40 self) - Add to MetaCart
strict" (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk

Multiscalar Processors

by Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar - In Proceedings of the 22nd Annual International Symposium on Computer Architecture , 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract - Cited by 585 (30 self) - Add to MetaCart
are dynamically routed among the many parallel pro-cessing units with the help of compiler-generated masks. Memory accesses may occur speculatively without knowledge of preceding loads or stores. Addresses are disambiguated dynamically, many in parallel, and processing waits only for true data dependence

VLFeat -- An open and portable library of computer vision algorithms

by Andrea Vedaldi, et al. , 2010
"... ..."
Abstract - Cited by 514 (10 self) - Add to MetaCart
Abstract not found

Simultaneous Multithreading: Maximizing On-Chip Parallelism

by Dean M. Tullsen , Susan J. Eggers, Henry M. Levy , 1995
"... This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide s ..."
Abstract - Cited by 802 (48 self) - Add to MetaCart
multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous multithreading
Next 10 →
Results 1 - 10 of 48,895
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2018 The Pennsylvania State University