Results 1 - 10
of
127
Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism
- ACM Transactions on Computer Systems
, 1992
"... Threads are the vehicle,for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIX-like processes, such as address spaces and I/O descriptors. The objective of this separation is to make the expr ..."
Abstract
-
Cited by 420 (18 self)
- Add to MetaCart
Threads are the vehicle,for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIX-like processes, such as address spaces and I/O descriptors. The objective of this separation is to make the expression and control of parallelism sufficiently cheap that the programmer or compiler can exploit even fine-grained parallelism with acceptable overhead. Threads can be supported either by the operating system kernel or by user-level library code in the application address space, but neither approach has been fully satisfactory. This paper addresses this dilemma. First, we argue that the performance of kernel threads is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; we thus argue that managing par- allelism at the user level is essential to high-performance parallel computing. Next, we argue that the lack of system integration exhibited by user-level threads is a consequence of the lack of kernel support for user-level threads provided by contemporary multiprocessor operating systems; we thus argue that kernel threads or processes, as currently conceived, are the wrong abstraction on which to support user- level management of parallelism. Finally, we describe the design, implementation, and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromis- ing the performance and flexibility advantages of user-level management of parallelism.
A Decomposition of Multi-Dimensional Point Sets with Applications to k-Nearest-Neighbors and n-Body Potential Fields
- J. ACM
, 1992
"... We define the notion of a well-separated pair decomposition of points in d-dimensional space. We then develop efficient sequential and parallel algorithms for computing such a decomposition. We apply the resulting decomposition to the efficient computation of k-nearest neighbors and n-body potential ..."
Abstract
-
Cited by 214 (4 self)
- Add to MetaCart
We define the notion of a well-separated pair decomposition of points in d-dimensional space. We then develop efficient sequential and parallel algorithms for computing such a decomposition. We apply the resulting decomposition to the efficient computation of k-nearest neighbors and n-body potential fields.
The University of Florida sparse matrix collection
- NA DIGEST
, 1997
"... The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications. Its matrices cover a wide spectrum of problem domains, both those arising from problems with underlying 2D or 3D geometry (structural enginee ..."
Abstract
-
Cited by 205 (8 self)
- Add to MetaCart
The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications. Its matrices cover a wide spectrum of problem domains, both those arising from problems with underlying 2D or 3D geometry (structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, networks and graphs, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, and power networks). The collection meets a vital need that artificially-generated matrices cannot meet, and is widely used by the sparse matrix algorithms community for the development and performance evaluation of sparse matrix algorithms. The collection includes software for accessing and managing the collection, from MATLAB, Fortran, and C.
The PARSEC benchmark suite: Characterization and architectural implications
- IN PRINCETON UNIVERSITY
, 2008
"... This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited ..."
Abstract
-
Cited by 150 (1 self)
- Add to MetaCart
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.
NAMD2: Greater Scalability for Parallel Molecular Dynamics
- JOURNAL OF COMPUTATIONAL PHYSICS
, 1998
"... Molecular dynamics programs simulate the behavior of biomolecular systems, leading to insights and understanding of their functions. However, the computational complexity of such simulations is enormous. Parallel machines provide the potential to meet this computational challenge. To harness this ..."
Abstract
-
Cited by 136 (31 self)
- Add to MetaCart
Molecular dynamics programs simulate the behavior of biomolecular systems, leading to insights and understanding of their functions. However, the computational complexity of such simulations is enormous. Parallel machines provide the potential to meet this computational challenge. To harness this potential, it is necessary to develop a scalable program. It is also necessary that the program be easily modified by application-domain programmers. The
Fast Parallel Algorithms for Short-Range Molecular Dynamics
- JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract
-
Cited by 128 (6 self)
- Add to MetaCart
Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently -- those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed--memory parallel machine which allows for message--passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers -- the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventi...
Hoard: A Scalable Memory Allocator for Multithreaded Applications
, 2000
"... Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiproces ..."
Abstract
-
Cited by 93 (14 self)
- Add to MetaCart
Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.
Software Caching and Computation Migration in Olden
, 1995
"... The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migratio ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migration can improve the performance of these programs, and provide a compile-time heuristic that selects between them for each pointer dereference. We have implemented a prototype system on the Thinking Machines CM-5. We describe our implementation and report on experiments with ten benchmarks.
BSPlib: The BSP Programming Library
, 1998
"... BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming ..."
Abstract
-
Cited by 80 (5 self)
- Add to MetaCart
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access using put or get operations, and bulk synchronous message passing. Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms, sorting, and molecular dynamics. Key words: Bulk synchronous parallel, parallel communications library, one-sided communication. 1 Introduction Since the earliest days of computing it has been clear that, sooner or lat...
Meta-level Programming with CodA
- IN PROCEEDINGS OF ECOOP'95, VOLUME LNCS 952
, 1995
"... Meta-levels are complex pieces of software with diverse demands in both the computation and interaction domains. Common techniques using just code to express behaviour fail to clearly assign responsibility for a particular behaviour's definition or to provide support for the reuse or integration o ..."
Abstract
-
Cited by 74 (1 self)
- Add to MetaCart
Meta-levels are complex pieces of software with diverse demands in both the computation and interaction domains. Common techniques using just code to express behaviour fail to clearly assign responsibility for a particular behaviour's definition or to provide support for the reuse or integration of existing behaviour descriptions. The techniques of ne-grained decomposition of meta-level behaviour into objects and their subsequent composition into object models provides a framework for creating, reusing and integrating complex object behaviours. Using such a framework, we show that users can develop and integrate quite different object models while retaining a high degree of abstraction and fostering meta-level component reuse.

