Results 1  10
of
550,573
Dryad: Distributed DataParallel Programs from Sequential Building Blocks
 In EuroSys
, 2007
"... Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Abstract

Cited by 730 (27 self)
 Add to MetaCart
Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set
Fast Parallel Algorithms for ShortRange Molecular Dynamics
 JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract

Cited by 622 (6 self)
 Add to MetaCart
Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 562 (15 self)
 Add to MetaCart
development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable
UNet: A UserLevel Network Interface for Parallel and Distributed Computing
 In Fifteenth ACM Symposium on Operating System Principles
, 1995
"... The UNet communication architecture provides processes with a virtual view of a network interface to enable userlevel access to highspeed communication devices. The architecture, implemented on standard workstations using offtheshelf ATM communication hardware, removes the kernel from the communi ..."
Abstract

Cited by 596 (17 self)
 Add to MetaCart
The UNet communication architecture provides processes with a virtual view of a network interface to enable userlevel access to highspeed communication devices. The architecture, implemented on standard workstations using offtheshelf ATM communication hardware, removes the kernel from the communication path, while still providing full protection. The model presented by UNet allows for the construction of protocols at user level whose performance is only limited by the capabilities of network. The architecture is extremely flexible in the sense that traditional protocols like TCP and UDP, as well as novel abstractions like Active Messages can be implemented efficiently. A UNet prototype on an 8node ATM cluster of standard workstations offers 65 microseconds roundtrip latency and 15 Mbytes/sec bandwidth. It achieves TCP performance at maximum network bandwidth and demonstrates performance equivalent to Meiko CS2 and TMC CM5 supercomputers on a set of SplitC benchmarks. 1
The program dependence graph and its use in optimization
 ACM Transactions on Programming Languages and Systems
, 1987
"... In this paper we present an intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependence5 for each operation in a program. Data dependences have been used to represent only the relevant data flow relationships of a program. ..."
Abstract

Cited by 989 (3 self)
 Add to MetaCart
. Control dependence5 are introduced to analogously represent only the essential control flow relationships of a program. Control dependences are derived from the usual control flow graph. Many traditional optimizations operate more efficiently on the PDG. Since dependences in the PDG connect
Optimization Flow Control, I: Basic Algorithm and Convergence
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 1999
"... We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using gradient projection algorithm. In thi ..."
Abstract

Cited by 690 (64 self)
 Add to MetaCart
We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using gradient projection algorithm
Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization
 SIAM Journal on Optimization
, 1993
"... We study the semidefinite programming problem (SDP), i.e the problem of optimization of a linear function of a symmetric matrix subject to linear equality constraints and the additional condition that the matrix be positive semidefinite. First we review the classical cone duality as specialized to S ..."
Abstract

Cited by 557 (12 self)
 Add to MetaCart
We study the semidefinite programming problem (SDP), i.e the problem of optimization of a linear function of a symmetric matrix subject to linear equality constraints and the additional condition that the matrix be positive semidefinite. First we review the classical cone duality as specialized
Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?
, 2004
"... Suppose we are given a vector f in RN. How many linear measurements do we need to make about f to be able to recover f to within precision ɛ in the Euclidean (ℓ2) metric? Or more exactly, suppose we are interested in a class F of such objects— discrete digital signals, images, etc; how many linear m ..."
Abstract

Cited by 1513 (20 self)
 Add to MetaCart
Suppose we are given a vector f in RN. How many linear measurements do we need to make about f to be able to recover f to within precision ɛ in the Euclidean (ℓ2) metric? Or more exactly, suppose we are interested in a class F of such objects— discrete digital signals, images, etc; how many linear measurements do we need to recover objects from this class to within accuracy ɛ? This paper shows that if the objects of interest are sparse or compressible in the sense that the reordered entries of a signal f ∈ F decay like a powerlaw (or if the coefficient sequence of f in a fixed basis decays like a powerlaw), then it is possible to reconstruct f to within very high accuracy from a small number of random measurements. typical result is as follows: we rearrange the entries of f (or its coefficients in a fixed basis) in decreasing order of magnitude f  (1) ≥ f  (2) ≥... ≥ f  (N), and define the weakℓp ball as the class F of those elements whose entries obey the power decay law f  (n) ≤ C · n −1/p. We take measurements 〈f, Xk〉, k = 1,..., K, where the Xk are Ndimensional Gaussian
Wattch: A Framework for ArchitecturalLevel Power Analysis and Optimizations
 In Proceedings of the 27th Annual International Symposium on Computer Architecture
, 2000
"... Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high ..."
Abstract

Cited by 1295 (43 self)
 Add to MetaCart
Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.
Scheduler Activations: Effective Kernel Support for the UserLevel Management of Parallelism
 ACM Transactions on Computer Systems
, 1992
"... Threads are the vehicle,for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIXlike processes, such as address spaces and I/O descriptors. The objective of this separation is to make the expr ..."
Abstract

Cited by 475 (21 self)
 Add to MetaCart
Threads are the vehicle,for concurrency in many approaches to parallel programming. Threads separate the notion of a sequential execution stream from the other aspects of traditional UNIXlike processes, such as address spaces and I/O descriptors. The objective of this separation is to make
Results 1  10
of
550,573