Results 1 - 10
of
1,909
Analysis of Optimal Thread Pool Size
- ACM SIGOPS Operating System Review
, 2000
"... The success of e-commerce, messaging middleware, and other Internet-based applications depends in part on the ability of network servers to respond in a timely and reliable manner to simultaneous service requests. Multithreaded systems, due to their efficient use of system resources and the populari ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
resources are wasted maintaining the thread pool. If the thread pool is too small, then additional threads must be created and destroyed on the fly to handle new requests. We analytically determine the optimal thread pool size to maximize the expected gain of using a thread. 1
An Evaluation of Optimized Threaded Code Generation
- In Proc. of the Conf. on Parallel Architectures and Compilation Techniques
, 1994
"... : Multithreaded architectures hold many promises: the exploitation of intra-thread locality and the latency tolerance of multithreaded synchronization can result in a more efficient processor utilization and higher scalability. The challenge for a code generation scheme is to make effective use of t ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
only a limited view of the code at any one time limits the thread size. These top-down generated threads can therefore be optimized by global, bottom-up optimization techniques. In this paper, we present such bottom-up optimizations and evaluate their effectiveness in terms of overall performance
Optimizing Threaded MPI Execution on SMP Clusters
- IN PROC. OF 15TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING
, 2001
"... Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design for thr ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design
Best Practices for Developing and Optimizing Threaded Applications
"... While threading can be a challenge, new software development tools help simplify the process by identifying thread correctness issues and performance opportunities. We present a methodology that has been used to successfully thread many applications and discuss tools that can assist in developing mu ..."
Abstract
- Add to MetaCart
multi-threaded applications. Microprocessor design is experiencing a shift away from a predominant focus on pure performance to a balanced approach that optimizes for power as well as performance. Multi-core processors are capable of greater performance with optimal power consumption by concurrently
Optimal Thread-to-Core Mapping for Pipeline Programs
"... Pipelining is commonly used in multi-threaded code. In pipeline programs, the computation is divided into stages that perform different types of computations. Unlike in a data parallel program, threads in a pipeline program have different behavior. Because of the asymmetry, the performance varies si ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Pipelining is commonly used in multi-threaded code. In pipeline programs, the computation is divided into stages that perform different types of computations. Unlike in a data parallel program, threads in a pipeline program have different behavior. Because of the asymmetry, the performance varies
Composable memory transactions
- In Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2005
"... Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring changes ..."
Abstract
-
Cited by 509 (43 self)
- Add to MetaCart
leaving the block. This paper takes a four-pronged approach to improving performance: (1) we introduce a new ‘direct access ’ implementation that avoids searching thread-private logs, (2) we develop compiler optimizations to reduce the amount of logging (e.g. when a thread accesses the same data
Scheduling Multithreaded Computations by Work Stealing
, 1994
"... This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is “work stealing," in which processors needing work steal com ..."
Abstract
-
Cited by 568 (34 self)
- Add to MetaCart
of the algorithm is at most O(TmS,,,P), where S, is the site of the largest activation record of any thread, thereby justify-ing the folk wisdom that work-stealing schedulers are more communication eficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a
Automatically characterizing large scale program behavior
, 2002
"... Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-pile ..."
Abstract
-
Cited by 778 (41 self)
- Add to MetaCart
-piler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we.must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections
Evaluating and Optimizing Thread Pool Strategies for Real-Time CORBA
- Proc. of the ACM SIGPLAN Workshop on Language, Compiler and Tool Support for Embedded Systems
, 2000
"... Strict control over the scheduling and execution of processor resources is essential for many fixed-priority real-time applications. To facilitate this common requirement, the Real-Time CORBA (RT-CORBA) specification defines standard middleware features that support end-to-end predictability for ope ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
of endpoints and event demultiplexers required, (3) efficiency in terms of data movement, context switches, memory allocations, and synchronizations required, (4) optimizations in terms of stack and thread specific storage memory allocations, and (5) bounded and unbounded priority inversion incurred in each
Results 1 - 10
of
1,909