Results 1 - 10
of
1,393
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 573 (32 self)
- Add to MetaCart
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become
Simultaneous Multithreading: Maximizing On-Chip Parallelism
, 1995
"... This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide s ..."
Abstract
-
Cited by 823 (48 self)
- Add to MetaCart
This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
- In Proceedings of the 17th International Conf. on Machine Learning
, 2000
"... Despite its popularity for general clustering, K-means suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy for the t ..."
Abstract
-
Cited by 418 (5 self)
- Add to MetaCart
for the third. Building on prior work for algorithmic acceleration that is not based on approximation, we introduce a new algorithm that efficiently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC
Cache-Aware Scheduling and Analysis for Multicores ∗
"... The major obstacle to use multicores for real-time applications is that we may not predict and provide any guarantee on real-time properties of embedded software on such platforms; the way of handling the on-chip shared resources such as L2 cache may have a significant impact on the timing predictab ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
predictability. In this paper, we propose to use cache space isolation techniques to avoid cache contention for hard realtime tasks running on multicores with shared caches. We present a scheduling strategy for real-time tasks with both timing and cache space constraints, which allows each task to use a fixed
Cache-Conscious Data Placement
- in Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems
, 1998
"... As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache performance by mapping code with temporal locality to different cache blocks in the vir ..."
Abstract
-
Cited by 163 (4 self)
- Add to MetaCart
in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache performance. In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates
Dealing with disaster: Surviving misbehaved kernel extensions
- In OSDI
, 1996
"... Today’s extensible operating systems allow applications to modify kernel behavior by providing mechanisms for application code to run in the kernel address space. The advantage of this approach is that it provides improved application flexibility and performance; the disadvantage is that buggy or ma ..."
Abstract
-
Cited by 276 (9 self)
- Add to MetaCart
Today’s extensible operating systems allow applications to modify kernel behavior by providing mechanisms for application code to run in the kernel address space. The advantage of this approach is that it provides improved application flexibility and performance; the disadvantage is that buggy
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
"... This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granulari ..."
Abstract
-
Cited by 236 (5 self)
- Add to MetaCart
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine
State-Space Caching Revisited
, 1992
"... State-space caching is a verification technique for finite-state concurrent systems. It performs an exhaustive exploration of the state-space of the system being checked while storing only all states of just one execution sequence plus as many other previously visited states as available memory a ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
State-space caching is a verification technique for finite-state concurrent systems. It performs an exhaustive exploration of the state-space of the system being checked while storing only all states of just one execution sequence plus as many other previously visited states as available memory
Optimizing dynamically-typed object-oriented languages with polymorphic inline caches
, 1991
"... Abstract. We have developed and implemented techniques that double the performance of dynamically-typed object-oriented languages. Our SELF implementation runs twice as fast as the fastest Smalltalk implementation, despite SELF’s lack of classes and explicit variables. To compensate for the absence ..."
Abstract
-
Cited by 130 (10 self)
- Add to MetaCart
Abstract. We have developed and implemented techniques that double the performance of dynamically-typed object-oriented languages. Our SELF implementation runs twice as fast as the fastest Smalltalk implementation, despite SELF’s lack of classes and explicit variables. To compensate for the absence
Cache Conscious Indexing for Decision-Support in Main Memory
, 1999
"... We study indexing techniques for main memory, including hash indexes, binary search trees, T-trees, B+-trees, interpolation search, and binary search on arrays. In a decision-support context, our primary concerns are the lookup time, and the space occupied by the index structure.
Our goal is to pro ..."
Abstract
-
Cited by 106 (9 self)
- Add to MetaCart
is to provide faster lookup times than binary search by paying attention to reference locality and cache behavior, without using substantial extra space. We propose a new indexing technique called "Cache-Sensitive Search Trees" (CSS-trees). Our technique stores a directory structure on top of a sorted
Results 1 - 10
of
1,393