Results 1 - 10
of
21
Virtual Memory Primitives for User Programs
, 1991
"... Memory Management Units (MMUs) are traditionally used by operating systems to implement disk-paged virtual memory. Some operating systems allow user programs to specify the protection level (inaccessible, readonly. read-write) of pages, and allow user programs t.o handle protection violations. bur. ..."
Abstract
-
Cited by 170 (2 self)
- Add to MetaCart
Memory Management Units (MMUs) are traditionally used by operating systems to implement disk-paged virtual memory. Some operating systems allow user programs to specify the protection level (inaccessible, readonly. read-write) of pages, and allow user programs t.o handle protection violations. bur. these mechanisms are not. always robust, efficient, or well-mat. ched to the needs of applications.
Practical, transparent operating system support for superpages
- SIGOPS Oper. Syst. Rev
, 2002
"... Most general-purpose processors provide support for memory pages of large sizes, called superpages. Superpages enable each entry in the translation lookaside buffer (TLB) to map a large physical memory region into a virtual address space. This dramatically increases TLB coverage, reduces TLB misses, ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Most general-purpose processors provide support for memory pages of large sizes, called superpages. Superpages enable each entry in the translation lookaside buffer (TLB) to map a large physical memory region into a virtual address space. This dramatically increases TLB coverage, reduces TLB misses, and promises performance improvements for many applications. However, supporting superpages poses several challenges to the operating system, in terms of superpage allocation and promotion tradeoffs, fragmentation control, etc. We analyze these issues, and propose the design of an effective superpage management system. We implement it in FreeBSD on the Alpha CPU, and evaluate it on real workloads and benchmarks. We obtain substantial performance benefits, often exceeding 30%; these benefits are sustained even under stressful workload scenarios. 1
Angel: A Proposed Multiprocessor Operating System Kernel (Extended Abstract)
, 1991
"... ) T.Wilkinson, T.Stiemerling and P.Osmon Computer Science Department, City University, Northampton Square, London EC1V 0HB, UK. & A.Saulsbury and P.Kelly Department of Computing, Imperial College, 180 Queens Gate, London SW7 2BZ, UK. 1 Introduction We describe an experimental multiprocessor operati ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
) T.Wilkinson, T.Stiemerling and P.Osmon Computer Science Department, City University, Northampton Square, London EC1V 0HB, UK. & A.Saulsbury and P.Kelly Department of Computing, Imperial College, 180 Queens Gate, London SW7 2BZ, UK. 1 Introduction We describe an experimental multiprocessor operating system called Angel. The design of Angel builds on experience gained from constructing the Meshix operating system [1] and from the implementation of a distributed shared memory server on this system [2]. The aim is for Angel to support a single uniform shared memory address space on a multiprocessor machine which contains both shared and distributed memory. Such a machine is seen to be a hierarchical system containing multiprocessor clusters using physically shared memory, which are then loosely coupled using a multi-path network. A number of current operating systems have a micro-kernel implementation and support lightweight threads, examples being Mach [3] and Chorus [4]. In these syst...
Virtual Shared Memory: A Survey of Techniques and Systems
, 1992
"... Shared memory abstraction on distributed memory hardware has become very popular recently. The abstraction can be provided at various levels in the architecture e.g. hardware, software, employing special mechanisms to maintain coherence of data. In this paper we present a survey of basic techniques ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Shared memory abstraction on distributed memory hardware has become very popular recently. The abstraction can be provided at various levels in the architecture e.g. hardware, software, employing special mechanisms to maintain coherence of data. In this paper we present a survey of basic techniques and review a large number of architectures that provide such an abstraction. We also propose new terminology which is more consistent and orderly as compared with the existing use of terminology for such architectures. 1 Introduction Virtual Shared Memory (VSM) in its most general sense refers to a provision of a shared address space on distributed memory hardware. Such architectures contain no physically shared memory. Instead the distributed local memories collectively provide a virtual address space shared by all the processors. VSM combines the benefits of the ease of programming found in shared-memory multiprocessors with the scalability of message-passing multiprocessors. The implemen...
A Robust Main-Memory Compression Scheme
- In Proceedings of the 32nd Annual International Symposium on Computer Architecture
, 2005
"... Lossless data compression techniques can potentially free up more than 50 % of the memory resources. However, previously proposed schemes suffer from high access costs. The proposed main-memory compression scheme practically eliminates performance losses of previous schemes by exploiting a simple an ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Lossless data compression techniques can potentially free up more than 50 % of the memory resources. However, previously proposed schemes suffer from high access costs. The proposed main-memory compression scheme practically eliminates performance losses of previous schemes by exploiting a simple and yet effective compression scheme, a highly-efficient structure for locating a compressed block in memory, and a hierarchical memory layout that allows compressibility of blocks to vary with a low fragmentation overhead. We have evaluated an embodiment of the proposed scheme in detail using 14 integer and floating point applications from the SPEC2000 suite along with two server applications and we show that the scheme robustly frees up 30 % of the memory resources, on average, with a negligible impact on the performance of only
FUGU: Implementing Translation and Protection in a Multiuser Multimodel Multiprocessor
, 1994
"... Multimodel multiprocessors provide both shared memory and message passing primitives to the user for efficient communication. In a multiuser machine, translation permits machine resources to be virtualized and protection permits users to be isolated. The challenge in a multiuser multiprocessor is ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Multimodel multiprocessors provide both shared memory and message passing primitives to the user for efficient communication. In a multiuser machine, translation permits machine resources to be virtualized and protection permits users to be isolated. The challenge in a multiuser multiprocessor is to provide translation and protection sufficient for general-purpose computing without compromising communication performance, particularly the performance of communication between parallel threads belonging to the same computation. FUGU is a proposed architecture that integrates translation and protection with a set of communication mechanisms originally designed for high performance on a single-user, physically-addressed, large-scale, multimodel multiprocessor. Communication
Legba: Fast Hardware Support for Fine-Grained Protection
- In Proceedings of the 8th Australia-Pacific Computer Systems Architecture Conference (ACSAC’2003
, 2003
"... Fine-grained hardware protection, if it can be done without slowing down the processor, could deliver significant benefits to software, enabling the implementation of strongly encapsulated light-weight objects. In this paper we introduce Legba, a new caching architecture that aims at supporting f ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Fine-grained hardware protection, if it can be done without slowing down the processor, could deliver significant benefits to software, enabling the implementation of strongly encapsulated light-weight objects. In this paper we introduce Legba, a new caching architecture that aims at supporting fine-grained memory protection and protected procedure calls without slowing down the processor 's clock speed.
Emulation of a Virtual Shared Memory Architecture
- Department of Computer Science, University of Bristol, Bristol
, 1993
"... In designing a multiprocessor architecture, the motivating factors are that the architecture should be general purpose, easier to program and at the same time scalable. The Data Diffusion Machine (DDM) seeks to fulfil such criteria. The DDM provides shared-data access on distributed memory hardware, ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In designing a multiprocessor architecture, the motivating factors are that the architecture should be general purpose, easier to program and at the same time scalable. The Data Diffusion Machine (DDM) seeks to fulfil such criteria. The DDM provides shared-data access on distributed memory hardware, allowing data to freely migrate to processors on demand. The DDM concept was originally proposed in terms of a hierarchy of buses, but has since been elaborated for different interconnects. This thesis presents a link-based realisation of the architecture and a link-based coherence protocol which is central in maintaining coherence of data. The link-based protocol exploits the combining properties of the DDM network to minimise traffic in the DDM hierarchy. The protocol also contains efficient and general support for synchronisation. To evaluate the design and performance of new architectures, trace-driven simulation is often used. This thesis presents a novel prototyping and performance ev...
Subspace snooping: Filtering snoops with operating system support
- in Proceedings of the The Nineteenth International Conference on Parallel Architectures and Compilation Techniques
, 2010
"... Although snoop-based coherence protocols provide fast cacheto-cache transfers with a simple and robust coherence mechanism, scaling the protocols has been difficult due to the overheads of broadcast snooping. In this paper, we propose a coherence filtering technique called subspace snooping, which s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Although snoop-based coherence protocols provide fast cacheto-cache transfers with a simple and robust coherence mechanism, scaling the protocols has been difficult due to the overheads of broadcast snooping. In this paper, we propose a coherence filtering technique called subspace snooping, which stores the potential sharers of each memory page in the page table entry. By using the sharer information in the page table entry, coherence transactions for a page generate snoop requests only to the subset of nodes in the system (subspace). However, the coherence subspace of a page may evolve, as the phases of applications may change or the operating system may migrate threads to different nodes. To adjust subspaces dynamically, subspace snooping supports a shrinking mechanism, which removes obsolete nodes from subspaces. Subspace snooping can be integrated to any type of coherence protocols and network topologies. As subspace snooping guarantees that a subspace always contains the precise sharers of a page, it does not restrict the designs of coherence protocols and networks. We evaluate subspace snooping with Token Coherence on un-ordered mesh networks. For scientific and server applications on a 16-core system, subspace snooping reduces 44 % of snoops on average.
Design and Evaluation of the Hamal Parallel Computer
, 2002
"... Over the years there has been an enormous amount of hardware research in parallel computation. It os a testament... ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Over the years there has been an enormous amount of hardware research in parallel computation. It os a testament...

