Results 1 - 10
of
14
Weak Ordering -- A New Definition
, 1990
"... A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater perfor ..."
Abstract
-
Cited by 213 (12 self)
- Add to MetaCart
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of rules for hardware that have to be made visible to software. The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than having to think about weaker memory, or even write buffers. Following this hypothesis, we re-define weak ordering as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints. We illustrate the power of the new definition with a set of software constraints that forbid data races and an imple-mentation for cache-coherent systems chat is not allowed by the old definition.
A Unified Formalization of Four Shared-Memory Models
- IEEE Transactions on Parallel and Distributed Systems
, 1993
"... This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory model, and datarace -free-0. The most intuitive and commonly assumed shared-memory model, sequential con ..."
Abstract
-
Cited by 105 (9 self)
- Add to MetaCart
This paper presents a shared-memory model, data-race-free-1, that unifies four earlier models: weak ordering, release consistency (with sequentially consistent special operations), the VAX memory model, and datarace -free-0. The most intuitive and commonly assumed shared-memory model, sequential consistency, limits performance. The models of weak ordering, release consistency, the VAX, and data-race-free-0 are based on the common intuition that if programs synchronize explicitly and correctly, then sequential consistency can be guaranteed with high performance. However, each model formalizes this intuition differently and has different advantages and disadvantages with respect to the other models. Data-race-free-1 unifies the models of weak ordering, release consistency, the VAX, and data-race-free-0 by formalizing the above intuition in a manner that retains the advantages of each of the four models. A multiprocessor is data-race-free-1 if it guarantees sequential consistency to data-...
Successive Overrelaxation for Support Vector Machines
- IEEE Transactions on Neural Networks
, 1998
"... Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs [11, 12, 9] is used to train a support vector machine (SVM) [20, 3] for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a t ..."
Abstract
-
Cited by 61 (14 self)
- Add to MetaCart
Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs [11, 12, 9] is used to train a support vector machine (SVM) [20, 3] for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a time, similar to Platt's sequential minimal optimization (SMO) algorithm [18] which handles two constraints at a time, it can process very large datasets that need not reside in memory. The algorithm converges linearly to a solution. Encouraging numerical results are presented on datasets with up to 10 million points. Such massive discrimination problems cannot be processed by conventional linear or quadratic programming methods, and to our knowledge have not been solved by other methods. 1 Introduction Successive overrelaxation, originally developed for the solution of large systems of linear equations [16, 15] has been successfully applied to mathematical programming problems [4, 11, 12, 1...
Where is Time Spent in Message-Passing and Shared-Memory Programs?
- In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI
, 1994
"... Message passing and shared memory are two techniques parallel programs use for coordination and communication. This paper studies the strengths and weaknesses of these two mechanisms by comparing equivalent, well-written message-passing and shared-memory programs running on similar hardware. To ensu ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
Message passing and shared memory are two techniques parallel programs use for coordination and communication. This paper studies the strengths and weaknesses of these two mechanisms by comparing equivalent, well-written message-passing and shared-memory programs running on similar hardware. To ensure that our measurements are comparable, we produced two carefully tuned versions of each program and measured them on closely-related simulators of a message-passing and a shared-memory machine, both of which are based on same underlying hardware assumptions. We examined the behavior and performance of each program carefully. Although the cost of computation in each pair of programs was similar, synchronization and communication differed greatly. We found that message-passing's advantage over shared-memory is not clear-cut. Three of the four shared-memory programs ran at roughly the same speed as their message-passing equivalent, even though their communication patterns were different. 1 In...
Designing Memory Consistency Models for Shared-Memory Multiprocessors
, 1993
"... The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations exploited by uniprocessors. For higher performance, several alternative models have been proposed. However, many of these are hardware-centric in nature and difficult to program. Further, the multitude of many seemingly unrelated memory models inhibits portability. We use a 3P criteria of programmability, portability, and performance to assess memory models, and find current models lacking in one or more of these criteria. This thesis establishes a unifying framework for reasoning about memory models that leads to models that adequately satisfy the 3P criteria. The first contribution of this thesis is a programmer-centric methodology, called sequential consistency normal form (SCNF), for specifying memory models. This methodology is based on the observation that performance enhancing optimizations can be allowed without violating sequential consistency if the system is given some information about the program. An SCNF model is a contract between the system and the programmer, where the system guarantees both high performance and sequential consistency only if the programmer provides certain information about the program. Insufficient information gives lower performance, but incorrect information
Location Consistency: Stepping Beyond the Barriers of Memory Coherence and Serializability
- McGill University, School of Computer
, 1994
"... A memory consistency model represents a binding "contract" between software and hardware in a shared-memory multiprocessor system. It is important to provide a memory consistency model that is easy to understand and that also facilitates efficient implementation. The memory consistency model that ha ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
A memory consistency model represents a binding "contract" between software and hardware in a shared-memory multiprocessor system. It is important to provide a memory consistency model that is easy to understand and that also facilitates efficient implementation. The memory consistency model that has been most commonly used in past work is sequential consistency (SC), which requires the execution of a parallel program to appear as some interleaving of the memory operations on a sequential machine. To reduce the rigid constraints of the SC model, several relaxed consistency models have been proposed, notably weak ordering (or weak consistency) (WC), release consistency (RC), data-race-free-0, and data-race-free-1. These models allow performance optimizations to be correctly applied, while guaranteeing that sequential consistency is retained for a specified class of programs. We call these models SCderived models. A central assumption in the definitions of all SC-derived memory consist...
Data Discrimination via Nonlinear Generalized Support Vector Machines
- Complementarity: Applications, Algorithms and Extensions
, 1999
"... The main purpose of this paper is to show that new formulations of support vector machines can generate nonlinear separating surfaces which can discriminate between elements of a given set better than a linear surface. The principal approach used is that of generalized support vector machines (GSVMs ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
The main purpose of this paper is to show that new formulations of support vector machines can generate nonlinear separating surfaces which can discriminate between elements of a given set better than a linear surface. The principal approach used is that of generalized support vector machines (GSVMs) which employ possibly indefinite kernels [17]. The GSVM training procedure is carried out by either the simple successive overrelaxation (SOR) [18] iterative method or by linear programming. This novel combination of powerful support vector machines [24, 5] with the highly effective SOR computational algorithm [15, 16, 14] or with linear programming allows us to use a nonlinear surface to discriminate between elements of a dataset that belong to one of two categories. Numerical results on a number of datasets show improved testing set correctness, by as much as a factor of two, when comparing the nonlinear GSVM surface to a linear separating surface. 1 Introduction A very simple convex qu...
Weak Ordering - A New Definition And Some Implications
, 1989
"... This paper is primarily concerned with the programmer's model of a shared memory system and its implications on hardware design and performance. A model for correct behavior of programs commonly (and often implicitly) assumed by programmers is that of sequential consistency, formally defined by Lamp ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper is primarily concerned with the programmer's model of a shared memory system and its implications on hardware design and performance. A model for correct behavior of programs commonly (and often implicitly) assumed by programmers is that of sequential consistency, formally defined by Lamport [Lam79] as follows: [A system is sequentially consistent if] the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. While the definition leaves the specific interpretation of the term operations undefined, for a shared memory system, it is usually assumed to refer to memory operations or accesses (e.g., reads and writes). Thus, stated simply for a shared memory system, the above definition translates into the following two conditions - (1) all memory accesses appear to execute atomically in some total order, and (2) all memory accesses of a single process appear to execute in program order. Uniprocessor systems offer the model of sequential consistency almost naturally and without much compromise in performance. In the simplest of architectures, where a processor is allowed to issue a memory access only after the previous access in program order is complete, a total order of memory accesses can be obtained based on the wall-clock time of their issue or execution. More sophisticated architectures allow overlap of instruction execution, out-of-order memory accesses, write buffers, caches (which may be lock-up free [Kro81]), etc. In these machines, an ordering of memory accesses based on wall-clock time of issue or execution may violate program order, but interlock logic assures that accesses appear...
Alternating Directions Methods for the Parallel Solution of Large-Scale Block-Structured Optimization Problems
, 1994
"... Prompted by advances in computer technology and the increasing confidence of decision makers in large-scale market models, practitioners of operations research are now tackling problems of increasing detail, complexity and size. This necessitates the development of new solution algorithms that explo ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Prompted by advances in computer technology and the increasing confidence of decision makers in large-scale market models, practitioners of operations research are now tackling problems of increasing detail, complexity and size. This necessitates the development of new solution algorithms that exploit problem structure as well as the properties of the target hardware, in order to minimize turnaround time and maximize model utilization. Many models in planning and scheduling exhibit a block-angular structure, that can represent spatial or temporal partial decomposability: decision variables can be broken down to largely independent blocks, that correspond to first-level decisions satisfying a subset of the constraints, which may represent a time period, or a geographical region, or a commodity. The blocks interact via coupling constraints related to second-level coordination of block decisions, such as shared resource allocation restrictions. In this thesis we construct three efficient decomposition algorithms for such block-angular problems. These algorithms belong to the family of alternating directions methods, and can be thought of as block Gauss-Seidel iterative schemes for an augmented Lagrangian, that exploit the block structure. Alternatively, they can be thought of as Douglas--Rachford schemes for calculating a zero of the maximal monotone subgradient operator. Our algorithms are of the "fork--join" type, alternating a local and a global computation phase. In the local phase, decoupled optimization subproblems corresponding to blocks are solved. In the global phase, solution information is combined and a coordination problem is solved, the results of which are used in modifying the objective function of the subproblems. The algorithms are thus similar to price-d...
Optimization Methods In Massive Datasets
"... We describe the role of generalized support vector machines in separating massive and complex data using arbitrary nonlinear kernels. Feature selection that improves generalization is implemented via an effective procedure that utilizes a polyhedral norm or a concave function minimization. Massive d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We describe the role of generalized support vector machines in separating massive and complex data using arbitrary nonlinear kernels. Feature selection that improves generalization is implemented via an effective procedure that utilizes a polyhedral norm or a concave function minimization. Massive data is separated using a linear programming chunking algorithm as well as a successive overrelaxation algorithm, each of which is capable of processing data with millions of points. 1 2 1. INTRODUCTION We address here the problem of classifying data in n-dimensional real (Euclidean) space R n into one of two disjoint nite point sets (i.e. classes). The support vector machine (SVM) approach to classication [57, 2, 25, 58, 13, 54, 55] attempts to separate points belonging to two given sets in R n by a nonlinear surface, often only implicitly dened by a kernel function. Since the nonlinear surface in R n is typically linear in its parameters, it can be represented as a linear func...

