Results 1  10
of
129
Obstructionfree synchronization: Doubleended queues as an example
 In preparation
, 2003
"... We introduce obstructionfreedom, a new nonblocking property for shared data structure implementations. This property is strong enough to avoid the problems associated with locks, but it is weaker than previous nonblocking properties—specifically lockfreedom and waitfreedom— allowing greater flexi ..."
Abstract

Cited by 167 (17 self)
 Add to MetaCart
We introduce obstructionfreedom, a new nonblocking property for shared data structure implementations. This property is strong enough to avoid the problems associated with locks, but it is weaker than previous nonblocking properties—specifically lockfreedom and waitfreedom— allowing greater flexibility in the design of efficient implementations. Obstructionfreedom admits substantially simpler implementations, and we believe that in practice it provides the benefits of waitfree and lockfree implementations. To illustrate the benefits of obstructionfreedom, we present two obstructionfree CASbased implementations of doubleended queues (deques); the first is implemented on a linear array, the second on a circular array. To our knowledge, all previous nonblocking deque implementations are based on unrealistic assumptions about hardware support for synchronization, have restricted functionality, or have operations that interfere with operations at the opposite end of the deque even when the deque has many elements in it. Our obstructionfree implementations have none of these drawbacks, and thus suggest that it is much easier to design obstructionfree implementations than lockfree and waitfree ones. We also briefly discuss other obstructionfree data structures and operations that we have implemented. 1.
SPIRAL: Code Generation for DSP Transforms
 PROCEEDINGS OF THE IEEE SPECIAL ISSUE ON PROGRAM GENERATION, OPTIMIZATION, AND ADAPTATION
, 2005
"... Abstract — Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL that considers this problem for the performancecritical domain of linear digital sig ..."
Abstract

Cited by 143 (32 self)
 Add to MetaCart
Abstract — Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL that considers this problem for the performancecritical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem, and exploits the domainspecific mathematical structure of transform algorithms to implement a feedbackdriven optimizer. Similar to a human expert, for a specified transform, SPIRAL “intelligently ” generates and explores algorithmic and implementation choices to find the best match to the computer’s microarchitecture. The “intelligence” is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high performance code for a broad set of DSP transforms including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code. Index Terms — library generation, code optimization, adaptation, automatic performance tuning, high performance computing, linear signal transform, discrete Fourier transform, FFT, discrete cosine transform, wavelet, filter, search, learning, genetic and evolutionary algorithm, Markov decision process I.
Partial Online Cycle Elimination in Inclusion Constraint Graphs
 IN PROCEEDINGS OF THE 1998 ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1998
"... Many program analyses are naturally formulated and implemented using inclusion constraints. We present new results on the scalable implementation of such analyses based on two insights: first, that online elimination of cyclic constraints yields ordersofmagnitude improvements in analysis time for ..."
Abstract

Cited by 114 (13 self)
 Add to MetaCart
Many program analyses are naturally formulated and implemented using inclusion constraints. We present new results on the scalable implementation of such analyses based on two insights: first, that online elimination of cyclic constraints yields ordersofmagnitude improvements in analysis time for large problems; second, that the choice of constraint representation affects the quality and efficiency of online cycle elimination. We present an analytical model that explains our design choices and show that the model's predictions match well with results from a substantial experiment.
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 107 (8 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
polymake: a Framework for Analyzing Convex Polytopes
, 1999
"... polymake is a software tool designed for the algorithmic treatment of polytopes and polyhedra. We give an overview of the functionality as well as of the structure. This paper can be seen as a first approximation to a polymake handbook. The tutorial starts with the very basics and ends up with a few ..."
Abstract

Cited by 97 (15 self)
 Add to MetaCart
polymake is a software tool designed for the algorithmic treatment of polytopes and polyhedra. We give an overview of the functionality as well as of the structure. This paper can be seen as a first approximation to a polymake handbook. The tutorial starts with the very basics and ends up with a few polymake applications to research problems. Then we present the main features of the system including the interfaces to other software products. polymake is free software; it is available on the Internet at http://www.math.tuberlin.de/diskregeom/polymake/.
Topologically Sweeping Visibility Complexes via Pseudotriangulations
, 1996
"... This paper describes a new algorithm for constructing the set of free bitangents of a collection of n disjoint convex obstacles of constant complexity. The algorithm runs in time O(n log n + k), where k is the output size, and uses O(n) space. While earlier algorithms achieve the same optimal run ..."
Abstract

Cited by 86 (9 self)
 Add to MetaCart
This paper describes a new algorithm for constructing the set of free bitangents of a collection of n disjoint convex obstacles of constant complexity. The algorithm runs in time O(n log n + k), where k is the output size, and uses O(n) space. While earlier algorithms achieve the same optimal running time, this is the first optimal algorithm that uses only linear space. The visibility graph or the visibility complex can be computed in the same time and space. The only complicated data structure used by the algorithm is a splittable queue, which can be implemented easily using redblack trees. The algorithm is conceptually very simple, and should therefore be easy to implement and quite fast in practice. The algorithm relies on greedy pseudotriangulations, which are subgraphs of the visibility graph with many nice combinatorial properties. These properties, and thus the correctness of the algorithm, are partially derived from properties of a certain partial order on the faces of th...
A Note on the Height of Binary Search Trees
, 1986
"... Let H. be the height of a binary search tree with n nodes constructed by standard insertions from a random permutation of I,..., n. It is shown that HJog n + c = 4.3 I 107... in probability as n + 00, where c is the unique solution of c log((2e)lc) = 1, c 2 2. Also, for all p> 0, lim,,E(H$)/ log ..."
Abstract

Cited by 79 (23 self)
 Add to MetaCart
Let H. be the height of a binary search tree with n nodes constructed by standard insertions from a random permutation of I,..., n. It is shown that HJog n + c = 4.3 I 107... in probability as n + 00, where c is the unique solution of c log((2e)lc) = 1, c 2 2. Also, for all p> 0, lim,,E(H$)/ log % = cp. Finally, it is proved that &/log n, c * = 0.3733..., in probability, where c * is defined by c log((2e)lc) = 1, c 5 1, and.S, is the saturation level of the same tree, that is, the number of full levels in the tree.
AutoBlocking MatrixMultiplication or Tracking BLAS3 Performance from Source Code
 In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
, 1997
"... An elementary, machineindependent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal code, tracking handcoded BLAS3 routines. ``Proof of concept'' is demonstrated by racing the inpla ..."
Abstract

Cited by 76 (6 self)
 Add to MetaCart
An elementary, machineindependent, recursive algorithm for matrix multiplication C+=A*B provides implicit blocking at every level of the memory hierarchy and tests out faster than classically optimal code, tracking handcoded BLAS3 routines. ``Proof of concept'' is demonstrated by racing the inplace algorithm against manufacturer's handtuned BLAS3 routines; it can win. The recursive code bifurcates naturally at the top level into independent blockoriented processes, that each writes to a disjoint and contiguous region of memory. Experience has shown that the indexing vastly improves the patterns of memory access at all levels of the memory hierarchy, independently of the sizes of caches or pages and without ad hoc programming. It also exposed a weakness in SGI's C compilers that merrily unroll loops for the superscalar R8000 processor, but do not analogously unfold the base cases of the most elementary recursions. Such deficiencies might deter programmers from using this rich class of recursive algorithms.
Partial Encryption of Compressed Images and Videos
, 2000
"... The increased popularity of multimedia applications places a great demand on efficient data storage and transmission techniques. Network communication, especially over a wireless network, can easily be intercepted and must be protected from eavesdroppers. Unfortunately, encryption and decryption ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
The increased popularity of multimedia applications places a great demand on efficient data storage and transmission techniques. Network communication, especially over a wireless network, can easily be intercepted and must be protected from eavesdroppers. Unfortunately, encryption and decryption are slow and it is often difficult, if not impossible, to carry out realtime secure image and video communication and processing. Methods have been proposed to combine compression and encryption together to reduce the overall processing time [3, 4, 12, 18, 20], but they are either insecure or too computationally intensive. We propose a novel solution, called partial encryption, in which a secure encryption algorithm is used to encrypt only part of the compressed data. Partial encryption is applied to several image and video compression algorithms in this paper. Only 13%27% of the output from quadtree compression algorithms [13, 17, 29, 30, 31, 32] is encrypted for typical images, and less than 2% is encrypted for 512 \Theta 512 images compressed by the SPIHT algorithm [26]. The results are similar for video compression, resulting in a significant reduction in encryption and decryption time. The proposed partial encryption schemes are fast, secure, and do not reduce the compression performance of the underlying compression algorithm. EDICS Number: SP 7.8 This research is supported in part by the Motorola Wireless Data Group and the Canadian Natural Sciences and Engineering Research Council under Grant OGP9198 and Postgraduate Scholarship. y Presently at Department of Computer Science, University of Waterloo. z To whom correspondence should be addressed. 1 1
Static Cache Simulation and its Applications
, 1994
"... This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient onthefly analysis of programs in general is de ..."
Abstract

Cited by 45 (13 self)
 Add to MetaCart
This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient onthefly analysis of programs in general is developed and proved correct. This method is combined with static cache simulation for a number of applications. The application of fast instruction cache analysis provides a new framework to evaluate instruction cache memories that outperforms even the fastest techniques published. Static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to realtime systems that cannot be efficiently analyzed. Static cache simulation for instruction caches provides a large degree of predictability for realtime systems. In addition, an architectural modification through bitencoding is introduced that provides fu...