Results 1  10
of
32
Recursive Array Layouts and Fast Parallel Matrix Multiplication
 In Proceedings of Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures
, 1999
"... Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional columnmajor or rowmajor array layouts i ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional columnmajor or rowmajor array layouts incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts for improving the performance of parallel recursive matrix multiplication algorithms. We extend previous work by Frens and Wise on recursive matrix multiplication to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. We show that while recursive array layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.22.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms;...
Recursive Array Layouts and Fast Matrix Multiplication
, 1999
"... The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional columnmajor or rowmajor array layouts to incur high variability in memory system performance as matrix size var ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
(Show Context)
The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional columnmajor or rowmajor array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.22.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between ...
Efficient Calculation of Spectral Coefficients and Their Applications
 IEEE Trans. on CAD/ICAS
, 1995
"... Spectral methods for analysis and design of digital logic circuits have been proposed and developed for several years. The widespread use of these techniques has suffered due to the associated computational complexity. This paper presents a new approach for the computation of spectral coefficient ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
Spectral methods for analysis and design of digital logic circuits have been proposed and developed for several years. The widespread use of these techniques has suffered due to the associated computational complexity. This paper presents a new approach for the computation of spectral coefficients with polynomial complexity. Usually, the computation of the spectral coefficients involves the evaluation of inner products of vectors of exponential length. In the new approach, it is not necessary to compute inner products, rather, each spectral coefficient is expressed in terms of a measure of correlation between two Boolean functions. This formulation coupled with compact BDD representations of the functions reduces the overall complexity. Further, some computer aided design applications are presented that can make use of the new spectrum evaluation approach. In particular, the basis for a synthesis method that allows spectral coefficients to be computed in an iterative manner ...
An Efficient Homomorphic Encryption Protocol for MultiUser Systems
"... Abstract. The homomorphic encryption problem has been an open one for three decades. Recently, Gentry has proposed a full solution. Subsequent works have made improvements on it. However, the time complexities of these algorithms are still too high for practical use. For example, Gentry’s homomorphi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. The homomorphic encryption problem has been an open one for three decades. Recently, Gentry has proposed a full solution. Subsequent works have made improvements on it. However, the time complexities of these algorithms are still too high for practical use. For example, Gentry’s homomorphic encryption scheme takes more than 900 seconds to add two 32 bit numbers, and more than 67000 seconds to multiply them. In this paper, we develop a noncircuit based symmetrickey homomorphic encryption scheme. It is proven that the security of our encryption scheme is equivalent to the large integer factorization problem, and it can withstand an attack with up to � � ln poly����� � chosen plaintexts for any predetermined ��, where � � is the security parameter. Multiplication, encryption, and decryption are almost linear in ����, and addition is linear in ����. Performance analyses show that our algorithm runs multiplication in 108 milliseconds and addition in a tenth of a millisecond for � � = 1024 and � � = 16. We further consider practical multipleuser datacentric applications. Existing homomorphic encryption schemes only consider one master key. To allow multiple users to retrieve data from a server, all users need to have the same key. In this paper, we propose to transform the master encryption key into different user keys and develop a protocol to support correct and secure communication between the users and the server using different user keys. In order to prevent collusion between some user and the server to derive the master key, one or more key agents can be added to mediate the interaction.
Design of Reversible Sequential Circuits Optimizing Quantum Cost, Delay, and Garbage Outputs
"... Reversible logic has shown potential to have extensive applications in emerging technologies such as quantum computing, optical computing, quantum dot cellular automata as well as ultra low power VLSI circuits. Recently, several researchers have focused their efforts on the design and synthesis of e ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Reversible logic has shown potential to have extensive applications in emerging technologies such as quantum computing, optical computing, quantum dot cellular automata as well as ultra low power VLSI circuits. Recently, several researchers have focused their efforts on the design and synthesis of efficient reversible logic circuits. In these works, the primary design focus has been on optimizing the number of reversible gates and the garbage outputs. The number of reversible gates is not a good metric of optimization as each reversible gate is of different type and computational complexity, and thus will have a different quantum cost and delay. The computational complexity of a reversible gate can be represented by its quantum cost. Further, delay constitutes an important metric, which has not been addressed in prior works on reversible sequential circuits as a design metric to be optimized. In this work, we present novel designs of reversible sequential circuits that are optimized in terms of quantum cost, delay and the garbage outputs. The optimized designs of several reversible sequential circuits are presented including the D Latch, the JK latch, the T latch and the SR latch, and their corresponding reversible masterslave flipflop designs. The proposed masterslave flipflop designs have the special property that they don’t require the inversion of the clock for use in the slave latch. Further, we introduce a novel strategy of cascading a Fredkin gate
Efficient Spectral Coefficient Calculation Using Circuit Output Probabilities
 Digital Signal Processing: A Review Journal
, 1994
"... Many problems in the field of digital logic may be solved more efficiently in the spectral domain than in the Boolean domain. However, the primary drawback of spectral techniques is the large complexity associated with the calculation of the spectrum of a Boolean function. We present a new method ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Many problems in the field of digital logic may be solved more efficiently in the spectral domain than in the Boolean domain. However, the primary drawback of spectral techniques is the large complexity associated with the calculation of the spectrum of a Boolean function. We present a new method for the computation of a spectral coefficient that has a complexity equal to O(j E j) where j E j is the number of edges in a binary decision diagram characterizing the circuit. This result is especially significant for techniques that require the calculation of only a few spectral coefficients since it allows the computations to be accomplished very efficiently and does not require storage resources for a large number values. Furthermore, this method holds for any general spectral transform and does not require the transformation matrix to be recursively defined or sparse. 1 Introduction There have been many applications proposed and developed using spectral methods for logic c...
Linking anonymous transactions: The consistent view attack
 In Proceedings of Privacy Enhancing Technologies, 6th International Workshop, PET 2006, number 4258 in Lecture Notes in Computer Science
, 2006
"... Abstract. In this paper we study a particular attack that may be launched by cooperating organisations in order to link the transactions and the pseudonyms of the users of an anonymous credential system. The results of our analysis are both positive and negative. The good (resp. bad) news, from a pr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we study a particular attack that may be launched by cooperating organisations in order to link the transactions and the pseudonyms of the users of an anonymous credential system. The results of our analysis are both positive and negative. The good (resp. bad) news, from a privacy protection (resp. evidence gathering) viewpoint, is that the attack may be computationally intensive. In particular, it requires solving a problem that is polynomial time equivalent to ALLSAT. The bad (resp. good) news is that a typical instance of this problem may be efficiently solvable. 1
Applications and Efficient Computation of Spectral Coefficients for Digital Logic
, 1994
"... Spectral methods for analysis and design of digital logic circuits have been proposed and developed for several years. The widespread use of these techniques has suffered due to the associated computational complexity. This paper presents a new approach for the computation of spectral coefficient ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Spectral methods for analysis and design of digital logic circuits have been proposed and developed for several years. The widespread use of these techniques has suffered due to the associated computational complexity. This paper presents a new approach for the computation of spectral coefficients with polynomial complexity. Usually, the computation of the spectral coefficients involves the evaluation of inner products of vectors of exponential length. In the new approach, it is not necessary to compute inner products, rather, each spectral coefficient is expressed in terms of a measure of correlation between two Boolean functions. This formulation coupled with compact BDD representations of the functions reduces the overall complexity. Further, some computer aided design applications are presented that can make use of the new spectrum evaluation approach. In particular, a spectralbased synthesis algorithm that allows spectral coefficients to be computed in an iterative man...
CYSEP A CYBERSECURITY PROCESSOR FOR 10GBPS NETWORKS AND BEYOND
"... In this paper, we describe the architecture of a CyberSecurity Processor (CYSEP) which can serve as a key module for enhancing security for highspeed networks/ systems. The CYSEP supports, at wirespeed, four major functions, namely, firewall / intrusion detection, encryption/decryption, message a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we describe the architecture of a CyberSecurity Processor (CYSEP) which can serve as a key module for enhancing security for highspeed networks/ systems. The CYSEP supports, at wirespeed, four major functions, namely, firewall / intrusion detection, encryption/decryption, message authentication, and distributed denial of service (DDoS) attack protection. The CYSEP is to be implemented on an applicationspecific integrated circuit (ASIC) with the stateofart CMOS 0.18mircron technology and expected to operate at 10 Gbps or higher. Massive parallelism and pipelining technique are to be employed in the ASIC to achieve the 10 Gbps wirespeed operation. 1
Enhancing and Using an Automatic Design System for Creating FPGAs
, 2005
"... The creation of integrated circuits has progressed from custom design and layout to the less timeintensive implementation media of ASICs and FPGAs. FPGAs provide the lowest development cost and fastest development time; however, the design of the FPGA itself is still a timeconsuming, expensive, cu ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The creation of integrated circuits has progressed from custom design and layout to the less timeintensive implementation media of ASICs and FPGAs. FPGAs provide the lowest development cost and fastest development time; however, the design of the FPGA itself is still a timeconsuming, expensive, custom layout task that takes at least 50 personyears to complete. This work explores new techniques to automate the design and layout of FPGAs. An existing automatic layout system is improved by changing the grouping of transistors that form the basic building blocks of the system. These improvements result in a 16.8 % area savings over previous versions and only a 36% area increase compared to equivalent custom designs. The system was also extended to create the first automatic layout of an FPGA from a generic architecture description. These improvements and additions suggest that the automatic layout system is a viable alternative to custom layout of FPGAs. ii Acknowledgements I would like to thank my supervisor, Professor Jonathan Rose, for his advice and guidance in all aspects of this work and my education. Also, Ian Kuon deserves my profound thanks and gratitude for his achievements and cooperation that led to the completion of this work. This work would not have been possible without the people who worked on it