Fully homomorphic encryption with polylog overhead
"... We show that homomorphic evaluation of (wide enough) arithmetic circuits can be accomplished with only polylogarithmic overhead. Namely, we present a construction of fully homomorphic encryption (FHE) schemes that for security parameter λ can evaluate any widthΩ(λ) circuit with t gates in time t · ..."
Cited by 63
We show that homomorphic evaluation of (wide enough) arithmetic circuits can be accomplished with only polylogarithmic overhead. Namely, we present a construction of fully homomorphic encryption (FHE) schemes that for security parameter λ can evaluate any widthΩ(λ) circuit with t gates in time t · polylog(λ). To get low overhead, we use the recent batch homomorphic evaluation techniques of SmartVercauteren and BrakerskiGentryVaikuntanathan, who showed that homomorphic operations can be applied to “packed” ciphertexts that encrypt vectors of plaintext elements. In this work, we introduce permuting/routing techniques to move plaintext elements across these vectors efficiently. Hence, we are able to implement general arithmetic circuit in a batched fashion without ever needing to “unpack” the plaintext vectors. We also introduce some other optimizations that can speed up homomorphic evaluation in certain cases. For example, we show how to use the Frobenius map to raise plaintext elements to powers of p at the “cost” of a linear operation.
Embedding graphs in books: a layout problem with applications to VLSI design
 SIAM J. ALGEBRAIC DISCRETE METHODS
, 1987
"... We study the graphtheoretic problem of embedding a graph in a book with its vertices in a line along the spine of the book and its edges on the pages in such a way that edges residing on the same page do not cross. This problem abstracts layout problems arising in the routing of multilayer printed ..."
Cited by 60
We study the graphtheoretic problem of embedding a graph in a book with its vertices in a line along the spine of the book and its edges on the pages in such a way that edges residing on the same page do not cross. This problem abstracts layout problems arising in the routing of multilayer printed circuit boards and in the design of faulttolerant processor arrays. In devising an embedding, one strives to minimize both the number of pages used and the "cutwidth" of the edges on each page. Our main results (1) present optimal embeddings of a variety of families of graphs; (2) exhibit situations where one can achieve small pagenumber only at the expense of large cutwidth; and (3) establish bounds on the minimum pagenumber of a graph based on various structural properties of the graph. Notable in the last category are proofs that (a) every nvertex dvalent graph can be embedded using O(dn1/2) pages, and (b) for every d>2 and all large n, there are nvertex dvalent graphs whose pagenumber is at least log n]&quot;
Oblivious RAM Revisited
"... We reinvestigate the oblivious RAM concept introduced by Goldreich and Ostrovsky, which enables a client, that can store locally only a constant amount of data, to store remotely n data items, and access them while hiding the identities of the items which are being accessed. Oblivious RAM is often c ..."
Cited by 31
We reinvestigate the oblivious RAM concept introduced by Goldreich and Ostrovsky, which enables a client, that can store locally only a constant amount of data, to store remotely n data items, and access them while hiding the identities of the items which are being accessed. Oblivious RAM is often cited as a powerful tool, which can be used, for example, for search on encrypted data or for preventing cache attacks. However, oblivious RAM it is also commonly considered to be impractical due to its overhead, which is asymptotically efficient but is quite high: each data request is replaced by O(log 4 n) requests, or by O(log 3 n) requests where the constant in the “O ” notation is a few thousands. In addition, O(n log n) external memory is required in order to store the n data items. We redesign the oblivious RAM protocol using modern tools, namely Cuckoo hashing and a new oblivious sorting algorithm. The resulting protocol uses only O(n) external memory, and replaces each data request by only O(log 2 n) requests (with a small constant). This analysis is validated by experiments that we ran. Keywords: Secure twoparty computation, oblivious RAM.
An Implementation of BackPropagation Learning on GF11, a Large SIMD Parallel Computer
 Parallel Computing
, 1990
"... Current connectionist simulations require huge computational resources. We describe a neural network simulator for the IBM GF11, an experimental SIMD machine with 566 processors and a peak arithmetic performance of 11 Gigaflops. We present our parallel implementation of the backpropagation learning ..."
Cited by 21
Current connectionist simulations require huge computational resources. We describe a neural network simulator for the IBM GF11, an experimental SIMD machine with 566 processors and a peak arithmetic performance of 11 Gigaflops. We present our parallel implementation of the backpropagation learning algorithm, techniques for increasing efficiency, performance measurements on the NetTalk texttospeech benchmark, and a performance model for the simulator. Our simulator currently runs the backpropagation learning algorithm at 900 million connections per second, where each "connection per second" includes both a forward and backward pass. This figure was obtained on the machine when only 356 processors were working; with all 566 processors operational, our simulation will run at over one billion connections per second. We conclude that the GF11 is wellsuited to neural network simulation, and we analyze our use of the machine to determine which features are the most important for high per...
Hypercubic Sorting Networks
 SIAM J. Comput
, 1998
"... . This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of ..."
Cited by 18
. This paper provides an analysis of a natural dround tournamentover n = 2 d players, and demonstrates that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is used to design efficient sorting algorithms for a variety of different models of parallel computation: (i) a comparator network of depth c \Delta lg n, c 7:44, that sorts the vast majority of the n! possible input permutations, (ii) an O(lg n)depth hypercubic comparator network that sorts the vast majority of permutations, (iii) a hypercubic sorting network with nearly logarithmic depth, (iv) an O(lgn)time randomized sorting algorithm for any hypercubic machine (other such algorithms have been previously discovered, but this algorithm has a significantly smaller failure probability than any previously known algorithm), and (v) a randomized algorithm for sorting n O(m)bit records on an (n lg n)node omega machine in O(m + lg n) bit steps. Key words. parallel sort...
Juggling Networks
, 1993
"... : Switching networks of various kinds have come to occupy a prominent position in computer science as well as communication engineering. The classical switching network technology has been spacedivisionmultiplex switching, in which each switching function is performed by a spatially separate switc ..."
Cited by 18
: Switching networks of various kinds have come to occupy a prominent position in computer science as well as communication engineering. The classical switching network technology has been spacedivisionmultiplex switching, in which each switching function is performed by a spatially separate switching component (such as a crossbar switch). A recent trend in switching network technology has been the advent of timedivisionmultiplex switching, wherein a single switching component performs the function of many switches at successive moments of time according to a periodic schedule. This technology has the advantage that nearly all of the cost of the network is in inertial memory (such as delay lines), with the cost of switching elements growing much more slowly as a function of the capacity of the network. In order for a classical spacedivisionmultiplex network to be adaptable to timedivision multiplex technology, its interconnection pattern must satisfy stringent requirements. For ...
Randomized Protocols for LowCongestion Circuit Routing in Multistage Interconnection Networks
"... In this paper we study randomized algorithms for circuit switching on multistage networks related to the butterfly. We devise algorithms that route messages by constructing circuits (or paths) for the messages with small congestion, dilation, and setup time. Our algorithms are based on the idea of h ..."
Cited by 16
In this paper we study randomized algorithms for circuit switching on multistage networks related to the butterfly. We devise algorithms that route messages by constructing circuits (or paths) for the messages with small congestion, dilation, and setup time. Our algorithms are based on the idea of having each message choose a route from two possibilities, a technique that has previously proven successful in simpler load balancing settings. As an application of our techniques, we propose a novel design for a data server.
Iterative Decoding of Concatenated Convolutional Codes: Implementation Issues
, 2007
"... This tutorial paper gives an overview of the implementation aspects related to turbo decoders, where the term turbo generally refers to iterative decoders intended for Parallel Concatenated Convolutional Codes as well as for Serial Concatenated Convolutional Codes. We start by considering the genera ..."
Cited by 15
This tutorial paper gives an overview of the implementation aspects related to turbo decoders, where the term turbo generally refers to iterative decoders intended for Parallel Concatenated Convolutional Codes as well as for Serial Concatenated Convolutional Codes. We start by considering the general structure of iterative decoders, and the main features of the SISO algorithm that forms the heart of iterative decoders. Then, we show that very efficient parallel architectures are available for all types of turbo decoders allowing high speed implementations. Other implementation aspects like quantization issues and stopping rules used in conjunction with buffering for increasing throughput are considered. Finally, we perform an evaluation of the complexities of the turbo decoders as a function of the main parameters of the code.
Routing on butterfly networks with random faults
 Proc. IEEE Annu Symp Foundations Comput Sci, IEEE Computer Society Press, Los Alamitos, CA
, 1995
