Results 1  10
of
176
Performance analysis of kary ncube interconnection networks
 IEEE Transactions on Computers
, 1990
"... AbstmctVLSI communication networks are wirelimited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes communication networks of varying dimension under the assumption of co ..."
Abstract

Cited by 359 (18 self)
 Add to MetaCart
(Show Context)
AbstmctVLSI communication networks are wirelimited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes communication networks of varying dimension under the assumption of constant wire bisection. Expressions for the latency, average case throughput, and hotspot throughput of kary ncube networks with constant bisection are derived that agree closely with experimental measurements. It is shown that lowdimensional networks (e.g., tori) have lower latency and higher hotspot throughput than highdimensional networks (e.g., binary ncubes) with the same bisection width. Index Terms Communication networks, concurrent computing, interconnection networks, messagepassing multiprocessors, parallel processing, VLSI. I.
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 238 (10 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Parallel Algorithms for Hierarchical Clustering
 Parallel Computing
, 1995
"... Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms f ..."
Abstract

Cited by 107 (2 self)
 Add to MetaCart
(Show Context)
Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces. O(n 2 ) algorithms are known for this problem [3, 4, 10, 18]. This paper reviews important results for sequential algorithms and describes previous work on parallel algorithms for hierarchical clustering. Parallel algorithms to perform hierarchical clustering using several distance metrics are then described. Optimal PRAM algorithms using n log n processors are given for the average link, complete link, centroid, median, and minimum variance metrics. Optimal butterfly and tree algorithms using n log n processors are given for the centroid, median, and minimum variance metrics. Optimal asymptotic speedups are achieved for the best practical algorithm to perform clustering using the single link metric on a n log n processor PRAM, butterfly, or tree. Keywords. Hierarchical clustering, pattern analysis, parallel algorithm, butterfly network, PRAM algorithm. 1 In...
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors
, 1981
"... In this paper we implement several basic operating system primitives by using a "replaceadd" operation, which can supersede the standard "test and set", and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential pr ..."
Abstract

Cited by 93 (2 self)
 Add to MetaCart
In this paper we implement several basic operating system primitives by using a "replaceadd" operation, which can supersede the standard "test and set", and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential processors. We also present a hardware implementation of replaceadd that permits multiple replaceadds to be processed nearly as efficiently as loads and stores. Moreover, the crucial special case of concurrent replaceadds updating the same variable is handled particularly well: If every PE simultaneously addresses a replaceadd at the same variable, all these requests are satisfied in the time required to process just one request.
Randomized routing and sorting on fixedconnection networks
 JOURNAL OF ALGORITHMS
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 86 (13 self)
 Add to MetaCart
(Show Context)
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Local parallel biomolecular computing
 DNA Based Computers III, volume 48 of DIMACS
, 1999
"... Biomolecular Computation(BMC) is computation at the molecular scale, using biotechnology engineering techniques. Most proposed methods for BMC used distributed (molecular) parallelism (DP); where operations are executed in parallel on large numbers of distinct molecules. BMC done exclusively by DP r ..."
Abstract

Cited by 56 (16 self)
 Add to MetaCart
(Show Context)
Biomolecular Computation(BMC) is computation at the molecular scale, using biotechnology engineering techniques. Most proposed methods for BMC used distributed (molecular) parallelism (DP); where operations are executed in parallel on large numbers of distinct molecules. BMC done exclusively by DP requires that the computation execute sequentially within any given molecule (though done in parallel for multiple molecules). In contrast, local parallelism (LP) allows operations to be executed in parallel on each given molecule. Winfree, et al [W96, WYS96]) proposed an innovative method for LPBMC, that of computation by unmediated selfassembly of � arrays of DNA molecules, applying known domino tiling techniques (see Buchi [B62], Berger [B66], Robinson [R71], and Lewis and Papadimitriou [LP81]) in combination with the DNA selfassembly techniques of Seeman et al [SZC94]. The likelihood for successful unmediated selfassembly of computations has not been determined (we discuss a simple model of assembly where there may be blockages in selfassembly, but more sophisticated models may have a higher likelihood of success). We develop improved techniques to more fully exploit the potential power of LPBMC. To increase
Randomized Routing on FatTrees
 Advances in Computing Research
, 1996
"... Fattrees are a class of routing networks for hardwareefficient parallel computation. This paper presents a randomized algorithm for routing messages on a fattree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
Fattrees are a class of routing networks for hardwareefficient parallel computation. This paper presents a randomized algorithm for routing messages on a fattree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor on a fattree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(+lg n lg lg n) with probability 1 \Gamma O(1=n). The best previous bound was O( lg n) for the offline problem in which the set of messages is known in advance. In the context of a VLSI model that equates hardware cost with physical volume, the routing algorithm can be used to demonstrate that fattrees are universal routing networks. Specifically, we prove that any routing network can be efficiently simulated by a fattree of comparable hardware cost. 1 Introduction Fattrees constitute...
Designing Least Cost Nonblocking Broadband Networks
, 1997
"... Integrated network technologies, such as ATM, support multimedia applications with vastly different bandwidth needs, connection request rates, and holding patterns. Due to their high level of flexibility and communication rates approaching several gigabits per second, the classical network planning ..."
Abstract

Cited by 54 (7 self)
 Add to MetaCart
Integrated network technologies, such as ATM, support multimedia applications with vastly different bandwidth needs, connection request rates, and holding patterns. Due to their high level of flexibility and communication rates approaching several gigabits per second, the classical network planning techniques, which rely heavily on statistical analysis, are less relevant to this new generation of networks. In this paper, we propose a new model for broadband networks and investigate the question of their optimal topology from a worstcase performance point of view. Our model is more flexible and realistic than others in the literature, and our worstcase bounds are among the first in this area. Our results include a proof of intractability for some simple versions of the network design problem, and efficient approximation algorithms for designing nonblocking networks of provably small cost. More specifically, assuming some mild global traffic constraints, we show that a minimumcost non...
Private Set Intersection: Are Garbled Circuits Better than Custom Protocols?
, 2012
"... Cryptographic protocols for Private Set Intersection (PSI) are the basis for many important privacypreserving applications. Over the past few years, intensive research has been devoted to designing custom protocols for PSI based on homomorphic encryption and other publickey techniques, apparently ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
(Show Context)
Cryptographic protocols for Private Set Intersection (PSI) are the basis for many important privacypreserving applications. Over the past few years, intensive research has been devoted to designing custom protocols for PSI based on homomorphic encryption and other publickey techniques, apparently due to the belief that solutions using generic approaches would be impractical. This paper explores the validity of that belief. We develop three classes of protocols targeted to different set sizes and domains, all based on Yao’s generic garbledcircuit method. We then compare the performance of our protocols to the fastest custom PSI protocols in the literature. Our results show that a careful application of garbled circuits leads to solutions that can run on millionelement sets on typical desktops, and that can be competitive with the fastest custom protocols. Moreover, generic protocols like ours can be used directly for performing more complex secure computations, something we demonstrate by adding a simple informationauditing mechanism to our PSI protocols.