Results 11  20
of
261
A PrecorrectedFFT Method for Electrostatic Analysis of Complicated 3D Structures
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 1997
"... In this paper we present a new algorithm for accelerating the potential calculation which occurs in the inner loop of iterative algorithms for solving electromagnetic boundary integral equations. Such integral equations arise, for example, in the extraction of coupling capacitances in threedimensio ..."
Abstract

Cited by 69 (26 self)
 Add to MetaCart
In this paper we present a new algorithm for accelerating the potential calculation which occurs in the inner loop of iterative algorithms for solving electromagnetic boundary integral equations. Such integral equations arise, for example, in the extraction of coupling capacitances in threedimensional (3D) geometries. We present extensive experimental comparisons with the capacitance extraction code FASTCAP [1] and demonstrate that, for a wide variety of geometries commonly encountered in integrated circuit packaging, onchip interconnect and microelectromechanical systems, the new "precorrectedFFT " algorithm is superior to the fast multipole algorithm used in FASTCAP in terms of execution time and memory use. At engineering accuracies, in terms of a speedmemory product, the new algorithm can be superior to the fast multipole based schemes by more than an order of magnitude.
Load Balancing and Data Locality in Adaptive Hierarchical Nbody Methods: BarnesHut, Fast Multipole, and Radiosity
 Journal Of Parallel and Distributed Computing
, 1995
"... processes, are increasingly being used to solve largescale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for longrang ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
processes, are increasingly being used to solve largescale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for longrange communication.
Clustering for Glossy Global Illumination
 ACM TRANSACTIONS ON GRAPHICS
, 1997
"... We present a new clustering algorithm for global illumination in complex environments. The new algorithm extends previous work on clustering for radiosity to allow for nondiffuse (glossy) reflectors. We represent clusters as points with directional distributions of outgoing and incoming radiance and ..."
Abstract

Cited by 62 (4 self)
 Add to MetaCart
We present a new clustering algorithm for global illumination in complex environments. The new algorithm extends previous work on clustering for radiosity to allow for nondiffuse (glossy) reflectors. We represent clusters as points with directional distributions of outgoing and incoming radiance and importance, and we derive an error bound for transfers between these clusters. The algorithm groups input surfaces into a hierarchy of clusters, and then permits clusters to interact only if the error bound is below an acceptable tolerance. We show that the algorithm is asymptotically more efficient than previous clustering algorithms even when restricted to ideally diffuse environments. Finally, we demonstrate the performance of our method on two complex glossy environments.
Efficient ReducedOrder Modeling of FrequencyDependent Coupling Inductances associated with 3D Interconnect Structures
, 1994
"... Reducedorder modeling techniques are now commonly used to efficiently simulate circuits combined with interconnect, but generating reducedorder models from realistic 3D structures has received less attention. In this paper we describe a Krylovsubspace based method for deriving reducedorder mode ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
Reducedorder modeling techniques are now commonly used to efficiently simulate circuits combined with interconnect, but generating reducedorder models from realistic 3D structures has received less attention. In this paper we describe a Krylovsubspace based method for deriving reducedorder models directly from the 3D magnetoquasistatic analysis program FastHenry. This new approach is no more expensive than computing an impedance matrix at a single frequency.
Barycentric Lagrange Interpolation
 SIAM Rev
"... Barycentric interpolation is a variant of Lagrange polynomial interpolation that is fast and stable. It deserves to be known as the standard method of polynomial interpolation. ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
Barycentric interpolation is a variant of Lagrange polynomial interpolation that is fast and stable. It deserves to be known as the standard method of polynomial interpolation.
Effective Flow Analysis for Avoiding RunTime Checks
 In Proceedings of the 1995 International Static Analysis Symposium
, 1995
"... . This paper describes a general purpose program analysis that computes global controlflow and dataflow information for higherorder, callbyvalue programs. This information can be used to drive global program optimizations such as inlining and runtime check elimination, as well as optimizations ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
. This paper describes a general purpose program analysis that computes global controlflow and dataflow information for higherorder, callbyvalue programs. This information can be used to drive global program optimizations such as inlining and runtime check elimination, as well as optimizations like constant folding and loop invariant code motion that are typically based on specialpurpose local analyses. The analysis employs a novel approximation technique called polymorphic splitting that uses letexpressions as syntactic clues to gain precision. Polymorphic splitting borrows ideas from HindleyMilner polymorphic type inference systems to create an analog to polymorphism for flow analysis. Experimental results derived from an implementation of the analysis for Scheme indicate that the analysis is extremely precise and has reasonable cost. In particular, it eliminates significantly more runtime checks than simple flow analyses (i.e. 0CFA) or analyses based on type ...
The Parallel Multipole Method on the Connection Machine
 SIAM J. ON SCIENTIFIC & STATISTICAL COMPUTING, 12(6):14201437, 1991
, 1991
"... This paper reports on a fast implementation of the threedimensional nonadaptive Parallel Multipole Method (PMM) on the Connection Machine system model CM2. The data interactions within the decomposition tree are modeled by a hierarchy of three dimensional grids forming a pyramid in which parent n ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
This paper reports on a fast implementation of the threedimensional nonadaptive Parallel Multipole Method (PMM) on the Connection Machine system model CM2. The data interactions within the decomposition tree are modeled by a hierarchy of three dimensional grids forming a pyramid in which parent nodes have degree eight. The base of the pyramid is embedded in the Connection Machine as a three dimensional grid. The standard grid embedding feature is used. For 10 or more particles per processor the communication time is insignificant. The evaluation of the potential field for a system with 128k particles takes 5 seconds, and a million particle system about 3 minutes. The maximum number of particles that can be represented in 2G bytes of primary storage is ~ 50 million. The execution rate of this implementation of the PMM is at about 1.7 Gflops/sec for a particleprocessorratio of 10 or greater. A further speed improvement is possible by an improved use of the memory hierarchy associate...
Flowdirected Inlining
 In Proceedings of the ACM Conference on Programming Language Design and Implementation
, 1996
"... A flowdirected inlining strategy uses information derived from controlflow analysis to specialize and inline procedures for functional and objectoriented languages. Since it uses controlflow analysis to identify candidate call sites, flowdirected inlining can inline procedures whose relationship ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
A flowdirected inlining strategy uses information derived from controlflow analysis to specialize and inline procedures for functional and objectoriented languages. Since it uses controlflow analysis to identify candidate call sites, flowdirected inlining can inline procedures whose relationships to their call sites are not apparent. For instance, procedures defined in other modules, passed as arguments, returned as values, or extracted from data structures can all be inlined. Flowdirected inlining specializes procedures for particular call sites, and can selectively inline a particular procedure at some call sites but not at others. Finally, flowdirected inlining encourages modular implementations: controlflow analysis, inlining, and postinlining optimizations are all orthogonal components. Results from a prototype implementation indicate that this strategy effectively reduces procedure call overhead and leads to significant reduction in execution time. 1 Introduction Functio...
Skeletons from the Treecode Closet
 J. Comp. Phys
, 1994
"... We consider treecodes (Nbody programs which use a tree data structure) from the standpoint of their worstcase behavior. That is, we derive upper bounds on the largest possible errors that are introduced into a calculation by use of various multipole acceptability criteria (MAC). We find that the ..."
Abstract

Cited by 40 (10 self)
 Add to MetaCart
We consider treecodes (Nbody programs which use a tree data structure) from the standpoint of their worstcase behavior. That is, we derive upper bounds on the largest possible errors that are introduced into a calculation by use of various multipole acceptability criteria (MAC). We find that the conventional BarnesHut MAC can introduce potentially unbounded errors unless ` ! 1= p 3, and that this behavior while rare, is demonstrable in astrophysically reasonable examples. We consider two other MACs closely related to the BH MAC. While they don't admit the same unbounded errors, they nevertheless require extraordinary amounts of CPU time to guarantee modest levels of accuracy. We derive new error bounds based on some additional, easily computed moments of the mass distribution. These error bounds form the basis for four new MACs which can be used to limit the absolute or relative error introduced by each multipole evaluation, or, with the introduction of some additional data struc...
Efficient Kernel Density Estimation using the Fast Gauss Transform with Applications to Color Modeling and Tracking
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... The study of many vision problems is reduced to the estimation of a probability density function from observations. Kernel density estimation techniques are quite general and powerful methods for this problem, but have a significant disadvantage in that they are computationally intensive. In this pa ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
The study of many vision problems is reduced to the estimation of a probability density function from observations. Kernel density estimation techniques are quite general and powerful methods for this problem, but have a significant disadvantage in that they are computationally intensive. In this paper we explore the use of kernel density estimation with the fast gauss transform (FGT) for problems in vision. The FGT allows the summation of a mixture of M Gaussians at N evaluation points in O(M + N) timeasopposedtoO(MN)time for a naive evaluation, and can be used to considerably speed up kernel density estimation. We present applications of the technique to problems from image segmentation and tracking, and show that the algorithm allows application of advanced statistical techniques to solve practical vision problems in real time with today’s computers. 1