Results 11  20
of
116
Multilevel graph layout on the GPU
 IEEE TRANS. VIS. COMPUT. GRAPH
, 2007
"... This paper presents a new algorithm for force directed graph layout on the GPU. The algorithm, whose goal is to compute layouts accurately and quickly, has two contributions. The first contribution is proposing a general multilevel scheme, which is based on spectral partitioning. The second contri ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
This paper presents a new algorithm for force directed graph layout on the GPU. The algorithm, whose goal is to compute layouts accurately and quickly, has two contributions. The first contribution is proposing a general multilevel scheme, which is based on spectral partitioning. The second contribution is computing the layout on the GPU. Since the GPU requires a data parallel programming model, the challenge is devising a mapping of a naturally unstructured graph into a wellpartitioned structured one. This is done by computing a balanced partitioning of a general graph. This algorithm provides a general multilevel scheme, which has the potential to be used not only for computation on the GPU, but also on emerging multicore architectures. The algorithm manages to compute high quality layouts of large graphs in a fraction of the time required by existing algorithms of similar quality. An application for visualization of the topologies of ISP (Internet Service Provider) networks is presented. Index Terms—Graph layout, GPU, graph partitioning.
The Transmission of Shifts and Shift Blurring in the QR Algorithm
, 1992
"... The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices. The multishift QR algorithm with multiplicity m is a version that effects m iterations of the QR algorithm at a time. It is known that roundoff errors cause the multishift QR algorithm to perform ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices. The multishift QR algorithm with multiplicity m is a version that effects m iterations of the QR algorithm at a time. It is known that roundoff errors cause the multishift QR algorithm to perform poorly when m is large. In this paper the mechanism by which the shifts are transmitted through the matrix in the course of a multishift QR iteration is identified. Numerical evidence showing that the mechanism works well when m is small and poorly when m is large is presented. When the mechanism works poorly, the convergence of the algorithm is degraded proportionately. 1. Introduction The QR algorithm is one of the most widely used algorithms for calculating the eigenvalues of matrices [7], [9], [16]. It is therefore worrisome that attempts to parallelize the QR algorithm have been mostly unsatisfactory. (However, the work of Henry and van de Geijn [10], [11] is recent good news.) One atte...
Parallel performance studies for an elliptic test problem
, 2008
"... The performance of parallel computer code depends on an intricate interplay of the processors, the architecture of the compute nodes, their interconnect network, the numerical algorithm, and the scheduling policy used. The solution of large, sparse, highly structured systems of linear equations by a ..."
Abstract

Cited by 18 (15 self)
 Add to MetaCart
The performance of parallel computer code depends on an intricate interplay of the processors, the architecture of the compute nodes, their interconnect network, the numerical algorithm, and the scheduling policy used. The solution of large, sparse, highly structured systems of linear equations by an iterative linear solver that requires communication between the parallel processes at every iteration is an instructive test of this interplay. This note considers the classical elliptic test problem of a Poisson equation with Dirichlet boundary conditions, whose approximation by the finite difference method results in a linear system of this type. Our existing implementation of the conjugate gradient method for the iterative solution of this system is known to have the potential to perform well up to many parallel processes, provided the interconnect network has low latency. Since the algorithm is known to be memory bound, it is also vital for good performance that the architecture of the nodes in conjunction with the scheduling policy does not create a bottleneck. The results presented here show excellent performance the cluster hpc in the UMBC High Performance Computing Facility and give guidance on the scheduling policy to be implemented. Specifically, they confirm that it is beneficial to use all four cores of the two dualcore processors on each node simultaneously, giving us in effect a computer that can run jobs efficiently with up to 128 parallel processes. 1
Computation of GaussKronrod Quadrature Rules with NonPositive Weights
 Math. Comp
, 1999
"... Recently Laurie presented a fast algorithm for the computation of (2n + 1)point GaussKronrod quadrature rules with real nodes and positive weights. We describe modifications of this algorithm that allow the computation of GaussKronrod quadrature rules with complex conjugate nodes and weights or w ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Recently Laurie presented a fast algorithm for the computation of (2n + 1)point GaussKronrod quadrature rules with real nodes and positive weights. We describe modifications of this algorithm that allow the computation of GaussKronrod quadrature rules with complex conjugate nodes and weights or with real nodes and positive and negative weights.
Test generation based diagnosis of device parameters for analog circuits
 In Design, Automation and Test in Europe
, 2001
"... With the increasing complexity of manufacturing processes and the shrinking of device geometries, the performance metrics of integrated circuits (ICs) are becoming increasingly sensitive to random fluctuations in the manufacturing process. We propose a diagnosis methodology that can be used to infer ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
With the increasing complexity of manufacturing processes and the shrinking of device geometries, the performance metrics of integrated circuits (ICs) are becoming increasingly sensitive to random fluctuations in the manufacturing process. We propose a diagnosis methodology that can be used to infer the cause(s) of variations in performance of analog ICs. The methodology consists of (a) a device parameter computation technique which is used to compute the device parameters of an IC from measurements made on it and (b) a causeeffect analysis module that is used to compute the cause of the variation in performance metrics of a given set of ICs. Simulation results to demonstrate the effectiveness of the technique are presented. 1.
Scheduling of QR factorization algorithms on SMP and multicore architectures
 IN PDP ’08: PROCEEDINGS OF THE SIXTEENTH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORKBASED PROCESSING
, 2008
"... This paper examines the scalable parallel implementation of QR factorization of a general matrix, targeting SMP and multicore architectures. Two implementations of algorithmsbyblocks are presented. Each implementation views a block of a matrix as the fundamental unit of data, and likewise, operat ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
This paper examines the scalable parallel implementation of QR factorization of a general matrix, targeting SMP and multicore architectures. Two implementations of algorithmsbyblocks are presented. Each implementation views a block of a matrix as the fundamental unit of data, and likewise, operations over these blocks as the primary unit of computation. The first is a conventional blocked algorithm similar to those included in libFLAME and LAPACK but expressed in a way that allows operations in the socalled critical path of execution to be computed as soon as their dependencies are satisfied. The second algorithm captures a higher degree of parallelism with an approach based on Givens rotations while preserving the performance benefits of algorithms based on blocked Householder transformations. We show that the implementation effort is greatly simplified by expressing the algorithms in code with the FLAME/FLASH API, which allows matrices stored by blocks to be viewed and managed as matrices of matrix blocks. The SuperMatrix runtime system utilizes FLASH to assemble and represent matrices but also provides outoforder scheduling of operations that is transparent to the programmer. Scalability of the solution is demonstrated on a ccNUMA platform with 16 processors.
ADAPTIVE MULTIPRECISION PATH TRACKING
"... This article treats numerical methods for tracking an implicitly defined path. The numerical precision required to successfully track such a path is difficult to predict a priori, and indeed, it may change dramatically through the course of the path. In current practice, one must either choose a con ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
This article treats numerical methods for tracking an implicitly defined path. The numerical precision required to successfully track such a path is difficult to predict a priori, and indeed, it may change dramatically through the course of the path. In current practice, one must either choose a conservatively large numerical precision at the outset or rerun paths multiple times in successively higher precision until success is achieved. To avoid unnecessary computational cost, it would be preferable to adaptively adjust the precision as the tracking proceeds in response to the local conditioning of the path. We present an algorithm that can be set to either reactively adjust precision in response to step failure or proactively set the precision using error estimates. We then test the relative merits of reactive and proactive adaptation on several examples arising as homotopies for solving systems of polynomial equations.
E B: Spectral instability for some Schrödinger operators
 Numer. Math
"... Abstract. We define the concept of instability index of an isolated eigenvalue of a nonselfadjoint operator, and prove some of its general properties. We also describe a stable procedure for computing this index for Schrödinger operators in one dimension, and apply it to the complex resonances of ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Abstract. We define the concept of instability index of an isolated eigenvalue of a nonselfadjoint operator, and prove some of its general properties. We also describe a stable procedure for computing this index for Schrödinger operators in one dimension, and apply it to the complex resonances of a typical operator with a dilation analytic potential.
Product eigenvalue problems
 SIAM Review
, 2005
"... Abstract. Many eigenvalue problems are most naturally viewed as product eigenvalue problems. The eigenvalues of a matrix A are wanted, but A is not given explicitly. Instead it is presented as a product of several factors: A = AkAk−1 ···A1. Usually more accurate results are obtained by working with ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract. Many eigenvalue problems are most naturally viewed as product eigenvalue problems. The eigenvalues of a matrix A are wanted, but A is not given explicitly. Instead it is presented as a product of several factors: A = AkAk−1 ···A1. Usually more accurate results are obtained by working with the factors rather than forming A explicitly. For example, if we want eigenvalues/vectors of B T B, it is better to work directly with B and not compute the product. The intent of this paper is to demonstrate that the product eigenvalue problem is a powerful unifying concept. Diverse examples of eigenvalue problems are discussed and formulated as product eigenvalue problems. For all but a couple of these examples it is shown that the standard algorithms for solving them are instances of a generic GR algorithm applied to a related cyclic matrix.