Results 1  10
of
101
The Landscape of Parallel Computing Research: A View from Berkeley
 TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
OSKI: A library of automatically tuned sparse matrix kernels
 Institute of Physics Publishing
, 2005
"... kernels ..."
(Show Context)
A Compact Discontinuous Galerkin (CDG) Method for Elliptic Problems,” submitted
 SIAM J. for Numerical Analaysis
, 2006
"... Abstract. We present a compact discontinuous Galerkin (CDG) method for an elliptic model problem. The problem is first cast as a system of first order equations by introducing the gradient of the primal unknown, or flux, as an additional variable. A standard discontinuous Galerkin (DG) method is the ..."
Abstract

Cited by 20 (12 self)
 Add to MetaCart
(Show Context)
Abstract. We present a compact discontinuous Galerkin (CDG) method for an elliptic model problem. The problem is first cast as a system of first order equations by introducing the gradient of the primal unknown, or flux, as an additional variable. A standard discontinuous Galerkin (DG) method is then applied to the resulting system of equations. The numerical interelement fluxes are such that the equations for the additional variable can be eliminated at the element level, thus resulting in a global system that involves only the original unknown variable. The proposed method is closely related to the local discontinuous Galerkin (LDG) method [B. Cockburn and C.W. Shu, SIAM J. Numer. Anal., 35 (1998), pp. 2440–2463], but, unlike the LDG method, the sparsity pattern of the CDG method involves only nearest neighbors. Also, unlike the LDG method, the CDG method works without stabilization for an arbitrary orientation of the element interfaces. The computation of the numerical interface fluxes for the CDG method is slightly more involved than for the LDG method, but this additional complication is clearly offset by increased compactness and flexibility.
The design and implementation of the MRRR algorithm
 ACM Trans. Math. Software
, 2004
"... In the 1990’s, Dhillon and Parlett devised the algorithm of multiple relatively robust representations (MRRR) for computing numerically orthogonal eigenvectors of a symmetric tridiagonal matrix T with O(n2) cost. While previous publications related to MRRR focused on theoretical aspects of the algor ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
In the 1990’s, Dhillon and Parlett devised the algorithm of multiple relatively robust representations (MRRR) for computing numerically orthogonal eigenvectors of a symmetric tridiagonal matrix T with O(n2) cost. While previous publications related to MRRR focused on theoretical aspects of the algorithm, a documentation of software issues has been missing. In this article, we discuss the design and implementation of the new MRRR version STEGR that will be included in the next LAPACK release. By giving an algorithmic description of MRRR and identifying governing parameters, we hope to make STEGR more easily accessible and suitable for future performance tuning. Furthermore, this should help users understand design choices and tradeoffs when using the code.
MultiThreading and OneSided Communication in Parallel LU Factorization
"... Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem has nontrivial dependence patterns which limi ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem has nontrivial dependence patterns which limit parallelism, and local computations require large matrices in order to achieve good single processor performance. We present an alternative programming model for this type of problem, which combines UPC's global address space with lightweight multithreading. We introduce the concept of memoryconstrained lookahead where the amount of concurrency managed by each processor is controlled by the amount of memory available. We implement novel techniques for steering the computation to optimize for high performance and demonstrate the scalability and portability of UPC with Teraflop level performance on some machines, comparing favourably to other stateoftheart MPI codes.
Implementation of a primaldual method for SDP on a shared memory parallel architecture
 Computational Optimization and Applications
, 2006
"... Primal–dual interior point methods and the HKM method in particular have been implemented in a number of software packages for semidefinite programming. These methods have performed well in practice on small to medium sized SDP’s. However, primal–dual codes have had some trouble in solving larger ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Primal–dual interior point methods and the HKM method in particular have been implemented in a number of software packages for semidefinite programming. These methods have performed well in practice on small to medium sized SDP’s. However, primal–dual codes have had some trouble in solving larger problems because of the storage requirements and required computational effort. In this paper we describe a parallel implementation of the primaldual method on a shared memory system. Computational results are presented, including the solution of some large scale problems with over 50,000 constraints.
AUTOMATING THE FINITE ELEMENT METHOD
, 2006
"... The finite element method can be viewed as a machine that automates the discretization of differential equations, taking as input a variational problem, a finite element and a mesh, and producing as output a system of discrete equations. However, the generality of the framework provided by the finit ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The finite element method can be viewed as a machine that automates the discretization of differential equations, taking as input a variational problem, a finite element and a mesh, and producing as output a system of discrete equations. However, the generality of the framework provided by the finite element method is seldom reflected in implementations (realizations), which are often specialized and can handle only a small set of variational problems and finite elements (but are typically parametrized over the choice of mesh). This paper reviews ongoing research in the direction of a complete automation of the finite element method. In particular, this work discusses algorithms for the efficient and automatic computation of a system of discrete equations from a given variational problem, finite element and mesh. It is demonstrated that by automatically generating and compiling efficient lowlevel code, it is possible to parametrize a finite element code over variational problem and finite element in addition to the mesh.
Parallel Spectral Clustering
"... Abstract. Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large dat ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193, 844 data instances and a large photo dataset of 637, 137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem. Key words: Parallel spectral clustering, distributed computing 1
spBayes An R Package for Univariate and Multivariate Hierarchical Pointreferenced Spatial Models
 Journal Of Statistical Software
, 2007
"... Scientists and investigators in such diverse fields as geological and environmental sciences, ecology, forestry, disease mapping, and economics often encounter spatially referenced data collected over a fixed set of locations with coordinates (latitude–longitude, Easting–Northing etc.) in a region ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Scientists and investigators in such diverse fields as geological and environmental sciences, ecology, forestry, disease mapping, and economics often encounter spatially referenced data collected over a fixed set of locations with coordinates (latitude–longitude, Easting–Northing etc.) in a region of study. Such point–referenced or geostatistical data are often best analyzed with Bayesian hierarchical models. Unfortunately, fitting such models involves computationally intensive Markov chain Monte Carlo (MCMC) methods whose efficiency depends upon the specific problem at hand. This requires extensive coding on the part of the user and the situation is not helped by the lack of available software for such algorithms. Here, we introduce a statistical software package, spBayes, built upon the R statistical computing platform that implements a generalized template encompassing a wide variety of Gaussian spatial process models for univariate as well as multivariate point–referenced data. We discuss the algorithms behind our package and illustrate its use with a synthetic and real data example.