Results 1 - 10
of
10
OSKI: A library of automatically tuned sparse matrix kernels
- Institute of Physics Publishing
, 2005
"... kernels ..."
A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters
- on PC clusters, IEICE Trans. Info. Systems E87-D
, 2004
"... The multiplication of large spare matrices is a basic operation for many scientific and engineering applications. There exist some high-performance library routines for this operation. They are often optimized based on the target architecture. The PC cluster computing paradigm has recently emerged a ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The multiplication of large spare matrices is a basic operation for many scientific and engineering applications. There exist some high-performance library routines for this operation. They are often optimized based on the target architecture. The PC cluster computing paradigm has recently emerged as a viable alternative for high-performance, low-cost computing. In this paper, we apply our super-programming approach [24] to study the load balance and runtime management overhead for implementing parallel large matrix multiplication on PC clusters. For a parallel environment, it is essential to partition the entire operation into tasks and assign them to individual processing elements. Most of the existing approaches partition the given sub-matrices based on some kinds of workload estimation. For dense matrices on some architectures estimations may be accurate. For sparse matrices on PC, however, the workloads of block operations may not necessarily depend on the size of data. The workloads may not be well estimated in advance. Any approach other than run-time dynamic partitioning may degrade performance. Moreover, in a heterogeneous environment, statically partitioning is NP-complete. For embedded problems, it also introduces management overhead. In this paper We adopt our super-programming approach that partitions the entire task into medium-grain tasks that are implemented using super-instructions; the workload of super-instructions is easy to estimate. These tasks are dynamically assigned to member computer nodes. A node may execute more than one super-instruction. Our results prove the viability of our approach.
On the Representation and Multiplication of Hypersparse Matrices
, 2008
"... Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on the multiplication of sparse matrices (SpGEMM). We first present the issues with existing sparse matrix representations and multiplication algorithms that make them unscalable to thousands of processors. Then, we develop and analyze two new algorithms that overcome these limitations. We consider our algorithms first as the sequential kernel of a scalable parallel sparse matrix multiplication algorithm and second as part of a polyalgorithm for SpGEMM that would execute different kernels depending on the sparsity of the input matrices. Such a sequential kernel requires a new data structure that exploits the hypersparsity of the individual submatrices owned by a single processor after the 2D partitioning. We experimentally evaluate the performance and characteristics of our algorithms and show that they scale significantly better than existing kernels.
The Combinatorial BLAS: Design, Implementation, and Applications
, 2010
"... This paper presents a scalable high-performance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of high-performance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse mat ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper presents a scalable high-performance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of high-performance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse matrix methods. Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse grained parallelism for thousands of processors, which can be uncovered by using the right primitives. We describe the Parallel Combinatorial BLAS, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. We provide an extendible library interface and some guiding principles for future development. The library is evaluated using two important graph algorithms, in terms of both performance and ease-ofuse. The scalability and raw performance of the example applications, using the combinatorial BLAS, are unprecedented on distributed memory clusters.
SparseM: A sparse matrix package for R
- J. of Statistical Software
"... SparseM provides some basic R functionality for linear algebra with sparse matrices. Use of the package is illustrated by a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices. Significant performance improvements in memory utilizati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
SparseM provides some basic R functionality for linear algebra with sparse matrices. Use of the package is illustrated by a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices. Significant performance improvements in memory utilization and computational speed are possible for applications involving large sparse matrices. 1
A FRISCH-NEWTON ALGORITHM FOR SPARSE QUANTILE REGRESSION
"... Abstract. Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the model can be quite large, but the number of non-zero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the Frisch-Newton algorithm for quantile regression described in Portnoy and Koenker (1997). The new algorithm substantially reduces the storage (memory) requirements and increases computational speed. The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVA-type decomposition and partial linear models. 1.
COMPUTING THE ACTION OF THE MATRIX EXPONENTIAL, WITH AN APPLICATION TO EXPONENTIAL INTEGRATORS
, 2010
"... A new algorithm is developed for computing etAB, where A is an n × n matrix and B is n×n0 with n0 ≪ n. The algorithm works for any A, its computational cost is dominated by the formation of products of A with n × n0 matrices, and the only input parameter is a backward error tolerance. The algorithm ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A new algorithm is developed for computing etAB, where A is an n × n matrix and B is n×n0 with n0 ≪ n. The algorithm works for any A, its computational cost is dominated by the formation of products of A with n × n0 matrices, and the only input parameter is a backward error tolerance. The algorithm can return a single matrix etAB or a sequence etkAB on an equally spaced grid of points tk. It uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the exponential. It determines the amount of scaling and the Taylor degree using the recent analysis of Al-Mohy and Higham [SIAM J. Matrix Anal. Appl. 31 (2009), pp. 970-989], which provides sharp truncation error bounds expressed in terms of the quantities ‖Ak‖1/k for a few values of k, where the norms are estimated using a matrix norm estimator. Shifting and balancing are used as preprocessing steps to reduce the cost of the algorithm. Numerical experiments show that the algorithm performs in a numerically stable fashion across a wide range of problems, and analysis of rounding errors and of the conditioning of the problem provides theoretical support. Experimental comparisons with two Krylov-based MATLAB codes show the new algorithm to be sometimes much superior in terms of computational cost and accuracy. An important application of the algorithm is to exponential integrators for ordinary differential equations. It is shown that the sums of the form ∑p k=0 ϕk(A)uk that arise in exponential integrators, where the ϕk are related to the exponential function, can be expressed in terms of a single exponential of a matrix of dimension n + p built by augmenting A with additional rows and columns, and the algorithm of this paper can therefore be employed.
Sun Microsystems
"... Numerical linear algebra, particularly the solution of linear systems of equations, linear least squares problems, eigenvalue problems and singular value ..."
Abstract
- Add to MetaCart
Numerical linear algebra, particularly the solution of linear systems of equations, linear least squares problems, eigenvalue problems and singular value
Minimum Classification Error Training in Example Based Speech and Pattern Recognition Using Sparse Weight Matrices
, 2009
"... The Minimum Classification Error (MCE) criterion is a wellknown criterion in pattern classification systems. The aim of MCE training is to minimize the resulting classification error when trying to classify a new data set. Usually, these classification systems use some form of statistical model to d ..."
Abstract
- Add to MetaCart
The Minimum Classification Error (MCE) criterion is a wellknown criterion in pattern classification systems. The aim of MCE training is to minimize the resulting classification error when trying to classify a new data set. Usually, these classification systems use some form of statistical model to describe the data. These systems usually do not work very well when this underlying model is incorrect. Speech recognition systems traditionally use Hidden Markov Models (HMM) with Gaussian (or Gaussian mixture) probability density functions as their basic model. It is well known that these models make some assumptions that are not correct. In example based approaches, these statistical models are absent and are replaced by the pure data. The absence of statistical models has created the need for parameters to model the data space accurately. For this work, we use the MCE criterion to create a system that is able to work together with this example based approach. Moreover, we extend the locally scaled distance measure with sparse, block diagonal weight matrices resulting in a better model for the data space and avoiding the computational load caused by using full matrices. We illustrate the approach with some example experiments on databases from pattern recognition and with speech recognition.

