Results 1  10
of
14
OSKI: A library of automatically tuned sparse matrix kernels
 Institute of Physics Publishing
, 2005
"... kernels ..."
The Combinatorial BLAS: Design, Implementation, and Applications
, 2010
"... This paper presents a scalable highperformance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of highperformance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse mat ..."
Abstract

Cited by 22 (9 self)
 Add to MetaCart
This paper presents a scalable highperformance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of highperformance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse matrix methods. Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse grained parallelism for thousands of processors, which can be uncovered by using the right primitives. We describe the Parallel Combinatorial BLAS, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. We provide an extendible library interface and some guiding principles for future development. The library is evaluated using two important graph algorithms, in terms of both performance and easeofuse. The scalability and raw performance of the example applications, using the combinatorial BLAS, are unprecedented on distributed memory clusters.
On the Representation and Multiplication of Hypersparse Matrices
, 2008
"... Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on the multiplication of sparse matrices (SpGEMM). We first present the issues with existing sparse matrix representations and multiplication algorithms that make them unscalable to thousands of processors. Then, we develop and analyze two new algorithms that overcome these limitations. We consider our algorithms first as the sequential kernel of a scalable parallel sparse matrix multiplication algorithm and second as part of a polyalgorithm for SpGEMM that would execute different kernels depending on the sparsity of the input matrices. Such a sequential kernel requires a new data structure that exploits the hypersparsity of the individual submatrices owned by a single processor after the 2D partitioning. We experimentally evaluate the performance and characteristics of our algorithms and show that they scale significantly better than existing kernels.
A SuperProgramming Technique for Large Sparse Matrix Multiplication on PC Clusters
 on PC clusters, IEICE Trans. Info. Systems E87D
, 2004
"... The multiplication of large spare matrices is a basic operation for many scientific and engineering applications. There exist some highperformance library routines for this operation. They are often optimized based on the target architecture. The PC cluster computing paradigm has recently emerged a ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
The multiplication of large spare matrices is a basic operation for many scientific and engineering applications. There exist some highperformance library routines for this operation. They are often optimized based on the target architecture. The PC cluster computing paradigm has recently emerged as a viable alternative for highperformance, lowcost computing. In this paper, we apply our superprogramming approach [24] to study the load balance and runtime management overhead for implementing parallel large matrix multiplication on PC clusters. For a parallel environment, it is essential to partition the entire operation into tasks and assign them to individual processing elements. Most of the existing approaches partition the given submatrices based on some kinds of workload estimation. For dense matrices on some architectures estimations may be accurate. For sparse matrices on PC, however, the workloads of block operations may not necessarily depend on the size of data. The workloads may not be well estimated in advance. Any approach other than runtime dynamic partitioning may degrade performance. Moreover, in a heterogeneous environment, statically partitioning is NPcomplete. For embedded problems, it also introduces management overhead. In this paper We adopt our superprogramming approach that partitions the entire task into mediumgrain tasks that are implemented using superinstructions; the workload of superinstructions is easy to estimate. These tasks are dynamically assigned to member computer nodes. A node may execute more than one superinstruction. Our results prove the viability of our approach.
Feature selection for loglinear acoustic models
 in Proc. ICASSP’11
"... Loglinear acoustic models have been shown to be competitive with Gaussian mixture models in speech recognition. Their high training time can be reduced by feature selection. We compare a simple univariate feature selection algorithm with ReliefF an efficient multivariate algorithm. An alternative ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Loglinear acoustic models have been shown to be competitive with Gaussian mixture models in speech recognition. Their high training time can be reduced by feature selection. We compare a simple univariate feature selection algorithm with ReliefF an efficient multivariate algorithm. An alternative to feature selection is ℓ1regularized training, which leads to sparse models. We observe that this gives no speedup when sparse features are used, hence feature selection methods are preferable. For dense features, ℓ1regularization can reduce training and recognition time. We generalize the well known Rprop algorithm for the optimization of ℓ1regularized functions. Experiments on the Wall Street Journal corpus showed that a large number of sparse features could be discarded without loss of performance. A strong regularization led to slight performance degradations, but can be useful on large tasks, where training the full model is not tractable. Index Terms — feature selection, ℓ1regularization, ReliefF, acoustic modeling, loglinear models
SparseM: A sparse matrix package for R
 J. of Statistical Software
"... SparseM provides some basic R functionality for linear algebra with sparse matrices. Use of the package is illustrated by a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices. Significant performance improvements in memory utilizati ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
SparseM provides some basic R functionality for linear algebra with sparse matrices. Use of the package is illustrated by a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices. Significant performance improvements in memory utilization and computational speed are possible for applications involving large sparse matrices. 1
A FRISCHNEWTON ALGORITHM FOR SPARSE QUANTILE REGRESSION
"... Abstract. Recent experience has shown that interiorpoint methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. Recent experience has shown that interiorpoint methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixedeffect model for panel data where the parametric dimension of the model can be quite large, but the number of nonzero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the FrischNewton algorithm for quantile regression described in Portnoy and Koenker (1997). The new algorithm substantially reduces the storage (memory) requirements and increases computational speed. The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVAtype decomposition and partial linear models. 1.
COMPUTING THE ACTION OF THE MATRIX EXPONENTIAL, WITH AN APPLICATION TO EXPONENTIAL INTEGRATORS
, 2010
"... A new algorithm is developed for computing etAB, where A is an n × n matrix and B is n×n0 with n0 ≪ n. The algorithm works for any A, its computational cost is dominated by the formation of products of A with n × n0 matrices, and the only input parameter is a backward error tolerance. The algorithm ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A new algorithm is developed for computing etAB, where A is an n × n matrix and B is n×n0 with n0 ≪ n. The algorithm works for any A, its computational cost is dominated by the formation of products of A with n × n0 matrices, and the only input parameter is a backward error tolerance. The algorithm can return a single matrix etAB or a sequence etkAB on an equally spaced grid of points tk. It uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the exponential. It determines the amount of scaling and the Taylor degree using the recent analysis of AlMohy and Higham [SIAM J. Matrix Anal. Appl. 31 (2009), pp. 970989], which provides sharp truncation error bounds expressed in terms of the quantities ‖Ak‖1/k for a few values of k, where the norms are estimated using a matrix norm estimator. Shifting and balancing are used as preprocessing steps to reduce the cost of the algorithm. Numerical experiments show that the algorithm performs in a numerically stable fashion across a wide range of problems, and analysis of rounding errors and of the conditioning of the problem provides theoretical support. Experimental comparisons with two Krylovbased MATLAB codes show the new algorithm to be sometimes much superior in terms of computational cost and accuracy. An important application of the algorithm is to exponential integrators for ordinary differential equations. It is shown that the sums of the form ∑p k=0 ϕk(A)uk that arise in exponential integrators, where the ϕk are related to the exponential function, can be expressed in terms of a single exponential of a matrix of dimension n + p built by augmenting A with additional rows and columns, and the algorithm of this paper can therefore be employed.
Sun Microsystems
"... Numerical linear algebra, particularly the solution of linear systems of equations, linear least squares problems, eigenvalue problems and singular value ..."
Abstract
 Add to MetaCart
Numerical linear algebra, particularly the solution of linear systems of equations, linear least squares problems, eigenvalue problems and singular value
Minimum Classification Error Training in Example Based Speech and Pattern Recognition Using Sparse Weight Matrices
, 2009
"... The Minimum Classification Error (MCE) criterion is a wellknown criterion in pattern classification systems. The aim of MCE training is to minimize the resulting classification error when trying to classify a new data set. Usually, these classification systems use some form of statistical model to d ..."
Abstract
 Add to MetaCart
The Minimum Classification Error (MCE) criterion is a wellknown criterion in pattern classification systems. The aim of MCE training is to minimize the resulting classification error when trying to classify a new data set. Usually, these classification systems use some form of statistical model to describe the data. These systems usually do not work very well when this underlying model is incorrect. Speech recognition systems traditionally use Hidden Markov Models (HMM) with Gaussian (or Gaussian mixture) probability density functions as their basic model. It is well known that these models make some assumptions that are not correct. In example based approaches, these statistical models are absent and are replaced by the pure data. The absence of statistical models has created the need for parameters to model the data space accurately. For this work, we use the MCE criterion to create a system that is able to work together with this example based approach. Moreover, we extend the locally scaled distance measure with sparse, block diagonal weight matrices resulting in a better model for the data space and avoiding the computational load caused by using full matrices. We illustrate the approach with some example experiments on databases from pattern recognition and with speech recognition.