Results 1 
7 of
7
Use of hybrid recursive csr/coo data structures in sparse matricesvector multiplication
 In IMCSIT
, 2010
"... Abstract—Recently, we have introduced an approach to basic sparse matrix computations on multicore cache based machines using recursive partitioning. Here, the memory representation of a sparse matrix consists of a set of submatrices, which are used as leaves of a quadtree structure. In this paper, ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Recently, we have introduced an approach to basic sparse matrix computations on multicore cache based machines using recursive partitioning. Here, the memory representation of a sparse matrix consists of a set of submatrices, which are used as leaves of a quadtree structure. In this paper, we evaluate the performance impact, on the Sparse MatrixVector Multiplication (SpMV), of a modification to our Recursive CSR implementation, allowing the use of multiple data structures in leaf matrices (CSR/COO, with either 16/32 bit indices). I.
Assembling Recursively Stored Sparse Matrices
"... Abstract—Recently, we have introduced an approach to multicore computations on sparse matrices using recursive partitioning, called Recursive Sparse Blocks (RSB). In this document, we discuss issues involved in assembling matrices in the RSB format. Since the main expected application area is iter ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Recently, we have introduced an approach to multicore computations on sparse matrices using recursive partitioning, called Recursive Sparse Blocks (RSB). In this document, we discuss issues involved in assembling matrices in the RSB format. Since the main expected application area is iterative methods, we compare the performance of matrix assembly to that of matrixvector multiply (SpMV), outlining both scalability of the method and execution times ratio. I.
Fast Matrixvector Multiplications for Largescale Logistic Regression on Sharedmemory Systems
"... Abstract—Sharedmemory systems such as regular desktops now possess enough memory to store large data. However, the training process for data classification can still be slow if we do not fully utilize the power of multicore CPUs. Many existing works proposed parallel machine learning algorithms by ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Sharedmemory systems such as regular desktops now possess enough memory to store large data. However, the training process for data classification can still be slow if we do not fully utilize the power of multicore CPUs. Many existing works proposed parallel machine learning algorithms by modifying serial ones, but convergence analysis may be complicated. Instead, we do not modify machine learning algorithms, but consider those that can take the advantage of parallel matrix operations. We particularly investigate the use of parallel sparse matrixvector multiplications in a Newton method for largescale logistic regression. Various implementations from easy to sophisticated ones are analyzed and compared. Results indicate that under suitable settings excellent speedup can be achieved. Keywordssparse matrix; parallel matrixvector multiplication; classification; Newton method I.
Efficient Multithreaded Untransposed, Transposed or Symmetric Sparse MatrixVector Multiplication with the Recursive Sparse Blocks Format
, 2014
"... In earlier work we have introduced the “Recursive Sparse Blocks ” (RSB) sparse matrix storage scheme oriented towards cache efficient matrixvector multiplication (SpMV) and triangular solution (SpSV) on cache based shared memory parallel computers. Both the transposed (SpMV T) and symmetric (SymSpM ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In earlier work we have introduced the “Recursive Sparse Blocks ” (RSB) sparse matrix storage scheme oriented towards cache efficient matrixvector multiplication (SpMV) and triangular solution (SpSV) on cache based shared memory parallel computers. Both the transposed (SpMV T) and symmetric (SymSpMV) matrixvector multiply variants are supported. RSB stands for a metaformat: it recursively partitions a rectangular sparse matrix in quadrants; leaf submatrices are stored in an appropriate traditional format — either Compressed Sparse Rows (CSR) or Coordinate (COO). In this work, we compare the performance of our RSB implementation of SpMV, SpMV T, SymSpMV to that of the stateoftheart Intel Math Kernel Library (MKL) CSR implementation on the recent Intel’s Sandy Bridge processor. Our results with a few dozens of real world large matrices suggest the efficiency of the approach: in all of the cases, RSB’s SymSpMV (and in most cases, SpMV T as well) took less than half of MKL CSR’s time; SpMV ’s advantage was smaller. Furthermore, RSB’s SpMV T is more scalable than MKL’s CSR, in that it performs almost as well as SpMV. Additionally, we include comparisons to the stateofthe art format Compressed Sparse Blocks (CSB) implementation. We observed RSB to be slightly superior to CSB in SpMV T, slightly inferior in SpMV, and better (in most cases by a factor of two or more) in SymSpMV. Although RSB is a nontraditional storage format and thus needs a special constructor, it can be assembled from CSR or any other similar rowordered representation arrays in the time of a few dozens of matrixvector multiply executions. Thanks to its significant advantage over MKL’s CSR routines for symmetric or transposed matrixvector multiplication, in most of the observed cases the assembly cost has been observed to amortize with fewer than fifty iterations.
Environmental Modelling & Software, 68
"... a b s t r a c t This paper details a strategy for modifying the source code of a complex model so that the model may be used in a data assimilation context, and gives the standards for implementing a data assimilation code to use such a model. The strategy relies on keeping the model separate from ..."
Abstract
 Add to MetaCart
(Show Context)
a b s t r a c t This paper details a strategy for modifying the source code of a complex model so that the model may be used in a data assimilation context, and gives the standards for implementing a data assimilation code to use such a model. The strategy relies on keeping the model separate from any data assimilation code, and coupling the two through the use of Message Passing Interface (MPI) functionality. This strategy limits the changes necessary to the model and as such is rapid to program, at the expense of ultimate performance. The implementation technique is applied in different models with state dimension up to .2.7 Â 10 8 The overheads added by using this implementation strategy in a coupled oceanatmosphere climate model are shown to be an order of magnitude smaller than the addition of correlated stochastic random errors necessary for some nonlinear data assimilation techniques.
Quarterly Journal of the Royal Meteorological Society Q. J. R. Meteorol. Soc. 00: 1–17 (0000) Twin experiments with the equivalent weights particle filter and
"... This paper investigates the use of a particle filter for data assimilation with a full scale coupled oceanatmosphere general circulation model. Synthetic twin experiments are performed to assess the performance of the equivalent weights filter in such a highdimensional system. Artificial 2dimensi ..."
Abstract
 Add to MetaCart
(Show Context)
This paper investigates the use of a particle filter for data assimilation with a full scale coupled oceanatmosphere general circulation model. Synthetic twin experiments are performed to assess the performance of the equivalent weights filter in such a highdimensional system. Artificial 2dimensional sea surface temperature fields are used as observational data every day. Results are presented for different values of the free parameters in the method. Measures of the performance of the filter are root mean square errors, trajectories of individual variables in the model and rank histograms. Filter degeneracy is not observed and the performance of the filter is shown to depend on the ability to keep maximum spread in the ensemble.