## Cache-oblivious dynamic programming (2006)

### Cached

### Download Links

- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [ce.sharif.ac.ir]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’06 |

Citations: | 17 - 5 self |

### BibTeX

@INPROCEEDINGS{Chowdhury06cache-obliviousdynamic,

author = {Rezaul Alam Chowdhury and Vijaya Ramachandran},

title = {Cache-oblivious dynamic programming},

booktitle = {In Proc. of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’06},

year = {2006},

pages = {591--600}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present efficient cache-oblivious algorithms for several fundamental dynamic programs. These include new algorithms with improved cache performance for longest common subsequence (LCS), edit distance, gap (i.e., edit distance with gaps), and least weight subsequence. We present a new cache-oblivious framework called the Gaussian Elimination Paradigm (GEP) for Gaussian elimination without pivoting that also gives cache-oblivious algorithms for Floyd-Warshall all-pairs shortest paths in graphs and ‘simple DP’, among other problems. 1

### Citations

9061 | Introduction to Algorithms
- Cormen, Leiserson, et al.
- 2001
(Show Context)
Citation Context ... and an arbitrarily large external memory partitioned into blocks of size B. The I/O complexity of an algorithm is the number of blocks transferred between these two levels. The cache-oblivious model =-=[9]-=- is an extension of this model with the additional feature that algorithms do not use knowledge of M and B. A cache-oblivious algorithm is flexible and portable, and simultaneously adapts to all level... |

2901 |
Dynamic programming
- Bellman
- 1957
(Show Context)
Citation Context ... and also that as much useful work as possible is performed on this data before it is written back to external memory (‘temporal locality’). Dynamic programming is a widely-used algorithmic technique =-=[3, 21, 7]-=-. However, standard implementations of these algorithms often fail to exploit the temporal locality of data which leads to poor I/O performance. ∗ Department of Computer Sciences, University of Texas,... |

561 |
The Input/Output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...rarchy with registers in the lowest level followed by L1 cache, L2 cache, L3 cache, main memory, and disk, with the access time of each memory level increasing with its level. The two-level I/O model =-=[1]-=- is a simple abstraction of this hierarchy that consists of an internal memory of size M, and an arbitrarily large external memory partitioned into blocks of size B. The I/O complexity of an algorithm... |

390 |
Algorithm 97 (Shortest Path
- Floyd
(Show Context)
Citation Context ...tion. We also show that GEP not only gives the cache-oblivious Gaussian elimination algorithm, but it also gives cacheoblivious algorithms for LU decomposition without pivoting, Floyd-Warshall’s APSP =-=[8, 25]-=-, matrix multiplication, and sequence alignment with gaps; with some modification, it also gives a cache-oblivious algorithm for a class of dynamic programs termed as ‘simple-DP’ [6] which includes dy... |

293 | A linear space algorithm for computing maximal common subsequences - Hirschberg - 1975 |

293 |
Introduction to computational Biology
- Waterman
- 1995
(Show Context)
Citation Context ...r one with the allowable edit operations being insertion, deletion and substitution of symbols each having a cost based on the symbol(s) on which it is to be applied. We also consider the gap problem =-=[11, 12, 26]-=- which is a natural generalization of the edit distance problem, and which arises in molecular biology, geology, and speech recognition. Unlike the edit distance problem, however, in this problem a se... |

221 | A Theorem on Boolean Matrices
- Warshall
- 1962
(Show Context)
Citation Context ...tion. We also show that GEP not only gives the cache-oblivious Gaussian elimination algorithm, but it also gives cacheoblivious algorithms for LU decomposition without pivoting, Floyd-Warshall’s APSP =-=[8, 25]-=-, matrix multiplication, and sequence alignment with gaps; with some modification, it also gives a cache-oblivious algorithm for a class of dynamic programs termed as ‘simple-DP’ [6] which includes dy... |

193 | Algorithms for the longest common subsequence problem
- Hirschberg
- 1977
(Show Context)
Citation Context ...l three methods have the same asymptotic bounds, but the triangular partitioning gave the best performance experimentally. For comparison we coded the widely used linear-space algorithm of Hirschberg =-=[14]-=-. Both algorithms were tested on both random and real-world sequences consisting upto 2 million symbols each, and timing and caching data were obtained on three stateof-the-art architectures: Intel Xe... |

188 |
Algorithm design
- Kleinberg, Tardos
- 2006
(Show Context)
Citation Context ... modification, it also gives a cache-oblivious algorithm for a class of dynamic programs termed as ‘simple-DP’ [6] which includes dynamic programming algorithms for RNA secondary structure prediction =-=[19]-=-, matrix chain multiplication and optimal binary search trees. The I/O-complexity of each of these algorithms matches the best I/O bound known for the corresponding problem. Related Work. The linear-s... |

177 |
I/O Complexity: The Red-Blue Pebbling Game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...cutes mn operations in order to implement the type of computation defined by equation 2.1 (i.e., compute c[m, n]), must perform Ω ( 1 + m+n ) mn B + BM I/Os. We use the red-blue pebble game technique =-=[16]-=- in order to obtain this lower bound. First we construct a computation DAG G given by the computation of the algorithm. Figure 4(a) shows an example of the computation DAG given by equation 2.1. Nodes... |

107 | An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms
- Blumofe, Frigo, et al.
- 1996
(Show Context)
Citation Context ... to obtain a cache-oblivious algorithm for Gaussian elimination without pivoting. Our algorithm is in-place, and is arguably simpler than the known cache-oblivious algorithms for solving this problem =-=[27, 4]-=-, since it is not based on LU decomposition and does not perform matrix multiplication. We also show that GEP not only gives the cache-oblivious Gaussian elimination algorithm, but it also gives cache... |

100 |
Comparative analyses of multi-species sequences from targeted genomic regions. Nature. 424, 788-793. at Pennsylvania State U niversity on February 27, 2014 http://hm g.oxfordjournals.org/ D ow nloaded from HMG-2005W-00913 Revised Antonellis et al. - 37 Le
- Thomas, Touchman, et al.
- 2003
(Show Context)
Citation Context ...hms on the Intel Xeon and the Sun UltraSPARC-III+. • CO ran a factor of 2 to 6 times faster than Hi on random sequences. In Table 2 we tabulate running times on the AMD Opteron for CFTR DNA sequences =-=[22]-=-, where again, CO performs approximately twice as fast as Hi. • CO executed 40%-50% fewer instructions than Hi. • Unlike Hi, CO was able to conceal the effects of caches on its running time; its actua... |

98 | Locality of Reference in LU Decomposition with Partial Pivoting
- Toledo
- 1997
(Show Context)
Citation Context ...linear equations are based on LU decomposition. ( In [27, 4] cache-oblivious 3 n algorithms performing O B √ ) I/O operations are M given for LU decomposition without pivoting, while the algorithm in =-=[23]-=- performs LU decomposition with partial pivoting within the same I/O bound. These algorithms use matrix multiplication and solution of triangular linear systems as subroutines. In [6], an O(n3 ( 3 n )... |

67 |
General context-free recognition in less than cubic time
- Valiant
- 1975
(Show Context)
Citation Context ...cation and solution of triangular linear systems as subroutines. In [6], an O(n3 ( 3 n ) time and O B √ ) I/O cacheM oblivious algorithm based on Valiant’s context-free language recognition algorithm =-=[24]-=-, is given for simple-DP. A cache-oblivious algorithm for Floyd-Warshall’s APSP algorithm is given in [20]. The algorithm runs in O(n3 ( ) 3 n ) time and incurs O cache misses. B √ M The rest of the p... |

49 |
Speeding up dynamic programming with applications to molecular biology
- Galil, Giancarlo
- 1989
(Show Context)
Citation Context ...r one with the allowable edit operations being insertion, deletion and substitution of symbols each having a cost based on the symbol(s) on which it is to be applied. We also consider the gap problem =-=[11, 12, 26]-=- which is a natural generalization of the edit distance problem, and which arises in molecular biology, geology, and speech recognition. Unlike the edit distance problem, however, in this problem a se... |

36 |
Dynamic Programming
- Sniedovich
- 1992
(Show Context)
Citation Context ... and also that as much useful work as possible is performed on this data before it is written back to external memory (‘temporal locality’). Dynamic programming is a widely-used algorithmic technique =-=[3, 21, 7]-=-. However, standard implementations of these algorithms often fail to exploit the temporal locality of data which leads to poor I/O performance. ∗ Department of Computer Sciences, University of Texas,... |

35 |
The least weight subsequence problem
- Hirschberg, Larmore
- 1987
(Show Context)
Citation Context ... space and in( ) 3 n curs O B I/Os. We present a cache-oblivious algo( 3 n rithm that incurs only O B √ ) I/Os without changM ing the time and space complexities. The least weightsubsequence problem =-=[15, 12]-=- can be viewed as a 1dimensional version of the gap problem, and we present a cache-oblivious algorithm that runs in O(n2 ( ) ) time and 2 n O BM I/Os under some natural assumptions. Finally we introd... |

32 | Optimizing graph algorithms for improved cache performance
- Park, Penner, et al.
- 2002
(Show Context)
Citation Context ...I/O cacheM oblivious algorithm based on Valiant’s context-free language recognition algorithm [24], is given for simple-DP. A cache-oblivious algorithm for Floyd-Warshall’s APSP algorithm is given in =-=[20]-=-. The algorithm runs in O(n3 ( ) 3 n ) time and incurs O cache misses. B √ M The rest of the paper is organized as follows. In section 2 we describe and analyze our cache-oblivious algorithm for the L... |

28 | Beyond core: Making parallel computer i/o practical
- Womble, Greenberg, et al.
- 1993
(Show Context)
Citation Context ... to obtain a cache-oblivious algorithm for Gaussian elimination without pivoting. Our algorithm is in-place, and is arguably simpler than the known cache-oblivious algorithms for solving this problem =-=[27, 4]-=-, since it is not based on LU decomposition and does not perform matrix multiplication. We also show that GEP not only gives the cache-oblivious Gaussian elimination algorithm, but it also gives cache... |

18 | 1992]. \Fast linearspace computations of longest common subsequences - Apostolico, Browne, et al. |

12 |
Parallel algorithms for dynamic programming recurrences with more than O(1) dependency
- Galil, Park
- 1994
(Show Context)
Citation Context ...r one with the allowable edit operations being insertion, deletion and substitution of symbols each having a cost based on the symbol(s) on which it is to be applied. We also consider the gap problem =-=[11, 12, 26]-=- which is a natural generalization of the edit distance problem, and which arises in molecular biology, geology, and speech recognition. Unlike the edit distance problem, however, in this problem a se... |

7 |
Rangan. A linear-space algorithm for the LCS problem
- Kumar, Pandu
- 1987
(Show Context)
Citation Context ...ynamic programming solution [7] that runs in Θ(mn) time, uses Θ(mn) space and performs Θ ( ) mn B I/Os when working on two sequences of lengths m and n. Linear space implementations of this algorithm =-=[13, 18, 2]-=- also have I/O complexity Ω ( ) mn B . The LCS problem arises in a wide range of applications, and is especially prominent in computational biology in sequence alignment. We present a cache-oblivious ... |

6 | Cache efficient simple dynamic programming
- Cherng, Ladner
- 2005
(Show Context)
Citation Context ...rshall’s APSP [8, 25], matrix multiplication, and sequence alignment with gaps; with some modification, it also gives a cache-oblivious algorithm for a class of dynamic programs termed as ‘simple-DP’ =-=[6]-=- which includes dynamic programming algorithms for RNA secondary structure prediction [19], matrix chain multiplication and optimal binary search trees. The I/O-complexity of each of these algorithms ... |

5 |
Cache-oblivious stencil computations
- Frigo, Strumpen
- 2005
(Show Context)
Citation Context ...rably better than the naïve bound of O(n2 ( ) ) it 2 n is considerably larger than O BM , which is the bound we achieve. If only the length of the LCS is needed, the technique for stencil computation =-=[10]-=- can achieve the same bound as our algorithm. However, that technique does not extend to computing an actual sequence. Known cache-oblivious algorithms for Gaussian elimination for solving systems of ... |

1 |
Experimental Evaluation of a CacheOblivious LCS Algorithm
- Chowdhury
- 2005
(Show Context)
Citation Context ...on symbols each, and timing and caching data were obtained on three stateof-the-art architectures: Intel Xeon, AMD Opteron and SUN UltraSPARC-III+. Detailed results of our experiments can be found in =-=[5]-=-. Below we summarize our results, where CO and Hi denote the new cache-oblivious algorithm and Hirschberg’s algorithm, respectively: • CO incurred considerably fewer cache misses compared to Hi. In Ta... |