## Efficient string matching algorithms for combinatorial universal denoising (2005)

Venue: | In Proc. of IEEE Data Compression Conference (DCC), Snowbird |

Citations: | 2 - 0 self |

### BibTeX

@INPROCEEDINGS{Chen05efficientstring,

author = {S. Chen and S. Diggaviý and S. Dusadý and S. Muthukrishnan},

title = {Efficient string matching algorithms for combinatorial universal denoising},

booktitle = {In Proc. of IEEE Data Compression Conference (DCC), Snowbird},

year = {2005}

}

### OpenURL

### Abstract

Inspired by the combinatorial denoising method DUDE [13], we present efficient algorithms for implementing this idea for arbitrary contexts or for using it within subsequences. We also propose effective, efficient denoising error estimators so we can find the best denoising of an input sequence over different context lengths. Our methods are simple, drawing from string matching methods and radix sorting. We also present experimental results of our proposed algorithms. 1

### Citations

310 | Efficient randomized pattern-matching algorithms
- Karp, Rabin
- 1987
(Show Context)
Citation Context ...nd then the denoising step from the empirical distribution. We will present three solutions for naming. Solution A. (Randomized algorithm) This approach is quite simple. We use a fingerprint function =-=[7]-=- on substrings of length� substringÞ�� ��� ℄,�Þ�� ��� ℄� of stringÞÒwhich has the probability that given the fingerprint�Þ�� ��� ℄of �Þ�� ��� ℄ifÞ�� ��� ℄�Þ�� ��� ℄and with probability at most �Ò, �Þ�... |

153 | Simple linear work suffix array construction
- Kärkkäinen, Sanders
- 2003
(Show Context)
Citation Context ...us on giving names to substringsÞ�� ���℄first. We will construct the suffix treeÌof stringÞÒ. A number of algorithms exist for suffix tree construction. The earliest result is [3] and a simple one is =-=[5]-=-; several implementations exist, e.g., [12]. This takesÇÒtime deterministically for alphabet��Òfor any constant . Recall that the suffix tree is a rooted trie with each edge labeled by some substring ... |

81 | Universal Discrete Denoising: Known Channel
- Weissman, Ordentlich, et al.
- 2005
(Show Context)
Citation Context ...the input sequence is a discrete memoryless channel (DMC). In this case, both the noise-free sequence as well as the noisy observations are assumed to belong to an alphabet Ëof size1Ñ. The authors in =-=[13]-=- proposed an algorithm calledDUDE that relies on computing for each string positionÞ�, certain empirical distributional value based on counting occurrences of its context, i.e.,,Þ� ������Þ� �Þ������Þ�... |

73 |
Rapid identification of repeated patterns in strings, trees and arrays
- Karp, Miller, et al.
- 1972
(Show Context)
Citation Context ... in the rangeÇÐÓ�Ò. The algorithm succeeds with high probability because of the low probability of error in the fingerprint function. Solution B. (Deterministic naming). This will follow the ideas in =-=[6]-=-. Without loss of generality we will focus on giving names to stringÞ�� ���℄. Our algorithm proceeds in rounds. In�th round, we will have namesÆ�for the stringsÞ�� �� �℄for�������Ð � �stuplesØ��Æ� Þ��... |

64 |
Linear algorithm for data compression via string matching
- Rodeh, Pratt, et al.
(Show Context)
Citation Context ...havior of the denoiser performance prediction seemed to be close for datasets examined. 5 Concluding Remarks There has been a rich history of interplay between string matching methods and compression =-=[11, 2, 8]-=-, so our work in this paper adds to this lore. In particular, string matching methods in [2, 8] use the context like we do here, but do a different, elaborate search within many contexts for a particu... |

14 |
Optimal suffix tree construction with large alphabets
- Farach-Colton
- 1997
(Show Context)
Citation Context ...c algorithm. We will focus on giving names to substringsÞ�� ���℄first. We will construct the suffix treeÌof stringÞÒ. A number of algorithms exist for suffix tree construction. The earliest result is =-=[3]-=- and a simple one is [5]; several implementations exist, e.g., [12]. This takesÇÒtime deterministically for alphabet��Òfor any constant . Recall that the suffix tree is a rooted trie with each edge la... |

10 | Perfect hashing for strings: Formalization and algorithms
- Farach, Muthukrishnan
- 1996
(Show Context)
Citation Context ...tical to above, but we only work on the induced treeØ timeÇ�� � �ÐÓ�ÐÓ�Ò obtained byWAÐÌ���for�,�����. The bottleneck is the procedure for computing WAÐÌ���which currently takesÇÐÓ�ÐÓ�Òtime per�using =-=[4]-=-. Summarizing, Theorem 3 A given stringÞÒand probability transition matrixÉwhich can be preprocessed, a queryDUDE-DENOISE�����can be answered in ��Ñdeterministically using suffix trees where��is the n... |

10 | Augmenting suffix trees, with applications
- Matias, Muthukrishnan, et al.
- 1998
(Show Context)
Citation Context ...havior of the denoiser performance prediction seemed to be close for datasets examined. 5 Concluding Remarks There has been a rich history of interplay between string matching methods and compression =-=[11, 2, 8]-=-, so our work in this paper adds to this lore. In particular, string matching methods in [2, 8] use the context like we do here, but do a different, elaborate search within many contexts for a particu... |

8 |
Efficient pruning of bidirectional context trees with applications to universal denoising and compression
- Ordentlich, Weinberger, et al.
- 2004
(Show Context)
Citation Context ...E for a given context length�. Since this loss function depends on the unknown inputÜÒ, we need to have estimators for the loss function that depend only on the observed noisy sequenceÞÒ. Recently in =-=[10]-=-, an estimator for the cumulative loss function has been presented for the denoising error; however, it is rather expensive, taking timeÇÒÑ�, and hence more efficient estimators are needed specially f... |

6 |
Augmenting suffix trees with applications
- Muthukrishnan, Ziv
- 1998
(Show Context)
Citation Context ...havior of the denoiser performance prediction seemed to be close for datasets examined. 5 Concluding Remarks There has been a rich history of interplay between string matching methods and compression =-=[11, 2, 8]-=-, so our work in this paper adds to this lore. In particular, string matching methods in [2, 8] use the context like we do here, but do a different, elaborate search within many contexts for a particu... |

4 |
S.: Substring compression problems
- Cormode, Muthukrishnan
- 2005
(Show Context)
Citation Context ...ata structures that enable the denoising of particular portions of the noisy sequence using specified context lengths. This is the denoising analog of the problem of compressing substrings studied in =-=[1]-=-. We address these concerns by studying a general data structural problem for subsequence denoising usingDUDE. Formally, we study the following problem. Problem 1 We have a sequenceÞÒand a noise model... |

1 |
On the temporal HZY compression scheme
- Cohen, Matias, et al.
(Show Context)
Citation Context |

1 |
Simple Optimal Parallel Multiple Pattern Matching
- Muthukrishnan
(Show Context)
Citation Context ...of technical problems. Can we design a deterministic algorithm for the naming of substrings of length�inÇÒtime andÇÐÓ��rounds? We suspect it is possible but one may need to go into the guts of [5] or =-=[9]-=- and usesnaming or �naming respectively. A conceptual question is to develop an effective denoising method in presence of noise comprising inserts and deletes. Also, it will be of interest to extendDU... |

1 |
Code: http://www.cs.sunysb.edu/ algorith/files/suffix-trees.shtml
- Skiena
(Show Context)
Citation Context ...rst. We will construct the suffix treeÌof stringÞÒ. A number of algorithms exist for suffix tree construction. The earliest result is [3] and a simple one is [5]; several implementations exist, e.g., =-=[12]-=-. This takesÇÒtime deterministically for alphabet��Òfor any constant . Recall that the suffix tree is a rooted trie with each edge labeled by some substring Þ����℄. Define the string depth of nodeÚden... |