Results 1  10
of
54
The String BTree: A New Data Structure for String Search in External Memory and its Applications.
 Journal of the ACM
, 1998
"... We introduce a new textindexing data structure, the String BTree, that can be seen as a link between some traditional externalmemory and stringmatching data structures. In a short phrase, it is a combination of Btrees and Patricia tries for internalnode indices that is made more effective by a ..."
Abstract

Cited by 138 (12 self)
 Add to MetaCart
(Show Context)
We introduce a new textindexing data structure, the String BTree, that can be seen as a link between some traditional externalmemory and stringmatching data structures. In a short phrase, it is a combination of Btrees and Patricia tries for internalnode indices that is made more effective by adding extra pointers to speed up search and update operations. Consequently, the String BTree overcomes the theoretical limitations of inverted files, Btrees, prefix Btrees, suffix arrays, compacted tries and suffix trees. String Btrees have the same worstcase performance as Btrees but they manage unboundedlength strings and perform much more powerful search operations such as the ones supported by suffix trees. String Btrees are also effective in main memory (RAM model) because they improve the online suffix tree search on a dynamic set of strings. They also can be successfully applied to database indexing and software duplication.
Fast Kernels for String and Tree Matching
, 2004
"... Introduction Many problems in machine learning require a data classification algorithm to work with a set of discrete objects. Common examples include biological sequence analysis where data is represented as strings (Durbin et al., 1998) and Natural Language Processing (NLP) where the data is give ..."
Abstract

Cited by 110 (7 self)
 Add to MetaCart
(Show Context)
Introduction Many problems in machine learning require a data classification algorithm to work with a set of discrete objects. Common examples include biological sequence analysis where data is represented as strings (Durbin et al., 1998) and Natural Language Processing (NLP) where the data is given in the form of a string combined with a parse tree (Collins and Du#y, 2001) or an annotated sequence (Altun et al., 2003). In order to apply kernel methods one defines a measure of similarity between discrete structures via a feature map # : X F. Here X is the set of discrete structures (eg. the set of all parse trees of a language) and F is a Hilbert space. Since #(x) F we can define a kernel by evaluating the scalar products k(x, x # ) = ##(x), #(x # )# (1.1) where x, x # X. The success of a kernel method employing k depends both on the faithful representation of discrete data and an e#cient means of computing k. Recent research e#ort has focussed on defining meaningful ker
Dictionary Matching and Indexing with Errors and Don't Cares
 In Proceedings of STOC
, 2004
"... ..."
(Show Context)
Marked Ancestor Problems
, 1998
"... Consider a rooted tree whose nodes can be marked or unmarked. Given a node, we want to find its nearest marked ancestor. This generalises the wellknown predecessor problem, where the tree is a path. ..."
Abstract

Cited by 62 (5 self)
 Add to MetaCart
(Show Context)
Consider a rooted tree whose nodes can be marked or unmarked. Given a node, we want to find its nearest marked ancestor. This generalises the wellknown predecessor problem, where the tree is a path.
Indexing and Dictionary Matching with One Error (Extended Abstract)
, 1999
"... The indexing problem is the one where a text is preprocessed and subsequent queries of the form: "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
The indexing problem is the one where a text is preprocessed and subsequent queries of the form: "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form: "Find all occurrences of dictionary patterns in text T" are answered in time proportional to the length of the text and the number of occurrences. There exist efficient worstcase solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e. where the pattern is within a bound edit (or hamming...
Matching a Set of Strings with Variable Length Don’t Cares, Theoretical Computer Science 178
, 1997
"... Given an alphabet A, a pattern p is a sequence (vl,...,vm) of words from A * called keywords. We represent p as a single word ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
(Show Context)
Given an alphabet A, a pattern p is a sequence (vl,...,vm) of words from A * called keywords. We represent p as a single word
An efficient algorithm for dynamic text indexing
 Proc. of 5th Annual ACMSIAM Symposium on Discrete Algorithms
, 1994
"... ABSTRACT Text indexing is one of the fundamental problems of string matching. Indeed, the suffix tree, the central data structure of string matching, was developed as an efficient static text indexer. The text indexing problem is that of building a data structure on a text which allows the occurrenc ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT Text indexing is one of the fundamental problems of string matching. Indeed, the suffix tree, the central data structure of string matching, was developed as an efficient static text indexer. The text indexing problem is that of building a data structure on a text which allows the occurrences of patterns to be quickly looked up. All previous text indexing schemes have been static in the sense that if the text is modified, the data structure must be rebuilt from scratch. In this paper, we present a first dynamic data structure and algorithms for the Online Dynamic Text Indexing problem. Our algorithms are based on a novel data structure, the border tree, which exploits string periodicities. 1 Introduction Pattern matching is one of the most wellstudied fields in computer science. Problems in this field have very broad applications in many areas of computer science. Elegant and efficient algorithms have been developed for exact pattern matching. (e.g. [4, 9]).
MultiMethod Dispatching: A Geometric Approach with Applications to String Matching Problems
, 1999
"... Current object oriented programming languages (OOPLs) rely on monomethod dispatching. Recent research has identified multimethods as a new, powerful feature to be added to OOPLs, and several experimental OOPLs now have multimethods. Their ultimate success and impact in practice depends, among ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
Current object oriented programming languages (OOPLs) rely on monomethod dispatching. Recent research has identified multimethods as a new, powerful feature to be added to OOPLs, and several experimental OOPLs now have multimethods. Their ultimate success and impact in practice depends, among other things, on whether multimethod dispatching can be supported efficiently. We show that the multimethod dispatching problem can be transformed to a geometric problem on multidimensional integer grids, for which we then develop a data structure that uses nearlinear space and has O(log log n) query time. This gives a solution whose performance almost matches that of the best known algorithm for standard monomethod dispatching. Our geometric data structure has other applications as well, namely in two string matching problems: matching multiple rectangular patterns against a rectangular query text, and approximate dictionary matching with edit distance at most one. Our results f...
Efficient Randomized Dictionary Matching Algorithms (Extended Abstract)
, 1992
"... The standard string matching problem involves finding all occurrences of a single pattern in a single text. While this approach works well in many application areas, there are some domains in which it is more appropriate to deal with dictionaries of patterns. A dictionary is a set of patterns; the ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
The standard string matching problem involves finding all occurrences of a single pattern in a single text. While this approach works well in many application areas, there are some domains in which it is more appropriate to deal with dictionaries of patterns. A dictionary is a set of patterns; the goal of dictionary matching is to find all dictionary patterns in a given text, simultaneously. In string matching, randomized algorithms have primarily made use of randomized hashing functions which convert strings into "signatures" or "finger prints". We explore the use of finger prints in conjunction with other randomized and deterministic techniques and data structures. We present several new algorithms for dictionary matching, along with parallel algorithms which are simpler of more efficient than previously known algorithms.