Results 1 
6 of
6
LempelZiv parsing and sublinearsize index structures for string matching (Extended Abstract)
 Proc. 3rd South American Workshop on String Processing (WSP'96
, 1996
"... String matching over a long text can be significantly speeded up with an index structure formed by preprocessing the text. For very long texts, the size of such an index can be a problem. This paper presents the first sublinearsize index structure. The new structure is based on LempelZiv parsing ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
String matching over a long text can be significantly speeded up with an index structure formed by preprocessing the text. For very long texts, the size of such an index can be a problem. This paper presents the first sublinearsize index structure. The new structure is based on LempelZiv parsing of the text and has size linear in N, the size of the LempelZiv parse. For a text of length n, N = O(n = log n) and can be still smaller if the text is compressible. With the new index structure, all occurrences of a pattern string of length m can be found in time O(m 2
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases
 In ICDE
, 2000
"... We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our index ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our indexing technique uses a diskbased suffix tree as an index structure and employs' lowerbound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and thus accelerate the query processing, we convert sequences of continuous values to sequences of discrete values via a categorization method and store only a subset of suffixes whose first values are different from their preceding values. The experimental results' reveal that our proposed technique can be a few orders' of magnitude faster than sequential scanning.
Finding Optimal Pairs of Cooperative and Competing Patterns with Bounded Distance
 In Proc. 7th International Conference on Discovery Science (DS’04
, 2004
"... We consider the problem of discovering the optimal pair of substring patterns with bounded distance #, from a given set S of strings. ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
We consider the problem of discovering the optimal pair of substring patterns with bounded distance #, from a given set S of strings.
M.: Sparse Directed Acyclic Word Graphs
 in Proc. 13th International Symp. on String Processing and Information Retrieval (SPIRE’06), Lecture Notes in Computer Science
, 2006
"... Abstract. The suffix tree of string w is a text indexing structure that represents all suffixes of w. A sparse suffix tree of w represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. The suffix tree of string w is a text indexing structure that represents all suffixes of w. A sparse suffix tree of w represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version of directed acyclic word graphs (DAWGs) of Blumer et al. We show that the size of SDAWGs is linear in the length of w, and present an online lineartime construction algorithm for SDAWGs. 1
Lineartime offline text compression by longestfirst substitution
 in Proc. 10th International Symp. on String Processing and Information Retrieval (SPIRE’03
, 2003
"... Abstract. Given a text, grammarbased compression is to construct a grammar that generates the text. There are many kinds of text compression techniques of this type. Each compression scheme is categorized as being either offline or online, according to how a text is processed. One representative ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract. Given a text, grammarbased compression is to construct a grammar that generates the text. There are many kinds of text compression techniques of this type. Each compression scheme is categorized as being either offline or online, according to how a text is processed. One representative tactics for offline compression is to substitute the longest repeated factors of a text with a production rule. In this paper, we present an algorithm that compresses a text basing on this longestfirst principle, in linear time. The algorithm employs a suitable index structure for a text, and involves technically efficient operations on the structure. 1
METHODOLOGY ARTICLE Open Access
"... integration of multiscale data for the hostpathogen studies ..."