A.K.H.: Similarity evaluation on treestructured data
 In: SIGMOD
, 2005
"... Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still ..."
Cited by 32 (2 self)
Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform treestructured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(T1+ T2), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filterandrefine framework to process similarity search on treestructured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets. 1.
The GappedFactor Tree
 PRAGUE STRINGOLOGY CONFERENCE (PSC)
, 2006
"... We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text wit ..."
Cited by 4 (0 self)
We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in O(n × Σ) time and space, with n the length of the text and Σ  the size of the alphabet. Such a data structure may play an important role in some pattern matching and motif inference problems, for instance in text filtration.
Indexing gappedfactors using a tree
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text wit ..."
Cited by 1 (0 self)
We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.
Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform treestructured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(T1  + T2), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filterandrefine framework to process similarity search on treestructured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets. 1.
, 2002
"... This is the second annual progress report for the ALCOMFT project, supported by the European ..."
"... Communicated by Editor's name We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all th ..."
Communicated by Editor’s name We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.