Results 1 
7 of
7
A.K.H.: Similarity evaluation on treestructured data
 In: SIGMOD
, 2005
"... Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform treestructured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(T1+ T2), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filterandrefine framework to process similarity search on treestructured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets. 1.
The GappedFactor Tree
 PRAGUE STRINGOLOGY CONFERENCE (PSC)
, 2006
"... We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text wit ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in O(n × Σ) time and space, with n the length of the text and Σ  the size of the alphabet. Such a data structure may play an important role in some pattern matching and motif inference problems, for instance in text filtration.
Indexing gappedfactors using a tree
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text wit ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.
ABSTRACT Similarity Evaluation on Treestructured Data
"... Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still ..."
Abstract
 Add to MetaCart
Treestructured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform treestructured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(T1  + T2), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filterandrefine framework to process similarity search on treestructured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets. 1.
Barcelona Aarhus Barcelona
, 2002
"... This is the second annual progress report for the ALCOMFT project, supported by the European ..."
Abstract
 Add to MetaCart
This is the second annual progress report for the ALCOMFT project, supported by the European
International Journal of Foundations of Computer Science c ○ World Scientific Publishing Company Indexing gappedfactors using a tree
"... Communicated by Editor’s name We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all th ..."
Abstract
 Add to MetaCart
Communicated by Editor’s name We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.